ignore everything before and after the regex paterns strings

  • (3 Pages)
  • +
  • 1
  • 2
  • 3

31 Replies - 2580 Views - Last Post: 12 September 2016 - 01:12 AM Rate Topic: -----

#1 dr4   User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 98
  • Joined: 19-December 14

ignore everything before and after the regex paterns strings

Posted 08 September 2016 - 01:49 PM

Hi people!

I have an regex patern to find everything between 2 strings, it works great for situations likes:

myText="sometext change:THIS TEXT:change sometext"

mystring1="change:"
mystring2=:"change"

temporary = new Regex(mystring1+ "(.*?)" + mystring2, RegexOptions.Singleline).Matches(myText);


but if I'm in this situation the code will not work:

myText="sometextchange:THIS TEXT:changesometext"

mystring1="change:"
mystring2=":change"


So I need to add something like:

temporary = new Regex("it dont care about what is here"+mystring1+ "(.*?)" + mystring2+"it dont care about what is here", RegexOptions.Singleline).Matches(myText);



so the regex will ignore everything before and after the 2 given strings

I'm reading the regex documentation and trying with "." , \w,\A or \Z but it didn't work



Edit:

aaaand the answer is in my own code =_= I feel so stupid when this happens...

theanswer = new Regex("(.*?)" +mystring1 + "(.*?)" +mystring2+ "(.*?)", RegexOptions.Singleline).Matches(mytext);

This post has been edited by dr4: 08 September 2016 - 01:50 PM


Is This A Good Question/Topic? 0
  • +

Replies To: ignore everything before and after the regex paterns strings

#2 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6798
  • View blog
  • Posts: 28,102
  • Joined: 12-December 12

Re: ignore everything before and after the regex paterns strings

Posted 08 September 2016 - 02:14 PM

Your original version does work, you don't need to collect all the remaining content, it is just that the text you are seeking is in Group 1:

        string myText = "sometextchange:THIS TEXT:changesometext";

        string mystring1 = "change:";
        string mystring2 = ":change";

        MatchCollection temporary = new Regex(mystring1 + "(.*?)" + mystring2, RegexOptions.Singleline).Matches(myText);

        foreach (Match swan in temporary) {
            Console.WriteLine("{0}", swan.Groups[1].Value);     // THIS TEXT
        }

You should also become familiar with string.Format (and other string methods) rather than concatenating several values, which is harder to read and easier to make a mistake.

(If you aren't used to seeing format.String expressions then you might argue that they are harder to read and interpret, but you'll soon become very familiar with them.)
Was This Post Helpful? 1
  • +
  • -

#3 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6798
  • View blog
  • Posts: 28,102
  • Joined: 12-December 12

Re: ignore everything before and after the regex paterns strings

Posted 08 September 2016 - 02:20 PM

I also encourage you to split statements like new Regex(..).Matches(..) into separate statements. You aren't gaining much (a single line does not count); you lose some clarity and have code that is harder to debug.

Of course, there are times when it is convenient to chain a couple of methods together, but this doesn't strike me (continuing the matches theme) as one of them.
Was This Post Helpful? 1
  • +
  • -

#4 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6798
  • View blog
  • Posts: 28,102
  • Joined: 12-December 12

Re: ignore everything before and after the regex paterns strings

Posted 08 September 2016 - 02:26 PM

string.Format("{0}(.*?){1}", mystring1, mystring2)

This isn't as complex as regex itself!
Was This Post Helpful? 1
  • +
  • -

#5 dr4   User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 98
  • Joined: 19-December 14

Re: ignore everything before and after the regex paterns strings

Posted 09 September 2016 - 10:17 AM

Thank you very much for the info, you are totally right ,never thought about that

This post has been edited by dr4: 09 September 2016 - 10:18 AM

Was This Post Helpful? 0
  • +
  • -

#6 dr4   User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 98
  • Joined: 19-December 14

Re: ignore everything before and after the regex paterns strings

Posted 09 September 2016 - 02:24 PM

the group 1 doesn't work like that thought, it just include the two strings or not but it don't have any function in the match process

the output in the group 0 would be:
THIS TEXT
while in the group 1 it would be
:change:THIS TEXT:change



The code that I posted works fine but the problem now is that it is taking ages to find all the coincidences, with the symbol "(.*?)" before and after the match it is looking in all the text multiple times so if there is 40 coincidences the code will read all the text 40 times, even the computer fan starts working faster, any idea about how to make it simple?

something that act like :

theanswer = new Regex("(.*?)" +mystring1 + "(.*?)" +mystring2+ "(.*?)", RegexOptions.Singleline).Matches(mytext);



but only ignoring the previous character of mystring1 and the posterior of mystring2 instead of ignore all the previous text (because it is actually reading all the text to know what to ignore which doesn't make sense)

This post has been edited by dr4: 09 September 2016 - 02:24 PM

Was This Post Helpful? 0
  • +
  • -

#7 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6798
  • View blog
  • Posts: 28,102
  • Joined: 12-December 12

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 02:44 AM

Group 0 is the entire pattern match Group 1 is the first capture group.

The code I posted works. You could compile the regex and iterate the matches, but generally avoiding universal patterns like .* should be avoided if possible, when you have a more specific pattern you can use.

If you are parsing a large file then I guess you could do so line by line, if appropriate.

I don't fully understand your post, though.
Was This Post Helpful? 0
  • +
  • -

#8 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6798
  • View blog
  • Posts: 28,102
  • Joined: 12-December 12

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 02:57 AM

Best Practices for Regular Expressions in the .NET Framework
Was This Post Helpful? 0
  • +
  • -

#9 aidenkael   User is offline

  • D.I.C Regular
  • member icon

Reputation: 66
  • View blog
  • Posts: 313
  • Joined: 22-October 13

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 07:47 AM

View Postandrewsw, on 08 September 2016 - 04:26 PM, said:

string.Format("{0}(.*?){1}", mystring1, mystring2)

This isn't as complex as regex itself!


Just want to add that string interpolation is also a solid idea if you are running c#6.0

string interpolated = $"{mystring1}(.*?){mystring2}";



Either way is fine, just thought I would add interpolation here :)
Was This Post Helpful? 1
  • +
  • -

#10 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 6969
  • View blog
  • Posts: 23,685
  • Joined: 05-May 12

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 11:13 AM

dr4: Can you explain in plain words what your application is trying to do and why? Pretend you are making a 30 second elevator ride sales pitch to an investor you run across by accident. Tell us what the value is in using your program.

My gut feel is that this problem can be more efficiently solved by simply using string.IndexOf(). Consider the overhead of building a state machine for the use of the regular expression versus, the simplicity of loops normally used by well known string matching algorithms.
Was This Post Helpful? 0
  • +
  • -

#11 aidenkael   User is offline

  • D.I.C Regular
  • member icon

Reputation: 66
  • View blog
  • Posts: 313
  • Joined: 22-October 13

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 11:46 AM

Okay never posting on a phone again. Tried to upbote this comment and accidently downvoted...... I give up on today.

This post has been edited by andrewsw: 10 September 2016 - 11:56 AM
Reason for edit:: Removed previous quote, just press REPLY

Was This Post Helpful? 0
  • +
  • -

#12 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6798
  • View blog
  • Posts: 28,102
  • Joined: 12-December 12

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 11:55 AM

I have upvoted.

You do not have to quote the previous post, there is a Reply button further down the page, or use the Fast Reply box.
Was This Post Helpful? 0
  • +
  • -

#13 dr4   User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 98
  • Joined: 19-December 14

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 12:23 PM

View Postaidenkael, on 10 September 2016 - 07:47 AM, said:

View Postandrewsw, on 08 September 2016 - 04:26 PM, said:

string.Format("{0}(.*?){1}", mystring1, mystring2)

This isn't as complex as regex itself!



string interpolated = $"{mystring1}(.*?){mystring2}";



that's it! just add the $ and it works like magic, it is finding all the matches in less than 1 second, thank you and thank you all guys for your time!
Was This Post Helpful? 0
  • +
  • -

#14 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 6969
  • View blog
  • Posts: 23,685
  • Joined: 05-May 12

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 12:52 PM

Huh? The point of that set of posts was to show that the following all result in the same thing:
string pattern1 = myString1 + "(.*)" + myString2;
string pattern2 = string.Format("{0}(.*){1}", myString1, myString2);
string pattern3 = $"{myString1}(.*){myString2}";



Using any of the above patterns will get you the same resulting matches.
Was This Post Helpful? 1
  • +
  • -

#15 dr4   User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 98
  • Joined: 19-December 14

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 01:11 PM

View PostSkydiver, on 10 September 2016 - 12:52 PM, said:

Huh? The point of that set of posts was to show that the following all result in the same thing:
string pattern1 = myString1 + "(.*)" + myString2;
string pattern2 = string.Format("{0}(.*){1}", myString1, myString2);
string pattern3 = $"{myString1}(.*){myString2}";



Using any of the above patterns will get you the same resulting matches.



with .* it don't find any match, it have to be (.*?), and the first one was my first option and it doesn't work if mystring have another character next to it instead of white space

This post has been edited by dr4: 10 September 2016 - 01:12 PM

Was This Post Helpful? 0
  • +
  • -

  • (3 Pages)
  • +
  • 1
  • 2
  • 3