ignore everything before and after the regex paterns strings

  • (3 Pages)
  • +
  • 1
  • 2
  • 3

31 Replies - 7734 Views - Last Post: 12 September 2016 - 01:12 AM Rate Topic: -----

#16 andrewsw   User is offline

  • no more Mr Potato Head
  • member icon

Reputation: 6957
  • View blog
  • Posts: 28,696
  • Joined: 12-December 12

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 01:33 PM

You do not have to quote the previous post, there is a Reply button further down the page, or use the Fast Reply box.
Was This Post Helpful? 0
  • +
  • -

#17 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 7915
  • View blog
  • Posts: 26,425
  • Joined: 05-May 12

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 03:02 PM

What?

You were talking about adding $ in post #13, and now in post #15 you make no mention of $ and are talking about the ( and ) as well as needing to add a ? to make things work. To make things worse, in your original post, you said that it was working fine with (.*), but now you say it doesn't work.

It almost feels like you are just randomly throwing code and regular expressions to see what sticks without any thought or analysis why some things work, and others do not.
Was This Post Helpful? 0
  • +
  • -

#18 dr4   User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 100
  • Joined: 19-December 14

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 03:12 PM

the code that I posted always had (.*?), I never used (.*) or mentioned it in any moment,read it again...and as I said add $ to my original code fixed everything

my original code was:

temporary = new Regex(mystring1+ "(.*?)" + mystring2, RegexOptions.Singleline).Matches(myText);



now it is

temporary = new Regex($+""+mystring1+ "(.*?)" + mystring2, RegexOptions.Singleline).Matches(myText);




the second code that andrewsw posted in post #4 also works like a charm , I just implemented it wrong the first time
Was This Post Helpful? 0
  • +
  • -

#19 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 7915
  • View blog
  • Posts: 26,425
  • Joined: 05-May 12

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 03:23 PM

Oops. Sorry. I missed that.

I didn't see that you were using the non-greedy version of the * quantifier.

I still don't see how adding that $ in there makes a difference to making things work. What was more significant is adding in the space.
Was This Post Helpful? 0
  • +
  • -

#20 andrewsw   User is offline

  • no more Mr Potato Head
  • member icon

Reputation: 6957
  • View blog
  • Posts: 28,696
  • Joined: 12-December 12

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 03:42 PM

(A space?)

Yes, I'm trying to think what difference the $ makes. The only thing that I could think of was possibly to do with the colon. With string interpolation:

Quote

You do not need to quote the quotation characters within the contained interpolation expressions because interpolated string expressions start with $, and the compiler scans the contained interpolation expressions as balanced text until it finds a comma, colon, or close curly brace.

repeated link

But a colon isn't a special character with regex anyway, so I'm still at a loss to recognise any difference between those two statements.
Was This Post Helpful? 0
  • +
  • -

#21 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 7915
  • View blog
  • Posts: 26,425
  • Joined: 05-May 12

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 04:34 PM

Time to breakout LINQpad and do some experiments.

This post has been edited by Skydiver: 10 September 2016 - 07:56 PM
Reason for edit:: Fix LINWood to LINQpad. Gotta love autocorrect.

Was This Post Helpful? 0
  • +
  • -

#22 dr4   User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 100
  • Joined: 19-December 14

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 05:25 PM

I have no idea, I never used regex before and took me ages make that simple code works (learnt a lot thought) but y, that $ changed everything

it may be obvious but just to complete the post (if someone come here from google some day) the output of:

temporary = new Regex("(.*?)" +mystring1 + "(.*?)" +mystring2+ "(.*?)", RegexOptions.Singleline).Matches(mytext);



and

temporary = new Regex($""+mystring1+ "(.*?)" + mystring2, RegexOptions.Singleline).Matches(myText);



is exactly the same, but the first one as I said take ages to load because it read all the text in each match

This post has been edited by dr4: 11 September 2016 - 02:40 AM

Was This Post Helpful? 0
  • +
  • -

#23 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 7915
  • View blog
  • Posts: 26,425
  • Joined: 05-May 12

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 08:00 PM

View Postdr4, on 10 September 2016 - 06:12 PM, said:

now it is

temporary = new Regex($+""+mystring1+ "(.*?)" + mystring2, RegexOptions.Singleline).Matches(myText);




the second code that andrewsw posted in post #4 also works like a charm , I just implemented it wrong the first time


That doesn't even compile:
using System;
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace SimpleCSConsole
{
    class Program
    {
        void Run()
        {
            string myString1 = "change:";
            string myString2 = ":change";
            string myText = "sometextchange:THIS IS A TEST:changesometext";

            MatchCollection temporary;
            temporary = new Regex($+""+myString1+"(.*?)"+myString2, RegexOptions.Singleline).Matches(myText);
            foreach (var match in temporary)
                Console.WriteLine(match);
        }

        static void Main()
        {
            new Program().Run();
            Console.ReadKey();
        }
    }
}



Gives the following errors:
CS1056	Unexpected character '$'	SimpleCSConsole	D:\z\Test\SimpleCSConsole\SimpleCSConsole\Program.cs	17
CS0023	Operator '+' cannot be applied to operand of type 'string'	SimpleCSConsole	D:\z\Test\SimpleCSConsole\SimpleCSConsole\Program.cs	17


Was This Post Helpful? 0
  • +
  • -

#24 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 7915
  • View blog
  • Posts: 26,425
  • Joined: 05-May 12

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 08:58 PM

View Postandrewsw, on 10 September 2016 - 06:42 PM, said:

(A space?)


Sorry, like , I was having phone input issues. Empty string is what I was really trying to Swype in.
Was This Post Helpful? 0
  • +
  • -

#25 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 7915
  • View blog
  • Posts: 26,425
  • Joined: 05-May 12

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 09:05 PM

The only way I could get it to compile was to take away the extra + to make the pattern $"" + myString1 + "(.*?)" + myString2.

Unfortunately, the claim in post #22 was that the results were the same where using "(.*?)" + myString1 + "(.*?)" + myString2 + "(.*?)" or $"" + myString1 + "(.*?)" + myString2, but I'm getting these results:
Trying to match pattern: '(.*?)change:(.*?):change(.*?)'
>>> begin >>>
sometextchange:THIS IS A TEST:change
<<< end <<<
Trying to match pattern: 'change:(.*?):change'
>>> begin >>>
change:THIS IS A TEST:change
<<< end <<<


using this code:
using System;
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace SimpleCSConsole
{
    class Program
    {
        void WriteMatches(IEnumerable<Match> matches)
        {
            Console.WriteLine(">>> begin >>>");
            foreach (var match in matches)
                Console.WriteLine(match);
            Console.WriteLine("<<< end <<<");
        }

        IEnumerable<Match> GetMatches(string input, string pattern)
        {
            Console.WriteLine($"Trying to match pattern: '{pattern}'");
            return Regex.Matches(input, pattern, RegexOptions.Singleline).Cast<Match>();
        }

        void Run()
        {
            string myString1 = "change:";
            string myString2 = ":change";
            string myText = "sometextchange:THIS IS A TEST:changesometext";

            var pattern1 = "(.*?)" + myString1 + "(.*?)" + myString2 + "(.*?)";
            var pattern2 = $"" + myString1 + "(.*?)" + myString2;

            WriteMatches(GetMatches(myText, pattern1));
            WriteMatches(GetMatches(myText, pattern2));
        }

        static void Main()
        {
            new Program().Run();
            Console.ReadKey();
        }
    }
}


Was This Post Helpful? 0
  • +
  • -

#26 andrewsw   User is offline

  • no more Mr Potato Head
  • member icon

Reputation: 6957
  • View blog
  • Posts: 28,696
  • Joined: 12-December 12

Re: ignore everything before and after the regex paterns strings

Posted 10 September 2016 - 10:18 PM

OP said:

is exactly the same, but the first one as I said take ages to load because it read all the text in each match

I was distracted by the $ at the front, didn't notice the (.*?) at the end, mainly because it was added in post 22 and wasn't present in post 18.

I cannot describe how that is working, technically, but it isn't too surprising that it performs badly. There is nothing to anchor/terminate it so it probably does a lot of running back and forth; cf catastrophic backtracking. I mentioned earlier that the use of .* should be kept to a minimum.
Was This Post Helpful? 1
  • +
  • -

#27 dr4   User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 100
  • Joined: 19-December 14

Re: ignore everything before and after the regex paterns strings

Posted 11 September 2016 - 02:39 AM

y sorry, there is no + after the $, my bad! I wrote it by heart.But yes, both patterns have the same output if I give the same strings, but the first one take like 20 seconds and the second less than 1.
Was This Post Helpful? 0
  • +
  • -

#28 andrewsw   User is offline

  • no more Mr Potato Head
  • member icon

Reputation: 6957
  • View blog
  • Posts: 28,696
  • Joined: 12-December 12

Re: ignore everything before and after the regex paterns strings

Posted 11 September 2016 - 03:15 AM

Please, in future, copy and paste exact code, or state that you are attempting to recall something from memory, to prevent this kind of bedevilment.

Also note that, with $"" + "something", the interpolation only occurs to the empty string.

Five Invaluable Techniques to Improve Regex Performance
Was This Post Helpful? 0
  • +
  • -

#29 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 7915
  • View blog
  • Posts: 26,425
  • Joined: 05-May 12

Re: ignore everything before and after the regex paterns strings

Posted 11 September 2016 - 07:18 AM

View Postdr4, on 11 September 2016 - 05:39 AM, said:

both patterns have the same output if I give the same strings, but the first one take like 20 seconds and the second less than 1.

No, they do not have the same output. Notice that one of them contains sometext", while the other does not. Additionally, if you examine the Groups within the matches, you'll see that one has only 2 groups, while the other has 4 groups. All that extra processing done at matching time -- not in a lazy on demand manner.
Was This Post Helpful? 0
  • +
  • -

#30 dr4   User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 100
  • Joined: 19-December 14

Re: ignore everything before and after the regex paterns strings

Posted 11 September 2016 - 09:46 AM

by curiosity I tried again and in both codes I'm getting exactly the same output for all the matches in the text (its like 40 or 50 matches), I don't know if it have 2 or 4 groups, I'm using only groups[1], the problem is solved long ago anyway.

temporary[x].Groups[1].ToString()

This post has been edited by dr4: 11 September 2016 - 09:47 AM

Was This Post Helpful? 0
  • +
  • -

  • (3 Pages)
  • +
  • 1
  • 2
  • 3