31 Replies - 7734 Views - Last Post: 12 September 2016 - 01:12 AM
#16
Re: ignore everything before and after the regex paterns strings
Posted 10 September 2016 - 01:33 PM
#17
Re: ignore everything before and after the regex paterns strings
Posted 10 September 2016 - 03:02 PM
You were talking about adding $ in post #13, and now in post #15 you make no mention of $ and are talking about the ( and ) as well as needing to add a ? to make things work. To make things worse, in your original post, you said that it was working fine with (.*), but now you say it doesn't work.
It almost feels like you are just randomly throwing code and regular expressions to see what sticks without any thought or analysis why some things work, and others do not.
#18
Re: ignore everything before and after the regex paterns strings
Posted 10 September 2016 - 03:12 PM
my original code was:
temporary = new Regex(mystring1+ "(.*?)" + mystring2, RegexOptions.Singleline).Matches(myText);
now it is
temporary = new Regex($+""+mystring1+ "(.*?)" + mystring2, RegexOptions.Singleline).Matches(myText);
the second code that andrewsw posted in post #4 also works like a charm , I just implemented it wrong the first time
#19
Re: ignore everything before and after the regex paterns strings
Posted 10 September 2016 - 03:23 PM
I didn't see that you were using the non-greedy version of the * quantifier.
I still don't see how adding that $ in there makes a difference to making things work. What was more significant is adding in the space.
#20
Re: ignore everything before and after the regex paterns strings
Posted 10 September 2016 - 03:42 PM
Yes, I'm trying to think what difference the $ makes. The only thing that I could think of was possibly to do with the colon. With string interpolation:
Quote
repeated link
But a colon isn't a special character with regex anyway, so I'm still at a loss to recognise any difference between those two statements.
#21
Re: ignore everything before and after the regex paterns strings
Posted 10 September 2016 - 04:34 PM
This post has been edited by Skydiver: 10 September 2016 - 07:56 PM
Reason for edit:: Fix LINWood to LINQpad. Gotta love autocorrect.
#22
Re: ignore everything before and after the regex paterns strings
Posted 10 September 2016 - 05:25 PM
it may be obvious but just to complete the post (if someone come here from google some day) the output of:
temporary = new Regex("(.*?)" +mystring1 + "(.*?)" +mystring2+ "(.*?)", RegexOptions.Singleline).Matches(mytext);
and
temporary = new Regex($""+mystring1+ "(.*?)" + mystring2, RegexOptions.Singleline).Matches(myText);
is exactly the same, but the first one as I said take ages to load because it read all the text in each match
This post has been edited by dr4: 11 September 2016 - 02:40 AM
#23
Re: ignore everything before and after the regex paterns strings
Posted 10 September 2016 - 08:00 PM
dr4, on 10 September 2016 - 06:12 PM, said:
temporary = new Regex($+""+mystring1+ "(.*?)" + mystring2, RegexOptions.Singleline).Matches(myText);
the second code that andrewsw posted in post #4 also works like a charm , I just implemented it wrong the first time
That doesn't even compile:
using System;
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace SimpleCSConsole
{
class Program
{
void Run()
{
string myString1 = "change:";
string myString2 = ":change";
string myText = "sometextchange:THIS IS A TEST:changesometext";
MatchCollection temporary;
temporary = new Regex($+""+myString1+"(.*?)"+myString2, RegexOptions.Singleline).Matches(myText);
foreach (var match in temporary)
Console.WriteLine(match);
}
static void Main()
{
new Program().Run();
Console.ReadKey();
}
}
}
Gives the following errors:
CS1056 Unexpected character '$' SimpleCSConsole D:\z\Test\SimpleCSConsole\SimpleCSConsole\Program.cs 17 CS0023 Operator '+' cannot be applied to operand of type 'string' SimpleCSConsole D:\z\Test\SimpleCSConsole\SimpleCSConsole\Program.cs 17
#24
Re: ignore everything before and after the regex paterns strings
Posted 10 September 2016 - 08:58 PM
#25
Re: ignore everything before and after the regex paterns strings
Posted 10 September 2016 - 09:05 PM
Unfortunately, the claim in post #22 was that the results were the same where using "(.*?)" + myString1 + "(.*?)" + myString2 + "(.*?)" or $"" + myString1 + "(.*?)" + myString2, but I'm getting these results:
Trying to match pattern: '(.*?)change:(.*?):change(.*?)' >>> begin >>> sometextchange:THIS IS A TEST:change <<< end <<< Trying to match pattern: 'change:(.*?):change' >>> begin >>> change:THIS IS A TEST:change <<< end <<<
using this code:
using System;
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace SimpleCSConsole
{
class Program
{
void WriteMatches(IEnumerable<Match> matches)
{
Console.WriteLine(">>> begin >>>");
foreach (var match in matches)
Console.WriteLine(match);
Console.WriteLine("<<< end <<<");
}
IEnumerable<Match> GetMatches(string input, string pattern)
{
Console.WriteLine($"Trying to match pattern: '{pattern}'");
return Regex.Matches(input, pattern, RegexOptions.Singleline).Cast<Match>();
}
void Run()
{
string myString1 = "change:";
string myString2 = ":change";
string myText = "sometextchange:THIS IS A TEST:changesometext";
var pattern1 = "(.*?)" + myString1 + "(.*?)" + myString2 + "(.*?)";
var pattern2 = $"" + myString1 + "(.*?)" + myString2;
WriteMatches(GetMatches(myText, pattern1));
WriteMatches(GetMatches(myText, pattern2));
}
static void Main()
{
new Program().Run();
Console.ReadKey();
}
}
}
#26
Re: ignore everything before and after the regex paterns strings
Posted 10 September 2016 - 10:18 PM
OP said:
I was distracted by the $ at the front, didn't notice the (.*?) at the end, mainly because it was added in post 22 and wasn't present in post 18.
I cannot describe how that is working, technically, but it isn't too surprising that it performs badly. There is nothing to anchor/terminate it so it probably does a lot of running back and forth; cf catastrophic backtracking. I mentioned earlier that the use of .* should be kept to a minimum.
#27
Re: ignore everything before and after the regex paterns strings
Posted 11 September 2016 - 02:39 AM
#28
Re: ignore everything before and after the regex paterns strings
Posted 11 September 2016 - 03:15 AM
Also note that, with $"" + "something", the interpolation only occurs to the empty string.
Five Invaluable Techniques to Improve Regex Performance
#29
Re: ignore everything before and after the regex paterns strings
Posted 11 September 2016 - 07:18 AM
dr4, on 11 September 2016 - 05:39 AM, said:
No, they do not have the same output. Notice that one of them contains sometext", while the other does not. Additionally, if you examine the Groups within the matches, you'll see that one has only 2 groups, while the other has 4 groups. All that extra processing done at matching time -- not in a lazy on demand manner.
#30
Re: ignore everything before and after the regex paterns strings
Posted 11 September 2016 - 09:46 AM
temporary[x].Groups[1].ToString()
This post has been edited by dr4: 11 September 2016 - 09:47 AM

New Topic/Question
Reply



MultiQuote

|