Welcome to Dream.In.Code
Become a C# Expert!

Join 150,396 C# Programmers for FREE! Get instant access to thousands of C# experts, tutorials, code snippets, and more! There are 1,025 people online right now. Registration is fast and FREE... Join Now!




Regular Expressions?

2 Pages V  1 2 >  
Reply to this topicStart new topic

Regular Expressions?, C#

reCoded
25 Feb, 2008 - 05:42 PM
Post #1

D.I.C Head
**

Joined: 25 Feb, 2008
Posts: 166

Hey everyone,

I am looking for some help doing the following in C#. I have saved an HTML file (website) and I want to read a file (user input) and search a string (user input) and print out the string if it was found. Now since this is an HTML file and there are boarders, columns etc. How can I go about tackling this problem.

Thanks,
reCoded
User is offlineProfile CardPM
+Quote Post

PsychoCoder
RE: Regular Expressions?
25 Feb, 2008 - 06:51 PM
Post #2

using DIC.Core;
Group Icon

Joined: 26 Jul, 2007
Posts: 9,483



Thanked: 161 times
Dream Kudos: 9075
Expert In: VB, VB.Net, C#, SQL, ASP, ASP.Net, Web Development, HTML, CSS, Win32 API, Javascript, mySQL, J#, Boo.Net

My Contributions
Have a look at this snippet for Strip HTML from string with Regular Expressions, this will help with your problem smile.gif
User is online!Profile CardPM
+Quote Post

reCoded
RE: Regular Expressions?
28 Feb, 2008 - 08:07 PM
Post #3

D.I.C Head
**

Joined: 25 Feb, 2008
Posts: 166

Thanks for the reply!

It was recently stated that I can not use regular expressions to solve this problem! I guess I am thinking of plan B. Also I am looking at the HTML file's source code and loading that into my program so its a txt file rather then the HTML page file.

Now my question is. Since C# processes white space, should I concatenate all the strings in the file and then search for the header in which I want to search a particular string from? (This website has Columns, Headers etc. and its basically a table I have to search and find the string input the user wants to find on the site).

Please let me know if I am taking the right approach in tackling this problem.

Thanks,
recoded
User is offlineProfile CardPM
+Quote Post

Martyr2
RE: Regular Expressions?
28 Feb, 2008 - 10:02 PM
Post #4

Programming Theoretician
Group Icon

Joined: 18 Apr, 2007
Posts: 5,660



Thanked: 314 times
Expert In: C/C++, Java, VB, VB.NET, C#, PHP, Web Development, HTML & CSS, Javascript

My Contributions
Well first of all, you can do this using regular expressions. That is what regular expressions are for, finding complex textual patterns in a haystack full of text. Whitespace isn't an issue if you correctly setup the pattern to take in account white space characters. Using regular expressions is actually the ideal solution for this. It is fast, can be used to find complex patterns and the matches are easily collected.

Perhaps you can give us an example of some text and the typical search string that someone would want to search for and perhaps someone here will be kind enough to build the pattern for you.

smile.gif
User is offlineProfile CardPM
+Quote Post

reCoded
RE: Regular Expressions?
2 Mar, 2008 - 06:57 PM
Post #5

D.I.C Head
**

Joined: 25 Feb, 2008
Posts: 166

CODE
</tr>
    <tr>
        <td><sup>All times local.</sup></td>
        <th id="ComCanada" headers="Com">Canada<br /> bound</th>
        <th id="ComUS" headers="Com">U.S.<br /> bound</th>
        <th id="TravCanada" headers="Trav">Canada<br /> bound</th>
        <th id="TravUS" headers="Trav">U.S.<br /> bound</th>
    </tr><tr><td headers="Office"><b>St. Stephen</b><br />St. Stephen, NB / Calais, ME<br />Last updated:<br /><b>2008-03-02&nbsp;&nbsp;13:09 AST</b></td><td headers="Com ComCanada">No delay</td><td headers="Com ComUS">No delay</td><td headers="Trav TravCanada">No delay</td><td headers="Trav TravUS">No delay</td></tr>
<tr><td headers="Office"><b>Woodstock Road</b><br />Belleville, NB / Houlton, ME<br />Last updated:<br /><b>2008-03-02&nbsp;&nbsp;12:44 AST</b></td><td headers="Com ComCanada">No delay</td><td headers="Com ComUS">No delay</td><td headers="Trav TravCanada">No delay</td><td headers="Trav TravUS">No delay</td></tr>


This is a snip from the source file. I want to search a whole column of a name in one column and return the text that is next to the column (make sense?). I can't use regular expressions but I was thinking maybe search the header of the column, using maybe something that has to do with search string? Still trying to plan out how I will tackle this problem with out regular expressions and suggestions will be appreciated!

Thanks,
recoded
User is offlineProfile CardPM
+Quote Post

orcasquall
RE: Regular Expressions?
5 Mar, 2008 - 05:54 AM
Post #6

D.I.C Head
Group Icon

Joined: 14 Sep, 2007
Posts: 158



Thanked: 3 times
Dream Kudos: 50
My Contributions
Actually, I think it would have been better if you had just given the URL... Is this the one?
Border wait times

So, say you search for "Stephen", and you want "No delay"? If this is so, try the following code
csharp

// sHtml contains the HTML code
string sSearch = "Stephen";
MatchCollection mc = Regex.Matches(sHtml, string.Format("<td.*?>.*?{0}.*?</td>\\s*<td.*?>.*?</td>", sSearch), RegexOptions.IgnoreCase);
if (mc.Count > 0)
{
Match m = mc[0];
string sRequired = Regex.Replace(m.ToString(), "<td.*?>.*?</td>\\s*<td.*?>(?<req>.*?)</td>", "${req}", RegexOptions.IgnoreCase);
Console.WriteLine(sRequired);
}
else
{
Console.WriteLine("No results");
}


As Martyr2 mentioned, provide the example source, a typical search string and the desired result. It's very hard to know if we got it right if we've got nothing to compare it with...

The regular expressions you need depends heavily on the structure of the HTML. So if the source HTML changes, the regular expressions used will need to be modified.

Hope this helps!
User is offlineProfile CardPM
+Quote Post

reCoded
RE: Regular Expressions?
5 Mar, 2008 - 09:08 AM
Post #7

D.I.C Head
**

Joined: 25 Feb, 2008
Posts: 166

Thank you for the reply!

Yes that helped out a lot. I appreciate all the responses.

-recoded
User is offlineProfile CardPM
+Quote Post

reCoded
RE: Regular Expressions?
5 Mar, 2008 - 11:53 PM
Post #8

D.I.C Head
**

Joined: 25 Feb, 2008
Posts: 166

Using the website above. Should I go through the file and make it one long string before I start breaking it down and finding the crossing that the user wants to find and check if there is a delay?


User is offlineProfile CardPM
+Quote Post

orcasquall
RE: Regular Expressions?
6 Mar, 2008 - 04:22 AM
Post #9

D.I.C Head
Group Icon

Joined: 14 Sep, 2007
Posts: 158



Thanked: 3 times
Dream Kudos: 50
My Contributions
This was how I did it (in full)...
csharp

string sHtml = string.Empty;
StreamReader sr = new StreamReader("yoursource.html");
sHtml = sr.ReadToEnd();
sr.Close();

// sHtml contains the HTML code
string sSearch = "Stephen";
MatchCollection mc = Regex.Matches(sHtml, string.Format("<td.*?>.*?{0}.*?</td>\\s*<td.*?>.*?</td>", sSearch), RegexOptions.IgnoreCase);
if (mc.Count > 0)
{
Match m = mc[0];
string sRequired = Regex.Replace(m.ToString(), "<td.*?>.*?</td>\\s*<td.*?>(?<req>.*?)</td>", "${req}", RegexOptions.IgnoreCase);
Console.WriteLine(sRequired);
}
else
{
Console.WriteLine("No results");
}

A couple of lines of code, and you've got the entire HTML in one string. Then you parse the HTML. You'll have to correctly assign whatever you want to search into sSearch of course.

You'd probably want to separate the reading-in-HTML part and the search part, then you won't read in the entire HTML every time you search. There's something called the variable for storing stuff... smile.gif

I don't think there's much of a performance decrease if you parse the entire HTML every time you search, if that's what you mean. So just store everything in one string; there's no need to break down each row in the table.

This will work well for searches that are low in number. If you need to search frequently, then a better way might be to parse the entire HTML structure once, and store the relation results somewhere. You'll have to parse once every hour then, based on the update frequency of the source site.

HINT: The number of Match objects in the MatchCollection is the number of rows in the HTML table. Use a foreach to loop through the MatchCollection to get all the relation results.
User is offlineProfile CardPM
+Quote Post

reCoded
RE: Regular Expressions?
6 Mar, 2008 - 04:52 PM
Post #10

D.I.C Head
**

Joined: 25 Feb, 2008
Posts: 166

I see you used regular expressions? I was told not to use regular expressions, that is why I am running into dead ends programming this. Thanks for the help though.
User is offlineProfile CardPM
+Quote Post

Jayman
RE: Regular Expressions?
6 Mar, 2008 - 07:02 PM
Post #11

Student of Life
Group Icon

Joined: 26 Dec, 2005
Posts: 7,327



Thanked: 66 times
Dream Kudos: 500
Expert In: Everything

My Contributions
Since you can't use RegEx, then simply read the file into a string and use the String methods to look for the tags and then retrieve the text of each one using the Index method.

It will take some work, but you can do it. Give it a try and if you have problems then post your questions here or if its unrelated then create a new topic.
User is offlineProfile CardPM
+Quote Post

reCoded
RE: Regular Expressions?
6 Mar, 2008 - 10:55 PM
Post #12

D.I.C Head
**

Joined: 25 Feb, 2008
Posts: 166

Yeah I am working on it now. Thanks a lot everyone for all the replies, really helped!


Thanks,
reCoded

User is offlineProfile CardPM
+Quote Post

2 Pages V  1 2 >
Fast ReplyReply to this topicStart new topic
Time is now: 1/9/09 06:25PM

Be Social

Dream.In.Code RSS Feed Dream.In.Code LinkedIn Group Follow Us On Twitter

Live C# Help!

C# Tutorials

Reference Sheets

C# Snippets

DIC Chatroom

Bye Bye Ads

Monthly Drawing

Thumb Drive

Top Contributors

Top 10 Kudos This Month