This was how I did it (in full)...
csharp
string sHtml = string.Empty;
StreamReader sr = new StreamReader("yoursource.html");
sHtml = sr.ReadToEnd();
sr.Close();
// sHtml contains the HTML code
string sSearch = "Stephen";
MatchCollection mc = Regex.Matches(sHtml, string.Format("<td.*?>.*?{0}.*?</td>\\s*<td.*?>.*?</td>", sSearch), RegexOptions.IgnoreCase);
if (mc.Count > 0)
{
Match m = mc[0];
string sRequired = Regex.Replace(m.ToString(), "<td.*?>.*?</td>\\s*<td.*?>(?<req>.*?)</td>", "${req}", RegexOptions.IgnoreCase);
Console.WriteLine(sRequired);
}
else
{
Console.WriteLine("No results");
}
A couple of lines of code, and you've got the entire HTML in one string. Then you parse the HTML. You'll have to correctly assign whatever you want to search into
sSearch of course.
You'd probably want to separate the reading-in-HTML part and the search part, then you won't read in the entire HTML every time you search. There's something called the variable for storing stuff...
I don't think there's much of a performance decrease if you parse the entire HTML every time you search, if that's what you mean. So just store everything in one string; there's no need to break down each row in the table.
This will work well for searches that are low in number. If you need to search frequently, then a better way might be to parse the entire HTML structure once, and store the relation results somewhere. You'll have to parse once every hour then, based on the update frequency of the source site.
HINT: The number of Match objects in the MatchCollection is the number of rows in the HTML table. Use a
foreach to loop through the MatchCollection to get all the relation results.