Regex.Replace with a string of HTML

HTML quotes conflicting with VB

Page 1 of 1

6 Replies - 3552 Views - Last Post: 26 June 2009 - 07:06 AM Rate Topic: -----

#1 SpeeDemon  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 55
  • Joined: 18-March 08

Regex.Replace with a string of HTML

Post icon  Posted 25 June 2009 - 09:16 PM

I have a long string of HTML code that contains several quotes around objects in the string. I have a Regex.Replace() function to change the "img src" tag, due to the way its stored in a database.

Here is my string: <p><strong><u><font color="#cc0099">RICH TEXT BOLD UNDERLINE. PICTURE TO APPEAR BELOW</font></u></strong></p><p><strong><u><font color="#cc0099"/></u></strong></p><p><img src="/inlineimages/WorkOrder/6/1245981403232.jpg"/> </p><p /><p>W00T!</p>

I need to change <img src="/inlineimages/WorkOrder/6/1245981403232.jpg"/> to <img src="http://localhost:8080/inlineimages/WorkOrder/6/1245981403232.jpg"/>

Here is my code:
Dim input As String = "<p><strong><u><font color= '" & "#cc0099" & "'>RICH TEXT BOLD UNDERLINE. PICTURE TO APPEAR BELOW</font></u></strong></p><p><strong><u><font color= '" & "#cc0099" & "'/></u></strong></p><p><img src='" & "/inlineimages/WorkOrder/6/1245981403232.jpg" & "'/> </p><p /><p>W00T!</p>"
WebBrowser1.DocumentText = Regex.Replace(input, "<img src=", "<img src= '" & "http://localhost:8080" & "'")


The problem is, my output looks like: <p><strong><u><font color= '#cc0099'>RICH TEXT BOLD UNDERLINE. PICTURE TO APPEAR BELOW</font></u></strong></p><p><strong><u><font color= '#cc0099'/></u></strong></p><p><img src= 'http://localhost:8080''/inlineimages/WorkOrder/6/1245981403232.jpg'/> </p><p /><p>W00T!</p>

Notice how the "<font color=" has single quotes around its value, and "<img src=" has single quotes, with a double quote jammed in between?

How can I resolve this?

Also, is there an automated way to go from my initial string, to the "Dim input As String" I created? I had to manually type that out, editing the quotes just to try and make it work.

Thank you!

Is This A Good Question/Topic? 0
  • +

Replies To: Regex.Replace with a string of HTML

#2 erickh  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 11
  • Joined: 24-June 09

Re: Regex.Replace with a string of HTML

Posted 25 June 2009 - 10:30 PM

Can I ask why you included the closing apostrophe after your localhost:8080 ?

Quote

& "'"

Was This Post Helpful? 0
  • +
  • -

#3 SpeeDemon  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 55
  • Joined: 18-March 08

Re: Regex.Replace with a string of HTML

Posted 25 June 2009 - 10:37 PM

"	<img src=	' "	& "http://localhost:8080" &	" '						   "

^ start			^escape	   ^value				  ^end escape			 ^end


This is my understanding anyway. Without those quotes I get "String constants must end with a double quote"

If i remote the apostrophe the syntax is correct, but the output is:

<img src= 'http://localhost:8080''/inlineimages/WorkOrder/6/1245981403232.jpg'/>

Same thing as original post. I was wrong about it being a double quote in the result though. Its 2 apostrophe's side by side.

This post has been edited by SpeeDemon: 25 June 2009 - 10:41 PM

Was This Post Helpful? 0
  • +
  • -

#4 erickh  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 11
  • Joined: 24-June 09

Re: Regex.Replace with a string of HTML

Posted 25 June 2009 - 11:18 PM

The "quote" is really 2 apostrophes in a row. Your text to replace doesn't include an apostrophe, so it is left in your resulting string. The second apostrophe (actually comes first in your resulting string) comes from your closing apostrophe at the end.

Try adding your search string to be <img src=' instead of <img src= and lose the ending apostrophe. That should cure your double apostrophe problem.

I'm still not sure why you don't have the replacement text as one string instead of joining 3 strings with ampersands, but it shouldn't matter either way.

Also, I'm not sure why you didn't just make the change directly when you declare the variable "input".

Good luck!
Was This Post Helpful? 0
  • +
  • -

#5 SpeeDemon  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 55
  • Joined: 18-March 08

Re: Regex.Replace with a string of HTML

Posted 25 June 2009 - 11:34 PM

I used that variable input just to test how regex works. I wont have the freedom to manually edit the string when I connect to the database in my code. All I did was copy paste from SQL query browser to get that string. As you can see though, those quotes really screw it up. Ill try your advice on the search string.

I dont join the 3 strings with ampersands because they are a text replacement, not a join... If thats what you mean?
Was This Post Helpful? 0
  • +
  • -

#6 SpeeDemon  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 55
  • Joined: 18-March 08

Re: Regex.Replace with a string of HTML

Posted 25 June 2009 - 11:41 PM

<p><strong><u><font color= '#cc0099'>RICH TEXT BOLD UNDERLINE. PICTURE TO APPEAR BELOW</font></u></strong></p><p><strong><u><font color= '#cc0099'/></u></strong></p><p><img src= 'http://localhost:8080/inlineimages/WorkOrder/6/1245981403232.jpg'/> </p><p /><p>W00T!</p>

^ Works, your 2 suggestions were successful.

Regex.Replace(input, "<img src='", "<img src= '" & "http://localhost:8080" & "")


So the final issue: Since my "input" is pure HTML, I get syntax errors left right and center, and CANNOT make the regex statement work without MANUALLY adjusting with the formatting I demonstrated to you. Is there a way for it to just... work?

For reference, my HTML is:

<p><strong><u><font color="#cc0099">RICH TEXT BOLD UNDERLINE. PICTURE TO APPEAR BELOW</font></u></strong></p><p><strong><u><font color="#cc0099"/></u></strong></p><p><img src="/inlineimages/WorkOrder/6/1245981403232.jpg"/> </p><p /><p>W00T!</p>

Thanks again.
Was This Post Helpful? 0
  • +
  • -

#7 erickh  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 11
  • Joined: 24-June 09

Re: Regex.Replace with a string of HTML

Posted 26 June 2009 - 07:06 AM

Hello,

I think you are using an older version of Visual Studio if you are using the Regex method. I have VS2003 and it doesn't have that method, so I could't replicate your code. But the code below works just fine in VS2005 using the String.Replace method (screen shot attached). Attached File  screen.bmp (200.95K)
Number of downloads: 67

Try updating your input variable before assigning it to the DocumentText like I did in the code below.

Good luck!

Erick


Dim input As String = "<p><strong><u><font color= '" & "#cc0099" & "'>RICH TEXT BOLD UNDERLINE. PICTURE TO APPEAR BELOW</font></u></strong></p><p><strong><u><font color= '" & "#cc0099" & "'/></u></strong></p><p><img src='" & "/inlineimages/WorkOrder/6/1245981403232.jpg" & "'/> </p><p /><p>W00T!</p>"
input = input.Replace("<img src='", "<img src= '" & "http://localhost:8080" & "")
'input = input.Replace("<img src='", "<img src= 'http://localhost:8080")   'better way to phrase the statement above
WebBrowser1.DocumentText = input


Was This Post Helpful? 0
  • +
  • -

Page 1 of 1