Reg Expression To Find Broken Hyperlinks

Need Regular Expression To Find Broken Hyperlins

Page 1 of 1

6 Replies - 1723 Views - Last Post: 25 October 2008 - 04:32 PM Rate Topic: -----

#1 HowdeeDoodee  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 78
  • Joined: 17-June 08

Reg Expression To Find Broken Hyperlinks

Posted 24 October 2008 - 10:36 AM

I need a macro, perhaps using a regular expression, to find hyperlinks that are broken or with improper syntax. Expressed another way, I want to find hyprlinks that do not match the following patterns. In the first hyperlink, the only part of the hyperlink that changes from one record to the next is the number 39762. Likewise for the next hyperlink with the number 39764. In these two hyperlinks below, only the number changes from record to record. Everything else in the hyperlink is a constant from one record to the next.
<a href="http://www.findthepower.net/CP/CommentaryProject/PostNewABC2_PrevNext.php?BRN=ON&PrevNextNum=39762">PREVIOUS</a>

<a href="http://www.findthepower.net/CP/CommentaryProject/PostNewABC2_PrevNext.php?BRN=ON&PrevNextNum=39764">NEXT</a>


In the hyperlink below, the only part of the hyperlink to change from record to record is Rev 21:6

Everything else in the following hyprlink is a constant from one record to the next.


<U><a href="http://www.findthepower.net/CP/CommentaryProject/PostNewABC2_R.php?BRN=ON&SeeAlso=Rev 21:6">Rev 21:6</a></U>




If a hyperlink does not match the above patterns, I would like the macro to insert three [[[ left brackets so I can find the improper hyperlink after the macro is run OR stop the macro at the point in the txt file where the error occurs.

Here is a Word macro that finds some errors in the url syntax but does not find all errors in the url syntax.


Sub FindBadLinks()
  Const Pattern1 = "<a href=""http://www.findthepower.net/CP/CommentaryProject/PostNewABC2_PrevNext.php?BRN=ON&PrevNextNum=*"">PREVIOUS</a>"
  Const Pattern2 = "<a href=""http://www.findthepower.net/CP/CommentaryProject/PostNewABC2_PrevNext.php?BRN=ON&PrevNextNum=*"">NEXT</a>"
  Const Pattern3 = "<U><a href=""http://www.findthepower.net/CP/CommentaryProject/PostNewABC2_R.php?BRN=ON&SeeAlso=*"">*</a></U>"
  Dim lngStart As Long
  Dim lngEnd As Long
  Dim strText As String
  Dim intFlag As Integer
  Application.ScreenUpdating = False
  Selection.HomeKey Unit:=wdStory
  With Selection.Find
	.ClearFormatting
	.MatchCase = False
	.MatchWholeWord = False
	.MatchWildcards = False
	.Forward = True
	.Wrap = wdFindStop
	Do While .Execute(FindText:="<a href=") = True
	  lngStart = Selection.Start
	  lngEnd = Selection.End
	  Selection.Collapse Direction:=wdCollapseEnd
	  If .Execute(FindText:="</a>") = False Then
		' No matching end tag
		Activedocument.Range(Start:=lngStart, End:=lngEnd).Select
		Selection.TypeText Text:="[[["
		Activedocument.Range(Start:=lngEnd + 3, End:=lngEnd + 3).Select
	  Else
		' Found end tag
		intFlag = 0
		lngEnd = Selection.End
		strText = Activedocument.Range(Start:=lngStart, End:=lngEnd).Text
		If strText Like Pattern1 Then
		  intFlag = -1
		ElseIf strText Like Pattern2 Then
		  intFlag = -1
		Else
		  ' Include Underline tags
		  strText = Activedocument.Range(Start:=lngStart - 3, End:=lngEnd + 4).Text
		  If strText Like Pattern3 Then
			intFlag = -1
		  Else
			intFlag = 3
		  End If
		End If
		If intFlag >= 0 Then
		  Activedocument.Range(Start:=lngStart, End:=lngStart).Select
		  Selection.TypeText Text:="[[["
		End If
		Activedocument.Range(Start:=lngEnd, End:=lngEnd).Select
	  End If
	Loop
  End With
  Application.ScreenUpdating = True
End Sub



Is This A Good Question/Topic? 0
  • +

Replies To: Reg Expression To Find Broken Hyperlinks

#2 brds  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 76
  • View blog
  • Posts: 515
  • Joined: 22-October 08

Re: Reg Expression To Find Broken Hyperlinks

Posted 24 October 2008 - 05:52 PM

Check your docs for Like.

Sub check()
	Dim a As String
	a = "this is a test"
	If a Like "*is*" Then
		MsgBox "COrrect", vbOKOnly, "Yeah!"
	End If
End Sub


When it encounters a hyperlink that dosn't match exit sub.

This post has been edited by brds: 24 October 2008 - 05:53 PM

Was This Post Helpful? 0
  • +
  • -

#3 HowdeeDoodee  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 78
  • Joined: 17-June 08

Re: Reg Expression To Find Broken Hyperlinks

Posted 25 October 2008 - 02:52 AM

View Postbrds, on 24 Oct, 2008 - 05:52 PM, said:

Check your docs for Like.

Sub check()
	Dim a As String
	a = "this is a test"
	If a Like "*is*" Then
		MsgBox "COrrect", vbOKOnly, "Yeah!"
	End If
End Sub


When it encounters a hyperlink that dosn't match exit sub.


Thank you for the reply. Can someone provide me with a regular expression to find a hyperlink? If I could get a regular expression going, I think I could adapt that expression to do what I need done. Thank you again.
Was This Post Helpful? 0
  • +
  • -

#4 AdamSpeight2008  Icon User is offline

  • MrCupOfT
  • member icon


Reputation: 2238
  • View blog
  • Posts: 9,409
  • Joined: 29-May 08

Re: Reg Expression To Find Broken Hyperlinks

Posted 25 October 2008 - 03:20 AM

View PostHowdeeDoodee, on 24 Oct, 2008 - 06:36 PM, said:

I need a macro, perhaps using a regular expression, to find hyperlinks that are broken or with improper syntax. Expressed another way, I want to find hyprlinks that do not match the following patterns. In the first hyperlink, the only part of the hyperlink that changes from one record to the next is the number 39762. Likewise for the next hyperlink with the number 39764. In these two hyperlinks below, only the number changes from record to record. Everything else in the hyperlink is a constant from one record to the next.
<a href="http://www.findthepower.net/CP/CommentaryProject/PostNewABC2_PrevNext.php?BRN=ON&PrevNextNum=39762">PREVIOUS</a>

<a href="http://www.findthepower.net/CP/CommentaryProject/PostNewABC2_PrevNext.php?BRN=ON&PrevNextNum=39764">NEXT</a>


In the hyperlink below, the only part of the hyperlink to change from record to record is Rev 21:6

Everything else in the following hyprlink is a constant from one record to the next.


[code]<U><a href="http://www.findthepower.net/CP/CommentaryProject/PostNewABC2_R.php?BRN=ON&SeeAlso=Rev 21:6">Rev 21:6</a></U>


The regular expression would be
<a href="http://www\.findthepower\.net/CP/CommentaryProject/PostNewABC2_PrevNext\.php\?BRN=ON&PrevNextNum=\d{5,5}\">
for the first and think you can figure out the second.
Was This Post Helpful? 0
  • +
  • -

#5 HowdeeDoodee  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 78
  • Joined: 17-June 08

Re: Reg Expression To Find Broken Hyperlinks

Posted 25 October 2008 - 05:40 AM

Quote

The regular expression would be
<a href="http://www\.findthepower\.net/CP/CommentaryProject/PostNewABC2_PrevNext\.php\?BRN=ON&PrevNextNum=\d{5,5}\">
for the first and think you can figure out the second.


Thank you very much. I am using MS Word to execute the regex.

I used your suggestion with a slight alteration as shown below. I could not get the \d{5,5}\ to work.

Quote

<a href="http://www\.findthepower\.net/CP/CommentaryProject/PostNewABC2_PrevNext\.php\?BRN=ON&PrevNextNum=?????\"\>PREVIOUS\<\/a\>


Thank you again.
Was This Post Helpful? 0
  • +
  • -

#6 thava  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 180
  • View blog
  • Posts: 1,606
  • Joined: 17-April 07

Re: Reg Expression To Find Broken Hyperlinks

Posted 25 October 2008 - 03:26 PM

one thing i found from your code

when <a href found it check for the next </a> in the document
if an another hyperlink is there then automatically move to the next hyperlink's end position that's why you did not got any text changed this is the logic error in your code try to avoid this thing then it will correct

best of luck :)
Was This Post Helpful? 0
  • +
  • -

#7 AdamSpeight2008  Icon User is offline

  • MrCupOfT
  • member icon


Reputation: 2238
  • View blog
  • Posts: 9,409
  • Joined: 29-May 08

Re: Reg Expression To Find Broken Hyperlinks

Posted 25 October 2008 - 04:32 PM

View PostHowdeeDoodee, on 25 Oct, 2008 - 01:40 PM, said:

Quote

The regular expression would be
<a href="http://www\.findthepower\.net/CP/CommentaryProject/PostNewABC2_PrevNext\.php\?BRN=ON&PrevNextNum=\d{5,5}\">
for the first and think you can figure out the second.


Thank you very much. I am using MS Word to execute the regex.

I used your suggestion with a slight alteration as shown below. I could not get the \d{5,5}\ to work.

Quote

<a href="http://www\.findthepower\.net/CP/CommentaryProject/PostNewABC2_PrevNext\.php\?BRN=ON&PrevNextNum=?????\"\>PREVIOUS\<\/a\>


Thank you again.


\d{5,5}
\d is a digit
The red 5 is the minimum number of digits
The blue 5 is the maximum number of digits
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1