2 Replies - 15891 Views - Last Post: 24 March 2009 - 02:56 PM Rate Topic: -----

#1 jayfella  Icon User is offline

  • New D.I.C Head

Reputation: 4
  • View blog
  • Posts: 24
  • Joined: 07-March 09

Get string between 2 values

Posted 24 March 2009 - 02:07 PM

Hi.

I have a public function that i use quite a lot in the program that im making, that takes a string in-between 2 values:

Public Function GetStringBetween(ByVal InputText As String, _
	ByVal starttext As String, _
	ByVal endtext As String)

		Dim lnTextStart As Long
		Dim lnTextEnd As Long

		lnTextStart = InStr(StartPosition, InputText, starttext, vbTextCompare) + Len(starttext)
		lnTextEnd = InStr(lnTextStart, InputText, endtext, vbTextCompare)
		If lnTextStart >= (StartPosition + Len(starttext)) And lnTextEnd > lnTextStart Then
			GetStringBetween = Mid$(InputText, lnTextStart, lnTextEnd - lnTextStart)
		Else
			GetStringBetween = "ERROR"
		End If
	End Function



So basically if i do this:

Dim Testpage as string = "hello my name is jayfella and i am seeking help"
Dim MyOutput as string

MyOutput = getstringbetween(testpage, "name", "seeking")


MyOutput will return: "is jayfella and i am"

Which is exactly what i want to do, but this is a very lame and slow way to go about it because it uses very old methods. The time it takes using this method is such that it interferes the timing of my program; the GetStringBetween function does not finish fast enough, so i have to insert a thread.sleep (sometimes for over 1500ms(!)) just for the program to wait for the GetStringBetween function! and that is my problem

I understand vb.net has vastly improved its number-crunching and searching abilities, but im struggling to figure out how to do it with as little coding as possible.

I have tried this:
Public Function midReturn(ByVal total As String, ByVal first As String, ByVal last As String)
		If last.Length < 1 Then
			midReturn = total.Substring(total.IndexOf(first))
		End If
		If first.Length < 1 Then
			midReturn = total.Substring(0, (total.IndexOf(last)))
		End If
		Try
			midReturn = ((total.Substring(total.IndexOf(first), (total.IndexOf(last) - total.IndexOf(first)))).Replace(first, "")).Replace(last, "")
		Catch ArgumentOutOfRangeException As Exception
			midReturn = "ERROR"
		End Try



Which pretty much does exactly the same thing, except it doesnt seem to work for me because the text i search through is HTML, so can contain pretty much any character in the universe, and that function doesnt cope very well with that (i think its because of the use of "indexof") :S

Does anybody have a simple and efficient way of doing this, without the "special character" complications?


Many thanks.

Is This A Good Question/Topic? 0
  • +

Replies To: Get string between 2 values

#2 krum110487  Icon User is offline

  • D.I.C Regular

Reputation: 39
  • View blog
  • Posts: 291
  • Joined: 07-February 09

Re: Get string between 2 values

Posted 24 March 2009 - 02:42 PM

well the problem is that either way you do it, you are basically sequentually searching the HTML.

A couple speed ups you can do is, instead of sending ByVal, send ByRef.

this will only send the starting memory cell instead of the entire string.

But beware, if you have never used ByRef you can actually change the text from the ORIGINAL value so, don't set that paramater name equal to anything (unless you want the original to change), you can avoid this with a "const."

this will clear up some overhead, but when it comes down to it, reading HTML has to be sequentually one way or another (indexOf is a sequential search).

if you are trying to make a program that parses/creates a website from the html, then I would make a loop that goes through each word one after another, that would be the fastest way, as far as I know at least!

if you want just PART of a website then the way you are doing it should work fine.

Something like:
Public Function midReturn(ByRef total As String, ByVal first As String, ByVal last As String)
		Dim FirstStart as Short = total.indexOf(first) + first.length
		Return Trim(Mid$(total, FirstStart, total.indexOf(last)) - FirstStart)		



so you take ALL of the text from the END of the first index to the BEGINNING of the last index. Then Trim.

one way or another you will have to search an HTML page sequentually, because it will change. You can solve this problem with a cache and hash values.

so if hash value has not changed since last time, use cache (which could be pre-parsed and ready to go!

EDIT:

Public Function midReturn(ByRef total As String, ByVal first As String, ByVal last As String)
		Dim FirstStart as Short = total.indexOf(first) + first.length
		Return Trim(Mid$(total, FirstStart, total.SubString(FirstStart).indexOf(last)))   



I had an error in my previous code, this should fix it!

so there are only 2 sequential searches within this code, and the second is smaller than the first. for the end value, I use a substring of the total, from the firstvalue to the end of the document.

then I did an indexOf that substring to find the last value, this is the fastes possible way I can think of at the moment.

This post has been edited by krum110487: 24 March 2009 - 02:54 PM

Was This Post Helpful? 1
  • +
  • -

#3 jayfella  Icon User is offline

  • New D.I.C Head

Reputation: 4
  • View blog
  • Posts: 24
  • Joined: 07-March 09

Re: Get string between 2 values

Posted 24 March 2009 - 02:56 PM

well - what i am doing is taking the HTML source of a page - and taking certain parts of it. e.g.

take part of a HTML page between the <textarea> and </textarea> tags.

i understand HTML pages are huge, and so will take a slight performace hit because of this. i'll keep meddling around with it, and let you know how i get on.

Thanks for your advice, much appreciated.

EDIT: the ByRef change managed to speed up my GetStringBetween function - to the point where i didnt need to add add the thread.sleep anymore, so in that respect, its a success :D

thanks man.

This post has been edited by jayfella: 24 March 2009 - 03:11 PM

Was This Post Helpful? 0
  • +
  • -

Page 1 of 1