extract values from string

extracting a string from between two other strings

Page 1 of 1

4 Replies - 3046 Views - Last Post: 12 July 2009 - 10:06 AM Rate Topic: -----

#1 g0ofygoober   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 06-June 09

extract values from string

Post icon  Posted 10 July 2009 - 05:05 PM

Hi,

i am making a youtube downloader script but i am having trouble getting the values from the youtube source code. to create a download url you need the videos id and the videos signature code.

i have writen this code that takes the text beforeand after each code i want to find to get the values but it just isnt working and i cant tell why

<?php
$url = $_GET['url'];
$source = file_get_contents($url);
$source = htmlentities($source);

function extractinfo ($string, $start, $end) {

$startpos = strpos($string, $start);
$startlen = strlen($start);

$endpos = strpos($string, $end);
$endlen = strlen($end);

return substr($string, $startpos+$startlen, ($endpos-1)-($startpos));
}

$video_id = extractinfo($source, "ajax?v=", "&action_get_statistics_and_data=");

$video_sig = extractinfo($source, "&t=", "&keywords=");

echo "id= " . $video_id;

echo "<br />";

echo "sig= " . $video_sig;


?>


any help would be really appreciated these strings are really confusing me

Is This A Good Question/Topic? 0
  • +

Replies To: extract values from string

#2 KuroTsuto   User is offline

  • D.I.C Head
  • member icon

Reputation: 42
  • View blog
  • Posts: 182
  • Joined: 13-February 09

Re: extract values from string

Posted 10 July 2009 - 10:32 PM

It seems to me that your $video_id should be able to be snagged from the URL alone. For instance, in the url 'http://www.youtube.com/watch?v=MLUJdpDfXZA', I would think that your id would be MLUJdpDfXZA, as this value matches what shows up between "ajax?v=" and "&action_get_statistics_and_data=". Though this is a URL, you tragically cannot simply snag it from the $_GET array via $_GET['v'], as the entire youtube URL is contained within a $_GET variable. Not knowing just how long the id will be nor where it will start and stop in the string provided by the user, I personally would use a regular expression to extract it from the $url variable (though I can't help but feel there's an easier way...).

Additionally your code seems to only account for the data that your extracting being displayed in a static layout, (grabbing everything between "&t=" and "&keywords=") when such is not always the case. For more accurate extraction, I would recommend that you take a look at regular expressions to account for not knowing exactly where the info you're looking for is going to pop up.

If you haven't looked at them before, they can be crazy confusing, but they're totally worth taking the time to learn. In your scenario, I believe you could snag your $video_sig and $video_id with patterns like these:

//video_sig from source
preg_match("/[&|\?]t=([^&|^']*)[&|']/", $source, $match))
$video_sig = $match[1];

//video_id from url
preg_match("/[&|\?]v=([^&|^$]*)[&|$]/", $source, $match))
$video_id = $match[1];



Note that I'm no expert at regular expressions, and what I just threw you is probably in terrible form or some such. The expression "/[&|\?]t=([^&]*)[&|']/" basically translates to this: look for a section of $source where either a '&' or a '?' are followed by 't=', which should be followed by ( any number of characters that are NOT a '&' and are NOT a '''), which should be followed by either a '&' or a '''. Return everything you find that fits in between the 1st set of parenthesis into the key '1' of the array $match.

The expression for $video_id functions in a similar manner, but it works assuming that the youtube url was surrounded with single quotes (') in the address bar (i.e. your address bar would be http://www.mysite.com/index.php?url='h...LUJdpDfXZA', which would avoid conflicts with the other variables being passed as GET variables and would keep the full youtube URL intact.

And voila. Delicious. So your modified code would be something like this, I'd imagine:

<?php
$url = $_GET['url'];
$source = file_get_contents($url);
$source = htmlentities($source);

//video_sig from source
preg_match("/[&|\?]t=([^&|^']*)[&|']/", $source, $match);
$video_sig = $match[1];
	
//video_id from url
preg_match("/[&|\?]v=([^&|^']*)[&|']/", $url, $match);
$video_id = $match[1];

echo "id= " . $video_id;

echo "<br />";

echo "sig= " . $video_sig;
?>


So yeah, regular expressions are great... Learn them well and such. Note that preg_match functions return the number of matches found, so it would be wise to use this to notify the end-user if no video id or sig was located, or some such.

Cheerios,
~KuroTsuto

This post has been edited by KuroTsuto: 10 July 2009 - 10:33 PM

Was This Post Helpful? 0
  • +
  • -

#3 KuroTsuto   User is offline

  • D.I.C Head
  • member icon

Reputation: 42
  • View blog
  • Posts: 182
  • Joined: 13-February 09

Re: extract values from string

Posted 10 July 2009 - 10:59 PM

Just tested my own code, lol. A few modifications:

<?php
$url_string = $_GET['url'];
$url = str_replace('\'','',$url_string);
$source = file_get_contents($url);

//video_sig from source
echo preg_match("/&t=([^&|^']*)[&|']/", $source, $match);
$video_sig = $match[1];
   
//video_id from url
preg_match("/[&|\?]v=([^&|^']*)[&|']/", $url_string, $match);
$video_id = $match[1];

echo "id= " . $video_id;

echo "<br />";

echo "sig= " . $video_sig;
?>


Using the htmlentities() function to encode the source was swapping the '&'s in $source with '&amp's, which, now that I think about it, was probably the real issue with your own code in the first place. In both my modifications to your code as well as your own code, this was prevent the string and preg functions from matching anything in $source as we were both looking for & instead of &amp.

As for the rest of my modifications, I also added the $url_string variable. The only difference between $url and $url_string is that $url_string includes the single quotes around the Youtube URL. Should all work tastily now, if you choose to use these regular expressions.

Goodnight, and all that jazz,
~KuroTsuto
Was This Post Helpful? 1
  • +
  • -

#4 g0ofygoober   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 06-June 09

Re: extract values from string

Posted 11 July 2009 - 03:10 PM

thanks alot for your help my original code for getting the id and sig is working now and i will definitely learn more about regular expressions. unfortunately my code still isn't working i have got the id and sig and use them to make a link that should lead to the video file but when you click the link it just leads to a blank page instead of leading to the video file. i have worked out that the problem is with the video signature because it works when you replace the signature grabbed by the code with a signature taken straight from the page using view source. im not sure if this is caused by using htmlentities or if its somthing to do with the way youtube works or something else completely.

This post has been edited by g0ofygoober: 11 July 2009 - 03:10 PM

Was This Post Helpful? 0
  • +
  • -

#5 KuroTsuto   User is offline

  • D.I.C Head
  • member icon

Reputation: 42
  • View blog
  • Posts: 182
  • Joined: 13-February 09

Re: extract values from string

Posted 12 July 2009 - 10:06 AM

Sure thing, man. If you want any help working out the rest of this thing, I would be happy to help. I also promise I won't continue throwing down completely obscure Regex expressions, too ;)
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1