4 Replies - 6296 Views - Last Post: 12 June 2012 - 11:49 AM Rate Topic: -----

#1 roob  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 7
  • Joined: 25-November 11

web crawler

Posted 16 March 2012 - 09:56 AM

hi im wondering if anyone could help me out on this really basic webcrawler
and point out any errors
so here it is:
<?php
set_time_limit(9999999999999999);

$url = 'http://www.robertflynn.co.uk';
$page = file_get_contents($url);
if (preg_match("www.",$page,$match)) {
    print "$match[1]";
}

?>



thanks :bigsmile:

Is This A Good Question/Topic? 0
  • +

Replies To: web crawler

#2 Atli  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 3718
  • View blog
  • Posts: 5,990
  • Joined: 08-June 10

Re: web crawler

Posted 16 March 2012 - 01:04 PM

Hey.

Well, that regular expression is clearly not going to work. It's missing the end delimiters, and the expression itself doesn't really make any sense. - Do you actually know how to use regular expressions? If not, you should go through a tutorial or two before attempting to use them. They aren't a simple tool.

However, in this case regular expressions aren't the only way, and in fact they may not be the best way. - Some argue, vehemently, that HTML should not be parsed with regular expressions, and I would have to agree with that. Instead, use an actual HTML parser, like the DomDocument class. It allows you to interact with the HTML page much like you'd do with the DOM in Javascript.

You would be especially interested in the DomDocument::loadHTML and DomDocument::getElementsByTagName methods.
Was This Post Helpful? 4
  • +
  • -

#3 creativecoding  Icon User is offline

  • Hash != Encryption
  • member icon


Reputation: 926
  • View blog
  • Posts: 3,205
  • Joined: 19-January 10

Re: web crawler

Posted 16 March 2012 - 04:06 PM

also for the set_time_limit, if you want unlimited you can just set it to 0.
Was This Post Helpful? 1
  • +
  • -

#4 Bituser  Icon User is offline

  • New D.I.C Head

Reputation: 1
  • View blog
  • Posts: 2
  • Joined: 31-March 12

Re: web crawler

Posted 31 March 2012 - 03:24 AM

Hi there,

I didn't have a clue about regular expressions until I read this guide: PHP Freaks. Hopefully you find it as helpful as I did.

This post has been edited by JackOfAllTrades: 31 March 2012 - 03:58 AM
Reason for edit:: Changed link to bit.ly, as the board software screws up the word "expression" in markup.

Was This Post Helpful? 1
  • +
  • -

#5 roob  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 7
  • Joined: 25-November 11

Re: web crawler

Posted 12 June 2012 - 11:49 AM

View PostAtli, on 16 March 2012 - 01:04 PM, said:

Hey.

Well, that regular expression is clearly not going to work. It's missing the end delimiters, and the expression itself doesn't really make any sense. - Do you actually know how to use regular expressions? If not, you should go through a tutorial or two before attempting to use them. They aren't a simple tool.

However, in this case regular expressions aren't the only way, and in fact they may not be the best way. - Some argue, vehemently, that HTML should not be parsed with regular expressions, and I would have to agree with that. Instead, use an actual HTML parser, like the DomDocument class. It allows you to interact with the HTML page much like you'd do with the DOM in Javascript.

You would be especially interested in the DomDocument::loadHTML and DomDocument::getElementsByTagName methods.


Thank you this was very helpful :sorcerer:

thank you people
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1