1 Replies - 1388 Views - Last Post: 13 March 2013 - 01:23 PM Rate Topic: -----

#1 squibby   User is offline

  • D.I.C Head

Reputation: 5
  • View blog
  • Posts: 91
  • Joined: 21-January 12

Speed up query time in scraper script (php)

Posted 13 March 2013 - 12:56 PM

I have written following code to scrape information from Yahoo Local business listings and return to me in a table and CSV file. As the results are paginated on the site (only return 10 results per page), i have to query the site with my search terms and plug them into the URL along with a variable which sets the start page.

This is ok for result sets of maybe 20 or 30. If i have a results set that is large e.g All Schools in London, then there is so much data the script has to loop through many results and it times out.

Is there a way to send out just one query and bypass all the looping?

For example if a query returns 1960 results my script would need to loop 196 times requesting the data for each page. This is really inefficient.

I would really appreciate any suggestions from any PHP gurus on here. Thanks for reading.



body {font-family: ‘Lucida Sans Unicode’, ‘Lucida Grande’, sans-serif;}

table.archive {width:980px;position:relative;border-width: 0px;border-spacing: 0px;border-style: none;border-color: gray;border-collapse: collapse;background-color: white;border-left:solid 2px #fafafa;border-right:solid 2px #fafafa;border-bottom:solid 2px #fafafa;margin-top:10px;margin-bottom:10px;margin-left:auto;margin-right:auto;}
table.archive th {border-width: 1px;padding: 0px;border-bottom: solid 1 px #fafafa;border-color:  #fafafa;background-color: #fafafa;text-align:left;padding:10px;}

table.archive tr:hover td {background-color: yellow; color: #000;}
table.archive tr {border-bottom:solid 1px  #fafafa;}

table.archive td {padding:10px;font-size:12px;}

.info {width:960px; padding:10px;border:solid 1px silver;margin-bottom:10px;font-size:0.8em;margin-left:auto;margin-right:auto;}

.form {width:960px; padding:10px;border:solid 1px silver;margin-bottom:10px;font-size:0.7em;margin-left:auto;margin-right:auto;}




<div class = "info">
<p>Quick Scraping Tool</p>

<div class = "form">
	<form method = "POST" action = "index.php" >
	<label>Type (e.g electrican, massage, chinese, wine): </label><input type = "text" name = "industry">
	<label>Area: (e.g Clitheroe, leeds, blackburn) </label><input type = "text" name = "area">
	<input type = "submit" value = "get" name = "submit">


if (isset($_POST['industry'])){
	$industry = $_POST['industry'];

if (isset($_POST['area'])){
	$area = $_POST['area'];

$startfrom = 0;


// Create DOM from URL
$html = file_get_html('http://uk.local.yahoo.com/'.$area.'/'.$industry.'/search-16342.html?fr=sfp&cb='.$startfrom.'');

//find number of results
$results =  $html->find('div#top h1',0)->plaintext;
$split_results = explode(' ', $results);
$number_of_results = $split_results[5];
$number_of_results = str_replace(",", "",$number_of_results);

// determine how many results pages there will be.
$pages = ceil($number_of_results/10);
if ($pages == 0){
	echo "<div class ='info'>There were no results found - try different search terms</div>";

//for loop get result from each page and append to array
for ($i=1; $i<=$pages; $i++)
 $html = file_get_html('http://uk.local.yahoo.com/'.$area.'/'.$industry.'/search-16342.html?fr=sfp&cb='.$startfrom.'');
//echo 'http://uk.local.yahoo.com/Lancashire/'.$area.'/'.$industry.'/search-16342.html?fr=sfp&cb='.$startfrom."<br>";

	foreach($html->find('li.vcard') as $article) {
			$item['name']     = $article->find('a.fn', 0)->plaintext;
			$item['number']    = $article->find('h3.tel', 0)->plaintext;
			$item['addr']    = $article->find('p.street-address', 0)->plaintext;
			$item['pcode'] = $article->find('p.postal-code', 0)->plaintext;
			$articles[] = $item;

	 // increment start page for url
		if ($startfrom == 0){
			$startfrom = $startfrom + 11;
		} else {
			$startfrom = $startfrom + 10;


echo "<table class = 'archive'>

foreach($articles as $item){
	echo "<tr>";
		echo "<td>".$item['name']."</td>";
		echo "<td>".$item['addr']."</td>";
		echo "<td>".$item['pcode']."</td>";
		echo "<td>".$item['number']."</td>";
	echo "</tr>";

echo "</table>";

// convert results into a downloadable excel file
$list = $articles;

$fp = fopen('file.csv', 'w');

foreach ($list as $fields) {
    fputcsv($fp, $fields);

echo "<div class = 'info'>Download as excel file <a href = 'file.csv'>here</a></div>";

echo "<div class = 'info'>There were ".$pages." pages scraped </br> There are ".$number_of_results." companies that match your search terms</div>";


Is This A Good Question/Topic? 0
  • +

Replies To: Speed up query time in scraper script (php)

#2 modi123_1   User is online

  • Suitor #2
  • member icon

Reputation: 15894
  • View blog
  • Posts: 63,607
  • Joined: 12-June 08

Re: Speed up query time in scraper script (php)

Posted 13 March 2013 - 01:23 PM

We will not help you violate Yahoo's TOS by scraping content. I am closing the topic. Do not persist in asking for help on illegal activities. If you have further questions on 'why' feel free to shoot me a pm.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1