This is ok for result sets of maybe 20 or 30. If i have a results set that is large e.g All Schools in London, then there is so much data the script has to loop through many results and it times out.
Is there a way to send out just one query and bypass all the looping?
For example if a query returns 1960 results my script would need to loop 196 times requesting the data for each page. This is really inefficient.
I would really appreciate any suggestions from any PHP gurus on here. Thanks for reading.
<!DOCTYPE HTML>
<html>
<head>
<style>
body {font-family: ‘Lucida Sans Unicode’, ‘Lucida Grande’, sans-serif;}
table.archive {width:980px;position:relative;border-width: 0px;border-spacing: 0px;border-style: none;border-color: gray;border-collapse: collapse;background-color: white;border-left:solid 2px #fafafa;border-right:solid 2px #fafafa;border-bottom:solid 2px #fafafa;margin-top:10px;margin-bottom:10px;margin-left:auto;margin-right:auto;}
table.archive th {border-width: 1px;padding: 0px;border-bottom: solid 1 px #fafafa;border-color: #fafafa;background-color: #fafafa;text-align:left;padding:10px;}
table.archive tr:hover td {background-color: yellow; color: #000;}
table.archive tr {border-bottom:solid 1px #fafafa;}
table.archive td {padding:10px;font-size:12px;}
.info {width:960px; padding:10px;border:solid 1px silver;margin-bottom:10px;font-size:0.8em;margin-left:auto;margin-right:auto;}
.form {width:960px; padding:10px;border:solid 1px silver;margin-bottom:10px;font-size:0.7em;margin-left:auto;margin-right:auto;}
</style>
</head>
<body>
<div class = "info">
<p>Quick Scraping Tool</p>
</div>
<div class = "form">
<form method = "POST" action = "index.php" >
<label>Type (e.g electrican, massage, chinese, wine): </label><input type = "text" name = "industry">
<label>Area: (e.g Clitheroe, leeds, blackburn) </label><input type = "text" name = "area">
<input type = "submit" value = "get" name = "submit">
</form>
</div>
<?php
if (isset($_POST['industry'])){
$industry = $_POST['industry'];
}
if (isset($_POST['area'])){
$area = $_POST['area'];
}
$startfrom = 0;
include('simple_html_dom.php');
// Create DOM from URL
$html = file_get_html('http://uk.local.yahoo.com/'.$area.'/'.$industry.'/search-16342.html?fr=sfp&cb='.$startfrom.'');
//find number of results
$results = $html->find('div#top h1',0)->plaintext;
$split_results = explode(' ', $results);
$number_of_results = $split_results[5];
$number_of_results = str_replace(",", "",$number_of_results);
// determine how many results pages there will be.
$pages = ceil($number_of_results/10);
if ($pages == 0){
echo "<div class ='info'>There were no results found - try different search terms</div>";
}
//for loop get result from each page and append to array
for ($i=1; $i<=$pages; $i++)
{
$html = file_get_html('http://uk.local.yahoo.com/'.$area.'/'.$industry.'/search-16342.html?fr=sfp&cb='.$startfrom.'');
//echo 'http://uk.local.yahoo.com/Lancashire/'.$area.'/'.$industry.'/search-16342.html?fr=sfp&cb='.$startfrom."<br>";
foreach($html->find('li.vcard') as $article) {
$item['name'] = $article->find('a.fn', 0)->plaintext;
$item['number'] = $article->find('h3.tel', 0)->plaintext;
$item['addr'] = $article->find('p.street-address', 0)->plaintext;
$item['pcode'] = $article->find('p.postal-code', 0)->plaintext;
$articles[] = $item;
}
// increment start page for url
if ($startfrom == 0){
$startfrom = $startfrom + 11;
} else {
$startfrom = $startfrom + 10;
}
}
echo "<table class = 'archive'>
<th>Company</th>
<th>Address</th>
<th>Postcode</th>
<th>Number</th>";
foreach($articles as $item){
echo "<tr>";
echo "<td>".$item['name']."</td>";
echo "<td>".$item['addr']."</td>";
echo "<td>".$item['pcode']."</td>";
echo "<td>".$item['number']."</td>";
echo "</tr>";
}
echo "</table>";
// convert results into a downloadable excel file
$list = $articles;
$fp = fopen('file.csv', 'w');
foreach ($list as $fields) {
fputcsv($fp, $fields);
}
echo "<div class = 'info'>Download as excel file <a href = 'file.csv'>here</a></div>";
echo "<div class = 'info'>There were ".$pages." pages scraped </br> There are ".$number_of_results." companies that match your search terms</div>";
?>
</body>
</html>

New Topic/Question
This topic is locked



MultiQuote






|