I am looking for advice on which language / environment / technology will best fit my objective. The project is straightforward. A user enters specific criteria in a web page, and on the server side, the server goes to work for him, initiating multiple web searches and scrapes, and then delivers the results to the user. The web sites scraped do not offer any specific API’s or web services.
In a desktop environment there are oodles of front end applications like IMacros or the like, however server side – I am clueless. There may be 1000 or more simultaneous searches, and results must all be delivered instantly. I have to assume that each session will trigger its own crawler / scratcher. Which technologies should be used server side? What are the options? Future scalability obviously a factor.
web data scratching - server side
Page 1 of 13 Replies - 816 Views - Last Post: 18 November 2012 - 10:19 AM
Replies To: web data scratching - server side
#2
Re: web data scratching - server side
Posted 16 November 2012 - 09:22 AM
The sites you are scraping... do they allow this sort of behavior?
#3
Re: web data scratching - server side
Posted 16 November 2012 - 10:00 AM
Quote
A user enters specific criteria in a web page, and on the server side, the server goes to work for him, initiating multiple web searches and scrapes, and then delivers the results to the user. The web sites scraped do not offer any specific API’s or web services.
A starting point is to read their TOS and robots.txt to see if they allow crawling.
#4
Re: web data scratching - server side
Posted 18 November 2012 - 10:19 AM
yes. they do allow crawling
Page 1 of 1
|
|

New Topic/Question
Reply



MultiQuote







|