0 Replies - 2147 Views - Last Post: 22 February 2010 - 11:56 PM

#1 Xioshin   User is offline

  • D.I.C Regular

Reputation: 4
  • View blog
  • Posts: 264
  • Joined: 05-November 08

Web Crawler fundamental question

Posted 22 February 2010 - 11:56 PM

Hey DIC! :) I have a question about Web Crawlers.

Disclaimer - In absolutely no way is the information provided below exactly what I am trying to achieve, but I am going to use it as an example.

If I wanted to create a website or desktop application that would be able to give information about an image, or link to an image based on a particular subject (for example you search "President Obama"), and it will link to all websites that have had an image of President Obama on it, the main goals of the crawler would be to search the web and find the images, save some information about the image, save the absolute URL to the image, save the URL to the website the image was on, and thats pretty much it.

Let's say I wanted to create a website or desktop application that would allow an end-user to type a particular car (make and model), and have results displayed for all websites that mention that text, then the crawler would be fundamentally different from the previous example, correct?

Am I wrong in assuming that any web crawler that is dealing with text-based searches is responsible for saving and indexing every webpage's ENTIRE TEXT (stripped of tags of course), whereas a crawler/data miner that is storing references to images is only responsible for the location of the image, its title, and keywords (and additional info such as dimensions, filesize if desired)?

Any information you can provide, even extending further beyond this initial discussion, is very much appreciated. Thanks! Not sure if this is entirely relevant to SEO, but if Skyhawk123 has any comments on the subject, I'd be more than happy to hear them. (I loved watching his presentation he uploaded on SEO a few months back).

Is This A Good Question/Topic? 0
  • +

Page 1 of 1