10 Replies - 542 Views - Last Post: 27 January 2014 - 02:18 AM

#1 Lieoften  Icon User is offline

  • D.I.C Regular

Reputation: 17
  • View blog
  • Posts: 260
  • Joined: 06-January 10

Creating a better search engine

Posted 22 January 2014 - 07:06 PM

Alright, so i've recently noticed that every search engine out there (google, bing, ask, yahoo etc etc) has become increasingly more and more annoying when searching for questions. the first page of links will either be Wikipedia or Wiki.Answers or Ask.yahoo... so I'm currently setting out to make my own search engine where you can choose to exclude sites (In your preferences, instead of by adding lines to your search)...

Now, i know how to do just about everything required to make a search engine--save for one crutial thing, how does one make a webcrawler/bot?

Is This A Good Question/Topic? 0
  • +

Replies To: Creating a better search engine

#2 Atli  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 3730
  • View blog
  • Posts: 6,017
  • Joined: 08-June 10

Re: Creating a better search engine

Posted 22 January 2014 - 09:11 PM

Companies like Google and Microsoft employ armies of some of the smartest people in the world, all with the goal of making their search engines smarter and smarter. Unless you have an unprecedented genius level in all sorts of fields relating to mathematics and complex data processing, you don't really stand a chance of coming close to the kind of relevant search results you'll get from them.

Building a web-crawler is the simple part of building a search engine. Being able to sift through all the data and pick out the most appropriate results for any given query is the complicated part. - A crawler is essentially simple: it goes through HTML, indexes the contents, follows the links it finds, and then does it all over again.
Was This Post Helpful? 1
  • +
  • -

#3 AfterBurner66  Icon User is offline

  • D.I.C Head

Reputation: 16
  • View blog
  • Posts: 116
  • Joined: 02-August 08

Re: Creating a better search engine

Posted 23 January 2014 - 07:19 AM

I think that when you start trying implement this, soon enough you'll realize what Atli said. If it was that easy then we would have been flooded by smart search engines
Was This Post Helpful? 0
  • +
  • -

#4 Lieoften  Icon User is offline

  • D.I.C Regular

Reputation: 17
  • View blog
  • Posts: 260
  • Joined: 06-January 10

Re: Creating a better search engine

Posted 23 January 2014 - 01:32 PM

Okay... You still didn't answer my question, though.

How do you make a webcrawler? I've been searching for tutorials for a good time now and still haven't found a single one.
Was This Post Helpful? 0
  • +
  • -

#5 Atli  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 3730
  • View blog
  • Posts: 6,017
  • Joined: 08-June 10

Re: Creating a better search engine

Posted 24 January 2014 - 01:27 AM

I did explain the core concept in my last post:

Atli said:

A crawler is essentially simple: it goes through HTML, indexes the contents, follows the links it finds, and then does it all over again.


Lieoften said:

I've been searching for tutorials for a good time now and still haven't found a single one.

I entered "PHP how to make a web crawler" into Google and found a bunch of examples and few open-source projects, all on the first page of results.
Was This Post Helpful? 0
  • +
  • -

#6 Ryano121  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1363
  • View blog
  • Posts: 3,002
  • Joined: 30-January 11

Re: Creating a better search engine

Posted 24 January 2014 - 05:51 AM

An even better question would be what are you going to run this crawler on? Your own machine? Companies like Google have massive server farms whose sole purpose is to crawl/index - and even then they've only covered the surface of things. How do you think you're going to get hold of all this data?

It's one thing building the software - it's another thing entirely having something to run it on.

Quote

Now, i know how to do just about everything required to make a search engine


Really? What's your method of keyword retrieval? How are you going to deal with synonymy and polysemy? What's your ranking methodology?

This post has been edited by Ryano121: 24 January 2014 - 05:54 AM

Was This Post Helpful? 0
  • +
  • -

#7 blankwavercade  Icon User is offline

  • D.I.C Head

Reputation: 30
  • View blog
  • Posts: 117
  • Joined: 13-December 11

Re: Creating a better search engine

Posted 25 January 2014 - 07:37 PM

There's a lot of things that actually go into search engines. It's not just a web crawler. For instance, where are you going to deploy your web crawler? Just put it out there and hope it starts crawling sites and not have a start or end point? Theres also the issue with your computer/server is going to cripple itself trying to both crawl plus manage the links that it found. So there that leads to "another" server/computer to store the data. So now youre using two servers. Which isnt a big deal, now that you have more links crawled You need "another" server. Ok so now this is getting bigger and more expensive.

So in reality a search engine isnt something you just do. Theres a reason why microsoft launched bing. They wanted to compete with google. They have so much money it's possible for them to throw stupid money at the project. But google on the other hand had investors throwing money, (note not stupid money, $100,000 for their first round) But this also happened in the late 90's not in the 00's.

You could always try to get people to invest in your idea of a search engine that does not include sites like ask.yahoo and wiki.answers.
Was This Post Helpful? 0
  • +
  • -

#8 moonclown  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 19
  • Joined: 23-January 14

Re: Creating a better search engine

Posted 26 January 2014 - 12:44 PM

View PostAtli, on 24 January 2014 - 01:27 AM, said:

I did explain the core concept in my last post:

Atli said:

A crawler is essentially simple: it goes through HTML, indexes the contents, follows the links it finds, and then does it all over again.


Lieoften said:

I've been searching for tutorials for a good time now and still haven't found a single one.

I entered "PHP how to make a web crawler" into Google and found a bunch of examples and few open-source projects, all on the first page of results.



<3 Google!
Was This Post Helpful? 0
  • +
  • -

#9 no2pencil  Icon User is offline

  • Admiral Fancy Pants
  • member icon

Reputation: 5413
  • View blog
  • Posts: 27,430
  • Joined: 10-May 07

Re: Creating a better search engine

Posted 26 January 2014 - 12:47 PM

Using Google to destroy Google! Muhahahahaha.

Wake me when it's over :)
Was This Post Helpful? 0
  • +
  • -

#10 Atli  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 3730
  • View blog
  • Posts: 6,017
  • Joined: 08-June 10

Re: Creating a better search engine

Posted 27 January 2014 - 01:48 AM

That'll be a damn long nap :)
Was This Post Helpful? 0
  • +
  • -

#11 jon.kiparsky  Icon User is online

  • Pancakes!
  • member icon


Reputation: 8037
  • View blog
  • Posts: 13,757
  • Joined: 19-March 11

Re: Creating a better search engine

Posted 27 January 2014 - 02:18 AM

I think it wouldn't be impossible to work out a scheme for anonymous distributed meta-search. You'd have to have a bunch of googlephobes to make it work, but it could be made to work. That would more or less be "using google to annoy google". The cute thing would be that, if it were set up correctly, google would have no way of knowing which searches were coming from the anonymous search scheme and which were ordinary civilians. Very annoying, I'd think.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1