3 Replies - 4862 Views - Last Post: 20 December 2012 - 09:36 PM

#1 sniderj1  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 51
  • Joined: 17-July 10

Finding an Organization's Name Given their Homepage's URL

Posted 20 December 2012 - 08:08 AM

I'm not sure what forum this belongs in so I just posted it to the software development. I'm trying to write an automated program that will say what the organization that runs the webpage is named. I've had a few ideas all of which possess severe flaws. I've considered looking them up in a WHOIS database, but many websites are privately registered, I've considered looking up their website on Alexa.com, but that only works if the owner has edited their site listing, and I've thought about just truncating the URL, but that's not very accurate. What other ideas are there and what would work?

Is This A Good Question/Topic? 0
  • +

Replies To: Finding an Organization's Name Given their Homepage's URL

#2 cfoley  Icon User is offline

  • Cabbage
  • member icon

Reputation: 2069
  • View blog
  • Posts: 4,307
  • Joined: 11-December 07

Re: Finding an Organization's Name Given their Homepage's URL

Posted 20 December 2012 - 11:23 AM

You could consider combining a number of approaches. As well as the ones you have mentioned, you could look for copyright info in the page and in meta data and source comments. OCR of images, especially the ones near the top of the page. You could apply standard data mining techniques to the page text, follow links to other pages on the site and apply the techniques to them too.

As a crude check, you could Google for the company your algorithm chooses and see if the page is near the top of the results.

This sounds like a very interesting project. Let us know what you decide on and how well it works for you!
Was This Post Helpful? 1
  • +
  • -

#3 sniderj1  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 51
  • Joined: 17-July 10

Re: Finding an Organization's Name Given their Homepage's URL

Posted 20 December 2012 - 11:27 AM

Thanks, OCR is definitely not an option both since I know nothing about it and it's more processor intensive than I'm going for, but looking for copyright markers is definitely a good idea.
Was This Post Helpful? 0
  • +
  • -

#4 GWatt  Icon User is offline

  • member icon

Reputation: 278
  • View blog
  • Posts: 3,079
  • Joined: 01-December 05

Re: Finding an Organization's Name Given their Homepage's URL

Posted 20 December 2012 - 09:36 PM

I would recommend doing a whois query.
Was This Post Helpful? 1
  • +
  • -

Page 1 of 1