I'm not sure what forum this belongs in so I just posted it to the software development. I'm trying to write an automated program that will say what the organization that runs the webpage is named. I've had a few ideas all of which possess severe flaws. I've considered looking them up in a WHOIS database, but many websites are privately registered, I've considered looking up their website on Alexa.com, but that only works if the owner has edited their site listing, and I've thought about just truncating the URL, but that's not very accurate. What other ideas are there and what would work?
3 Replies - 1285 Views - Last Post: 20 December 2012 - 09:36 PM
#1
Finding an Organization's Name Given their Homepage's URL
Posted 20 December 2012 - 08:08 AM
Replies To: Finding an Organization's Name Given their Homepage's URL
#2
Re: Finding an Organization's Name Given their Homepage's URL
Posted 20 December 2012 - 11:23 AM
You could consider combining a number of approaches. As well as the ones you have mentioned, you could look for copyright info in the page and in meta data and source comments. OCR of images, especially the ones near the top of the page. You could apply standard data mining techniques to the page text, follow links to other pages on the site and apply the techniques to them too.
As a crude check, you could Google for the company your algorithm chooses and see if the page is near the top of the results.
This sounds like a very interesting project. Let us know what you decide on and how well it works for you!
As a crude check, you could Google for the company your algorithm chooses and see if the page is near the top of the results.
This sounds like a very interesting project. Let us know what you decide on and how well it works for you!
#3
Re: Finding an Organization's Name Given their Homepage's URL
Posted 20 December 2012 - 11:27 AM
Thanks, OCR is definitely not an option both since I know nothing about it and it's more processor intensive than I'm going for, but looking for copyright markers is definitely a good idea.
#4
Re: Finding an Organization's Name Given their Homepage's URL
Posted 20 December 2012 - 09:36 PM
I would recommend doing a whois query.
Page 1 of 1
|
|

New Topic/Question
Reply



MultiQuote




|