1 Replies - 643 Views - Last Post: 22 November 2012 - 10:44 AM Rate Topic: -----

#1 yashagrawal57  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 1
  • Joined: 22-November 12

Extracting specific data from a website to an Access Database

Posted 22 November 2012 - 06:20 AM

There is a specific website from which I need to extract certain data, which is far too much to do manually. The data is in the form of normal, plain text on the webpage.

But I want to specify certain conditions to extract data. Such as extracting the data under a certain subtitle (Eg. extracting the lines of text under the title "Research" on several thousand webpages).

If possible, I would also like to place the extracted data in a Microsoft Access Database, again with some conditions deciding which column of the table will the data go into (Eg. hyperlinked text from the webpage will go into one column, while the aforementioned text under the title "Research" will go into another column).

I need a lot of this kind of extraction to be done automatically from a few thousand webpages. Time is not really an issue, it can take days or weeks. But I have no idea of any programming language (except for high school Visual Basic) and nor of what kind of program or script or language I would need to accomplish such a task, if it is even possible/practical.

Any help would be greatly appreciated! Thanks.

Yash

P.S. I have absolutely no idea what language or program is needed for this task, which is why I have posted this in the "Other Languages" section. Sorry if it belongs elsewhere!


Is This A Good Question/Topic? 0
  • +

Replies To: Extracting specific data from a website to an Access Database

#2 ishkabible  Icon User is offline

  • spelling expret
  • member icon




Reputation: 1622
  • View blog
  • Posts: 5,709
  • Joined: 03-August 09

Re: Extracting specific data from a website to an Access Database

Posted 22 November 2012 - 10:44 AM

It's highly practical and possible, I do this sort of thing all the time at work. You can't learn to do this in 3 weeks however. this is a non-trivial task that will probably take at least of year of programming experience if not more before you could handle it.

this sounds like it is going to require a number of things:

  • an HTML parser to get the information in a useable format
  • extracting the information form this HTML parser
  • an Access Database library for writing the extracted information to it


non of these tasks are trivial either. luckily first and last are already done in libraries for you. you will need to acquire the knowledge to use the libraries still and there probably isn't going to be a library to do the kind of extraction you want and hence you will need to learn to do that.

the language doesn't really matter. C# already has everything in it's standard library to do this pretty easily so it wouldn't be a bad place to look. This sort of thing is often done by languages like Perl, Python, and Ruby. It could however still be done by langues like C and C++(with extra work). Perl, Python, and Ruby have a reputation for making this sort of thing easy.

This post has been edited by ishkabible: 22 November 2012 - 10:45 AM

Was This Post Helpful? 1
  • +
  • -

Page 1 of 1