2 Replies - 8327 Views - Last Post: 14 August 2009 - 05:10 PM Rate Topic: -----

#1 fsloke  Icon User is offline

  • D.I.C Regular

Reputation: 25
  • View blog
  • Posts: 412
  • Joined: 19-December 07

How to extract data from pdf file?

Posted 14 August 2009 - 06:48 AM

The pdf is not a scan paper , it is a generated by computer pdf.

It is well define table with column and row.

Is there have any external library use to extract data from pdf file?

Thanks
Is This A Good Question/Topic? 0
  • +

Replies To: How to extract data from pdf file?

#2 pbl  Icon User is offline

  • There is nothing you can't do with a JTable
  • member icon

Reputation: 8324
  • View blog
  • Posts: 31,857
  • Joined: 06-March 08

Re: How to extract data from pdf file?

Posted 14 August 2009 - 04:09 PM

FSloke don't tell me we have to say that to an old DIC like you

Google java pdf library

This post has been edited by pbl: 14 August 2009 - 04:10 PM

Was This Post Helpful? 1
  • +
  • -

#3 fsloke  Icon User is offline

  • D.I.C Regular

Reputation: 25
  • View blog
  • Posts: 412
  • Joined: 19-December 07

Re: How to extract data from pdf file?

Posted 14 August 2009 - 05:10 PM

Yes sir pbl

Yesterday I searching around and accidentally read some forum which linked to this url

http://www.jguru.com...jsp?EID=1074237

They fellow said:

Quote

How can I index PDF documents?

In order to index PDF documents you need to first parse them to extract text that you want to index from them. Here are some PDF parsers that can help you with that:
PDFBox is a Java API from Ben Litchfield that will let you access the contents of a PDF document. It comes with integration classes for Lucene to translate a PDF into a Lucene document.

XPDF is an open source tool that is licensed under the GPL. It's not a Java tool, but there is a utility called pdftotext that can translate PDF files into text files on most platforms from the command line.

Based on xpdf, there is a utility called pdftohtml that can translate PDF files into HTML files. This is also not a Java application.

JPedal is a Java API for extracting text and images from PDF documents.

Simple Text Extractor Library for use with PDF documents. Relies on PDFBox.


Now I still searching and playing around the library....

Do anyone got play around with the library stated?

:)
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1