All pages PDF stripper

  • (2 Pages)
  • +
  • 1
  • 2

19 Replies - 729 Views - Last Post: 30 May 2020 - 11:47 AM Rate Topic: -----

#16 leace   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 18
  • Joined: 26-May 20

Re: All pages PDF stripper

Posted 27 May 2020 - 10:03 AM

View PostNormR, on 27 May 2020 - 10:01 AM, said:

What are lines 74 and 75 supposed to do?

Having the PrintWriter creation and close inside of the loop means each time through the loop a new file will be created and written to, overwriting what was written to the last version of the file.
Those statements should be outside of the loop.
-Create file
- loop to write data to file
-close the file


74,75 is not active

// PDPage firstPage = document.getPages().get(0);
//stripper.extractRegions(firstPage);
Was This Post Helpful? 0
  • +
  • -

#17 NormR   User is online

  • D.I.C Lover
  • member icon

Reputation: 832
  • View blog
  • Posts: 6,392
  • Joined: 25-December 13

Re: All pages PDF stripper

Posted 27 May 2020 - 10:06 AM

Ok, what happens when the new version is executed?
Was This Post Helpful? 0
  • +
  • -

#18 leace   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 18
  • Joined: 26-May 20

Re: All pages PDF stripper

Posted 27 May 2020 - 10:15 AM

View PostNormR, on 27 May 2020 - 10:06 AM, said:

Ok, what happens when the new version is executed?


Its picking mixed contents from second page.
Was This Post Helpful? 0
  • +
  • -

#19 NormR   User is online

  • D.I.C Lover
  • member icon

Reputation: 832
  • View blog
  • Posts: 6,392
  • Joined: 25-December 13

Re: All pages PDF stripper

Posted 27 May 2020 - 10:18 AM

Please post the output that you are talking about.
add some comments that point out where the problem is
show what the desired output is.

Problem continued here: https://stackoverflo...741715_62047716

This post has been edited by NormR: 27 May 2020 - 02:02 PM

Was This Post Helpful? 0
  • +
  • -

#20 g00se   User is offline

  • D.I.C Lover
  • member icon

Reputation: 3698
  • View blog
  • Posts: 16,952
  • Joined: 20-September 08

Re: All pages PDF stripper

Posted 30 May 2020 - 11:47 AM

Not quite sure what the problem is. All you need is a loop. Here is a looped, shortened version of your main method

    public static void main(String[] args) throws Exception {

        try (PDDocument document = PDdocument.load(new File("test2.pdf"))) {

            if (!document.isEncrypted()) {
                PDFTextStripperByArea stripper = new PDFTextStripperByArea();
                stripper.setSortByPosition(true);
                Rectangle2D rect4 = new Rectangle2D.Double(210, 160, 230, 25);
                stripper.addRegion("class5", rect4);
                int numPages = document.getNumberOfPages();
                for (int i = 0; i < numPages; i++) {
                    PDPage page = document.getPages().get(i);
                    stripper.extractRegions(page);
                    System.out.println(stripper.getTextForRegion("class5"));
                }// end for
            }
        } catch (IOException e) {
            System.err.println("Exception while trying to read pdf document - " + e);
        }
    }

Was This Post Helpful? 0
  • +
  • -

  • (2 Pages)
  • +
  • 1
  • 2