9 Replies - 279 Views - Last Post: 26 May 2020 - 02:35 PM Rate Topic: -----

#1 leace   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 18
  • Joined: 26-May 20

Transposing a matrix from a PDF stripper

Posted 26 May 2020 - 01:59 PM

I am reading the text from PDF file using PDFBOX. I am able to read properly using "Rectangle2D". But issue where results shows one by one. Instead i want show in Transpose view.

Current output

PO-003334823
Vendor : WL-00051

---------------------------------------------------------------------------------------------
Expected Output

PO-003334823 Vendor : WL-00051
--------------------------------------------------------------------------------------------------

public class PDFBoxReadFromFile {


  public static void main(String[] args) throws Exception {

    try (PDDocument document = PDdocument.load(new File("C:\\Users\\ed\\Documents\\test2.pdf"))) {

      if (!document.isEncrypted()) {
        PDFTextStripperByArea stripper = new PDFTextStripperByArea();
        stripper.setSortByPosition(true);
        Rectangle2D rect4 = new Rectangle2D.Double(210, 160, 230, 25);
        Rectangle rect1 = new Rectangle(55, 290, 225, 17);

        Rectangle2D rect2 = new Rectangle2D.Double(281, 255, 255, 20);
        Rectangle2D rect3 = new Rectangle2D.Double(2, 365, 660, 1900);
        stripper.addRegion("class2", rect1);
        stripper.addRegion("class5", rect4);
        PDPage firstPage = document.getPages().get(0);
        stripper.extractRegions(firstPage);
        System.out.print(stripper.getTextForRegion("class5"));
        System.out.print(stripper.getTextForRegion("class2"));


        File file = new File("C:/Users/ed/eclipse-workspace/pdfboxreadfromfile/file.txt");
        FileWriter fw = new FileWriter(file);
        PrintWriter pw = new PrintWriter(fw);
        pw.println(stripper.getTextForRegion("class5"));
        pw.println(stripper.getTextForRegion("class2"));

        pw.close();

      }
    } catch (IOException e) {
      System.err.println("Exception while trying to read pdf document - " + e);
    }
  }



Is This A Good Question/Topic? 0
  • +

Replies To: Transposing a matrix from a PDF stripper

#2 NormR   User is online

  • D.I.C Lover
  • member icon

Reputation: 836
  • View blog
  • Posts: 6,441
  • Joined: 25-December 13

Re: Transposing a matrix from a PDF stripper

Posted 26 May 2020 - 02:04 PM

See the discussion here: https://www.javaprog...html#post168761

The difference appears to be a line-end character.
What was the output when the print method was used?
Whoops! I see this code is using the print method.

This post has been edited by NormR: 26 May 2020 - 02:11 PM

Was This Post Helpful? 0
  • +
  • -

#3 leace   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 18
  • Joined: 26-May 20

Re: Transposing a matrix from a PDF stripper

Posted 26 May 2020 - 02:05 PM

View PostNormR, on 26 May 2020 - 02:04 PM, said:

See the discussion here: https://www.javaprog...html#post168761

What was the output when the print method was used?



Its available in my topic with current output
Was This Post Helpful? 0
  • +
  • -

#4 NormR   User is online

  • D.I.C Lover
  • member icon

Reputation: 836
  • View blog
  • Posts: 6,441
  • Joined: 25-December 13

Re: Transposing a matrix from a PDF stripper

Posted 26 May 2020 - 02:10 PM

If the Strings returned by the getTextForRegion method end with line-end characters, the String class's trim method will remove them.
Was This Post Helpful? 0
  • +
  • -

#5 leace   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 18
  • Joined: 26-May 20

Re: Transposing a matrix from a PDF stripper

Posted 26 May 2020 - 02:17 PM

View PostNormR, on 26 May 2020 - 02:10 PM, said:

If the Strings returned by the getTextForRegion method end with line-end characters, the String class's trim method will remove them.



Tried follows and unsure if i am align with your suggestion . But no luck .
System.out.print(stripper.getTextForRegion ("class5" ) + ("class2" ));

Below is the output

PO-003334823
class2
Was This Post Helpful? 0
  • +
  • -

#6 NormR   User is online

  • D.I.C Lover
  • member icon

Reputation: 836
  • View blog
  • Posts: 6,441
  • Joined: 25-December 13

Re: Transposing a matrix from a PDF stripper

Posted 26 May 2020 - 02:20 PM

System.out.print(stripper.getTextForRegion ("class5" ).trim() + ("class2" ));

I did not see the use of trim in the posted code. I have added it so that it will remove the line-end character from the String returned by getTextForRegion
Was This Post Helpful? 1
  • +
  • -

#7 leace   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 18
  • Joined: 26-May 20

Re: Transposing a matrix from a PDF stripper

Posted 26 May 2020 - 02:22 PM

View PostNormR, on 26 May 2020 - 02:20 PM, said:

System.out.print(stripper.getTextForRegion ("class5" ).trim() + ("class2" ));

I did not see the use of trim in the posted code. I have added it so that it will remove the line-end character from the String returned by getTextForRegion


Now alignment is coming as per our requirement but no data on the second region.only class is coming
PO-003847823class2
Was This Post Helpful? 0
  • +
  • -

#8 NormR   User is online

  • D.I.C Lover
  • member icon

Reputation: 836
  • View blog
  • Posts: 6,441
  • Joined: 25-December 13

Re: Transposing a matrix from a PDF stripper

Posted 26 May 2020 - 02:27 PM

Quote

System.out.print(stripper.getTextForRegion ("class5" ).trim() + ("class2" ));

That statement shows that the String "class2" should be printed at the end of the String returned by the getTextForRegion method.

If you do not want to see "class2" at the end of the String, remove it:
System.out.print(stripper.getTextForRegion ("class5" ).trim());

This post has been edited by NormR: 26 May 2020 - 02:27 PM

Was This Post Helpful? 0
  • +
  • -

#9 leace   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 18
  • Joined: 26-May 20

Re: Transposing a matrix from a PDF stripper

Posted 26 May 2020 - 02:30 PM

View PostNormR, on 26 May 2020 - 02:27 PM, said:

Quote

System.out.print(stripper.getTextForRegion ("class5" ).trim() + ("class2" ));

That statement shows that the String "class2" should be printed at the end of the String returned by the getTextForRegion method.

If you do not want to see "class2" at the end of the String, remove it:
System.out.print(stripper.getTextForRegion ("class5" ).trim());



Hello Norm . Some how we reached our desired output. by adding follows.
System.out.print(stripper.getTextForRegion ("class5" ).trim() + (stripper.getTextForRegion ("class2" ).trim() +(stripper.getTextForRegion ("class3" ).trim())));

And trim is the main catch.

But how to make pipe in between the regions

Current
PO-001841823Vendor : eL-00014Date...................................: 5/10/2021
expected
PO-001841823|Vendor : eL-00014|Date...................................: 5/10/2021
Was This Post Helpful? 0
  • +
  • -

#10 NormR   User is online

  • D.I.C Lover
  • member icon

Reputation: 836
  • View blog
  • Posts: 6,441
  • Joined: 25-December 13

Re: Transposing a matrix from a PDF stripper

Posted 26 May 2020 - 02:35 PM

Quote

how to make pipe in between the regions

Use the String concatenation operator: + to build a String with all the desired bits:
"one" + "!" + "two"

You are currently using it to build the String given to the print method.

This post has been edited by NormR: 26 May 2020 - 02:36 PM

Was This Post Helpful? 1
  • +
  • -

Page 1 of 1