2 Replies - 2196 Views - Last Post: 28 April 2012 - 11:59 AM Rate Topic: -----

#1 ABOU  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 31
  • Joined: 09-December 08

[Powershell] Merging text files by criteria provided in a CSV

Posted 31 August 2011 - 07:06 AM

I have a folder with a number of files which contain OCR text for documents which have been split in to separate pages for processing purposes.

Now that the processing has been carried out i wish to merge the OCR files so that there are only the number of files which correspond to the original documents.

The text files are stored in the parent folder as follows:

ts001.txt
ts002.txt
ts003.txt
ts004.txt
ts00*.txt
ts050.txt


Also stored within the folder is a csv file which contains markers which indicate the start of a new document.

ts001.txt  Y
ts002.txt
ts003.txt  Y
ts004.txt
ts005.txt
ts006.txt  Y
ts007.txt


Where the finished combined files should be:

ts001.txt = ts001.txt + ts002.txt
ts003.txt = ts003.txt + ts004.txt + ts005.txt
ts006.txt = ts006.txt + ts007.txt

I am aware that a csv file can be read in using powershell allowing its columns to be referenced which would allow me to access the document head colomn with the Ys in it. however as there is no closing value to indicate the end of the document i am unsure how to group the documents which contain the original document information in order to merge them.

I understand the principal of merging text documents through powershell also it is the grouping of documents which is giving me trouble

Any help would be greatly appreciated

Regards, Craig

Is This A Good Question/Topic? 0
  • +

Replies To: [Powershell] Merging text files by criteria provided in a CSV

#2 ABOU  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 31
  • Joined: 09-December 08

Re: [Powershell] Merging text files by criteria provided in a CSV

Posted 31 August 2011 - 10:23 AM

Ok i dont know how to update the original post so im going to post here.

$List = read-host "Please insert path and file: " | import-csv 
#| select File, HeadingPage, @{Name="PageNumber"; Expression = ""}

$HP = $List | select HeadingPage
foreach ($item in $List.HeadingPage){
$List
}


This is loading the csv in. In order to allow a merge I was thinking that making a new column and placing a page number by counting from each Y would make the process of merging easier but i havent figured out a successful way to accomplish this.

I am more lost on the logic of how to get the files in a grouping to merge them.
Was This Post Helpful? 0
  • +
  • -

#3 webmin88  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 8
  • Joined: 22-October 11

Re: [Powershell] Merging text files by criteria provided in a CSV

Posted 28 April 2012 - 11:59 AM

Hello,

So it sounds like your csv file is acting as a database of sorts to control your program on what files you have, and which ones can be merged. My suggestion is to do the following:

foreach($line in (gc file.csv)){
     $pagecount = 0
     if(($line.contains("Y") -and ($line.split(",")[2] -ne "")){
          $filename = $line.split(",")[0] | New-Item FILE_PATH
          gc $filename | Add-Content FILE_PATH\$filename
          do{
          foreach($line in (gc file.csv)){
               if(!($line.contains("Y")){
                    $pagetwo = $line.split(",")[0]
                    gc $pagetwo | Add-Content FILE_PATH\$filename
                    $pagecount++
               }
          }
          }until($pagecount -eq ($line.split(",")[2]))
     }
}



So its not perfect, but this should give you an idea. Add one more column to your CSV file for the amount of pages associated with the first page so it looks like filename, Y, #ofpages and then the first foreach should read each line of the CSV and look for a line that contains the file name, the new file attribute, and the number of pages associated with it. When it finds that line, it will create the new file, add the first file to the new file, the continue looking through the CSV for the other pages with that first file until it reaches the $pagecount.

I hope that makes sense. Lemme know if you have any questions.

You may also wanna add another attribute to your CSV file where you tell it what initial page the additional pages are associated with and tweak that second foreach loop accordingly. Sorry, but I just noticed that.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1