10 Replies - 749 Views - Last Post: 01 June 2012 - 10:05 AM Rate Topic: -----

#1 CapC  Icon User is offline

  • New D.I.C Head

Reputation: 3
  • View blog
  • Posts: 30
  • Joined: 27-September 11

Assistance with ignoring case script

Posted 31 May 2012 - 01:21 PM

Gentlemen,

I am faced with an issue that I have been attempting to remedy for some time and just can't find the best way to handle it.

I am not very well versed in Python, though I figured this may be a good project for me to start with (assuming it can't be done with basic UNIX/BASH)

Essentially I need to process received files through some scripting I have written. The issue lies in that I have several clients I run processing for that never send me the same punctuation in their files, if I receive 500 files per client and have to manually lower all file names before processing it can become very "data entry-ish" and consumes FAR to much of my time with this menial task.


So essentially my scripting points to a certain directory... for example:

dos2unix -n ~/untz/update/current/abc.prn ~/untz/update/current/abc.txt

The current directory is a symbolic link, pointing to the correct directory I am needing to work with.

The problem I am running into lies in that abc.prn is not always abc.prn, it could be Abc.prn one week, aBc.prn the next, abC.prn etc etc.

So all I really am trying to achieve is for the case of the .prn files to be ignored, and I have absolutely no idea how to go about this.

I am hoping someone more versed in this could lend a hand and point me in the right direction, as I said I know only basic Python/Perl.

So I guess to summarize:
1) Can this be done with just a unix/bash command?

If not..

2) Can someone point me at how I would go about incorporating this necessity into a python script? I know it can be done with Python or Perl, I am just not sure how.

This post has been edited by CapC: 31 May 2012 - 01:24 PM


Is This A Good Question/Topic? 0
  • +

Replies To: Assistance with ignoring case script

#2 jon.kiparsky  Icon User is offline

  • Pancakes!
  • member icon


Reputation: 7996
  • View blog
  • Posts: 13,694
  • Joined: 19-March 11

Re: Assistance with ignoring case script

Posted 31 May 2012 - 01:27 PM

could you just process all the .prn files in a given directory?

Globbing them as *.prn would probably be the easiest thing. You can then save the output with the .txt extension
Was This Post Helpful? 0
  • +
  • -

#3 CapC  Icon User is offline

  • New D.I.C Head

Reputation: 3
  • View blog
  • Posts: 30
  • Joined: 27-September 11

Re: Assistance with ignoring case script

Posted 31 May 2012 - 01:34 PM

The issue with that would lie in that each file has a different data layout - different columns, different fixed lengths, different carriage returns. If I globbed them all together as *.prn and output them as a single .txt I would have to parse them back out again.

Unless I'm not following your intent correctly?

This post has been edited by CapC: 31 May 2012 - 01:35 PM

Was This Post Helpful? 0
  • +
  • -

#4 jon.kiparsky  Icon User is offline

  • Pancakes!
  • member icon


Reputation: 7996
  • View blog
  • Posts: 13,694
  • Joined: 19-March 11

Re: Assistance with ignoring case script

Posted 31 May 2012 - 01:41 PM

No - do you care about the name of the file? Do you run different scripts depending on the name of the file, or are they all dos2unix?

If they're all getting the same treatment, then you just need a script that will take a list of files and run your script, sending the output to filename.txt instead of filename.prn

This post has been edited by jon.kiparsky: 31 May 2012 - 01:41 PM

Was This Post Helpful? 1
  • +
  • -

#5 CapC  Icon User is offline

  • New D.I.C Head

Reputation: 3
  • View blog
  • Posts: 30
  • Joined: 27-September 11

Re: Assistance with ignoring case script

Posted 31 May 2012 - 01:48 PM

View Postjon.kiparsky, on 31 May 2012 - 01:41 PM, said:

No - do you care about the name of the file? Do you run different scripts depending on the name of the file, or are they all dos2unix?

If they're all getting the same treatment, then you just need a script that will take a list of files and run your script, sending the output to filename.txt instead of filename.prn


I follow, unfortunately I do care about file name, while at this step I am only doing dos2unix processing however I have scripting in place to convert column names, and split up some fixed length files by their carriage returns and load into my t_tables (for example file abc.txt would be processed further beyond this step and loaded into t_abc - every file is loaded into it's own t_table or s_table for later processing.

Beyond that point processing pulls directly from the t_tables and is mostly done using SQL.

But at this point, the dos2unix is only the first step, there is a significant amount of processing beyond this point that requires the file names to remain the same.

This post has been edited by CapC: 31 May 2012 - 01:54 PM

Was This Post Helpful? 0
  • +
  • -

#6 jon.kiparsky  Icon User is offline

  • Pancakes!
  • member icon


Reputation: 7996
  • View blog
  • Posts: 13,694
  • Joined: 19-March 11

Re: Assistance with ignoring case script

Posted 31 May 2012 - 02:07 PM

Okay, to process a file in perl you might do something like


#!/usr/bin/perl

foreach $file (@ARGV){            #  for each file (you can glob these: *.prn will work fine)
$new_filename = $file;     # 
$new_filename =~ s/\.prn/.txt/;    # change the extension
`dos2unix -n $file $new_filename`;  # or whatever command you want to do
}


so if you saved this as foo.pl, for your example above you could do

foo.pl ~/untz/update/current/abc.prn

or

foo.pl ~/untz/update/current/*.prn

to do all of the files in that directory at one go.
Was This Post Helpful? 1
  • +
  • -

#7 CapC  Icon User is offline

  • New D.I.C Head

Reputation: 3
  • View blog
  • Posts: 30
  • Joined: 27-September 11

Re: Assistance with ignoring case script

Posted 31 May 2012 - 02:28 PM

Thank you very much for your help with this.

I renamed the file ABC.prn to test this.

I created the perl file on a unix server, made it executable and ran it exactly as follows:

perl foo.pl ~/untz/update/current/abc.prn (now at this point the file is sitting in the current directory as ABC.prn)

dos2unix: problems converting file /untz/update/current/abc.prn to file /untz/update/current/abc.txt.

I then typed the command as:

perl foo.pl ~/untz/update/current/ABC.prn and it worked as intended.



So it seems this is still not account for the different cases of the file name for some reason
Was This Post Helpful? 0
  • +
  • -

#8 atraub  Icon User is offline

  • Pythoneer
  • member icon

Reputation: 759
  • View blog
  • Posts: 2,010
  • Joined: 23-December 08

Re: Assistance with ignoring case script

Posted 31 May 2012 - 02:44 PM

Here's a short and sweet python version I whipped up (based on stackoverflow).

import glob
import os

def bulkLowerCase(directory=None,pattern=None):
    if directory == None:
        directory = os.getcwd()
        
    if pattern == None:
        pattern = "*.*"
        
    for pathAndFilename in glob.iglob(os.path.join(directory, pattern)):
        title, ext = os.path.splitext(os.path.basename(pathAndFilename))
        os.rename(pathAndFilename, pathAndFilename.lower())




you'll need to make a couple minor alterations to use it from command line, but this should give you the basic idea.

This post has been edited by atraub: 31 May 2012 - 02:46 PM

Was This Post Helpful? 1
  • +
  • -

#9 jon.kiparsky  Icon User is offline

  • Pancakes!
  • member icon


Reputation: 7996
  • View blog
  • Posts: 13,694
  • Joined: 19-March 11

Re: Assistance with ignoring case script

Posted 31 May 2012 - 03:49 PM

View PostCapC, on 31 May 2012 - 04:28 PM, said:

Thank you very much for your help with this.

I renamed the file ABC.prn to test this.

I created the perl file on a unix server, made it executable and ran it exactly as follows:

perl foo.pl ~/untz/update/current/abc.prn (now at this point the file is sitting in the current directory as ABC.prn)

dos2unix: problems converting file /untz/update/current/abc.prn to file /untz/update/current/abc.txt.

I then typed the command as:

perl foo.pl ~/untz/update/current/ABC.prn and it worked as intended.



So it seems this is still not account for the different cases of the file name for some reason



Slowly but surely, I come to understand your requirements. You want to transform just files with the pattern [aA][bB][cC].prn, but not any other file?

#!/usr/bin/perl

@files = `ls |grep -i $ARGV[0]`;

foreach $file (@files){            
chomp($file);
$new_filename = $file;     # 
$new_filename =~ s/\.prn/.txt/;    # change the extension

`cp  $file $new_filename`;  # or whatever command you want to do
}



This will take a filename to use as a pattern, ie "abc.prn", and use ls and grep to get any file matching that pattern, in a case-insensitive manner. That list of files is then treated much as before.

As always, the hard part is understanding the requirements correctly!

This post has been edited by jon.kiparsky: 31 May 2012 - 03:50 PM

Was This Post Helpful? 1
  • +
  • -

#10 CapC  Icon User is offline

  • New D.I.C Head

Reputation: 3
  • View blog
  • Posts: 30
  • Joined: 27-September 11

Re: Assistance with ignoring case script

Posted 01 June 2012 - 06:25 AM

This doesn't seem to be working for me either unfortunately.

The prn to txt conversion part is good, I think the logic may still be slightly confused as to what I am needing.

So I get a bucket of the same 500 files, they are always .prn extension and they each file is named essentially the same each time I receive it with the only variance being punctuation.

The file names xxx.prn could range from a.prn to sdfj_aASD_LfjSDFa.prn

What I need the perl script to do is essentially interpret all capital letters as lower case.

So when I receive the file "sdfj_aASD_LfjSDFa.prn"

The hard coded script I have to call:

perl foo.pl ~/untz/update/current/sdfj_aasd_lfjsdfa.prn

it would find and process the file punctuated as sdfj_aASD_LfjSDFa.prn in the current directory.

This way I can hard code everything in lower case, and no matter how the client varies punctuating the file each time they send it, the punctuation will always be ignored so the lower case hardscript will find it.
Was This Post Helpful? 0
  • +
  • -

#11 CapC  Icon User is offline

  • New D.I.C Head

Reputation: 3
  • View blog
  • Posts: 30
  • Joined: 27-September 11

Re: Assistance with ignoring case script

Posted 01 June 2012 - 10:05 AM

Hey gents,

I came up with something that acheives the first step I am attempting being to set everything to lower case.


#!/usr/bin/perl

use warnings;
use strict;
use File::Copy;

chdir '~/untz/update/current';

move $_, lc ($_) foreach glob '*';




This should work for what I'm intending to do

Then I should be able to just call a BASH script with a series of hard-coded commands like:

This will work, I will just have to set it up once for each file, as opposed to having to manually change 500 files weekly to match whatever case they sent the first time.

dos2unix -n ~/untz/update/current/abc.prn ~/untz/update/current/abc.txt
dos2unix -n ~/untz/update/current/abcdefghij.prn ~/untz/update/current/abcdefghij.txt
dos2unix -n ~/untz/update/current/eee11ghif.prn ~/untz/update/current/eee11ghif.txt

etc


This post has been edited by CapC: 01 June 2012 - 10:09 AM

Was This Post Helpful? 0
  • +
  • -

Page 1 of 1