1 Replies - 6929 Views - Last Post: 14 September 2010 - 07:12 AM

#1 Zel2008   User is offline

  • D.I.C Addict

Reputation: 17
  • View blog
  • Posts: 893
  • Joined: 06-January 09

Extracting data from text with a Perl one liner

Posted 13 September 2010 - 08:37 AM

Hi everybody,
This is my first foray into Perl in quite some time, so I hope this isn't too simple of a question. :)

I have a text file that's set up in blocks like this:
Line 1 with spaces
Line 2 with spaces
Number I need: 1
Percentage1: 1/3 = 33.33%
Percentage2: 3/4 = 75.00%
<blank line>



And the blocks in the text file repeat down like this in the same format. I'm looking to create 3 Perl one-liners that I can use in a bash script, to get out each of the three pieces of information I'd like. In the above example, the data I want is:

1
1/3 = 33.33%
3/4 = 75.00%

So, basically, the end of the last three lines that aren't blank.

I've tried this so far:
while read line
do
DATA1=`echo $line | perl -e '"s/Number I need: (*)/\1/g"; print'`
DATA2=`echo $line | perl -e '"s/Percentage1: (*)/\1/g"; print'`
DATA3='echo $line | perl -e '"s/Percentage2: (*)/\1/g"; print'`
echo "$DATA1 $DATA2 $DATA3";
done < test.out



But nothing prints out. Can anyone please help me figure out where I'm going wrong here?

Thanks,
Zel2008

This post has been edited by Zel2008: 13 September 2010 - 08:39 AM


Is This A Good Question/Topic? 0
  • +

Replies To: Extracting data from text with a Perl one liner

#2 dsherohman   User is offline

  • Perl Parson
  • member icon

Reputation: 227
  • View blog
  • Posts: 654
  • Joined: 29-March 09

Re: Extracting data from text with a Perl one liner

Posted 14 September 2010 - 07:12 AM

View PostZel2008, on 13 September 2010 - 03:37 PM, said:

while read line
do
DATA1=`echo $line | perl -e '"s/Number I need: (*)/\1/g"; print'`
DATA2=`echo $line | perl -e '"s/Percentage1: (*)/\1/g"; print'`
DATA3='echo $line | perl -e '"s/Percentage2: (*)/\1/g"; print'`
echo "$DATA1 $DATA2 $DATA3";
done < test.out



I wouldn't really call this a Perl question... What you've got there isn't a Perl program, but rather a shell script that happens to invoke some Perl one-liners. Since you're actually working in the shell, I'd suggest using grep for this:

grep ^$ -B3 datafile.txt | grep [^-] | sed -e 's/.*: *//'


The first grep looks for empty lines (beginning of line (^) immediately followed by end of line ($)) in datafile.txt and the -B3 tells it to also show the three lines immediately preceding each match, giving you "the last three lines that aren't blank", plus the blank line and a "--" separator between matches.

The second grep returns all lines which contain at least one non-"-" character, thus removing the blank lines and separators.

Finally, I used sed to remove everything up to the first non-space character following the (last) colon on each line.

Two points to note:

1) I interpreted your spec literally, that it should return data from the last three lines before each blank line, rather than from lines which begin with "Number I need", "Percentage1" or "Percentage2", which is what your shell script was (attempting to) do. If your actual intention is to retrieve data from lines with those prefixes, only one grep is needed:

grep '^Number I need\|Percentage[12]' datafile.txt | sed -e 's/.*: *//'


2) Because of my literal interpretation of the stated spec, the final block in the data file will be ignored unless it's followed by a blank line. (You can't take the last three items from before a blank line if there's no blank line, after all.) The prefix-based alternate version in point 1 above also removes this dependency on a final blank line.


As for the Perl one-liners that aren't working for you:
  • "*" is solely a quantifier in regular expressions and needs something before it to tell it what to match zero-or-more times, such as "." to match any character.
  • You're not giving the regex anything to match against - your one-liner doesn't read any data from STDIN, which is where the text to match is being provided.
  • The double quotes around your regex turn it into a literal string, not a command to be executed.
  • Although this doesn't break it, there's no point in replacing the original string with the matched text and then printing the original string when you can skip a few steps by printing the match directly.


Fixing these up,
echo 'Percentage1: 1/3 = 33.33%' | perl -e '<> =~ /Percentage1: (.*)/; print $1'
prints
1/3 = 33.33%
as you intended.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1