3 Replies - 1690 Views - Last Post: 30 May 2009 - 02:53 AM

#1 n00bc0der  Icon User is offline

  • D.I.C Head

Reputation: 3
  • View blog
  • Posts: 62
  • Joined: 27-November 08

Comparing text files

Post icon  Posted 29 May 2009 - 09:41 AM

Hey Everyone,

Alrighty this is like the 3rd post I've made regarding a different step in this code... I'm getting a lot closer to my final product.

What I've done so far is create a baseline for a series of files and directories that should be found on the drive. [It's for a product for the company I work for]. Note: This is my first time using Perl hence the reason I'm having some trouble with it. So far this is what I have:

#!/usr/bin/perl

###############################################################
#This script is used to ensure the permissions of significant
#files are set correctly, and that the correct number of files
#and directories are existent in the proper location.
###############################################################

#Used to detect all directories in a given original directory
sub getfulldirectorylist {
	my $dirtop = @_[0];
	my @dirlist = `find $dirtop -type d`;
	return @dirlist;
}

#Used to detect all files found in a given directory
sub getdirectorycontents {
	my $dirname = @_[0];
	my @contents = `ls -A $dirname`;
	return @contents;
}

#Used to detect the permissions of a given file

#note I'm not using this method yet.. just working through other parts
#before I get to here
sub getpermissions {
	my ($filename, $location) = @_;

	my $permission = `cd $location; ls -l | grep $filename | grep nobody`;
	if ($permission) {
			return "nobody";
	}

	my $permission = `cd $location; ls -l | grep $filename | grep root`;
	if ($permission) {
		return "root";
	}
}

sub compare {
	my ($baseline, $test) = @_;
	open(FILE, "$baseline") or die "Can't open $baseline : $!";
	my @baselinefile = <FILE>;
	close(FILE);
	open(FILE, "$test") or die "Can't open $test : $!";
	my @testfile = <FILE>;
	close(FILE);
	
	for ($i = 0; i < $baselinefile; $i++) {
		if (@baseline[i] == @testfile[i]) {
		}
		else {
			print "@baselinefile[i] does not match @testfile[i]";
		}
	}
}

sub write_file {
	my ($f, @data ) = @_;
	@data = () unless @data;
	open F, "> $f" or die "Can't open $f : $!";
	print F @data;
	close F;
}

for my $topdir ("/store", "/opt/qradar") {
	my @dir = getfulldirectorylist("$topdir");
	my @list = ();
	foreach $dir (@dir) {
		my @content = getdirectorycontents($dir);
		push(@list, "$dir");
		foreach $content(@content) {
			push(@list, "$content");
		}
		push(@list, "\n");
	}
	write_file("test.txt", @list);
}

&compare("baseline.txt", "test.txt");



So I can't get my compare subroutine to work like I want it to work.
I don't get it to print anything to the screen when there are differences that it should be printing.
I know it's going to print one thing and then it should mess up everything else because if one file doesn't exist in my search then no other files will be in the same location either. Again, I know this is going to happen but I'm trying to accomplish the "verifying that it's not the same" step first.

Any help/criticism is well appreciated.

Thanks,
Chris

Is This A Good Question/Topic? 0
  • +

Replies To: Comparing text files

#2 KevinADC  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 27
  • View blog
  • Posts: 401
  • Joined: 23-January 07

Re: Comparing text files

Posted 29 May 2009 - 12:33 PM

Without looking over your code very well, I see this line:

if (@baseline[i] == @testfile[i]) {


== is a math operator, not a string operator, what I think you want is "eq":

if (@baseline[i] eq @testfile[i]) {


"eq" compares two strings but for it to be true they must be exact matches.

On closer look there are more problems with this section of code:

for ($i = 0; i < $baselinefile; $i++) {
		if (@baseline[i] == @testfile[i]) {
		}
		else {
			print "@baselinefile[i] does not match @testfile[i]";
		}
	}
}


It should be written like so:

for ($i = 0; $i < $baselinefile; $i++) {
		if ($baseline[$i] eq $testfile[$i]) {
		}
		else {
			print "$baselinefile[$i] does not match $testfile[$i]";
		}
	}
}



You left out $ before most of the i's.

You also want to use the scalar context of an array element: $array[$i] instead of the list context: @array[$i]. While that will work it is long deprecated and is not gauranteed to work in the future.
Was This Post Helpful? 0
  • +
  • -

#3 dsherohman  Icon User is offline

  • Perl Parson
  • member icon

Reputation: 226
  • View blog
  • Posts: 654
  • Joined: 29-March 09

Re: Comparing text files

Posted 30 May 2009 - 02:16 AM

View PostKevinADC, on 29 May, 2009 - 07:33 PM, said:

for ($i = 0; $i < $baselinefile; $i++) {


One other detail here:

I would recommend using
for $i (0..$baselinefile) {
instead of the C-style loop. It's much more resistant to off-by-one errors, incorrect termination conditions, etc.

View PostKevinADC, on 29 May, 2009 - 07:33 PM, said:

You also want to use the scalar context of an array element: $array[$i] instead of the list context: @array[$i]. While that will work it is long deprecated and is not gauranteed to work in the future.

...and the Perl interpreter will tell you that you should change this if you "use warnings".
Was This Post Helpful? 0
  • +
  • -

#4 dsherohman  Icon User is offline

  • Perl Parson
  • member icon

Reputation: 226
  • View blog
  • Posts: 654
  • Joined: 29-March 09

Re: Comparing text files

Posted 30 May 2009 - 02:53 AM

View Postn00bc0der, on 29 May, 2009 - 04:41 PM, said:

#!/usr/bin/perl

As I recall, the initial version of your code that you posted with the first question included "use strict". Is there a particular reason why you removed it?

View Postn00bc0der, on 29 May, 2009 - 04:41 PM, said:

	my $permission = `cd $location; ls -l | grep $filename | grep nobody`;


Take a look at Perl's built-in stat function.

View Postn00bc0der, on 29 May, 2009 - 04:41 PM, said:

sub compare {
	my ($baseline, $test) = @_;
	open(FILE, "$baseline") or die "Can't open $baseline : $!";
	my @baselinefile = <FILE>;
	close(FILE);
	open(FILE, "$test") or die "Can't open $test : $!";
	my @testfile = <FILE>;
	close(FILE);
	
	for ($i = 0; i < $baselinefile; $i++) {
		if (@baseline[i] == @testfile[i]) {
		}
		else {
			print "@baselinefile[i] does not match @testfile[i]";
		}
	}
}


  • This code reads both files, in full, into memory, which will be unnecessarily slow and require much more memory than is really needed. This might not be noticeable on smaller files, but can become crippling if you have to deal with larger amounts of data.
  • You're reading the files in text mode, which will cause problems if your files contain non-ASCII data. "binmode FILEHANDLE" on each handle after opening it will take care of this as far as the initial data read is concerned, but, even then, I wouldn't necessarily expect text comparison of binary data to be 100% reliable.
  • The two-argument form of open has a number of issues and largely considered obsolete. Using the three-argument form with lexical filehandles is the preferred technique in Perl 5.6 or later: open($fh, '<', $test);
  • As already pointed out, == is numeric comparison and will treat each argument as being equal to the value of any leading digits, or 0. Use eq for string comparison.
  • You don't test for the case of the files being different because $test contains additional lines after the end of $base. Doing a stat on each file before opening them and declaring them different if they're not the same size will take care of this case, as well as avoiding the need to read all that data into memory at all when you hit the most trivial (and probably most common) case for detecting a difference.
  • The proper way to get the index of the last item in an array is with $#array, not $array. Your for loop should, therefore, be "for my $i (0..$#baselinefile) {". (My mention of that form of loop in my earlier response incorrectly used $baselinefile and omitted the "my". That's what I get for commenting on code without checking its context first...)
  • Is there a reason to use "if ($x eq $y) {} else {do_something}" rather than "if ($x ne $y) {do_something}" or "unless ($x eq $y) {do_something}"? "else" following an empty block is a pretty nasty smell.
  • When you find a line that's different, you probably want to include "last;" to abort the loop after the print statement. There's likely no need to continue comparing the rest of the file when you already know they're different.
  • Finally, since you're calling out to the shell for everything else, why not just have the shell diff the files for you?

View Postn00bc0der, on 29 May, 2009 - 04:41 PM, said:

&compare("baseline.txt", "test.txt");



You don't need the & and using it can produce non-intuitive side-effects. I strongly recommend removing it.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1