11 Replies - 1247 Views - Last Post: 06 May 2009 - 12:07 AM

#1 Sun751  Icon User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 57
  • Joined: 11-December 08

Fork

Post icon  Posted 29 April 2009 - 07:52 PM

I am working on one of my projects where i have to extract two different version of same database schema and diff it.
I am new to perl and i am using it to extract just DDL of database from the .sql file using pattern matching.

My pattern matching works like this:

$/ = ";\n"; # this is my seperater

if ($lines =~ /(create\s+(table|view|index|unique index|trigger).*?;)/gs)
{
my $stmt = $1;
#$stmt =~ s/\n/ /g; # remove newlines and read in one line.
print FILEWRITE "$stmt\n";
}

But, the problem is in .sql source file there are some white spaces between ";" and "\n";

to get over this i am trying to use Fork, to remove whitespaces first and then read a file with $/ = ";\n";seperater, for example,
FORK:{
if ($pid = fork)
{
while (<FILE>)
{
$_ = s/;\s*\$/;/;
}
}
elsif(defined($pid))
{
$/ = ";\n";
while(my $line =<FILE>)
{
#---Do pattern matching and writing to file
}
}

But this is not working as i am aspecting, if any one have any suggestion???

Is This A Good Question/Topic? 0
  • +

Replies To: Fork

#2 rharriso  Icon User is offline

  • New D.I.C Head

Reputation: 3
  • View blog
  • Posts: 45
  • Joined: 29-April 09

Re: Fork

Posted 29 April 2009 - 11:53 PM

Well how is the program behaving? I guess I can't really tell how the program is suppose to run. Perhaps you could post your code with text highlighting?

This post has been edited by rharriso: 29 April 2009 - 11:55 PM

Was This Post Helpful? 0
  • +
  • -

#3 Sun751  Icon User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 57
  • Joined: 11-December 08

Re: Fork

Posted 30 April 2009 - 12:19 AM

View Postrharriso, on 29 Apr, 2009 - 10:53 PM, said:

Well how is the program behaving? I guess I can't really tell how the program is suppose to run. Perhaps you could post your code with text highlighting?
This is my code, what i have done till now



use strict; 
use warnings; 


my $dir1 =   $ARGV[0] 
my $dir2 =   $ARGV[1]; 
my $file = 'authority.sql'; 

read_file($dir1,$file); # call subroutine to process file1
read_file($dir2,$file); # call subroutine to process file2

my $cmd =" diff -bwy '$dir1/authoritySchema.sql' '$dir2/authoritySchema.sql' > output"; #Diff Command of Linux 
system $cmd; 
	if ($? != 0)
	{
		print "Files Differ\n"; 
	}
sub read_file 
{
	my $dlocation = shift;
		my $fname = shift; 


 	open(FILEREAD, "<$dlocation/$fname") or 
			   die "Can't open"; 

	my $floc = "$dlocation/authoritySchema.sql";
	$floc =~ s/\//\\/g if ($^O eq 'MSWin32');
	open( FILEWRITE,"> \"$floc\"") or #sortithe output file
			die "$dlocation\n Unable to open output file 'authoritySchema.sql' : $!";
	
FORK{
			  if ($pid=fork)
		{
		while (<FILEREAD>)
			{
				$_ =~ s/;\s*$/;/;
			}
		}
				 else
					 {

				 $/ = ";\n"; # input seperator 
  
	while (my $lines = <FILEREAD>) 
	{
		if ($lines =~ /(create\s+(table|view|index|unique index|trigger).*?;)/gs)
		{
			print FILEWRITE "$1\n";
		}
		elsif ($lines =~ /(^\s*grant\s+(select|update|insert|delete|index|alter|references).*?;)/m) 
		{
			print FILEWRITE "$1\n"; 
		}
		elsif ($lines =~ /(^\s*update.*?;)/s)# /s for reading multiple lines 
		{
			print FILEWRITE "$1\n"; 
		}
		elsif ($lines =~ /(^\s*alter table.*?;)/s)
		{
			print FILEWRITE "$1\n"; 
		}
		elsif ($lines =~ /(^\s*create\s*procedure.*?end\s+procedure\s*;)/gs)
		{
			print FILEWRITE "$1\n";

		}

	}
	
}
close FILEREAD; 
close FILEWRITE;


This post has been edited by Sun751: 30 April 2009 - 04:37 PM

Was This Post Helpful? 0
  • +
  • -

#4 Ed_Bighead  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 14
  • View blog
  • Posts: 178
  • Joined: 26-April 09

Re: Fork

Posted 30 April 2009 - 06:00 AM

Edit your post and put your code in between the code blocks. It really does make things easier to read.

Posted Image
Was This Post Helpful? 0
  • +
  • -

#5 rharriso  Icon User is offline

  • New D.I.C Head

Reputation: 3
  • View blog
  • Posts: 45
  • Joined: 29-April 09

Re: Fork

Posted 30 April 2009 - 12:15 PM

Once I get a chance I'll see if i can get this program working. Keep plugging at it though. It might be a bit.
Was This Post Helpful? 0
  • +
  • -

#6 castaway  Icon User is offline

  • New D.I.C Head

Reputation: 2
  • View blog
  • Posts: 5
  • Joined: 04-April 09

Re: Fork

Posted 30 April 2009 - 03:02 PM

Any particular reason you need to do this by hand?

Try SQL::Translator which has a "Diff" tool.

Jess
Was This Post Helpful? 0
  • +
  • -

#7 dsherohman  Icon User is offline

  • Perl Parson
  • member icon

Reputation: 226
  • View blog
  • Posts: 654
  • Joined: 29-March 09

Re: Fork

Posted 02 May 2009 - 06:45 AM

Can you explain the reasoning behind your decision to approach the problem in this manner? "fork" creates a separate process, so all you've done here is create two processes running concurrently, both of which are reading the same file and examining it in different ways.

My best guess is that your intent was for one process to clean up the input file by removing any whitespace preceding an end-of-line semicolon and then the other process would parse that cleaned data, but your two processes aren't communicating with each other, so changes made by the regex in one won't be seen by the other.

Personally, based on what I can see in your code, I would just set $/ to ";" instead of ";\n", so that it won't matter whether there's whitespace after the semicolon. This approach will also allow for multiple statements on a single line. Doing it this way does mean that you'd need to strip newlines out of the data that gets read in, but you're already doing that anyhow, so no big deal there.
Was This Post Helpful? 0
  • +
  • -

#8 Sun751  Icon User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 57
  • Joined: 11-December 08

Re: Fork

Posted 03 May 2009 - 06:16 PM

View Postdsherohman, on 2 May, 2009 - 05:45 AM, said:

Can you explain the reasoning behind your decision to approach the problem in this manner? "fork" creates a separate process, so all you've done here is create two processes running concurrently, both of which are reading the same file and examining it in different ways.

My best guess is that your intent was for one process to clean up the input file by removing any whitespace preceding an end-of-line semicolon and then the other process would parse that cleaned data, but your two processes aren't communicating with each other, so changes made by the regex in one won't be seen by the other.

Personally, based on what I can see in your code, I would just set $/ to ";" instead of ";\n", so that it won't matter whether there's whitespace after the semicolon. This approach will also allow for multiple statements on a single line. Doing it this way does mean that you'd need to strip newlines out of the data that gets read in, but you're already doing that anyhow, so no big deal there.


Hi dsherohman,

Your understanding is correct, that is what i want to do two seperate, process one for triming and other for further processing.


The only reason why i want to go with this approch is i don't want to use any temporary files which is another way,to do this task.

I understand $/ = ';'. but could you please clearify me how can i create communication between two FORK process, as you are saying its not communicating.
I want to run triming first and read edited(trimed) text for further processing...........

Waiting for your response!!!

Regards

Sun

This post has been edited by Sun751: 03 May 2009 - 06:19 PM

Was This Post Helpful? 0
  • +
  • -

#9 dsherohman  Icon User is offline

  • Perl Parson
  • member icon

Reputation: 226
  • View blog
  • Posts: 654
  • Joined: 29-March 09

Re: Fork

Posted 04 May 2009 - 10:56 AM

View PostSun751, on 4 May, 2009 - 01:16 AM, said:

Your understanding is correct, that is what i want to do two seperate, process one for triming and other for further processing.

The only reason why i want to go with this approch is i don't want to use any temporary files which is another way,to do this task.

I understand $/ = ';'. but could you please clearify me how can i create communication between two FORK process, as you are saying its not communicating.
I want to run triming first and read edited(trimed) text for further processing...........

The simplest way for a parent process to fork off a child and communicate with it afterward is to pass the messages through a file handle opened prior to forking, as described at http://perldoc.perl....-open()-for-IPC

But I'm still not seeing any benefit to using two separate processes here at all, regardless of whether they communicate using pipes, temp files, or more advanced forms of IPC. You have not given any reason for not doing this within a single process.

This code reads and displays semicolon-separated statements and prints them out, one per line, regardless of whether they have spaces after their terminating semicolons, are split across multiple lines, or have multiple statements on one line, and it does it all in a single process without forking:
#!/usr/bin/perl

use strict;
use warnings;

$/ = ';';

while (<DATA>) {
  my $line = $_;
  $line =~ s/^\s*//;	# Trim leading whitespace
  $line =~ s/\s*;$//;   # Trim trailing whitespace and semicolon
  $line =~ s/\s+/ /g;   # Collapse all remaining whitespace into single spaces

  print $line, "\n";
} 

__DATA__
This is a simple one - on a line alone with no trailing spaces;
This one has trailing spaces;   This
one started on the last line and has spaces before the semicolomn  ;
This one shares a line; with this one;



It produces the output:
This is a simple one - on a line alone with no trailing spaces
This one has trailing spaces
This one started on the last line and has spaces before the semicolomn
This one shares a line
with this one


Was This Post Helpful? 0
  • +
  • -

#10 Sun751  Icon User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 57
  • Joined: 11-December 08

Re: Fork

Posted 04 May 2009 - 04:34 PM

View Postdsherohman, on 4 May, 2009 - 09:56 AM, said:

View PostSun751, on 4 May, 2009 - 01:16 AM, said:

Your understanding is correct, that is what i want to do two seperate, process one for triming and other for further processing.

The only reason why i want to go with this approch is i don't want to use any temporary files which is another way,to do this task.

I understand $/ = ';'. but could you please clearify me how can i create communication between two FORK process, as you are saying its not communicating.
I want to run triming first and read edited(trimed) text for further processing...........

The simplest way for a parent process to fork off a child and communicate with it afterward is to pass the messages through a file handle opened prior to forking, as described at http://perldoc.perl....-open()-for-IPC

But I'm still not seeing any benefit to using two separate processes here at all, regardless of whether they communicate using pipes, temp files, or more advanced forms of IPC. You have not given any reason for not doing this within a single process.

This code reads and displays semicolon-separated statements and prints them out, one per line, regardless of whether they have spaces after their terminating semicolons, are split across multiple lines, or have multiple statements on one line, and it does it all in a single process without forking:
#!/usr/bin/perl

use strict;
use warnings;

$/ = ';';

while (<DATA>) {
  my $line = $_;
  $line =~ s/^\s*//;	# Trim leading whitespace
  $line =~ s/\s*;$//;   # Trim trailing whitespace and semicolon
  $line =~ s/\s+/ /g;   # Collapse all remaining whitespace into single spaces

  print $line, "\n";
} 

__DATA__
This is a simple one - on a line alone with no trailing spaces;
This one has trailing spaces;   This
one started on the last line and has spaces before the semicolomn;
This one shares a line; with this one;



It produces the output:
This is a simple one - on a line alone with no trailing spaces
This one has trailing spaces
This one started on the last line and has spaces before the semicolomn
This one shares a line
with this one





Hi dsherohman,

First of all thank you for your response.The only reason why I want two different process is
First - Before I read file into filehandle i want to trim the source so that whatever I read into the file handle have the seperater ";\n"
for example: -
create table "auth".auareffw 
  (
fmt_acc char(21) not null ,
 );


Don't want any whitespaces between ";" and new line. Because in source file I have ";" in between the text which i don't want as a seperator. for example,
create  procedure "auth".update_anmast ( prm_ani_num like auanmast.ani_num, prm_mod_dte_tme like auanmast.mod_dte_tme, prm_mod_opr like auanmast.mod_opr )  define l_status smallint;  define count_rows smallint;


this is only reason why i want ";\n" to work as seperator.

Second - after i get Formated data into file handle i am running following pattern matching to extract data that i am looking for,

$/ = ";\n"; # input seperator
while (my $lines = <FILEREAD>) 
{
	if ($lines =~ /(create\s+(table|view|index|unique index|trigger).*?;)/gs)
		{
			print FILEWRITE "$1\n";
		}
	elsif ($lines =~ /(^\s*create\s*procedure.*?end\s+procedure\s*;)/gs)
		{
			print FILEWRITE "$1\n";

		}


And to accomplish this task i can run two different process - one for triming the whitespaces and write into temporay file while another process to read from temporary file having ";\n" as a seperator for pattern matching task.

As you mention pipe in last post, could you please let me know how I can use that as temporay data holder so that i can run two process on it, Read in a pipe formatting data and Reading from a pipe to file handle running pattern matching.

All suggestion are welcome,

Regards

Sun

This post has been edited by Sun751: 04 May 2009 - 04:36 PM

Was This Post Helpful? 0
  • +
  • -

#11 dsherohman  Icon User is offline

  • Perl Parson
  • member icon

Reputation: 226
  • View blog
  • Posts: 654
  • Joined: 29-March 09

Re: Fork

Posted 05 May 2009 - 05:31 AM

View PostSun751, on 4 May, 2009 - 11:34 PM, said:

First - Before I read file into filehandle i want to trim the source so that whatever I read into the file handle have the seperater ";\n"

What benefit do you perceive in doing the trimming separately from the post-trim processing? They can both be done in the same process, as the code in my previous post demonstrates.

View PostSun751, on 4 May, 2009 - 11:34 PM, said:

Because in source file I have ";" in between the text which i don't want as a seperator. for example,
create  procedure "auth".update_anmast ( prm_ani_num like auanmast.ani_num, prm_mod_dte_tme like auanmast.mod_dte_tme, prm_mod_opr like auanmast.mod_opr )  define l_status smallint;  define count_rows smallint;


this is only reason why i want ";\n" to work as seperator.


This looks like an XY Problem. Your actual objective is apparently to find a way to read in lines separated by a semicolon, optional whitespace, and a newline. You have decided that forking and IPC is the way to do this and have fixated on that solution to the exclusion of considering any other way of accomplishing your actual objective.

How large are your input files? If they're small enough to fit into memory, you could do it this way:
#!/usr/bin/perl

use strict;
use warnings;

undef $/;
my $raw = <DATA>;
my @items = split /;\s*\n/, $raw;

for (@items) { 
  $_ =~ s/\s+$//g;	  # Trim trailing whitespace
  $_ =~ s/\s+;/;/g;	 # Trim whitespace before non-terminal semicolon
  print "$_;\n---\n"; 
};

__DATA__
This is a simple one - on a line alone with no trailing spaces;
This one has trailing spaces;   This
one started on the last line and has spaces before the semicolon  ;
This one shares a line; with this one;


which outputs
This is a simple one - on a line alone with no trailing spaces;
---
This one has trailing spaces;   This
one started on the last line and has spaces before the semicolon;
---
This one shares a line; with this one;
---



View PostSun751, on 4 May, 2009 - 11:34 PM, said:

Second - after i get Formated data into file handle i am running following pattern matching to extract data that i am looking for,

...which can simply be dropped in to replace the print statements in my examples. I'm just printing the results of the trimmed data instead of doing your actual processing on it for the sake of keeping the examples as small and simple as possible. This also makes them more likely to be of use to others who may read this thread.

View PostSun751, on 4 May, 2009 - 11:34 PM, said:

All suggestion are welcome,

My number one suggestion is to not fixate on doing this in separate processes. It's not necessary, it's not the most efficient way to accomplish your goal of reading in the data and passing it off to your processing code, and it is nowhere close to being the easiest, most straightforward, or most maintainable way of doing it.
Was This Post Helpful? 0
  • +
  • -

#12 Sun751  Icon User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 57
  • Joined: 11-December 08

Re: Fork

Posted 06 May 2009 - 12:07 AM

View Postdsherohman, on 5 May, 2009 - 04:31 AM, said:

View PostSun751, on 4 May, 2009 - 11:34 PM, said:

First - Before I read file into filehandle i want to trim the source so that whatever I read into the file handle have the seperater ";\n"

What benefit do you perceive in doing the trimming separately from the post-trim processing? They can both be done in the same process, as the code in my previous post demonstrates.

View PostSun751, on 4 May, 2009 - 11:34 PM, said:

Because in source file I have ";" in between the text which i don't want as a seperator. for example,
create  procedure "auth".update_anmast ( prm_ani_num like auanmast.ani_num, prm_mod_dte_tme like auanmast.mod_dte_tme, prm_mod_opr like auanmast.mod_opr )  define l_status smallint;  define count_rows smallint;


this is only reason why i want ";\n" to work as seperator.


This looks like an XY Problem. Your actual objective is apparently to find a way to read in lines separated by a semicolon, optional whitespace, and a newline. You have decided that forking and IPC is the way to do this and have fixated on that solution to the exclusion of considering any other way of accomplishing your actual objective.

How large are your input files? If they're small enough to fit into memory, you could do it this way:
#!/usr/bin/perl

use strict;
use warnings;

undef $/;
my $raw = <DATA>;
my @items = split /;\s*\n/, $raw;

for (@items) { 
  $_ =~ s/\s+$//g;	  # Trim trailing whitespace
  $_ =~ s/\s+;/;/g;	 # Trim whitespace before non-terminal semicolon
  print "$_;\n---\n"; 
};

__DATA__
This is a simple one - on a line alone with no trailing spaces;
This one has trailing spaces;   This
one started on the last line and has spaces before the semicolon ;
This one shares a line; with this one;


which outputs
This is a simple one - on a line alone with no trailing spaces;
---
This one has trailing spaces;   This
one started on the last line and has spaces before the semicolon;
---
This one shares a line; with this one;
---



View PostSun751, on 4 May, 2009 - 11:34 PM, said:

Second - after i get Formated data into file handle i am running following pattern matching to extract data that i am looking for,

...which can simply be dropped in to replace the print statements in my examples. I'm just printing the results of the trimmed data instead of doing your actual processing on it for the sake of keeping the examples as small and simple as possible. This also makes them more likely to be of use to others who may read this thread.

View PostSun751, on 4 May, 2009 - 11:34 PM, said:

All suggestion are welcome,

My number one suggestion is to not fixate on doing this in separate processes. It's not necessary, it's not the most efficient way to accomplish your goal of reading in the data and passing it off to your processing code, and it is nowhere close to being the easiest, most straightforward, or most maintainable way of doing it.




Hi dsherohman,

Fist of all thank you for your suggestion, your suggestion is impressive and your codes looks good. But there is a problem using your concept in my situation. I am talking about the source file which is quite long for example the one I am dealing with is about 51364 lines long and I need to extract 38065 lines out of it if my code works correctly. So, I don't think its smart way to use memory for holding those data. And using tmp file is another impossible thing for me.

Thus, i need to have two process one running for triming and another for pattern matching so that my seperator";\n" works correctly,

If you have any suggestion please do let me know,

Regards

Suraj
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1