School Assignment? Project Due Tomorrow? Chat LIVE With A Programming Expert!

Welcome to Dream.In.Code
Become an Expert!

Join 300,419 Programmers for FREE! Get instant access to thousands of experts, tutorials, code snippets, and more! There are 1,504 people online right now. Registration is fast and FREE... Join Now!




Read file into multiple hash

 

Read file into multiple hash

Sun751

18 Jun, 2009 - 03:22 PM
Post #1

D.I.C Head
**

Joined: 11 Dec, 2008
Posts: 57



Thanked: 1 times
My Contributions
I am reading a file and initializing the hash with respective segment. And i have used following loop,
I am wondering if there is any smart way of doing this task, i mean better than what I have done.
following is my code,
[code]
while (my $line = <FILEREAD>)
{
$line =~ s/\r\n//; #Subsitute binary character \cM\cJ
chomp($line);
if ($line)
{
if ($line eq "[RESOURCE]"){$RFlag = "WHITE"};
if ($line eq "[WNT]"){$WFlag = "WHITE";$RFlag = "RED"};
if ($line eq "[UNX]"){$UFlag = "WHITE";$WFlag = "RED"};
if ($RFlag eq "WHITE")
{
if ($line =~ /^Id.*?$/)
{
$key = $line;
}
elsif ($line =~ /^Source.*?$/)
{
$$HR_resource{$key} = "$line";
}

}
if ($WFlag)
{
if ($WFlag eq "WHITE")
{
if ($line =~ /^Id.*?$/)
{
$key = $line;
}
elsif ($line =~ /^Source.*?$/)
{
$$HR_Window{$key} = "$line";
}

}
}
if ($UFlag)
{
if ($UFlag eq "WHITE")
{
if ($line =~ /^Id.*?$/)
{
$key = $line;
}
elsif ($line =~ /^Source.*?$/)
{
$$HR_Linux{$key} = "$line";
}

}
}
}
[\code]

Any suggestion welcome,

Cheers

User is offlineProfile CardPM
+Quote Post


KevinADC

RE: Read File Into Multiple Hash

18 Jun, 2009 - 06:49 PM
Post #2

D.I.C Regular
Group Icon

Joined: 23 Jan, 2007
Posts: 401



Thanked: 25 times
Dream Kudos: 50
My Contributions
Just judging by the code and not knowing anything else, it looks like it could be polished up a bit (and the syntax errors have to be removed for it to even work).

Why are you using soft references for the hashes? That should alomst certainly be switched to a regular hash instead of a soft reference to a hash. You could probably use one hash (as in a hash of hashes) instead of three hashes but there is no way to know without further information.

This post has been edited by KevinADC: 18 Jun, 2009 - 06:49 PM
User is offlineProfile CardPM
+Quote Post

Sun751

RE: Read File Into Multiple Hash

18 Jun, 2009 - 09:24 PM
Post #3

D.I.C Head
**

Joined: 11 Dec, 2008
Posts: 57



Thanked: 1 times
My Contributions
QUOTE(KevinADC @ 18 Jun, 2009 - 06:49 PM) *

Just judging by the code and not knowing anything else, it looks like it could be polished up a bit (and the syntax errors have to be removed for it to even work).

Why are you using soft references for the hashes? That should alomst certainly be switched to a regular hash instead of a soft reference to a hash. You could probably use one hash (as in a hash of hashes) instead of three hashes but there is no way to know without further information.


Could you please guide me how to get rid of multiple hashs
and actully i am looking for some other way so that i don;t have to use those FLAG's
any suggestion
let me know

Cheers

User is offlineProfile CardPM
+Quote Post

KevinADC

RE: Read File Into Multiple Hash

18 Jun, 2009 - 10:17 PM
Post #4

D.I.C Regular
Group Icon

Joined: 23 Jan, 2007
Posts: 401



Thanked: 25 times
Dream Kudos: 50
My Contributions
Without further information I don't want to try and devine what your code is doing. Post some sample data and explain what you need to do with the data.
User is offlineProfile CardPM
+Quote Post

dsherohman

RE: Read File Into Multiple Hash

20 Jun, 2009 - 07:25 AM
Post #5

D.I.C Head
**

Joined: 29 Mar, 2009
Posts: 184



Thanked: 35 times
My Contributions
QUOTE(Sun751 @ 18 Jun, 2009 - 11:22 PM) *

I am reading a file and initializing the hash with respective segment. And i have used following loop,
I am wondering if there is any smart way of doing this task, i mean better than what I have done.

Let's take a look, hmm?

CODE

$line =~ s/\r\n//; #Subsitute binary character \cM\cJ
chomp($line);

These two lines are redundant. chomp removes the trailing (OS-dependent) newline from a value if there is a trailing newline. In theory, the chomp alone should be sufficient, but, in practice, it doesn't work so well if the file's line endings are different from those on the OS where the code is run, so I'd suggest using "$line =~ tr/\r\n//d" on its own, as this will remove any and all \r or \n characters from the line, whether they appear individually or paired (and tr/// runs faster than a full-blown regex).

CODE

if ($line eq "[RESOURCE]"){$RFlag = "WHITE"};
if ($line eq "[WNT]"){$WFlag = "WHITE";$RFlag = "RED"};
if ($line eq "[UNX]"){$UFlag = "WHITE";$WFlag = "RED"};

This looks to me like you're trying to implement a simple state machine using flags to indicate whether you're in a "RESOURCE", "WNT", or "UNX" section, but it's unclear why you're using two flags instead of just one. I suspect that
CODE

if ($line =~ /^\[([A-Z ]+)\]/) { $section_flag = $1 };
# or, with a more Perlish accent:
# $section_flag = $1 if $line =~ /^\[([A-Z ]+)\]/;

would be an easier and cleaner way to accomplish this, plus it will automatically catch any lines containing "[ALL CAPS TEXT]" if new section types are added later, although you may need to alter the regex to allow a broader range of characters than just uppercase letters and spaces.

This method is also less fragile in the case of files with out-of-order sections. The rest of your code seems to assume that only one of $RFlag, $WFlag, or $UFlag will be "WHITE" at any given time, but an input file with the [UNX] section first, then [WNT], then [RESOURCE] would set all three to "WHITE", violating this assumption.

The remaining code appears to be identical for all three sections, aside from which hash the "Source" values are inserted into, so they can be stripped down to:
CODE

if ($line =~ /^Id/)
{
  $key = $line;
}
elsif ($line =~ /^Source/)
{
  $$$results{$section_flag}{$key} = $line;
}

$results{RESOURCE}, $results{WNT}, and $results{UNX} will then correspond to your \%HR_resource, \%HR_Window, and \%HR_Linux, respectively.

Also note that I removed the double quotes around "$line" in the assignment to the hash and the trailing ".*?$" from the regexes, as they were superfluous.

Putting this all together, we get (syntactically valid, but completely untested):
CODE

while (my $line = <FILEREAD>)
{
  $line =~ tr/\r\n//d; #Remove binary characters \cM and \cJ
  if ($line)
  {
    if ($line =~ /^\[([A-Z ]+)\]/)
    {
      $section_flag = $1;
    }
    elsif ($line =~ /^Id/)
    {
      $key = $line;
    }
    elsif ($line =~ /^Source/)
    {
      $$$results{$section_flag}{$key} = $line;
    }
  }
}

which should behave identically to the code you posted, within the assumed constraints that input files must contain a [RESOURCE] section followed by a [WNT] section and then a [UNX] section and that you don't care what happens with data found in any section(s) other than those three.
User is offlineProfile CardPM
+Quote Post

Sun751

RE: Read File Into Multiple Hash

21 Jun, 2009 - 03:03 AM
Post #6

D.I.C Head
**

Joined: 11 Dec, 2008
Posts: 57



Thanked: 1 times
My Contributions
QUOTE(dsherohman @ 20 Jun, 2009 - 07:25 AM) *

QUOTE(Sun751 @ 18 Jun, 2009 - 11:22 PM) *

I am reading a file and initializing the hash with respective segment. And i have used following loop,
I am wondering if there is any smart way of doing this task, i mean better than what I have done.

Let's take a look, hmm?

CODE

$line =~ s/\r\n//; #Subsitute binary character \cM\cJ
chomp($line);

These two lines are redundant. chomp removes the trailing (OS-dependent) newline from a value if there is a trailing newline. In theory, the chomp alone should be sufficient, but, in practice, it doesn't work so well if the file's line endings are different from those on the OS where the code is run, so I'd suggest using "$line =~ tr/\r\n//d" on its own, as this will remove any and all \r or \n characters from the line, whether they appear individually or paired (and tr/// runs faster than a full-blown regex).

CODE

if ($line eq "[RESOURCE]"){$RFlag = "WHITE"};
if ($line eq "[WNT]"){$WFlag = "WHITE";$RFlag = "RED"};
if ($line eq "[UNX]"){$UFlag = "WHITE";$WFlag = "RED"};

This looks to me like you're trying to implement a simple state machine using flags to indicate whether you're in a "RESOURCE", "WNT", or "UNX" section, but it's unclear why you're using two flags instead of just one. I suspect that
CODE

if ($line =~ /^\[([A-Z ]+)\]/) { $section_flag = $1 };
# or, with a more Perlish accent:
# $section_flag = $1 if $line =~ /^\[([A-Z ]+)\]/;

would be an easier and cleaner way to accomplish this, plus it will automatically catch any lines containing "[ALL CAPS TEXT]" if new section types are added later, although you may need to alter the regex to allow a broader range of characters than just uppercase letters and spaces.

This method is also less fragile in the case of files with out-of-order sections. The rest of your code seems to assume that only one of $RFlag, $WFlag, or $UFlag will be "WHITE" at any given time, but an input file with the [UNX] section first, then [WNT], then [RESOURCE] would set all three to "WHITE", violating this assumption.

The remaining code appears to be identical for all three sections, aside from which hash the "Source" values are inserted into, so they can be stripped down to:
CODE

if ($line =~ /^Id/)
{
  $key = $line;
}
elsif ($line =~ /^Source/)
{
  $$$results{$section_flag}{$key} = $line;
}

$results{RESOURCE}, $results{WNT}, and $results{UNX} will then correspond to your \%HR_resource, \%HR_Window, and \%HR_Linux, respectively.

Also note that I removed the double quotes around "$line" in the assignment to the hash and the trailing ".*?$" from the regexes, as they were superfluous.

Putting this all together, we get (syntactically valid, but completely untested):
CODE

while (my $line = <FILEREAD>)
{
  $line =~ tr/\r\n//d; #Remove binary characters \cM and \cJ
  if ($line)
  {
    if ($line =~ /^\[([A-Z ]+)\]/)
    {
      $section_flag = $1;
    }
    elsif ($line =~ /^Id/)
    {
      $key = $line;
    }
    elsif ($line =~ /^Source/)
    {
      $$$results{$section_flag}{$key} = $line;
    }
  }
}

which should behave identically to the code you posted, within the assumed constraints that input files must contain a [RESOURCE] section followed by a [WNT] section and then a [UNX] section and that you don't care what happens with data found in any section(s) other than those three.


Your solution looks really impressive, but really important thing I am unsure is about how to extract respective hash from following:-
CODE

$$$results{$section_flag}{$key} = $line;


Any suggestion, welcome

Cheers
User is offlineProfile CardPM
+Quote Post

dsherohman

RE: Read File Into Multiple Hash

21 Jun, 2009 - 06:00 AM
Post #7

D.I.C Head
**

Joined: 29 Mar, 2009
Posts: 184



Thanked: 35 times
My Contributions
QUOTE(Sun751 @ 21 Jun, 2009 - 11:03 AM) *

Your solution looks really impressive, but really important thing I am unsure is about how to extract respective hash from following:-
CODE

$$$results{$section_flag}{$key} = $line;



As I said in the earlier message,
QUOTE

$results{RESOURCE}, $results{WNT}, and $results{UNX} will then correspond to your \%HR_resource, \%HR_Window, and \%HR_Linux, respectively.

so you should (again, untested, and I'm not sure of the context your original code came from, so I can't be 100% certain) be able to just replace:

$$HR_resource{...} with $$$results{RESOURCE}{...}
%$HR_resource with %{$$results{RESOURCE}}

throughout the rest of your code (along with the corresponding HR_Window -> results{WNT} and HR_Linux -> results{LNX} constructs, of course).
User is offlineProfile CardPM
+Quote Post

Fast ReplyReply to this topicStart new topic

Time is now: 11/7/09 11:49PM

Live Help!

Be Social

Dream.In.Code RSS Feed Dream.In.Code LinkedIn Group Follow Us On Twitter Fan Us On Facebook

Tutorials

Programming

Web Development

Reference Sheets

Code Snippets

DIC Chatroom

Bye Bye Ads

Monthly Drawing

Thumb Drive

Top Contributors

Top 10 Kudos This Month