Dealing with single column multi header text file

Pulling out values to insert into database as rows

  • (2 Pages)
  • +
  • 1
  • 2

20 Replies - 1781 Views - Last Post: 22 April 2009 - 06:49 PM Rate Topic: -----

#1 LoveSquid  Icon User is offline

  • New D.I.C Head

Reputation: 1
  • View blog
  • Posts: 47
  • Joined: 14-October 08

Dealing with single column multi header text file

Posted 09 April 2009 - 09:13 PM

Here is my dilemma:

We have a service that handles uploading inventory files to different websites for us, keeping them synced. For our own website, we would like to display all available data. The only file format that exports the largest amount of data is UIEE with a .msg extension. It is a text file, not an Outlook file, that has the data in a single column with a header on each delimited by a pipe and then the data, When opened in emacs & vi it shows the lines terminated by ^M and each set of data by a single ^M with no data. Here is an example, but the terminating ^M character is not shown.

User
BOOKS
2009-04-09
15:09:34

UR|221162
TI|The Prophet Unarmed; Trotsky : 1921-1929
PR|10.00
AA|Deutscher, Isaac
BD|Hardcover
NT| blue cloth boards; interior is clean and unmarked; tight binding; 490 pages
CO|0
LO|Steel Shelves
PC|2.13
SD|
CN|Fine
PP|London
DP|1959
JK|No Dust Jacket
PU|Oxford Univ Press
XA|4
XB|5
XC|BO
XD|S

UR|011157
TI|Stirling Moss
PR|12.98
AA|Robert Edwards
BN|1841882003
BD|Soft Cover with French ...
NT|1841882003 Even though this is a brand new book, there is a small vertical (but barely perceptible) crease where the front flap ends. 360 pages; profusely illustrated
CO|0
PC|6.49
SD|
CN|New
PP|United Kingdom
DP|2002
KE|AUTOMOBILE RACING BIOGRAPHY DRIVERS SPORTS
PU|Weidenfeld & Nicolson
XA|4
XB|5
XC|BO
XD|S




As you can see, the headers are only present when there is data present in that field, which changes book to book, with the exception of the CO field. For very new record the first header is always the UR field which is our local database SKU and is always unique. This will be linked to the unique key, auto_incremented id field in our online database. The first few rows contain useless file information.

I am really at a loss at how to extract the data on the right of the pipe but keep it referenced to the appropriate header for insertion/deletion/updating into MYSQL database table appropriate field.

I am NOT asking for someone to write the code, but a point in the correct direction would go a long way. I've handled csv & tsv, but nothing like this single column, pseudo repeating header format.

Thanks in advance for any assistance.

Jase

Is This A Good Question/Topic? 0
  • +

Replies To: Dealing with single column multi header text file

#2 CTphpnwb  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 2896
  • View blog
  • Posts: 10,031
  • Joined: 08-August 08

Re: Dealing with single column multi header text file

Posted 10 April 2009 - 05:03 AM

Use the explode function.
Was This Post Helpful? 0
  • +
  • -

#3 LoveSquid  Icon User is offline

  • New D.I.C Head

Reputation: 1
  • View blog
  • Posts: 47
  • Joined: 14-October 08

Re: Dealing with single column multi header text file

Posted 10 April 2009 - 01:25 PM

Thanks CTphpnwb, I really appreciate the assistance. But, I'm trying to find out how to use the two-digit UIEE code on the left of the pipe as the table field headers and the data on the right for the input field data much as I would use the headers in a CSV file to map data to the appropriate fields in a database table. I just cannot find a decent method for separating the records from each other and then pulling out the data and mapping it.

Again, any help is appreciated.

Here is what I have, but only echoing results:
foreach (glob("*.msg") as $filename){

$file = file($filename) or exit('could not open file');
foreach ($file as $key => $value)
{
	if ($key <= 4) {unset($file[$key]);}
	else {break;}
}

$file = implode("", $file);

$handle = fopen($filename, 'w') or exit('could not read file');
$write = fwrite($handle, $file) or exit('could not write to file');

if (fnmatch("delete*", $filename)) {

	if (!$file = file($filename)) die ('Could not open input file'); {

	foreach($file as $line) {
		$line_values = explode('|', $line); // this just creates a long string right now
		echo $line_values[0];

	 // separate data from headers here
	}
}
}
}



The first part is stripping out the extra 5 lines a the top of the file. The second is meant to echo the data on screen.

This post has been edited by LoveSquid: 10 April 2009 - 02:58 PM

Was This Post Helpful? 0
  • +
  • -

#4 CTphpnwb  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 2896
  • View blog
  • Posts: 10,031
  • Joined: 08-August 08

Re: Dealing with single column multi header text file

Posted 10 April 2009 - 02:47 PM

This works for me:
<?php
$data = "UR|221162
TI|The Prophet Unarmed; Trotsky : 1921-1929
PR|10.00
AA|Deutscher, Isaac
BD|Hardcover
NT| blue cloth boards; interior is clean and unmarked; tight binding; 490 pages
CO|0
LO|Steel Shelves
PC|2.13
SD|
CN|Fine
PP|London";


$x = explode("\n",$data);
var_dump($x);
echo "<br><br>";

foreach($x as $y) {
	$z = explode("|",$y);
	$mydata[$z[0]] = $z[1];
}
var_dump($mydata);

?>

Was This Post Helpful? 1
  • +
  • -

#5 LoveSquid  Icon User is offline

  • New D.I.C Head

Reputation: 1
  • View blog
  • Posts: 47
  • Joined: 14-October 08

Re: Dealing with single column multi header text file

Posted 10 April 2009 - 02:58 PM

Thanks again CTphpnwb, I really appreciate the help!!
Was This Post Helpful? 0
  • +
  • -

#6 CTphpnwb  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 2896
  • View blog
  • Posts: 10,031
  • Joined: 08-August 08

Re: Dealing with single column multi header text file

Posted 10 April 2009 - 03:07 PM

You're welcome.

By the way, here's code that will loop through the list, assuming they're separated by consecutive newline characters:
<?php
$data = "UR|221162
TI|The Prophet Unarmed; Trotsky : 1921-1929
PR|10.00
AA|Deutscher, Isaac
BD|Hardcover
NT| blue cloth boards; interior is clean and unmarked; tight binding; 490 pages
CO|0
LO|Steel Shelves
PC|2.13
SD|
CN|Fine
PP|London
DP|1959
JK|No Dust Jacket
PU|Oxford Univ Press
XA|4
XB|5
XC|BO
XD|S

UR|011157
TI|Stirling Moss
PR|12.98
AA|Robert Edwards
BN|1841882003
BD|Soft Cover with French ...
NT|1841882003 Even though this is a brand new book, there is a small vertical (but barely perceptible) crease where the front flap ends. 360 pages; profusely illustrated
CO|0
PC|6.49
SD|
CN|New
PP|United Kingdom
DP|2002
KE|AUTOMOBILE RACING BIOGRAPHY DRIVERS SPORTS
PU|Weidenfeld & Nicolson
XA|4
XB|5
XC|BO
XD|S";


$A = explode("\n\n",$data);
var_dump($A);
echo "<br><br>";
foreach($A as $B) {
	$C = explode("\n",$B);
	foreach($C as $D) {
		$E = explode("|",$D);
		$mydata[$E[0]] = $E[1];
	}
	var_dump($mydata);
	echo "<br><br>";
}

?>

Was This Post Helpful? 0
  • +
  • -

#7 LoveSquid  Icon User is offline

  • New D.I.C Head

Reputation: 1
  • View blog
  • Posts: 47
  • Joined: 14-October 08

Re: Dealing with single column multi header text file

Posted 13 April 2009 - 07:12 PM

Thanks CTphpnwb for all your help thus far!

I am still unable to get everything working. I am trying to pull the file in where the name of each file is different based on date and time. They all begin with delete and have the .msg extension. I have been trying to use the glob function to pull them into an array and then explode each one. The trouble lies in that, even when I physically change the file name and grab call it directly, in the end it only displays the last entry. I think it has to do with the fact that the last last line in the entire file is a double newline. Meaning to blank lines where as there is only one blank between entries. If that affects the script overall, any pointers on how to handle it would be very helpful. If there is something about how this run, please point it out because I have re-written using every possible function I can think of (i.e., file, fopen, fget), just to see if I can return all entries in the file. I can return the last in any number of different ways, but never all. Here is my code thus far:
fforeach(glob("delete*.msg") as $filename) {

$file = file($filename) or exit('could not open file');
foreach($file as $key => $value) {
	if ($key <= 4) {unset($file[$key]);}
	else {break;}
}

$file = implode("", $file);

$handle = fopen($filename, 'w') or exit('could not read file');
$write = fwrite($handle, $file) or exit('could not write to file');


		$record = explode("\n\n",$file);
		foreach($record as $row) {
			$line = explode("\n",$row);
			foreach($line as $line_value) {
				$values = explode("|",$line_value);
				$mydata[$values[0]] = $values[1];
				
		}
			var_dump($mydata);
			echo "<br><br>";
	}

}



Thanks again CTphpnwb for all your help, I have learned quite a lot from your code so far!!

This post has been edited by LoveSquid: 14 April 2009 - 08:54 AM

Was This Post Helpful? 0
  • +
  • -

#8 Hary  Icon User is offline

  • D.I.C Regular

Reputation: 44
  • View blog
  • Posts: 427
  • Joined: 23-September 08

Re: Dealing with single column multi header text file

Posted 13 April 2009 - 11:16 PM

You could split up the entries by using a start "UR|" and end "XD|", as these indicate the real start/end of a record. Depending on whitespace is kinda tricky.
Was This Post Helpful? 0
  • +
  • -

#9 LoveSquid  Icon User is offline

  • New D.I.C Head

Reputation: 1
  • View blog
  • Posts: 47
  • Joined: 14-October 08

Re: Dealing with single column multi header text file

Posted 16 April 2009 - 11:24 AM

I have been trying to get the original to work using every conceivable method, but it only returns the last record. I know I am missing something, but I am lost.

Any help would be great. If there is a method for defining the start and end of a record as proposed by Hary, a pointer or 2 in that direction would be appreciated.

Thanks.

This post has been edited by LoveSquid: 16 April 2009 - 12:16 PM

Was This Post Helpful? 0
  • +
  • -

#10 Hary  Icon User is offline

  • D.I.C Regular

Reputation: 44
  • View blog
  • Posts: 427
  • Joined: 23-September 08

Re: Dealing with single column multi header text file

Posted 16 April 2009 - 12:14 PM

I will have a look at it tomorrow, now off to bed.
Was This Post Helpful? 0
  • +
  • -

#11 LoveSquid  Icon User is offline

  • New D.I.C Head

Reputation: 1
  • View blog
  • Posts: 47
  • Joined: 14-October 08

Re: Dealing with single column multi header text file

Posted 16 April 2009 - 12:16 PM

Thanks, Hary, I appreciate it!

EDIT: If it would help at all, I can post a link to an actual file, perhaps I am missing something there. I have looked at the files in Emacs and Vi and both show all lines terminating in ^M and all "empty" lines containing ^M. Not sure if that makes any difference as I am not familiar with the char ^M, though it appears to be generated by Macs(?).

This post has been edited by LoveSquid: 16 April 2009 - 12:21 PM

Was This Post Helpful? 0
  • +
  • -

#12 LoveSquid  Icon User is offline

  • New D.I.C Head

Reputation: 1
  • View blog
  • Posts: 47
  • Joined: 14-October 08

Re: Dealing with single column multi header text file

Posted 16 April 2009 - 04:28 PM

I think I have solved my original problem. I am now working on the next step which is preparing it to insert into the database. I have experience with inserting CSV data, so hopefully this will go smoothly. I do need a decent method for removing the "dummy" record inserted in each file. It begins and contains text where the others only consist of numerical values. If you know of a quick method, feel free to chime in, otherwise I will be plugging away and once I get this resolved, I will update this thread.

Thanks for the help thus far!

This post has been edited by LoveSquid: 16 April 2009 - 04:28 PM

Was This Post Helpful? 0
  • +
  • -

#13 CTphpnwb  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 2896
  • View blog
  • Posts: 10,031
  • Joined: 08-August 08

Re: Dealing with single column multi header text file

Posted 16 April 2009 - 06:32 PM

View PostLoveSquid, on 16 Apr, 2009 - 02:16 PM, said:

Thanks, Hary, I appreciate it!

EDIT: If it would help at all, I can post a link to an actual file, perhaps I am missing something there. I have looked at the files in Emacs and Vi and both show all lines terminating in ^M and all "empty" lines containing ^M. Not sure if that makes any difference as I am not familiar with the char ^M, though it appears to be generated by Macs(?).

Control-m is carriage return, which is different from new line. You could do a preg_replace on it.

Post the file, too.

Quote

I do need a decent method for removing the "dummy" record inserted in each file. It begins and contains text where the others only consist of numerical values.

Use the is_numeric function to test for the dummy record.
Was This Post Helpful? 1
  • +
  • -

#14 LoveSquid  Icon User is offline

  • New D.I.C Head

Reputation: 1
  • View blog
  • Posts: 47
  • Joined: 14-October 08

Re: Dealing with single column multi header text file

Posted 16 April 2009 - 07:34 PM

Thanks again CT! I had looked a the is_numeric, is_string and empty functions about a half-hour ago and started working on a new comparison loop. Here is a link to the file (zipped) and if you want to just view (unzipped) (my linux system opens the msg mime/type as text).

I appreciate your assistance and I have learned enough about arrays to know that you can do almost anything with them if you know what do and when! I had always thought of them as rigid data holders and used them in only very limited manners.

This post has been edited by LoveSquid: 16 April 2009 - 07:39 PM

Was This Post Helpful? 0
  • +
  • -

#15 LoveSquid  Icon User is offline

  • New D.I.C Head

Reputation: 1
  • View blog
  • Posts: 47
  • Joined: 14-October 08

Re: Dealing with single column multi header text file

Posted 19 April 2009 - 03:59 PM

CTphpnwb, thanks for the info on ^M. Using preg_replace on it got me past my sticking point and moving forward. I am now working on is_numeric to skip the dummy record. The dummy record is not always at the end, so I need to identify it definitively. Now I need to figure out where to do the is_numeric check.
Was This Post Helpful? 0
  • +
  • -

  • (2 Pages)
  • +
  • 1
  • 2