Welcome to Dream.In.Code
Become an Expert!

Join 149,478 Programmers for FREE! Get instant access to thousands of experts, tutorials, code snippets, and more! There are 1,487 people online right now. Registration is fast and FREE... Join Now!




Extracting mixed data from many files

 
Reply to this topicStart new topic

Extracting mixed data from many files

dhfeldman
16 May, 2007 - 05:53 PM
Post #1

New D.I.C Head
*

Joined: 16 May, 2007
Posts: 4


My Contributions
Hi all,
First, let me apologize for not having code at this point. i do have a clear idea of what I'd like to do though, and apprciate suggestions to point me in the right direction. I have many binary data files (produced by a commercial package) and need to extract certain details from their headers into an easy to use format like a CSV or tab-delimited file that I can import into Excel or Access. The file format is quite complex, with a header of several thousand bytes containing a mix of text and numeric (binary floats and integers) data. The format of the file is completely documented by the commercial manufacturer, with offsets, data types, and size of each field in the header.

I would like to build a utility that will loop through each file (using a mask with a particular extension) in a Windows (or DOS for that matter) directory, pull certain values at specific offsets out of each file's header, and construct a tab-delimited string from the extracted data. A CR/LF would terminate each string. A file of these strings with all the extracted results would be saved.

I could develop this with considerable effort in Pascal, the language with which I am (or at least was, some years ago) most familiar, but I suspct this would be far easier to do with perl or Python; furthermore, I want to get my feet wet with perl (or python) and this project would be the perfect way to do that.

I've been reading "Learning Perl", by Schwartz et al, but have barely any practical experience with it. I've been hearing good things about Python, and would consider putting my effort into that too.

Here's what I'd like to do:

loop through each file in directory
concatenate each of the following to a tab-delimited string:
Filename
Creation date
Creation time
Extract certain strings from header at offset:length (there will be several of these)
Extract certain numerical values from header (offset:type; convert to alphanumerics)
terminate line with CR/LF
(done with that file)
(until done with all files)

What's a simple way to loop through all the files (with a filename mask and just process files since a certain date)?
How do I handle the parsing out of specific (known offsets, lengths), non-contiguous bytes, some char, some float, some integer?

thanks for the advice.

dhfeldman


User is offlineProfile CardPM
+Quote Post

KevinADC
RE: Extracting Mixed Data From Many Files
16 May, 2007 - 09:39 PM
Post #2

D.I.C Head
Group Icon

Joined: 23 Jan, 2007
Posts: 238



Thanked: 6 times
Dream Kudos: 50
My Contributions
The problem is that nothing is simple when you don't know the language. Here are some perl functions to look at:

grep - to get the file names
stat - to get just the date range (or use file test operators)
sysopen - open the files
sysread - read the data you want
syswrite - write output to a new file

This post has been edited by KevinADC: 16 May, 2007 - 09:39 PM
User is offlineProfile CardPM
+Quote Post

dhfeldman
RE: Extracting Mixed Data From Many Files
21 May, 2007 - 03:04 PM
Post #3

New D.I.C Head
*

Joined: 16 May, 2007
Posts: 4


My Contributions

Thanks to KevinADC for the suggestions. This has helped get on the right track.
So far I've just experimented with extracting values out of single file headers.
I've successfully read strings from within the mixed header, but so far have not successfully recovered usable floating point binary values. The documentation of the header shows that at a certain offset, a series of 4 byte "float" (as implemented by MS C++, I think) are to be found. When I try to read as follows, all I get is zeros in the PRINT output. I've succesfully read these values using Delphi (where I read into a variable of type SINGLE, which is a 4 byte floating point type in Delphi. Using Delphi's FloatToString function), where I see the number as expected. So am I doing this wrong in Perl, or does Perl not implement floating point numbers the same way as MS C++ and Delphi do? I admit to being baffled by Perl's inate conversion between numbers and strings, it's lack of "strong typing", and the apparent lack of explict conversion functions to force a certain format. (This is why I put the %e format string in the output.) It's not obvious to me how perl would know that the 4 bytes is in fact a float, that needs to be represented appropriately. Below is a snippet that shows my results.

[code]
# reading header of .ABF file, which is a binary file that has a header of over 6 kbyte, that is a mix of
# numeric and char strings, created by a commercial program coded in C++. A particular floating point 4 byte
# value resides ar offset of 4704 bytes. I try to read it by using sysseek() to set the file pointer to the offset,
# then sysread() to read 4 bytes.

sysopen (thefile,'c:\testabf1.abf',0) or die('sysopen failed'); # 0=read only
binmode (thefile);
$b=sysseek(thefile,4704,0);
$a=sysread (thefile,$fragment,4);

# try to force output of a formatted floating point
printf ("$a bytes read from offset $b: value extracted is %e\n",$fragment);

# or just let Perl try to figure out what I want to print
print ("$a bytes read from offset $b: value extracted is $fragment");

[resulting output]
4 bytes read from offset 4704: value extracted is 0.000000e+000
4 bytes read from offset 4704: value extracted is

# note value of 0.0 in the first case, and just a blank in the second

[\code]
User is offlineProfile CardPM
+Quote Post

KevinADC
RE: Extracting Mixed Data From Many Files
21 May, 2007 - 07:53 PM
Post #4

D.I.C Head
Group Icon

Joined: 23 Jan, 2007
Posts: 238



Thanked: 6 times
Dream Kudos: 50
My Contributions
I'm not sure mate. I'd have to see the record/file you are trying to read.
User is offlineProfile CardPM
+Quote Post

Fast ReplyReply to this topicStart new topic
Time is now: 1/7/09 04:02PM

Be Social

Dream.In.Code RSS Feed Dream.In.Code LinkedIn Group Follow Us On Twitter

Live Help!

Tutorials

Programming

Web Development

Reference Sheets

Code Snippets

DIC Chatroom

Bye Bye Ads

Monthly Drawing

Thumb Drive

Top Contributors

Top 10 Kudos This Month