Having trouble parsing binary file

getting extra garbage when reading specific chunks

Page 1 of 1

10 Replies - 2985 Views - Last Post: 16 July 2009 - 02:24 PM Rate Topic: -----

#1 manafi   User is offline

  • New D.I.C Head

Reputation: 5
  • View blog
  • Posts: 46
  • Joined: 06-November 08

Having trouble parsing binary file

Posted 15 July 2009 - 11:09 AM

Hi,
I'm trying to parse a binary terragen terrain file to retrieve vertices and height values, but I don't seem to properly understand the procedure. The file format is listed here. Supposedly, there is a definite header, consisting of 16 bytes, followed by a SIZE chunk: a 4-byte SIZE marker and then a 2-byte int representing the smallest dimension. From there, the chunks can vary, but I'm not worried about that yet.

I have included my map file I tried to upload my map, but I can't upload a .ter file.
It was exported straight from TerraGen.
Because of that, I am assuming it is a valid binary file.

Here is what I have been testing so far.

NOTE: some of these includes are old; I hijacked a file from an old project I stopped working on.
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
#include <fstream>
#include <vector>
#include <map>
using namespace std;

int main()
{
	ifstream in;
	in.open("map.ter", ios::binary);
	if(!in.good())
	{
		cout<<"bad file name."<<endl;
		cin.get();
		return -1;
	}
	char terragen[16];
	char chunkHeader[4];
	char nPts[2];
	char xPts[2], yPts[2];
	int length = 8;
	in.read(terragen, 16);
	cout << reinterpret_cast<char*>(terragen) << endl;
	in.read(chunkHeader, 4);
	cout << chunkHeader <<endl;
	in.read(nPts, 2);
	cout << reinterpret_cast<int>(nPts);
	in.read(nPts, 2);
	in.read(chunkHeader, 4);
	cout <<endl << chunkHeader;
	if(chunkHeader == "XPTS")
		cout << endl << "Matched header to XPTS.";
	else
		cout << endl << printf("Extra garbage messed up reading, size: %d", sizeof(chunkHeader));
	cin.get();
}



This is working somewhat. I can see it printing the expected headers, like first is "TERRAGEN", followed by "TERRAIN ". However, tacked onto the end of these headers is garbage text. It happens for every read that I do, even if I have specified the correct length variable.

Like I said, I'm not that versed in parsing binary files; in fact, I failed it once before back when I was in school. But now that I want to include this in my own program, I have to actually learn it (damn you professors trying to teach me actual useful information!!!)

If anyone can help me with this problem, I would appreciate the effort.

If the .ter file is necessary, I will try to upload it somewhere else.

Thank you.

Attached image(s)

  • Attached Image


Is This A Good Question/Topic? 0
  • +

Replies To: Having trouble parsing binary file

#2 brds   User is offline

  • D.I.C Addict
  • member icon

Reputation: 76
  • View blog
  • Posts: 515
  • Joined: 22-October 08

Re: Having trouble parsing binary file

Posted 15 July 2009 - 12:17 PM

The terrain file is probably fine, I believe that the handling of c-style strings that is killing it.

The string "TERRAGENTERRAIN " is 17 characters in length.
char arr[] = "String";
arr[0] == 'S';
...
arr[5] == 'g';
arr[6] == '\0'; /* String termination character */


So your code is just printing until it reaches a '\0'.

You cannot compare c-style strings with ==
if(arr == "String") /* should cause warning during compilation */

if(strcmp(arr, "String") /* compare with strcmp, from string.h */



You can compare c++ string objects with ==
string mystring = "Blah!";
if(mystring == "Blah!") /* Valid */


Was This Post Helpful? 0
  • +
  • -

#3 wildgoose   User is offline

  • D.I.C Regular
  • member icon

Reputation: 67
  • View blog
  • Posts: 468
  • Joined: 29-June 09

Re: Having trouble parsing binary file

Posted 15 July 2009 - 01:13 PM

Yes I concur. These are not ASCIIz strings. They are fixed byte blocks to compare. Don't use a string compare unless you copy it to another buffer. Otherwise do a TIFF tag type comparison, which you should be using for the 8 byte chunks.
typedef struct Chunk
{
	uint ChunkId;
	uint nSize;
};


Some compilers can understand
uint nId = 'STPX';					// [XPTS] in reverse!


Note the single quotes not double!

But a more cross platform method
#define IFFID( a, b, c, d ) ((uint32)( (a) | ((b)<<8) | ((c)<<16) | ((d)<<24)))


#define IFFID_XPTS	IFFID( 'X','P','T','S' );
#define IFFID_YPTS	IFFID( 'Y','P','T','S' );


read( &blk, sizeof(blk) )
if (blk.ChunkId == IFFID_XPTS )


This post has been edited by wildgoose: 15 July 2009 - 01:14 PM

Was This Post Helpful? 0
  • +
  • -

#4 manafi   User is offline

  • New D.I.C Head

Reputation: 5
  • View blog
  • Posts: 46
  • Joined: 06-November 08

Re: Having trouble parsing binary file

Posted 15 July 2009 - 02:14 PM

View Postwildgoose, on 15 Jul, 2009 - 12:13 PM, said:

#define IFFID( a, b, c, d ) ((uint32)( (a) | ((b)<<8) | ((c)<<16) | ((d)<<24)))


#define IFFID_XPTS	IFFID( 'X','P','T','S' );
#define IFFID_YPTS	IFFID( 'Y','P','T','S' );


read( &blk, sizeof(blk) )
if (blk.ChunkId == IFFID_XPTS )




So if I used this struct notation, the various byte 'markers' from the file would be stored in blk.ChunkId, i.e., your XPTS and YPTS examples, TERRAGEN, TERRAIN , SIZE, etc? And from there I should be able to verify the right information has been read using the comparison shown above?

I've had a bit more luck using the <cstdio.h> functions, but I'd really like to understand this as well.

Even in the c implementation, the values associated with XPTS and YPTS are supposed to be the width/height of my map, something like 256 or 512, but when displaying those as 2-byte ints, they return 11. Once I read the information in properly using a C++ implementation, how could I then display the integer representation of those values?

Thanks.

This post has been edited by manafi: 15 July 2009 - 02:15 PM

Was This Post Helpful? 0
  • +
  • -

#5 wildgoose   User is offline

  • D.I.C Regular
  • member icon

Reputation: 67
  • View blog
  • Posts: 468
  • Joined: 29-June 09

Re: Having trouble parsing binary file

Posted 15 July 2009 - 02:26 PM

You'll have to create an 8 byte IFFID or use the 4 byte twice to check the header's.

But yes should work. The good way to parse your file is to do so using the Chunk structure, but you can do it as individual fields.

Since you can't post a file. Do this instead.
char a[64];
fread( &a, sizeof(a), 1, fp );

for (i = 0; i < 64; i++)
{
   printf( "0x%2.2x ", a[i] );
   if (i & 3 == 3)  printf( "\n" );
}



And then cut'n'paste your output here so we can see the first 64 bytes!

That spec didn't mention if the data was in Big-Endian or Little-Endian form.

You could also run debug.exe and dump the first 64 bytes of the file.

This post has been edited by wildgoose: 15 July 2009 - 02:27 PM

Was This Post Helpful? 1
  • +
  • -

#6 wildgoose   User is offline

  • D.I.C Regular
  • member icon

Reputation: 67
  • View blog
  • Posts: 468
  • Joined: 29-June 09

Re: Having trouble parsing binary file

Posted 15 July 2009 - 02:48 PM

I just downloaded a different terragen file type and it is similar to a TIFF file format, has chunk headers with sub-headers, but uses Little-Endian same as 80x86 processor.

Here's a function I use in my logging mechanism designed to dump blocks of memory. You can read in data and dump using it. Just modify to work for you! I removed some of what you didn't need and replaced the raw (redirectable) print with a printf().

//
//  Memory Dump
//
#define xMEM_DUMP_PAD	"	  "

void CLogFile::Mem( const void * const vp, uint nSize, const char *szText, ... )
{
	uint  x, run, tail, col, nOffset, nAddr;
	char *p, *r, *s, buf[256], *pStr;
	byte *mp;
   bool flg;

   ASSERT_PTR(vp);
   ASSERT_ZERO(nSize);

	 // If provided, write out the title

   if ( NULL != szText )
   {
	  char	buff[ LOG_BUF_MAX ];
	  va_list	args;

	  va_start( args, szText ); 
	  vsprintf(buff, szText, args);
	  va_end( args );
	  printf(buff);   	// Print a Banner
	  flg = false;
   }
   else
   {
	  flg = true;
   }

	  //Sub-Header

   sprintf(buf, "  Memory Dump: 0x%8.8x   Size: 0x%x (%u)", (uint)vp, nSize, nSize );
   printf(buf);

   //	Memory Dump (Old DOS debug method)

	mp = (byte *) vp;
	run = 16;
   nAddr = (uint) (byte*)vp;
   strcpy(buf, xMEM_DUMP_PAD);
   pStr = buf + sizeof(xMEM_DUMP_PAD) - 1;

		//  For all lines

	for ( nOffset = 0; nOffset < nSize;  )
	{
	  col = tail = 0;
	  p = pStr;

	  p += sprintf(p, "%8.8x ", nOffset );
	  r = p + (16*3 + 3 + 2);
	 s = r + 16;
	
	  if ( nSize-nOffset < run )
	 {
		run = nSize-nOffset;   // Display only that requested!
		 tail = 16 - run;		// trailing places
	 }

	 memset(p, ' ', s-p);
	
			//  Hex values

		for ( x = 0; x < run; x++, mp++ )
		  {
			p += sprintf( p, " %2.2x", *mp );	//	ASCII chars

			if (( 0x20 <= *mp ) && ( *mp <= 0x7f ))
			  {
				*r++ = *mp;
			  }
			else
			  {
				*r++ = '.';
			 }

			if (( ++col % 4 ) == 0 )
			  {
				*p++ = ' ';
			  }
		  }

		 *p = ' ';	// Hex - ASCII fill

		  s += sprintf( s, "  {0x%8.8x}", nAddr );
		  printf( buf );

		nAddr += run;	// (Real) Address - next text line
		nOffset += run;	// Offset - next text line
	  }

	printf( "\r\n" );
}


Was This Post Helpful? 0
  • +
  • -

#7 manafi   User is offline

  • New D.I.C Head

Reputation: 5
  • View blog
  • Posts: 46
  • Joined: 06-November 08

Re: Having trouble parsing binary file

Posted 15 July 2009 - 07:33 PM

Okay, I ran your code and here is what the first 64 bytes looks like

0x54 0x45
0x52 0x52
0x41 0x47
0x45 0x4e
0x54 0x45
0x52 0x52
0x41 0x49
0x4e 0x20
0x53 0x49
0x5a 0x45
0x00 0x01
0x00 0x00
0x58 0x50
0x54 0x53
0x01 0x01
0x00 0x00
0x59 0x50
0x54 0x53
0x01 0x01
0x00 0x00
0x53 0x43
0x41 0x4c
0x00 0x00
0xfffffff0 0x41
0x00 0x00
0xfffffff0 0x41
0x00 0x00
0xfffffff0 0x41
0x43 0x52
0x41 0x44
0x00 0x10
0xffffffc7 0x45

Was This Post Helpful? 0
  • +
  • -

#8 wildgoose   User is offline

  • D.I.C Regular
  • member icon

Reputation: 67
  • View blog
  • Posts: 468
  • Joined: 29-June 09

Re: Having trouble parsing binary file

Posted 15 July 2009 - 08:37 PM

Should have had you dump more bytes. (Next time use the 'Mem' function I posted!)

Based upon the dump, and the spec, you're 257, Width x Height, not 256.
Your X and Y are square!

0x54 0x45 0x52 0x52 0x41 0x47 0x45 0x4e		  "TERRAGEN"
0x54 0x45 0x52 0x52	0x41 0x49 0x4e 0x20		  "TERRAIN "

-Block-
0x53 0x49 0x5a 0x45				"SIZE" 
	0x00 0x01		0x0100 (256)   is N-1 thus N is 257
	0x00 0x00		0x0000 (0)	 {padding}

-SubBlock-
	0x58 0x50 0x54 0x53		"XPTS"
	0x01 0x01		0x0101 (257) {.xpts}
	0x00 0x00		0x0000 (0)   {padding

	0x59 0x50 0x54 0x53		"YPTS"
	0x01 0x01		0x0101 (257) {.ypts}
	0x00 0x00		 	0x0000 (0) {.pad}

	0x53 0x43 0x41 0x4c	"SCAL"
	0x00 0x00 0xf0 0x41		(float)x
	0x00 0x00 0xf0 0x41		(float)y
	0x00 0x00 0xf0 0x41		(float)z

	0x43 0x52 0x41 0x44		"CRAD"
	0x00 0x10 0xc7 0x45		(float) radius


Was This Post Helpful? 0
  • +
  • -

#9 manafi   User is offline

  • New D.I.C Head

Reputation: 5
  • View blog
  • Posts: 46
  • Joined: 06-November 08

Re: Having trouble parsing binary file

Posted 16 July 2009 - 07:45 AM

Ok, I'm starting to connect what I see, but I'm still getting lost along the way.

First, about your Mem function: when compiling, I'm getting a few errors from your identifiers. What did you declare ASSERT_PTR, ASSERT_ZERO, and LOG_BUF_MAX as? Or are those defined in another library? Also, in your function parameter list, you put ... at the end; I've never seen that, is that valid?

Second, from your analysis, it looks like the data is in little-endian, right? But it looks as though, for example, SIZE is kept in the order 0x53 0x49 0x5a 0x45, while the value is flipped to 0x01 0x00. This is probably the part that confuses me most...does endianness only apply to number values? Or am I not seeing that it applies to characters as well? I never did pick up on Big/Little Endian, and we never focused on it for long.

And finally, the loop you gave to print out the first 64 bytes--is that the only reason the printout was in hex format? Or is the data inherently stored as hex values? I guess I mean, should I expect to extract the data and have to convert from hex to decimal to get the size, xpts, ypts, etc.

Hmm, just thinking out loud here, but 256 can't be represented in 8 decimal bits, so it has to be hex value, right?
Was This Post Helpful? 0
  • +
  • -

#10 wildgoose   User is offline

  • D.I.C Regular
  • member icon

Reputation: 67
  • View blog
  • Posts: 468
  • Joined: 29-June 09

Re: Having trouble parsing binary file

Posted 16 July 2009 - 11:05 AM

you need to modify the mem function.
Change the function name and
Remove the assertions
#define LOG_BUF_MAX   256


void LogMem( const void * const vp, uint nSize, const char *szText, ... )
{

//   ASSERT_PTR(vp);
   //ASSERT_ZERO(nSize);




That company is using their four byte ASCII tags from LSB to MSB order. This is the same orientation as Little-Endian.

The 16-bit data within the file is definitely Little-Endian, LSB to MSB.

That snippet you printed for some reason was two bytes wide not the four bytes wide the ~3 == 3 should have done! But the hex codes were forced to 2 characters but those bytes with their MSB set were treated as signed and thus were being signed extended so instead of

FE they were showing up as FFFFFFFE
I printed an "0x" in front followed by "%x" for hex but I told it to do formatting and insert leading zeros thus I used "%2.2x".
If the data provided was an unsigned integer FE would have be FE not the FFFFFFFE.

A byte IS 8 bits XXXX XXXX D7, D6, D5, D4, D3, D2, D1, D0
char is typically signed 8-bit thus 7-bit + 1.
0x80...0....0x7f -128....0....127
If it was unsigned then it would be
0x00....0xFF 0....255

So let's look at the following
0x00   0x40  0x7F   0x80   0x81	0xC0  0xFF	(Hex)
  0		64	 127	  128   129	  192	255	(Unsigned Decimal)
  0		64	 127	  -128  -127	 -65	 -1	 (Signed Decimal)


Was This Post Helpful? 1
  • +
  • -

#11 manafi   User is offline

  • New D.I.C Head

Reputation: 5
  • View blog
  • Posts: 46
  • Joined: 06-November 08

Re: Having trouble parsing binary file

Posted 16 July 2009 - 02:24 PM

Thanks to you, I'm actually getting somewhere with the parsing of this program.

Thanks a lot for your assistance.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1