Page 1 of 1

Understanding and Reading binary files in C++ How to read a binary file and what is a binary file Rate Topic: ****- 3 Votes

#1 Aphex19  Icon User is offline

  • Born again Pastafarian.
  • member icon

Reputation: 614
  • View blog
  • Posts: 1,873
  • Joined: 02-August 09

Posted 24 April 2010 - 05:42 PM

*
POPULAR

Hi, welcome to this tutorial on (reading) binary files.

First of all, what is a binary file?

I am refering to a file which is not meant to be read directy by humans, only by computers, in other words, not just ASCII text files, only machine readable files, although some binary files may contain portions of ASCII code. Binary files are basically made by computers, for computers... usually

What do binary files contain and what are they used for?

All files just contain a sequence of bits but we will be reading 8bits (1 Byte) at a time as this is the format in which most are written, especially executable files (such as Windows exe, Roms etc...).

Binary executable files often contain operation codes which will be executed by your CPU, for example, an Assembler might compile the following mnemonics into the following opcodes (On a x86 architecture)

Mnemonic: INC EAX		Opcode: 0x40 
Mnemonic: MOV EAX, [ESP+4]	Opcode: 0x8B 0x44 0x24 0x04


Although, this is not always the case as a binary file can mean any file that is compiled, such as data (sound, image data etc etc...) which may not be an opcode, it may just be data.

On a side note, it worth mentioning that 1 byte can only ever store 8 bits (0-7), but sometimes they are refered to as signed or unsigned, an unsigned number will only be used to represent positive numbers, while a signed number can be either positive or negative. The way negative numbers are reperesented is by using the sign bit (most significant bit) as a "negative/sign flag", this is called the "two's compliment", but I'm not going to go into that here.

What's the difference between ASCII and binary files?

Although this has been partially explained, I'll explain further. Have you ever opened up an ".exe" file in a text editor and seen a bunch of garbage, much like this?

Posted Image

If not, then you are missing something, try it
What you are seeing is the ASCII reperestation of each byte contained within that file, which isn't created to be viewed as text. Every character you see is a reperesentation of some byte (number between 0-255) stored in that file and your apathetic text editor is translating those values using the ASCII (American Standard Code for Information Interchange) encoding system, into characters for your viewing pleasure.

You could look at this the other way round, if you view a text file with a hex editor (I advice using free Hex Editor Neo), you will see that each character has its own value. Take a look at this chart to see what I mean.

So are we going to program or what?!?

Yes, yes.

Their are quite a few ways to open a file in binary mode in C/C++, but my prefered method is to use fread() and fopen() as these give you some great features that other don't, but I'm sure you believe me :no:

Anyway, this code will simply open a file in binary mode and output the first 100 bytes in Hexadecimal. (By the way, the reason we use Hexadecimal is because 4 bits (1 Nybble) = 1 hex character, so 8 bits set to 1 = FF, it's a perfect numeral system imho )

#include <stdio.h>
#include <iostream>

using namespace std;

// An unsigned char can store 1 Bytes (8bits) of data (0-255)
typedef unsigned char BYTE;

// Get the size of a file
long getFileSize(FILE *file)
{
	long lCurPos, lEndPos;
	lCurPos = ftell(file);
	fseek(file, 0, 2);
	lEndPos = ftell(file);
	fseek(file, lCurPos, 0);
	return lEndPos;
}

int main()
{
	const char *filePath = "C:\\Users\\UrName\\Desktop\\testFile.bin";	
	BYTE *fileBuf;			// Pointer to our buffered data
	FILE *file = NULL;		// File pointer

	// Open the file in binary mode using the "rb" format string
	// This also checks if the file exists and/or can be opened for reading correctly
	if ((file = fopen(filePath, "rb")) == NULL)
		cout << "Could not open specified file" << endl;
	else
		cout << "File opened successfully" << endl;

	// Get the size of the file in bytes
	long fileSize = getFileSize(file);

	// Allocate space in the buffer for the whole file
	fileBuf = new BYTE[fileSize];

	// Read the file in to the buffer
	fread(fileBuf, fileSize, 1, file);

	// Now that we have the entire file buffered, we can take a look at some binary infomation
	// Lets take a look in hexadecimal
	for (int i = 0; i < 100; i++)
		printf("%X ", fileBuf[i]);

	cin.get();
	delete[]fileBuf;
        fclose(file);   // Almost forgot this 
	return 0;
}


NOTE: You will need to update the file path to suite your needs, include double backslashes as C++ interprets the backslash as a "ignore the next character" symbol. Use whatever file you want

I won't go into to much detail in the code because the comments are pretty self explainitory, although, i'll mention a few things.

  • In this code, I used the C++ way of allocating/deallocating memory (new and delete) just to stay "up to date", but "malloc()" is just as good if not sometimes more suitable.
  • Note how I check if the file pointer returned NULL before I allowed the program to continue
  • The fread function simply copies the specified amount of bytes into the buffer a certain amount of times, from a certain offset, if you wish to specify an offset then do this. fread(&fileBuf[0x100], fileSize, 1, file);, now the data will be copied in to the buffer starting from 0x100. This is part of the reason fread is so great
  • Always deallocate your memory that you allocate, otherwise the world might implode!! (or you will get a memory leak, which you do not want).
  • I used the format string "%X " in printf, an uppcase X is telling printf to output the contents of the buffer in uppercase hexadecimal, you could try "%c" and you will see the ASCII again


C++ kindly gives us types which will automatically cast between character data (ASCII) and numeral (Binary) depending on the type used, and printf allows us to overide this to view the hexadecimal in the buffer, like you opened the file in hex editor.


Why would I open a file in binary?

Opening files in binary can have many purposes, such as reading image files, audio files, video files or even reading ROM files. Most data files has headers which give infomation on the file, try creating an image header reader or a ROM header reader (NES Rom?) :yes:

Anyway, if your still here, thanks for reading, I made this tutorial because (when I first started to write emulators), I couldnt find a simple tutorial on how to read and what to do with binary files. I really this helps someone.

Peace


This post has been edited by Aphex19: 26 April 2010 - 07:49 AM


Is This A Good Question/Topic? 9
  • +

Replies To: Understanding and Reading binary files in C++

#2 Guest_cpp_guest_viewer*


Reputation:

Posted 27 April 2010 - 03:25 AM

nice tutorial,thanks for giving me some idea.
Was This Post Helpful? 0

#3 athlon32  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 116
  • View blog
  • Posts: 363
  • Joined: 20-August 08

Posted 27 April 2010 - 08:07 AM

AWESOME! You're a great author too :D
Was This Post Helpful? 1
  • +
  • -

#4 Aphex19  Icon User is offline

  • Born again Pastafarian.
  • member icon

Reputation: 614
  • View blog
  • Posts: 1,873
  • Joined: 02-August 09

Posted 03 May 2010 - 01:17 PM

Wow, I didn't expect such a good reception, you're welcome :)
Was This Post Helpful? 0
  • +
  • -

#5 X-Team  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 4
  • Joined: 07-July 09

Posted 05 May 2010 - 06:16 AM

Unlike the other tutorials on the Internet this helped me a lot!
Thanks for writing this tutorial and keep up the good work! 0_0
Was This Post Helpful? 0
  • +
  • -

#6 PexMech  Icon User is offline

  • New D.I.C Head

Reputation: 2
  • View blog
  • Posts: 6
  • Joined: 24-February 11

Posted 24 February 2011 - 03:13 PM

Thanks for a great helpful guide!

For a beginner like myself it's really welcome.

Actually noticed one interesting thing right away, (using backslashes) which never seen before.

Heard of:
<!-- Note, bla bla bla --> In xml files.
And
;For hide notes in other textfiles like Windows HOSTS for example.

But what meaning has this little symbol?:

{

}
Was This Post Helpful? 0
  • +
  • -

#7 ApexTheCoder  Icon User is offline

  • New D.I.C Head

Reputation: 1
  • View blog
  • Posts: 10
  • Joined: 24-March 13

Posted 30 March 2013 - 02:30 PM

Thanks for the tutorial. I know how to read binary files etc, but this tutorial has made it a lot clearer. Thank you.
Was This Post Helpful? 0
  • +
  • -

#8 Fheli  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 1
  • Joined: 29-January 14

Posted 29 January 2014 - 07:36 AM

Thanks for the insights in reading the binary file. I am a biginner when it comes to programing in c++, If I have a data that is 2BYTES (16 bits). When i compile this source code with mydata it only shows 8 bits but i want to see 16 bits. i tried to multiply the BYTE by 2 and it gave error. Please help.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1