This may be a very noobish question, but why write to files in binary? Is there any advantage over writing in characters?
18 Replies - 1919 Views - Last Post: 19 July 2011 - 04:33 PM
Replies To: Binary File Format
#2
Re: Binary File Format
Posted 17 July 2011 - 08:12 PM
speed: it's a lot faster to load binary files becuase they don't have to be converted to numbers and what not
size: binary files are generally much smaller. this is very important for things transferred over networks.
security: which is easier to exploit? a C++ source file or the application it produces?
size: binary files are generally much smaller. this is very important for things transferred over networks.
security: which is easier to exploit? a C++ source file or the application it produces?
#3
Re: Binary File Format
Posted 17 July 2011 - 08:25 PM
But there are also disadvantages. Harder to see when the file is not being written correctly. The file produced may not be compatible on a different machine. Depending on how the data was written the file may not be compatible with a program that was compiled by a different compiler even on the same machine.
Jim
Jim
#4
Re: Binary File Format
Posted 17 July 2011 - 08:49 PM
jimblumberg, just read one byte at a time, and define whatever file's variables as big or little endian in whatever format documentation.
Pro to text: edits easily without new program
Con to text: text is binary, heavily processed
Pro to bin: compresses well, already ready for the end program
Con to bin: bad documentation makes endian errors, new programs required for edits beyond hex tweaks.
PS to all, I think he's talking about a file written by the end program.
Pro to text: edits easily without new program
Con to text: text is binary, heavily processed
Pro to bin: compresses well, already ready for the end program
Con to bin: bad documentation makes endian errors, new programs required for edits beyond hex tweaks.
PS to all, I think he's talking about a file written by the end program.
This post has been edited by OLH064: 17 July 2011 - 08:51 PM
#5
Re: Binary File Format
Posted 17 July 2011 - 09:19 PM
Quote
just read one byte at a time, and define whatever file's variables as big or little endian in whatever format documentation.
But by reading and writing the file a byte at a time you loose any speed benefit of writing binary files. Writing a single byte at a time is much slower than writing large chunks at a time. It is not only endian issues but if you write a structure then there may also be structure packing issues to consider.
Jim
#6
Re: Binary File Format
Posted 18 July 2011 - 11:23 AM
I don't want a fl_me w_r, but I meant "read an array of bytes and handle them one at a time"
also, "whatever documentation" define structures and structure variable endians.
My file access methods read and write in chunks automatically. Err, semi-automatically. Gotta pass 0x4000 for the cluster.
also, "whatever documentation" define structures and structure variable endians.
My file access methods read and write in chunks automatically. Err, semi-automatically. Gotta pass 0x4000 for the cluster.
#7
Re: Binary File Format
Posted 18 July 2011 - 12:26 PM
alke4, on 18 July 2011 - 04:05 AM, said:
This may be a very noobish question, but why write to files in binary? Is there any advantage over writing in characters?
Every machine and architecture can read a byte which is the greatest platform non-dependency. The alternatives have special formats that are platform dependent.
#8
Re: Binary File Format
Posted 18 July 2011 - 02:34 PM
I just stumbled accross this thread and I'm embarrassed to admit that I have absolutely no idea what you're all talking about!
#9
Re: Binary File Format
Posted 18 July 2011 - 03:05 PM
lol, let me 'splain
say i want to store the number 16934704180. that takes a number of charters(12) which comes out to 12 bytes. however this number can fit in a 4 byte integer. so given a binary file for this number would be 3 times smaller.
another way to look at is that your storing the data in base 256 rather than base 10, it only take 4 digits to store it in base 256 but it takes 12 digits to store it in base 10. plus some conversions have to happen to convert text to numbers, not with binary though.
say i want to store the number 16934704180. that takes a number of charters(12) which comes out to 12 bytes. however this number can fit in a 4 byte integer. so given a binary file for this number would be 3 times smaller.
another way to look at is that your storing the data in base 256 rather than base 10, it only take 4 digits to store it in base 256 but it takes 12 digits to store it in base 10. plus some conversions have to happen to convert text to numbers, not with binary though.
This post has been edited by ishkabible: 18 July 2011 - 03:06 PM
#10
Re: Binary File Format
Posted 18 July 2011 - 04:35 PM
Thanks for the answers people, it really helped clarify things.
#11
Re: Binary File Format
Posted 18 July 2011 - 04:46 PM
Code fun.
Given this simple type:
Here's a text save and load:
Here's the binary for same:
There's more to it than just this. If I have some hellacious struct, with lots of different data types, the binary code really doesn't change. More, you can store an array and "randomly access" elements in the middle of the file pretty easily.
The downside of a binary file is that it's far from human readable. Also, structs in memory can line up a little differently on different systems, but are consistent on the same system. Save game files are usually binary, mostly because it's so trivial to just dump the data out. It's also probably the fasted format to read, since you're doing little more that dumping data from file to memory.
Given this simple type:
const int ROWS = 20, COLS = 50; typedef int Grid[ROWS][COLS];
Here's a text save and load:
void saveTxt(const string &fn, Grid g) {
fstream out(fn.c_str(), ios::out | ios::binary);
for(int row=0; row<ROWS; row++) {
for(int col=0; col<COLS; col++) {
out << g[row][col] << ' ';
}
out << endl;
}
out.close();
}
void loadTxt(const string &fn, Grid g) {
fstream is(fn.c_str(), fstream::in);
for(int row=0; row<ROWS; row++) {
for(int col=0; col<COLS; col++) {
is >> g[row][col];
}
}
is.close();
}
Here's the binary for same:
void saveBin(const string &fn, Grid g) {
fstream fh(fn.c_str(), ios::out | ios::binary);
fh.write((char *)g, sizeof(Grid));
fh.close();
}
void loadBin(const string &fn, Grid g) {
fstream fh(fn.c_str(), ios::in | ios::binary);
fh.read((char *)g, sizeof(Grid));
fh.close();
}
There's more to it than just this. If I have some hellacious struct, with lots of different data types, the binary code really doesn't change. More, you can store an array and "randomly access" elements in the middle of the file pretty easily.
The downside of a binary file is that it's far from human readable. Also, structs in memory can line up a little differently on different systems, but are consistent on the same system. Save game files are usually binary, mostly because it's so trivial to just dump the data out. It's also probably the fasted format to read, since you're doing little more that dumping data from file to memory.
#12
Re: Binary File Format
Posted 19 July 2011 - 01:17 AM
Thanks for explaining that ishkabible and baavgai, I'll have to try that out in my next C++ program in fact what would be a good trivial application for reading/writing binary files that a newb like myself could cook up?
#13
Re: Binary File Format
Posted 19 July 2011 - 02:22 AM
RetardedGenius, including variable-length data members (STL containers etc.) would add to the challenge a little as you may have to include binary data in the file which isn't in the structure, for example the size of the next variable-length member. Also, you may want to include a version number in your binary file, so future code working on past formats have the opportunity to deny loading it, or convert it.
So, you would want to store the length of name, address and friendIds.
struct SocialAddressBook
{
const char* name;
const char* address;
int age;
std::list<int> friendIds;
};
So, you would want to store the length of name, address and friendIds.
#14
Re: Binary File Format
Posted 19 July 2011 - 08:00 AM
The big disadvantage to text files is that they are serial in nature. That is I generally can't read the 234'th line of the file unless I have read the previous 233 lines (at least scanning for new line chars so I can count up to 234).
Now this is not necessarily an intrinsic property of text files -- you *could* make a format that would allow you to jump around in the text (for example making every line have 80 chars) but such formats are not terribly popular. See in a binary file an integer is a fixed size (say 2 or 4 bytes) but in a text file an integer could be 1 byte "0" or 5 "31415" or more... i.e. it is variable in length. The result generally is that text files are very serial in nature.
Binary files on the other hand don't tend to have nice "EOL" markers breaking the file up. So for a binary file to find information you need to be a little more tricky.
The first trick is to just describe a map and lay out byte-for-byte where data is. This is great because I can jump around anywhere I need to get at data. The problem is that what if my data isn't terribly static.
The next trick is to use fixed-length records. That is a layout byte-for-byte where data will be in a record, then I can repeat records over and over. This is great because I can jump to the 234'th record with just a little math and don't have to worry about the 233 before it.
Of course sometimes fixed length things are a bit of a problem because our data might not be fixed length. So you can use another trick where you create "fields" and have "field descriptor bytes/blocks" that begin each field and contain the "length of the field" data. So while it is harder to skip around in this kind of data because I have to read the field descriptors -- but to get to the 234'th field I only have to read the previous 233 descriptors so its a bit like skipping a stone along the water I only have to touch the file in a few places.
Or I could use a directory based format where I begin (or end) the file with an "index" that tells me where data can be found in the file. This generally requires a little more bookkeeping that other formats but can be very effective.
And of course I can use any combination of these techniques I may need. For example PDF files and TIFF images both tend to "embed" resources of other file formats inside. So for example Tiff is a tag (field) based format but its individual tags may be in other formats such as record based data.
All of the "tricks" that work for binary files can technically be done for text files as well but tend to be more a hassle to implement and usually drastically reduce the readability of the file and one of the big reasons to like text of binary is that you can read/edit the text files.
So there are various pros and cons to character-based and binary files. I for one prefer binary files but then again when I was a kid I really felt that looking at data in hex was "cool" and I have probably never really out grown that.
Now this is not necessarily an intrinsic property of text files -- you *could* make a format that would allow you to jump around in the text (for example making every line have 80 chars) but such formats are not terribly popular. See in a binary file an integer is a fixed size (say 2 or 4 bytes) but in a text file an integer could be 1 byte "0" or 5 "31415" or more... i.e. it is variable in length. The result generally is that text files are very serial in nature.
Binary files on the other hand don't tend to have nice "EOL" markers breaking the file up. So for a binary file to find information you need to be a little more tricky.
The first trick is to just describe a map and lay out byte-for-byte where data is. This is great because I can jump around anywhere I need to get at data. The problem is that what if my data isn't terribly static.
The next trick is to use fixed-length records. That is a layout byte-for-byte where data will be in a record, then I can repeat records over and over. This is great because I can jump to the 234'th record with just a little math and don't have to worry about the 233 before it.
Of course sometimes fixed length things are a bit of a problem because our data might not be fixed length. So you can use another trick where you create "fields" and have "field descriptor bytes/blocks" that begin each field and contain the "length of the field" data. So while it is harder to skip around in this kind of data because I have to read the field descriptors -- but to get to the 234'th field I only have to read the previous 233 descriptors so its a bit like skipping a stone along the water I only have to touch the file in a few places.
Or I could use a directory based format where I begin (or end) the file with an "index" that tells me where data can be found in the file. This generally requires a little more bookkeeping that other formats but can be very effective.
And of course I can use any combination of these techniques I may need. For example PDF files and TIFF images both tend to "embed" resources of other file formats inside. So for example Tiff is a tag (field) based format but its individual tags may be in other formats such as record based data.
All of the "tricks" that work for binary files can technically be done for text files as well but tend to be more a hassle to implement and usually drastically reduce the readability of the file and one of the big reasons to like text of binary is that you can read/edit the text files.
So there are various pros and cons to character-based and binary files. I for one prefer binary files but then again when I was a kid I really felt that looking at data in hex was "cool" and I have probably never really out grown that.
#15
Re: Binary File Format
Posted 19 July 2011 - 02:56 PM
Wait, so when you open a binary file, does the file have to have the .bin extension?
Like that?
ifstream read("text.bin", ios::binary);
Like that?
This post has been edited by alke4: 19 July 2011 - 02:56 PM
|
|

New Topic/Question
Reply




MultiQuote












|