This article is intended to address the plethora of posts to Dream.In.Code's C and C++ forum where a poster has doomed their program to weird behaviour under the innocent assumption that using "EOF" is a sure way to know that there's no more data to read from their file
- What?? E.O.F. stands for 'End Of File', surely that means there's no more data?
Take a look at this short program which works with a file storing 5 names, and more importantly, what happens when 'eof' is used as the breaking condition for reading data into a vector.
names.txt
Peter Paul Fred Tom Harry
bad.cpp
#include <fstream>
#include <iostream>
#include <algorithm>
#include <string>
#include <vector>
#include <cstdlib> //exit() and EXIT_FAILURE
const char* filename = "C:\\names.txt";
int main()
{
typedef std::ostream_iterator<std::string> output;
std::ifstream namefile(filename);
std::vector<std::string> names;
std::string input;
if(!namefile.is_open())
{
std::cerr << "Error opening file";
exit(EXIT_FAILURE);
}
while(!namefile.eof())
{
namefile >> input;
names.push_back(input);
}
std::cout << "Read " << names.size()
<< " names successfully\n";
std::copy(names.begin(), names.end(),
output(std::cout, "\n"));
std::cout.flush();
}
output from bad.cpp
Read 6 names successfully Peter Paul Fred Tom Harry HarryUh oh! First the program tells you that its read 6 names when there are only 5 names in the file, and then it goes on to display the last name twice.
What has happened is that the while loop has been repeated 6 times, because it is repeating until namefile has its internal "EOF" flag set. Remember that ifstream is not a file, its a stream, which means that it cannot know whether or not there is any more data in a file until after a failed attempt to read past the end of the file. After it read "Harry" from the file, the stream had not failed, therefore the EOF flag was still unset.
There's an even worse (more subtle) problem with this, based on how the >> operator works; it will always stop reading at the first 'whitespace' character it encounters - a space, a newline, a carriage return or a tab.
The file used in this example has a trailing whitespace character after the last name, however, if you modify the file, that the final character is y from "Harry", the stream will fail during the attempt to read "Harry", since it will not encounter any whitespace, and the eof flag will be set during that 5th and final read, and the program will "appear" to be working OK.
Obviously this is extremely unreliable. The one thing worse than a bug which you can see is a bug which you can't see! Any user, or other program which modifies this file, or even a part of this program which appends to this file may indiscriminately add a newline or space after the final data entry some other time, which means if you rely on the lack of a whitespace character, your program has a 50/50 chance of failing.
Not only this, but there are reasons other than EOF that a stream can fail; for example, a file may have been corrupted with some control characters, and the stream may fail for attempting to read some invalid data; This will send the program into an infinite loop since EOF will never be reached.
The Solution
The above example has established that "Read while not EOF" is a terrible, unreliable idiom prone to breaking, and must be avoided. luckily the solution is extremely simple; the correct idiom could be termed "Read while data exists to be read".
It simply involves moving the expression which reads data from the file into the while condition itself. This may seem a little odd, since namefile >> input doesn't really look like a true/false statement - it isn't! But that doesn't matter, because the 'return' of a >> operation is a reference to the stream from which data is extracted.
"Read while data exists to be read"
while( namefile >> input)
names.push_back(input);
streams are deliberatley allowed to evaluate to a bool for exactly this purpose. They use a technique called a conversion operator, Which is a form of operator overloading to allow objects to behave like other data types.in the case of std::istream, an "operator bool" exists, and will yield false if a read operation fails for any reason; whether it be invalid data, a nonexistant file, EOF, or some other reason.
good.cpp
#include <fstream>
#include <iostream>
#include <algorithm>
#include <string>
#include <vector>
const char* filename = "C:\\names.txt";
int main()
{
typedef std::ostream_iterator<std::string> output;
std::ifstream namefile(filename);
std::vector<std::string> names;
std::string input;
while( namefile >> input)
names.push_back(input);
std::cout << "Read " << names.size()
<< " names successfully\n";
std::copy(names.begin(), names.end(),
output(std::cout, "\n"));
std::cout.flush();
}
Output from good.cpp
Read 5 names successfully Peter Paul Fred Tom HarryUsing the same input file with a trailing space, or even using one without a trailing space, the output is the same - there are no 'overruns'; moreover, the new program is now hardened against other kinds of stream failures.
This solution with struct and class types
Inevitably, its more likely that you want to be able to read text files into complicated types which the >> operator doesn't know how to handle on its own. The problem described above still applies, but so can the solution with a little extra work.
The easy remedy is to overload >> so that data can be extracted directly into a struct/class object without the need for long, ugly while statements and a list of temporary variables. The ideal interface ought to allow file reading to look exactly the same as the code above.
Take a common example of a student class (output operator included for testing, though output formatting is beyond the scope of this article)
class student
{
std::string name;
int id;
double fees;
public:
//Other member functions & constructors
friend std::ostream& operator<< (std::ostream&, const student&);
friend std::istream& operator>> (std::istream&, student&);
};
the final two lines allow overloaded insertion << and extraction >> operators to work closely with the student class, almost as part of its interface. They declare these operators to be friends of the student class, allowing them unrestricted access to private members; although the operators are not members of the class itselfThe student input file might be a comma-separated format of Name, ID, Fees
Peter Pan,1234,2499.99 Paul Simon,2468,3000.00 Fred Perry,1357,1250.50 Tom Jones,5678,600.00 Harry Hill,9876,2255.00There are no obviously easy ways to input an entire student at once - the data is comma separated, with one student on each line. The function which reads the file will need to grab a line and parse the comma separated data for each of the 3 student attributes. Luckily, reading from a file is usually fairly reliable - if the format (layout) of that file is known to be consistent throughout, a fairly rigid procedure can be written to parse each line.
Parsing the student file
Looking closely at the data and the student class, the format of a student is
[string] <comma> [int] <comma> [double]
getline() is capable of retrieving data upto a delimiter; The student's name can be easily retrieved
std::getline(input, s.name, ',');getline will automatically discard the comma which follows the name data.
This will retrieve 'id' but leave the next comma alone.
input >> s.id;the next instruction must explicitly discard that comma
input.ignore();
Finally, the fees attribute can be retrieved
input >> s.fees;
There's still one problem - there's a 'newline' character remaining, this needs to be discarded, otherwise the next call to this overloaded operator will encounter this newline while its trying to read the student name
input.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
The final operator>> looks like
std::istream& operator>>(std::istream& input, student& s)
{
std::getline(input, s.name, ','); //read name
input >> s.id; //read id
input.ignore();
input >> s.fees; //read fees
input.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
return input;
}
The final solution
Student class with overloaded operators
#include <iomanip>
#include <string>
#include <iostream>
#include <sstream>
class student
{
std::string name;
int id;
double fees;
public:
//Other member functions & constructors
friend std::ostream& operator<< (std::ostream&, const student&);
friend std::istream& operator>> (std::istream&, student&);
};
std::ostream& operator<<(std::ostream& out, const student& s)
{
using namespace std;
out << setw(20) << left << s.name << ' '
<< setw(7) << left << s.id << ' '
<< setw(7) << setprecision(2) << fixed << s.fees;
return out;
}
std::istream& operator>>(std::istream& input, student& s)
{
std::getline(input, s.name, ','); //read name
input >> s.id; //read id
input.ignore();
input >> s.fees; //read fees
input.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
return input;
}
main.cpp
#include <fstream>
#include <algorithm>
#include <vector>
const char* filename = "C:\\students.txt";
int main()
{
typedef std::ostream_iterator<student> output;
std::ifstream studentfile(filename);
std::vector<student> names;
student input;
while(studentfile >> input)
names.push_back(input);
std::cout << "Read " << names.size()
<< " students successfully\n";
std::copy(names.begin(), names.end(),
output(std::cout, "\n"));
std::cout.flush();
}
output from main.cpp
Read 5 students successfully Peter Pan 1234 2499.99 Paul Simon 2468 3000.00 Fred Perry 1357 1250.50 Tom Jones 5678 600.00 Harry Hill 9876 2255.00
Notice that the final code in main() which populates the vector<student> is almost identical to the simple program which populated the vector<std::string>, even though the data in the file is far less trivial.
Summary
This article has addressed a common 'gotcha' for learners who are toying with streams and file I/O in C++. Overloading the stream insertion and extraction operators provides the user of a class with a clean and idiomatic way to handle I/O. The mess of parsing data is wrapped in an overloaded operator, allowing multiple objects of that class to be easily retrieved from a file; In addition, handling file read errors does not need to be a concern of the code which handles data parsing.






MultiQuote







|