• (3 Pages)
  • +
  • 1
  • 2
  • 3

EOF and reading text files (C++) FAQ: Why you should never use EOF while reading a text file Rate Topic: ***** 2 Votes

#1 Bench  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 857
  • View blog
  • Posts: 2,343
  • Joined: 20-August 07

Post icon  Posted 15 December 2009 - 02:15 PM

*
POPULAR

The Problem
This article is intended to address the plethora of posts to Dream.In.Code's C and C++ forum where a poster has doomed their program to weird behaviour under the innocent assumption that using "EOF" is a sure way to know that there's no more data to read from their file
  • What?? E.O.F. stands for 'End Of File', surely that means there's no more data?
Yes, kind of. Sorry if this hurts your head, but 'end of file' in C++ does not mean the same thing as 'no more data'.


Take a look at this short program which works with a file storing 5 names, and more importantly, what happens when 'eof' is used as the breaking condition for reading data into a vector.

names.txt
Peter
Paul
Fred
Tom
Harry


bad.cpp
#include <fstream>
#include <iostream>
#include <algorithm>
#include <string>
#include <vector>
#include <cstdlib> //exit() and EXIT_FAILURE

const char* filename = "C:\\names.txt";

int main()
{
    typedef std::ostream_iterator<std::string> output;
    std::ifstream namefile(filename);
    std::vector<std::string> names;
    std::string input;
    if(!namefile.is_open())
    {
        std::cerr << "Error opening file";
        exit(EXIT_FAILURE);
    }
    while(!namefile.eof())
    {
        namefile >> input;
        names.push_back(input);
    }
    std::cout << "Read " << names.size()
              << " names successfully\n";
    std::copy(names.begin(), names.end(), 
              output(std::cout, "\n"));
    std::cout.flush();
}

output from bad.cpp
Read 6 names successfully
Peter
Paul
Fred
Tom
Harry
Harry
Uh oh! First the program tells you that its read 6 names when there are only 5 names in the file, and then it goes on to display the last name twice.

What has happened is that the while loop has been repeated 6 times, because it is repeating until namefile has its internal "EOF" flag set. Remember that ifstream is not a file, its a stream, which means that it cannot know whether or not there is any more data in a file until after a failed attempt to read past the end of the file. After it read "Harry" from the file, the stream had not failed, therefore the EOF flag was still unset.

There's an even worse (more subtle) problem with this, based on how the >> operator works; it will always stop reading at the first 'whitespace' character it encounters - a space, a newline, a carriage return or a tab.

The file used in this example has a trailing whitespace character after the last name, however, if you modify the file, that the final character is y from "Harry", the stream will fail during the attempt to read "Harry", since it will not encounter any whitespace, and the eof flag will be set during that 5th and final read, and the program will "appear" to be working OK.
Obviously this is extremely unreliable. The one thing worse than a bug which you can see is a bug which you can't see! Any user, or other program which modifies this file, or even a part of this program which appends to this file may indiscriminately add a newline or space after the final data entry some other time, which means if you rely on the lack of a whitespace character, your program has a 50/50 chance of failing.

Not only this, but there are reasons other than EOF that a stream can fail; for example, a file may have been corrupted with some control characters, and the stream may fail for attempting to read some invalid data; This will send the program into an infinite loop since EOF will never be reached.


The Solution
The above example has established that "Read while not EOF" is a terrible, unreliable idiom prone to breaking, and must be avoided. luckily the solution is extremely simple; the correct idiom could be termed "Read while data exists to be read".

It simply involves moving the expression which reads data from the file into the while condition itself. This may seem a little odd, since namefile >> input doesn't really look like a true/false statement - it isn't! But that doesn't matter, because the 'return' of a >> operation is a reference to the stream from which data is extracted.

"Read while data exists to be read"
while( namefile >> input)
    names.push_back(input); 
streams are deliberatley allowed to evaluate to a bool for exactly this purpose. They use a technique called a conversion operator, Which is a form of operator overloading to allow objects to behave like other data types.
in the case of std::istream, an "operator bool" exists, and will yield false if a read operation fails for any reason; whether it be invalid data, a nonexistant file, EOF, or some other reason.

good.cpp
#include <fstream>
#include <iostream>
#include <algorithm>
#include <string>
#include <vector>

const char* filename = "C:\\names.txt";

int main()
{
    typedef std::ostream_iterator<std::string> output;
    std::ifstream namefile(filename);
    std::vector<std::string> names;
    std::string input;
    while( namefile >> input)
        names.push_back(input);
    std::cout << "Read " << names.size()
              << " names successfully\n";
    std::copy(names.begin(), names.end(), 
              output(std::cout, "\n"));
    std::cout.flush();
} 

Output from good.cpp
Read 5 names successfully
Peter
Paul
Fred
Tom
Harry 
Using the same input file with a trailing space, or even using one without a trailing space, the output is the same - there are no 'overruns'; moreover, the new program is now hardened against other kinds of stream failures.


This solution with struct and class types
Inevitably, its more likely that you want to be able to read text files into complicated types which the >> operator doesn't know how to handle on its own. The problem described above still applies, but so can the solution with a little extra work.
The easy remedy is to overload >> so that data can be extracted directly into a struct/class object without the need for long, ugly while statements and a list of temporary variables. The ideal interface ought to allow file reading to look exactly the same as the code above.

Take a common example of a student class (output operator included for testing, though output formatting is beyond the scope of this article)
class student
{
    std::string name;
    int id;
    double fees;
public:
    //Other member functions & constructors
    friend std::ostream& operator<< (std::ostream&, const student&);
    friend std::istream& operator>> (std::istream&, student&);
}; 
the final two lines allow overloaded insertion << and extraction >> operators to work closely with the student class, almost as part of its interface. They declare these operators to be friends of the student class, allowing them unrestricted access to private members; although the operators are not members of the class itself

The student input file might be a comma-separated format of Name, ID, Fees
Peter Pan,1234,2499.99
Paul Simon,2468,3000.00
Fred Perry,1357,1250.50
Tom Jones,5678,600.00
Harry Hill,9876,2255.00

There are no obviously easy ways to input an entire student at once - the data is comma separated, with one student on each line. The function which reads the file will need to grab a line and parse the comma separated data for each of the 3 student attributes. Luckily, reading from a file is usually fairly reliable - if the format (layout) of that file is known to be consistent throughout, a fairly rigid procedure can be written to parse each line.


Parsing the student file
Looking closely at the data and the student class, the format of a student is
[string] <comma> [int] <comma> [double]

getline() is capable of retrieving data upto a delimiter; The student's name can be easily retrieved
std::getline(input, s.name, ',');
getline will automatically discard the comma which follows the name data.

This will retrieve 'id' but leave the next comma alone.
input >> s.id;
the next instruction must explicitly discard that comma
input.ignore();


Finally, the fees attribute can be retrieved
input >> s.fees;


There's still one problem - there's a 'newline' character remaining, this needs to be discarded, otherwise the next call to this overloaded operator will encounter this newline while its trying to read the student name
input.ignore(std::numeric_limits<std::streamsize>::max(), '\n'); 


The final operator>> looks like
std::istream& operator>>(std::istream& input, student& s)
{
    std::getline(input, s.name, ','); //read name
    input >> s.id;                    //read id
    input.ignore();
    input >> s.fees;                  //read fees
    input.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
    return input;
} 



The final solution
Student class with overloaded operators
#include <iomanip>
#include <string>
#include <iostream>
#include <sstream>

class student
{
    std::string name;
    int id;
    double fees;
public:
    //Other member functions & constructors
    friend std::ostream& operator<< (std::ostream&, const student&);
    friend std::istream& operator>> (std::istream&, student&);
};
std::ostream& operator<<(std::ostream& out, const student& s)
{
    using namespace std;
    out << setw(20) << left << s.name  << ' ' 
        << setw(7)  << left << s.id    << ' ' 
        << setw(7)  << setprecision(2) << fixed << s.fees;
    return out;
}
std::istream& operator>>(std::istream& input, student& s)
{
    std::getline(input, s.name, ','); //read name
    input >> s.id;                    //read id
    input.ignore();
    input >> s.fees;                  //read fees
    input.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
    return input;
} 

main.cpp
#include <fstream>
#include <algorithm>
#include <vector>

const char* filename = "C:\\students.txt";

int main()
{
    typedef std::ostream_iterator<student> output;
    std::ifstream studentfile(filename);
    std::vector<student> names;
    student input;
    while(studentfile >> input)
        names.push_back(input);
    std::cout << "Read " << names.size()
              << " students successfully\n";
    std::copy(names.begin(), names.end(), 
              output(std::cout, "\n"));
    std::cout.flush();
} 

output from main.cpp
Read 5 students successfully
Peter Pan			1234	2499.99
Paul Simon		   2468	3000.00
Fred Perry		   1357	1250.50
Tom Jones			5678	600.00
Harry Hill		   9876	2255.00


Notice that the final code in main() which populates the vector<student> is almost identical to the simple program which populated the vector<std::string>, even though the data in the file is far less trivial.



Summary
This article has addressed a common 'gotcha' for learners who are toying with streams and file I/O in C++. Overloading the stream insertion and extraction operators provides the user of a class with a clean and idiomatic way to handle I/O. The mess of parsing data is wrapped in an overloaded operator, allowing multiple objects of that class to be easily retrieved from a file; In addition, handling file read errors does not need to be a concern of the code which handles data parsing.

Is This A Good Question/Topic? 18
  • +

Replies To: EOF and reading text files (C++)

#2 Anarion  Icon User is offline

  • The Persian Coder
  • member icon

Reputation: 305
  • View blog
  • Posts: 1,507
  • Joined: 16-May 09

Posted 13 February 2010 - 02:08 AM

Indeed a complete reference to be used in the forum. Thanks for writing it! :^:
Many people have this issue with files (including myself till a couple months ago)
Was This Post Helpful? 0
  • +
  • -

#3 Delta_Echo  Icon User is offline

  • D.I.C Addict

Reputation: 5
  • View blog
  • Posts: 722
  • Joined: 24-October 07

Posted 01 March 2010 - 05:49 AM

Thanks for writing. It will be very useful for future reference!
Now.. we need to make every new member in the C++ board read this... hmm...
Kudos to you.
Was This Post Helpful? 0
  • +
  • -

#4 jcmaster2  Icon User is offline

  • D.I.C Head

Reputation: 2
  • View blog
  • Posts: 183
  • Joined: 27-April 09

Posted 01 March 2010 - 10:54 AM

Unfortunately, Most modern C++ texts...still use

while(!inout.eof()) for testing...

So, you will likely see people with these problems...

There are some that are using the format

while(input >> variable) format as you mention...
And some book are inconsistent with usage..

Good tutorial though...
Was This Post Helpful? 0
  • +
  • -

#5 friends26  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 31
  • Joined: 27-February 10

Posted 05 March 2010 - 06:13 AM

thanks about the information!.
but lucky me cuz i never experienced any error like
that about EOF.
but thanks you so much still!. :)
Was This Post Helpful? 0
  • +
  • -

#6 PlasticineGuy  Icon User is offline

  • mov dword[esp+eax],0
  • member icon

Reputation: 281
  • View blog
  • Posts: 1,436
  • Joined: 03-January 10

Posted 05 March 2010 - 09:05 PM

You will eventually.
Was This Post Helpful? 0
  • +
  • -

#7 r.stiltskin  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1833
  • View blog
  • Posts: 4,927
  • Joined: 27-December 05

Posted 30 March 2010 - 09:09 AM

Great tutorial! I was getting tired of explaining to people why their loops controlled by if (!file.eof() ) didn't work and had it in mind to write a tutorial about it ... and here it is already.

Just one minor suggestion:

Your examples bad.cpp, good.cpp and main.cpp won't compile on Linux systems under GCC-3.x or GCC-4.x as they now stand. Apparently <iterator> (needed for ostream_iterator) is included by one of the other headers in MSDN, but not in GCC, so to be portable these programs should have
#include <iterator>
added to them.
Was This Post Helpful? 3
  • +
  • -

#8 Guest_sandy*


Reputation:

Posted 26 May 2010 - 03:20 PM

View Postr.stiltskin, on 30 March 2010 - 08:09 AM, said:

Great tutorial! I was getting tired of explaining to people why their loops controlled by if (!file.eof() ) didn't work and had it in mind to write a tutorial about it ... and here it is already.

Just one minor suggestion:

Your examples bad.cpp, good.cpp and main.cpp won't compile on Linux systems under GCC-3.x or GCC-4.x as they now stand. Apparently <iterator> (needed for ostream_iterator) is included by one of the other headers in MSDN, but not in GCC, so to be portable these programs should have
#include <iterator>
added to them.


thanks for editing it. but we can't use vectors
Was This Post Helpful? 0

#9 Bench  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 857
  • View blog
  • Posts: 2,343
  • Joined: 20-August 07

Posted 12 June 2010 - 12:46 AM

View Postr.stiltskin, on 30 March 2010 - 04:09 PM, said:

Just one minor suggestion:

Your examples bad.cpp, good.cpp and main.cpp won't compile on Linux systems under GCC-3.x or GCC-4.x as they now stand. Apparently <iterator> (needed for ostream_iterator) is included by one of the other headers in MSDN, but not in GCC, so to be portable these programs should have
#include <iterator>
added to them.

Nice find! I hope a moderator can edit the examples and add this. Annoyingly, it seems that the comeau try-it-out online compiler (which usually picks up discrepancies like this between MS and the standard) also compiles happily without including <iterator>
Was This Post Helpful? 0
  • +
  • -

#10 Kutlar  Icon User is offline

  • New D.I.C Head

Reputation: 6
  • View blog
  • Posts: 29
  • Joined: 04-March 10

Posted 03 September 2010 - 02:13 PM

Very good stuff!

Glad I saw this one, as the last section in my C++ class was on file i/o and the professor taught it
with the standard while(!filename.eof()) pitfall. Thanks!
Was This Post Helpful? 0
  • +
  • -

#11 JackOfAllTrades  Icon User is offline

  • Saucy!
  • member icon

Reputation: 6107
  • View blog
  • Posts: 23,659
  • Joined: 23-August 08

Posted 24 December 2010 - 07:21 AM

For similar functionality in C, please see this blog entry.
Was This Post Helpful? 1
  • +
  • -

#12 eng. elaph al_abasy  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 45
  • Joined: 20-December 10

Posted 12 March 2011 - 09:04 AM

hello
i was using the old way for reading file but i do not use the operator(>>)and i am dealing with binary data of size greater than one byte (integer,double....)and using the read and write funcs.
to write on the file.txt i am wondering how can i use your solution cuz the last repeated value make problems to the system i am trying to implement specially when i try to plot the output of my system
this is how i write the data to file with write
out.write(reinterpret_cast<char*>(&cos),sizeof (int));


to read the value from file
in.read(reinterpret_cast<char*>(&cos),sizeof (int));

Was This Post Helpful? 0
  • +
  • -

#13 r.stiltskin  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1833
  • View blog
  • Posts: 4,927
  • Joined: 27-December 05

Posted 12 March 2011 - 09:41 AM

If I understand you correctly, you can read from the file like this:
    while(in.read(reinterpret_cast<char*>(&cos),sizeof (int))) {
        // code to process "cos"
    }


which will continue to enter the loop as long as there is data in the file.

If that isn't what you wanted, please try to clarify your question.
Was This Post Helpful? 0
  • +
  • -

#14 eng. elaph al_abasy  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 45
  • Joined: 20-December 10

Posted 12 March 2011 - 10:45 AM

View Postr.stiltskin, on 12 March 2011 - 09:41 AM, said:

If I understand you correctly, you can read from the file like this:
    while(in.read(reinterpret_cast<char*>(&cos),sizeof (int))) {
        // code to process "cos"
    }


which will continue to enter the loop as long as there is data in the file.

If that isn't what you wanted, please try to clarify your question.

i am trying to implement a digital communication system and like every system i have number of blocks and write the output of each block on a file and read the input to the block from some values now i am building a NCO numerically controlled oscillator this is when i input a a numerical value it is generate a cosine wave with certain freq according to that value
so i have been b4 create a cosine wave with 8192 poind and store the points in file
now when the numerical value entered the NCO calculate the new freq and the now of points to generate cosine wave with that freq then open the file where the 8192 point stored and read the wanted points using the sequence of read ,seekg,write
in.read(reinterpret_cast<char*>(&cos),sizeof (int));
in.seekg(((s-1)*4),ios::cur);
out.write(reinterpret_cast<char*>(&cos),sizeof (int));


read the wanted point from the 8192point file seekg set the file pointer to the next wanted point write is to write the points to the output file
this is inside the
 while (!in.eof())
{
////read ,seekg,write///
}  


you know that the last value is written twice and when i plot the wave i have get wrong value in my system so how can use the solution in this case
hope now that u got the complete image of my problem
regards

This post has been edited by eng. elaph al_abasy: 12 March 2011 - 10:50 AM

Was This Post Helpful? 0
  • +
  • -

#15 r.stiltskin  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1833
  • View blog
  • Posts: 4,927
  • Joined: 27-December 05

Posted 12 March 2011 - 11:06 AM

My answer is still basically the same:
while(in.read(reinterpret_cast<char*>(&cos),sizeof (int))) {
    out.write(reinterpret_cast<char*>(&cos),sizeof (int));
}



But that doesn't make sense to me. You say that you are generating a new wave, so why are you reading the data of the old wave and writing the same data to the output file?

I also don't understand what you are doing with seekg. After each 4-byte read, the file pointer automatically advances 4 bytes. Isn't your data contiguous? Did you write something else in between the wave points that you want to skip over? What is s? Why are you moving the pointer?

edit: If you have a reason to skip some of the points from the input file, that's fine.
while(in.read(reinterpret_cast<char*>(&cos),sizeof (int))) will read and enter the loop only as long as there is more data in the file. It won't duplicate the last entry.

This post has been edited by r.stiltskin: 12 March 2011 - 11:25 AM

Was This Post Helpful? 1
  • +
  • -

  • (3 Pages)
  • +
  • 1
  • 2
  • 3