What is the fastest way to find a value in a file c++

  • (4 Pages)
  • +
  • 1
  • 2
  • 3
  • Last »

50 Replies - 1902 Views - Last Post: 01 September 2012 - 04:49 PM Rate Topic: ***** 1 Votes

#1 jtreiner  Icon User is offline

  • D.I.C Head

Reputation: -1
  • View blog
  • Posts: 53
  • Joined: 20-February 12

What is the fastest way to find a value in a file c++

Posted 28 August 2012 - 05:43 PM

What is the fastest way to find a value in a file. For example i put a bunch of random numbers into a file using fstream. lets say those numbers are: 7, 19, 84, 33, and 76. I want to find the location of the number 84 in the file. What is the best way i could do that. I am working with BIG files and i need a fast approach so my program doesn't slow down.
Thanks guys
Is This A Good Question/Topic? 0
  • +

Replies To: What is the fastest way to find a value in a file c++

#2 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 3570
  • View blog
  • Posts: 11,099
  • Joined: 05-May 12

Re: What is the fastest way to find a value in a file c++

Posted 28 August 2012 - 06:19 PM

Does your operating system support asynchronous I/O and threading?

Or even better, does your operating system support memory mapped files and threading?
Was This Post Helpful? 1
  • +
  • -

#3 jjl  Icon User is offline

  • Engineer
  • member icon

Reputation: 1074
  • View blog
  • Posts: 4,533
  • Joined: 09-June 09

Re: What is the fastest way to find a value in a file c++

Posted 28 August 2012 - 09:14 PM

How frequently are you searching the file? If you are infrequently searching, then a basic linear search may be in your best interest. If you are continuously searching the file, you may want to look into memory mapping (as stated above)

Here's a link on boost memory mapping
http://www.boost.org...ses.mapped_file
Was This Post Helpful? 1
  • +
  • -

#4 IngeniousHax  Icon User is offline

  • |>|20-514<|{3|2

Reputation: 78
  • View blog
  • Posts: 1,358
  • Joined: 28-March 09

Re: What is the fastest way to find a value in a file c++

Posted 29 August 2012 - 11:45 AM

A possible solution, once you have figured out your speed optimizing you can use tellp(), or tellg(). You'll have to read up on them to actually find out how to manipulate them and the data you need to.
Was This Post Helpful? 1
  • +
  • -

#5 jtreiner  Icon User is offline

  • D.I.C Head

Reputation: -1
  • View blog
  • Posts: 53
  • Joined: 20-February 12

Re: What is the fastest way to find a value in a file c++

Posted 29 August 2012 - 06:58 PM

First thing I want a cross platform solution. I am going to need to search around every one or two seconds. and i am going to need to find multiple values. I am also constantly going to be uploading new values to the file also.

Can someone please thumbs up me so im not negative one. thanks :bananaman:
Was This Post Helpful? 0
  • +
  • -

#6 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 3570
  • View blog
  • Posts: 11,099
  • Joined: 05-May 12

Re: What is the fastest way to find a value in a file c++

Posted 29 August 2012 - 07:41 PM

Boost will give you the cross platform support for memory mapped files. And I believe that Boost will also give you cross platform threading support as well, although any compiler/runtime library that is C++11 compliant should also give you threads: http://en.cppreferen...om/w/cpp/thread

Are duplicate values allowed? And if so, do all duplicates need to be reported, or just a subset of the duplicates?

Will the thing that is adding new values to the file be running as a separate process, or will it be running within the same program that is doing the searching?

Will uploading new values always be at the end of the file? Or can new values be inserted in the middle and "shift" the contents of the file around?

Basically, I'm asking because a lot of speed increases can be gained by searching multiple sections of the file in parallel (hence the threading), and by using memory mapped files (to get rid of the overhead of two memory copies from to kernel space, and then kernel space to your program's user space). Some of those advantages will erode away if there needs to be some kind of locking happening whenever an upload inserts into the middle. I don't think locking is needed for appends to the end.

An alternative to parallel processing and memory mapped files is building a conventional B-tree or B+ tree over the values found in the file. Since the keys in the tree are the values you are looking for anyway, there's no real need to seek through the file anymore after the first pass of building the tree. Inserting new values (either to the end of the file or in the middle) is simply a matter of updating the tree.
Was This Post Helpful? 1
  • +
  • -

#7 jtreiner  Icon User is offline

  • D.I.C Head

Reputation: -1
  • View blog
  • Posts: 53
  • Joined: 20-February 12

Re: What is the fastest way to find a value in a file c++

Posted 29 August 2012 - 08:09 PM

View PostSkydiver, on 29 August 2012 - 07:41 PM, said:

Boost will give you the cross platform support for memory mapped files. And I believe that Boost will also give you cross platform threading support as well, although any compiler/runtime library that is C++11 compliant should also give you threads: http://en.cppreferen...om/w/cpp/thread

Are duplicate values allowed? And if so, do all duplicates need to be reported, or just a subset of the duplicates?

Will the thing that is adding new values to the file be running as a separate process, or will it be running within the same program that is doing the searching?

Will uploading new values always be at the end of the file? Or can new values be inserted in the middle and "shift" the contents of the file around?

Basically, I'm asking because a lot of speed increases can be gained by searching multiple sections of the file in parallel (hence the threading), and by using memory mapped files (to get rid of the overhead of two memory copies from to kernel space, and then kernel space to your program's user space). Some of those advantages will erode away if there needs to be some kind of locking happening whenever an upload inserts into the middle. I don't think locking is needed for appends to the end.

An alternative to parallel processing and memory mapped files is building a conventional B-tree or B+ tree over the values found in the file. Since the keys in the tree are the values you are looking for anyway, there's no real need to seek through the file anymore after the first pass of building the tree. Inserting new values (either to the end of the file or in the middle) is simply a matter of updating the tree.


Are duplicate values allowed? And if so, do all duplicates need to be reported, or just a subset of the duplicates?
no, unless I can can some how remove the duplicates after.

Will the thing that is adding new values to the file be running as a separate process, or will it be running within the same program that is doing the searching?
they are not a separate program but are two different functions. The searching function will run every second (or some other fixed amount of time) and the "adding value" function will only run under certain circumstances.

For now the values in the file are in no specific order but if there is a way to sort (or save) the value in some order (such as least to greatest) that would be great.
Was This Post Helpful? 0
  • +
  • -

#8 jjl  Icon User is offline

  • Engineer
  • member icon

Reputation: 1074
  • View blog
  • Posts: 4,533
  • Joined: 09-June 09

Re: What is the fastest way to find a value in a file c++

Posted 29 August 2012 - 10:28 PM

If sorting isn't an issue, this seems to be the best route to take. Once the file is sorted into a buffer, you could perform a binary search. You probably arn't gonna beat O(log(n)) time complexity unless you can load your file into a hash table ( which would be constant time complexity lookups)

This post has been edited by jjl: 29 August 2012 - 10:29 PM

Was This Post Helpful? 1
  • +
  • -

#9 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 3570
  • View blog
  • Posts: 11,099
  • Joined: 05-May 12

Re: What is the fastest way to find a value in a file c++

Posted 29 August 2012 - 11:51 PM

Let's say you do sort the file and store it, unless you load the entire 2MB file into memory, binary search on file is going to be slow, specially if you are not storing fixed size records.

Assuming we do the traditional binary search, we do in pseudo code:
bool BinarySearchFile(File file, int value, long lowSeek, long highSeek)
{
    while(lowSeek < highSeek)
    {
        // Compute mid-point
        midSeek = (highSeek - lowSeek) / 2;

        // Try to find the beginning of the value
        // stored near the midpoint
        while (!IsAtStartOfAValue(file, midSeek))
        {
            --midSeek;
            file.Seek(midSeek);
        }

        // Read in the value stored near the midpoint
        midValue = file.ReadValue();

        // Quit now if we found the value
        if (value == midValue)
            return true;

        // Otherwise, search the halves before or after the midpoint
        if (value < midValue)
            highSeek = midSeek;
        else
            lowSeek = file.Position;
    }
    return false;
}

bool IsAtStartOfAValue(File file, long location)
{
    // magic code that returns true if the file cursor
    // is in the start of a value.
    // Remember to seek back to location to restore file
    // pointer to previous state.
}



We'll be doing a lot of seeking and reading to test if we are at a the middle of a value. Unless the file is buffered(an assumption that we aren't supposed to make in C/C++), then each seek and read is going to be an expensive operation. Assuming average record length of 8 bytes, I'm estimating an average of 8 seeks and 4 reads to find the start of a value to read.

And then later when we add a new value, 2MB worth of file has to be re-written. I'm assuming that we'll be doing the equivalent of an insertion sort to insert the value by scanning and copying the file to a temporary file until we get to a value greater than the new value. We append the new value to the temporary file, and then copy the rest of the original file to the temporary file. We then delete the original file, and rename the temporary file to be the original file.

2MB isn't that much to hold in memory on a modern computer unless your program is really memory constrained. I think that one simple approach is not to search the file, but to search memory. Load the file into memory, perform a sort, and then from then on do binary searches in memory. When a new value is added, append to the file, and to the memory copy. Re-sort memory, and then go along merrily.

Is the value range of the values meant to go into the file well known? e.g. never less than 0 and never greater than MAX_INT. If so, a hashtable kept in memory as suggested earlier would probably be even faster than doing a binary search. The hashtable can even be a simple bit array where each bit corresponds to the value. A bit is set to 1 if the value is present.
Was This Post Helpful? 1
  • +
  • -

#10 jtreiner  Icon User is offline

  • D.I.C Head

Reputation: -1
  • View blog
  • Posts: 53
  • Joined: 20-February 12

Re: What is the fastest way to find a value in a file c++

Posted 30 August 2012 - 06:24 PM

I don't need to just search one value but a few hundred. I will then put those values into the memory (float variables). Those float variables will constantly be changed with other values from the file.

My big problem is that I have to load a few hundred values a second out of several thousand (or a LOT more) values. I need a fast way of doing this. Boost might be a possible way of doing this but I have never used Boost before. If anyone can give an example of how I would do this with Boost that would be great.

Edit:
The values do go below 0. I was thinking that maybe i would use two different functions. one for negatives and one for positives.
Was This Post Helpful? 0
  • +
  • -

#11 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 3570
  • View blog
  • Posts: 11,099
  • Joined: 05-May 12

Re: What is the fastest way to find a value in a file c++

Posted 30 August 2012 - 07:57 PM

Great... now that's even more fun... How do you consider two floating point numbers to be "equal" ? Is there a number of significant digits that must match?

#include <iostream>
#include <iomanip>

using namespace std;

int main()
{
    float x = 0.2f;

    cout << setprecision(10) << x << endl;
}


Was This Post Helpful? 1
  • +
  • -

#12 jtreiner  Icon User is offline

  • D.I.C Head

Reputation: -1
  • View blog
  • Posts: 53
  • Joined: 20-February 12

Re: What is the fastest way to find a value in a file c++

Posted 30 August 2012 - 08:10 PM

For the moment I would like the program to be able to find all float variables between number x and numbers y then upload them to the memory. I would also like the program to be able to find where the value was in the file.
Was This Post Helpful? 0
  • +
  • -

#13 Oler1s  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1395
  • View blog
  • Posts: 3,884
  • Joined: 04-June 09

Re: What is the fastest way to find a value in a file c++

Posted 30 August 2012 - 08:30 PM

You need to stop thinking about the code and go back to the requirements and the solution required. Your requirements have managed to vary across the course of this thread.

What is the ultimate problem you are trying to solve??
Was This Post Helpful? 0
  • +
  • -

#14 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 3570
  • View blog
  • Posts: 11,099
  • Joined: 05-May 12

Re: What is the fastest way to find a value in a file c++

Posted 30 August 2012 - 08:32 PM

Yes. That makes sense.

But understand that you can't do exact equality comparisons with the built in floats and doubles in C or C++. If ran the code above, you would have notice that the computer actually stores the number 0.2 as 0.20000000298023224... (Change the setprecision to something bigger than 10.)

So two simple choices can be made:
1) Your values are never stored as floating point numbers, but actually stored as strings. Some specially logic will have to be applied to determine that 5 == 5.0 == 5.00 == 5.000 == ... .
2) Your values are stored as floating point numbers, but comparisons done with some kind of epsilon. For example 5 == 4.999 == 5.001 if epsilon is 0.001.

Using Boost is just like using the standard C++ library. You just include a different set of headers and link to a different library.
Was This Post Helpful? 1
  • +
  • -

#15 jtreiner  Icon User is offline

  • D.I.C Head

Reputation: -1
  • View blog
  • Posts: 53
  • Joined: 20-February 12

Re: What is the fastest way to find a value in a file c++

Posted 31 August 2012 - 08:11 PM

I need A fast solution to find all float variables between two numbers (x and y) in a file. The file will be big so it has to be fast.

For the float variables they will probably not be any more then one digit after the (.).
Was This Post Helpful? 0
  • +
  • -

  • (4 Pages)
  • +
  • 1
  • 2
  • 3
  • Last »