Preface
The term container is a fairly general-purpose term, used to describe a feature or mechanism in the C++ programming language which holds a collection of objects or data items of a certain type. Examples of containers include lists, deques, vectors, and even strings.
[Note - a string can be thought-of as a collection of characters]
Range is a word used to describe a series or sequence of objects/data items stored in a particular container. Often, the range that a programmer is interested in, will be every single element from the beginning to the end of a container.
- This article refers to the remove function, as found in the C++ <algorithm> standard header, within the std namespace; and assumes the reader already has a basic familiarity with at least one of the STL containers and its associated iterator(s). (Although not all STL containers are suitable for use with remove - notably, stack and queue)
An iterator is an object or variable which acts as a "place-holder", that is tasked with remembering the position of an element within a container. iterators are often used as temporary variables which help manipulate a container. They are also commonly used by containers to provide reference points to the beginning and end of their stored objects/data items.
A data set using STL vectors
The example in this article will remove all occurrances of 5, from a vector container which stores int objects.
The example container (Which will be called myvec) looks like this vector<int> myvec;
and will be given a few test values
CODE
myvec.push_back(10);
myvec.push_back(5);
myvec.push_back(-8);
myvec.push_back(5);
myvec.push_back(1);
myvec.push_back(4);
At the moment, its range is [ 10, 5, -8, 5, 1, 4 ]. Here, 5 appears in multiple places scattered through the container.myvec.push_back(5);
myvec.push_back(-8);
myvec.push_back(5);
myvec.push_back(1);
myvec.push_back(4);
In order to delete all elements containing 5 out of myvec, the most efficient way is to rearrange the data so that all the unwanted elements are overwritten, by shifting other elements back from the other end of the container. This has the side-effect of leaving a bunch of unwanted objects at the far-end, but at least they are all together and can be destroyed efficiently.
The remove algorithm (Abstract)
Given the set of data [ 10, 5, -8, 5, 1, 4 ] - removing all elements which match 5 uses the following process.
- start a counter of matched elements at zero. step to the first element
10 5 -8 5 1 4 - No match found.
- step to the next element
10 5 -8 5 1 4 - a match is found, increase the match counter by 1.
- step to the next element
10 5 -8 5 1 4 - No match found,
- perform a leftwise-shift of 1 position.
10 -8 -8 5 1 4
- step to the next element
10 -8 -8 5 1 4 - a match is found, Increase the match counter by 1.
- step to the next element
10 -8 -8 5 1 4 - No match found
- perform a leftwise-shift of 2 positions
10 -8 1 5 1 4
- step to the next element,
10 -8 1 5 1 4 - No match found
- perform a leftwise-shift of 2 positions
10 -8 1 4 1 4
- hit the end of the list. return an iterator to the first "invalid" element, ready for deletion.
10 -8 1 4 1 4
Each leftwise-shift causes an element to be copied from its original position - The "invalid" objects at the end still contain their original data - however, that data is now unwanted in the list. The returned iterator should be used for destroying the invalid objects, shortening the data set.
A closer look at remove
The visible prototype for the remove function is as follows
CODE
FwdIt remove(FwdIt first, FwdIt last, Ty val )
remove is a template function (or, a 'generic' cookie cutter for a function) which doesn't exist until provided with more information. The prototype gives some clues as to the missing information; FwdIt and Ty. (Note: these two names for the missing information are in fact template parameters - so they are likely to be named something else depending on the compiler and the implementation of the standard library used)
The remove function requires that FwdIt must be some sort of iterator (a "place holder") capable of stepping through a container, and that Ty must be able to represent the type of data that is stored by that container - in this case, the stored data is known to be int.
Using remove in the example
The STL vector is supplied with two different iterator types, iterator and const_iterator . The const_iterator does not allow any modification of the vector, so that is no use here - iterator must be used instead - hence the full type of the iterator is now known to be vector<int>::iterator
With this information, the remove prototype can be expanded to make more more sense, here is the full prototype which will be used by the real function in the program.
vector<int>::iterator remove( vector<int>::iterator first, vector<int>::iterator last, int val )
Now its clear - the first two parameters are iterators (placeholders) for the start and finish points in the data set, and the third parameter is the value to be removed.
Usefully, the STL vector has two methods which are used to obtain iterators to the begining and the end of the stored data. these methods are begin() and end() respectively. Now the call to remove looks like this
remove( myvec.begin(), myvec.end(), 5 );
The last thing to do, is to create an iterator variable, to keep track of the invalid data. This will be called invalid
CODE
vector<int>::iterator invalid;
invalid = remove( myvec.begin(), myvec.end(), 5 );
invalid = remove( myvec.begin(), myvec.end(), 5 );
Destroying invalid objects
Destroying the invalid objects at the end of the data set depends somewhat on what kind of container is being used. for a vector, the erase method is needed. erase comes in two flavours - the first simply removes one element, which is of no use, since the data set will have two invalid objects at the end. the other variation of erase removes a range of elements.
erase will insist on having a vector<int>::iterator to work with - which is handy, because that's exactly the same type as the invalid iterator left by remove. The end of the invalid elements will be the end of the container. Thus, the call to the erase method will look like this.
myvec.erase( invalid, myvec.end() );
remove in action
This complete, compilable example shows a data set before, and after the remove operation. The data set has 2 elements removed, leaving 2 'rogue' data items at the end of the data, which are eventually destroyed by the myvec.erase method.
CODE
#include <iostream>
#include <vector>
#include <algorithm>
int main()
{
using namespace std;
//Populate myvec with the data set 10, 5, -8, 5, 1, 4
vector<int> myvec;
myvec.push_back(10);
myvec.push_back(5);
myvec.push_back(-8);
myvec.push_back(5);
myvec.push_back(1);
myvec.push_back(4);
cout << "\n\n Initial data set: ";
for(size_t i(0); i!=myvec.size(); ++i)
cout << myvec.at(i) << ' ';
//Remove the data elements matching '5'
vector<int>::iterator invalid;
invalid = remove( myvec.begin(), myvec.end(), 5 );
cout << "\n\n Data set after remove: ";
for(size_t i(0); i!=myvec.size(); ++i)
cout << myvec.at(i) << ' ';
//Destroy the remaining invalid elements
myvec.erase( invalid, myvec.end() );
cout << "\n\n Data set after erase: ";
for(size_t i(0); i!=myvec.size(); ++i)
cout << myvec.at(i) << ' ';
}
#include <vector>
#include <algorithm>
int main()
{
using namespace std;
//Populate myvec with the data set 10, 5, -8, 5, 1, 4
vector<int> myvec;
myvec.push_back(10);
myvec.push_back(5);
myvec.push_back(-8);
myvec.push_back(5);
myvec.push_back(1);
myvec.push_back(4);
cout << "\n\n Initial data set: ";
for(size_t i(0); i!=myvec.size(); ++i)
cout << myvec.at(i) << ' ';
//Remove the data elements matching '5'
vector<int>::iterator invalid;
invalid = remove( myvec.begin(), myvec.end(), 5 );
cout << "\n\n Data set after remove: ";
for(size_t i(0); i!=myvec.size(); ++i)
cout << myvec.at(i) << ' ';
//Destroy the remaining invalid elements
myvec.erase( invalid, myvec.end() );
cout << "\n\n Data set after erase: ";
for(size_t i(0); i!=myvec.size(); ++i)
cout << myvec.at(i) << ' ';
}
Conclusion
The C++ remove algorithm is often the most efficient way to remove elements from an un-sorted array-based container, such as the STL vector. the remove algorithm minimises the amount of copying and resizing typically involved in manipulating such containers, and is used by other named functions in the <algorithm> header, such as remove_if and remove_copy_if, which both allow greater flexibility when selecting elements for deletion, by including a parameter for a function or functor which determines whether or not an object should be removed. Functions and functors used in this way generally return a boolean value, and are known as predicates - Predicates are used instead of a single 'match' value.
When remove is used in conjunction with arrays, pointers act as iterators, however, depending on the array and its use in context, the method of destroying invalid objects will vary - In some cases, the objects may not be destroyed, but simply left, if an index or pointer variable is used to track the active range of the array.
The remove algorithm should not be used on sorted containers such as set and map which are based on a Binary Tree structure, and depend on the sorted order of their elements to work correctly.
References
The C++ Standard Library, Nicolai M Josuttis, Addison-Wesley, 1999.
The C++ Programming Language, Bjarne Stroustrup, Addison-Wesley, 1997.