24 Replies  3505 Views  Last Post: 02 June 2012  06:25 PM
#16
Re: What is more efficient
Posted 02 June 2012  12:15 AM
Apparently your teacher's question was specifically about insertion sort, or at least the situation were you continuously insert into a list and have to keep it sorted at all time. (This is one of the possible interpretations of your question that I mentioned in my initial post, but you never replied to make clear that this is in fact what you meant.) Note that the way to go for this scenario in real life would be a balanced BST  not a linked list, but that just as an aside. When some people said that vectors were faster to sort, they were almost certainly talking about the case were you add all elements first and then just sort it once (using algorithms like e.g. quick sort, not insertion sort). That or they were just objecting to your general statement that lists are better for sorting than vectors and not to the specific question.
#17
Re: What is more efficient
Posted 02 June 2012  12:29 PM
#include <iostream> #include <algorithm> #include <ctime> #include <cstdlib> #include <vector> #include <list> void test(int n) { std::vector<int> v; std::list<int> l; std::vector<int> random_vals; //just so that we can use the same random values for each test for(int i=0; i < n; ++i) { random_vals.push_back(rand()); } clock_t begin = clock(); for(int i=0; i < n; ++i) { int r = random_vals[i]; //uses exact algorthim for inserting element into vector that prof said v.insert(std::lower_bound(v.begin(), v.end(), r), r); } clock_t end = clock(); std::cout<<"vector time: "<<double(end  begin)/CLOCKS_PER_SEC<<'\n'; begin = clock(); for(int i=0; i < n; ++i) { int r = random_vals[i]; //uses near exact algorthim for inserting element into list that prof said //this is like a binary search but it has to advance the iterator step by step //this has the same linear time complexity but uses fewer comparisons l.insert(std::lower_bound(l.begin(), l.end(), r), r); } end = clock(); std::cout<<"list time: "<<double(end  begin)/CLOCKS_PER_SEC<<'\n'; } int main() { srand(time(NULL)); test(20000); return 0; }
I get these times:
vector time: 0.049
list time: 2.492
that's 50 times slower!!
and I'm not saying that std::list doesn't have its place, it does, but it's only in some really strange cases.
edit:
and as it turns out the simplest algorithm is also better.
#include <iostream> #include <algorithm> #include <ctime> #include <cstdlib> #include <vector> #include <list> template<class FwdIter, class T> FwdIter find_insertion_point(FwdIter begin, FwdIter end, const T& val) { while(begin != end) { if(*begin > val) { break; } ++begin; } return begin; } void test(int n) { std::vector<int> v; std::list<int> l; std::vector<int> random_vals; //just so that we can use the same random values for each test for(int i=0; i < n; ++i) { random_vals.push_back(rand()); } clock_t begin = clock(); for(int i=0; i < n; ++i) { int r = random_vals[i]; //uses exact algorthim for inserting element into vector that prof said v.insert(std::lower_bound(v.begin(), v.end(), r), r); } clock_t end = clock(); std::cout<<"vector time: "<<double(end  begin)/CLOCKS_PER_SEC<<'\n'; begin = clock(); for(int i=0; i < n; ++i) { int r = random_vals[i]; //uses near exact algorthim for inserting element into list that prof said //this is like a binary search but it has to advance the iterator step by step //this has the same liniar time complexity but uses fewer comparisions l.insert(find_insertion_point(l.begin(), l.end(), r), r); } end = clock(); std::cout<<"list time: "<<double(end  begin)/CLOCKS_PER_SEC<<'\n'; } int main() { srand(time(NULL)); test(20000); return 0; }
this gives me:
vector time: 0.049
list time: 0.834
that's still 17 times slower!
This post has been edited by ishkabible: 02 June 2012  12:39 PM
#18
Re: What is more efficient
Posted 02 June 2012  01:37 PM
My gut feel though is that his numbers will look very different and the linked lists will win if:
struct HugeRecord { int Key; char BigBlockRepresentingRestOfRecord[8192  sizeof(int)]; };
and you start to compare vector<HugeRecord> vs list<HugeRecord> where the random numbers generated were in the Key fields.
This post has been edited by Skydiver: 02 June 2012  01:41 PM
#19
Re: What is more efficient
Posted 02 June 2012  01:41 PM
ishkabible, on 02 June 2012  09:29 PM, said:
It should be pointed out that both cases are O(n) (and the prof didn't say anything contrary to that), so neither is better just going by the BigOh. Though I have to say that phrasing it in terms of O(n) + O(n log n) vs. O(n) + O(1) certainly makes it sound as if one was asymptotically better than the other, which is extremely misleading. That's the fault of the teacher though, not BigOh notation.
#20
Re: What is more efficient
Posted 02 June 2012  01:57 PM
Skydiver: yes; with something sufficiently sized I would expect a vector would be worse(I'll test it to see how far). in this instance however you can get better performance by using move semantics and dynamically allocate that array(C++11 only). if you have a bunch of small static arrays that have different types or purposes or the class has a ton of members(and I mean a TON) then a linked list would be better but that is exceedingly rare(epically considering the set of algorithms this would be applicable for). also, at that point you would probably dynamically allocate the instances anyway and a vector would be faster.
edit:
pretty surprising, it only took an array of 8 integers to overthrow an std::vector with 5000 items but with 20000 items it takes arrays of 150 elements for std::list to overthrow std::vector. I'm not exactly sure why this is. it's consistent on my system from 100040000 items that as the number of items goes up you need larger and larger elements for std::list to beat std::vector. I would *expect* that the behavior would continue. in the case of small <10000 elements it would be worth testing to see if std::list will beat std::vector with larger elements(say like 16 bytes or more) for certain algorithms. anything too much bigger(>3264 bytes) dynamic memory allocation probably should've been used in the first place.
#include <iostream> #include <algorithm> #include <ctime> #include <cstdlib> #include <vector> #include <list> template<int size> struct big { int k; int data[size]; big(int x) : k(x) {} bool operator>(const big& B)/> { return k > b.k; } bool operator<(const big& B)/> { return k < b.k; } bool operator>=(const big& B)/> { return k >= b.k; } bool operator<=(const big& B)/> { return k <= b.k; } bool operator==(const big& B)/> { return k >= b.k; } bool operator!=(const big& B)/> { return k != b.k; } }; template<class FwdIter, class T> FwdIter find_insertion_point(FwdIter begin, FwdIter end, const T& val) { while(begin != end) { if(*begin > val) { break; } ++begin; } return begin; } template<int size> void test(int n) { std::vector<big<size>> v; std::list<big<size>> l; std::vector<int> random_vals; //just so that we can use the same random values for each test for(int i=0; i < n; ++i) { random_vals.push_back(rand()); } clock_t begin = clock(); for(int i=0; i < n; ++i) { int r = random_vals[i]; //uses exact algorthim for inserting element into vector that prof said v.insert(std::lower_bound(v.begin(), v.end(), r), r); } clock_t end = clock(); std::cout<<"vector time: "<<double(end  begin)/CLOCKS_PER_SEC<<'\n'; begin = clock(); for(int i=0; i < n; ++i) { int r = random_vals[i]; //uses near exact algorthim for inserting element into list that prof said //this is like a binary search but it has to advance the iterator step by step //this has the same liniar time complexity but uses fewer comparisions l.insert(find_insertion_point(l.begin(), l.end(), r), r); } end = clock(); std::cout<<"list time: "<<double(end  begin)/CLOCKS_PER_SEC<<'\n'; } int main() { srand(time(NULL)); test<8>(5000); return 0; }
This post has been edited by ishkabible: 02 June 2012  02:31 PM
#21
Re: What is more efficient
Posted 02 June 2012  02:27 PM
// ListVsVector.cpp : Defines the entry point for the console application. // #include "stdafx.h" template<int size> struct HugeRecord { int Key; char Dummy[size  sizeof(int)]; HugeRecord(int key) : Key(key) { } bool operator <(const HugeRecord & rhs) const { return Key < rhs.Key; } }; template<class T> void test(const std::vector<int> & random_vals) { std::vector<T> v; std::list<T> l; std::cout << "Record size: " << sizeof(T) << " bytes.\n"; clock_t begin = clock(); for(auto it = random_vals.begin(); it < random_vals.end(); ++it) { //uses exact algorthim for inserting element into vector that prof said T * r = new T(*it); v.insert(std::lower_bound(v.begin(), v.end(), *r), *r); } clock_t end = clock(); std::cout<<"vector time: "<<double(end  begin)/CLOCKS_PER_SEC<<'\n'; begin = clock(); for(auto it = random_vals.begin(); it < random_vals.end(); ++it) { //uses near exact algorthim for inserting element into list that prof said //this is like a binary search but it has to advance the iterator step by step //this has the same linear time complexity but uses fewer comparisons T * r = new T(*it); l.insert(std::lower_bound(l.begin(), l.end(), *r), *r); } end = clock(); std::cout<<"list time: "<<double(end  begin)/CLOCKS_PER_SEC<<'\n'; } int main() { srand(time(NULL)); //just so that we can use the same random values for each test std::vector<int> random_vals; for(int i=0; i < 5000; ++i) { random_vals.push_back(rand()); } test<int>(random_vals); test<HugeRecord<8>>(random_vals); test<HugeRecord<16>>(random_vals); test<HugeRecord<32>>(random_vals); test<HugeRecord<64>>(random_vals); test<HugeRecord<128>>(random_vals); test<HugeRecord<256>>(random_vals); test<HugeRecord<512>>(random_vals); test<HugeRecord<1024>>(random_vals); test<HugeRecord<2048>>(random_vals); test<HugeRecord<4096>>(random_vals); test<HugeRecord<8192>>(random_vals); std::cout << "Done.\n"; std::cin.get(); return 0; }
I must say you have a pretty beefy machine to do the 20000 ints in 0.049. I had to cut back to about 5000. Feel free to change the code above back to 20000 and put in the simpler search.
This post has been edited by Skydiver: 02 June 2012  02:31 PM
#22
Re: What is more efficient
Posted 02 June 2012  02:35 PM
if you turn on optimization and use the release/build version of your program you should get less of a gap.
This post has been edited by ishkabible: 02 June 2012  02:38 PM
#23
Re: What is more efficient
Posted 02 June 2012  03:01 PM
ishkabible, on 02 June 2012  02:35 PM, said:
if you turn on optimization and use the release/build version of your program you should get less of a gap.
I swapped from debug to release mode very early on. Or at least I thought I did... :) Switching it back to Release again. ($!@# VS2010. No wonder I prefer to hand write my make files and build on the command line when not doing things quick and dirty.) Looks like the break even point is even lower now that I'm really building in release mode: somewhere between 128256 bytes the list starts to win.
Yup, as I was reading elsewhere VC++ STL uses 1.5 growth factor for its vectors, while GCC uses 2 as the growth factor. And there is the issue of different memory managers. So testing different libs and array sizes should be there for completeness.
Anyway, I don't recall the source anymore, but I heard that for a typical LOB application, a "record" size is about 24K of data while game objects would be more in the 12K range. But when I heard this, this was in the days when the "style" was to keep everything in single structs because "allocating memory is slow so allocate as big a chunk as possible" was the philosophy. With people's now calling new left and right and confidence is safe pointers, I think "record" sizes are now smaller.
This post has been edited by Skydiver: 02 June 2012  03:12 PM
#24
Re: What is more efficient
Posted 02 June 2012  06:08 PM
#25
Re: What is more efficient
Posted 02 June 2012  06:25 PM
I pointed out that in implementation, std::vector is faster, just like I said it often is earlier. I showed a test to prove it.
sepp2k pointed O(n) + O(log n) == O(n) becuase you only keep the largest term with BigO becuase these 2 items will scale similarly. BigO isn't made to settle small differences in algorithms, just how well the algorithms scale.
Skydiver pointed out that if the elements are big enough that a linked list will preform the operation you described better than a vector. we then tested this to see how much bigger the items need to be and have no conclusive result(and the answer will probably differ from machine to machine, compiler to compiler, standard library to standard library, etc...). all we have shown is that linked lists will indeed preform this operation more efficiently for larger elements but how large "larger" is is very dependent on many factors.
This post has been edited by ishkabible: 02 June 2012  06:32 PM
