Everything works that way, not just arrays. If you mutate an object those changes will be visible to all variables that refer to that object. The only way to prevent changes from affecting other variables is to not call mutating methods (that's why it's ruby convention to suffix all mutating methods with a !, so they're easy to avoid) or to make sure that you copy the object, so no other variables refer to the same copy. Ruby never implicitly copies objects.
19 Replies - 4463 Views - Last Post: 25 February 2012 - 08:28 PM
#17
Re: Not duplicating integers in sum
Posted 25 February 2012 - 11:52 AM
sepp2k, on 24 February 2012 - 10:10 PM, said:
Note also that your algorithm's average running time is in O(n log n), while ruby's is in O(n) (assuming a decent hashing function).
I didn't delve into the ruby implementation to make certain. I don't know if ruby's uniq implementation is O(n) and I can't be certain that ruby's sort is not O(n), it could use radix sort for integer types (I doubt it but it's possible). Results will depend on the input size/range/distribution. A good benchmark of uniq or sort for that matter would test it with many lists of different sizes/ranges/distributions/types. You are right to say that I shouldn't make such bold claims about performance, it was a simple observation.
That being said, you also fed it something really specific range 0..(10k-1) concatenated 100 times. If I feed it something more random i.e.: array = Array.new(10000000) {rand(10000000)}, I see my version beating it again.
Anyway, there's no point going on about it, I was not aiming to create a fast uniq function, I just made a uniq function.
#18
Re: Not duplicating integers in sum
Posted 25 February 2012 - 12:18 PM
Thanks for all your help everyone.
#19
Re: Not duplicating integers in sum
Posted 25 February 2012 - 12:39 PM
Sorry that we got a little off-topic.
#20
Re: Not duplicating integers in sum
Posted 25 February 2012 - 08:28 PM
Karel-Lodewijk, on 25 February 2012 - 07:52 PM, said:
I didn't delve into the ruby implementation to make certain. I don't know if ruby's uniq implementation is O(n) and I can't be certain that ruby's sort is not O(n)
Well, I'm certain, so you could just trust me ;-)
Ruby's uniq uses hash maps to find duplicates and ruby's sort uses quicksort, no matter the element type. Note that I only know this for a fact about MRI. Other implementations might do different things, but I don't think they do in this case.
Also note that since ruby's integers are unbounded and arrays can be heterogeneous, it would actually be quite hard for sort to special-case integer arrays. It would first have to make certain that all elements are integers and that they're all within a range where a different algorithm actually makes sense.
Quote
Results will depend on the input size/range/distribution. A good benchmark of uniq or sort for that matter would test it with many lists of different sizes/ranges/distributions/types.
True, I admit I was lazy.
Quote
That being said, you also fed it something really specific range 0..(10k-1) concatenated 100 times. If I feed it something more random i.e.: array = Array.new(10000000) {rand(10000000)}, I see my version beating it again.
I don't think that's about it being random, but about there being less duplicates, i.e. I suspect that ruby's version performs better the more duplicates there are, while yours performs better if there are few duplicates because of the overhead of the hash table growing large. And indeed tests seem to confirm this:
array1 = Array.new(10000000) {rand(10000000)}
array2 = Array.new(10000000) {rand(10000)}
Benchmark.bm do |bm|
bm.report("uniq on 10000000 elements with few dups") { array1.uniq }
bm.report("karel on 10000000 elements with few dups") { karels_uniq array1}
bm.report("uniq on 10000000 elements with lotsa dups") { array2.uniq }
bm.report("karel on 10000000 elements with lotsa dups") { karels_uniq array2}
end
Results:
user system total real uniq on 10000000 elements with few dups 14.890000 0.490000 15.380000 ( 15.346995) karel on 10000000 elements with few dups 8.470000 0.010000 8.480000 ( 8.604579) uniq on 10000000 elements with lotsa dups 2.100000 0.010000 2.110000 ( 2.101718) karel on 10000000 elements with lotsa dups 7.440000 0.000000 7.440000 ( 7.542186)
So in conclusion using a custom uniq method like yours will indeed be worthwhile if you expect there to be few duplicates, while ruby's version is best suited for cases where there are a lot of duplicates.
Quote
Anyway, there's no point going on about it, I was not aiming to create a fast uniq function, I just made a uniq function.
I think there's plenty of point going on about this. For one thing I think that curiosity by itself is a perfectly sufficient reason to look at unexpected results in detail. For another I just learned something worthwhile (i.e. in which cases ruby's uniq performs well and in which it does not). So I'm glad we had this discussion (even if we did get off-topic, for which I apologize to the OP).

New Topic/Question
Reply



MultiQuote

|