Not duplicating integers in sum

  • (2 Pages)
  • +
  • 1
  • 2

19 Replies - 3550 Views - Last Post: 25 February 2012 - 08:28 PM Rate Topic: -----

#16 sepp2k  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 2113
  • View blog
  • Posts: 3,234
  • Joined: 21-June 11

Re: Not duplicating integers in sum

Posted 24 February 2012 - 03:30 PM

Everything works that way, not just arrays. If you mutate an object those changes will be visible to all variables that refer to that object. The only way to prevent changes from affecting other variables is to not call mutating methods (that's why it's ruby convention to suffix all mutating methods with a !, so they're easy to avoid) or to make sure that you copy the object, so no other variables refer to the same copy. Ruby never implicitly copies objects.
Was This Post Helpful? 0
  • +
  • -

#17 Karel-Lodewijk  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 449
  • View blog
  • Posts: 849
  • Joined: 17-March 11

Re: Not duplicating integers in sum

Posted 25 February 2012 - 11:52 AM

View Postsepp2k, on 24 February 2012 - 10:10 PM, said:

Note also that your algorithm's average running time is in O(n log n), while ruby's is in O(n) (assuming a decent hashing function).


I didn't delve into the ruby implementation to make certain. I don't know if ruby's uniq implementation is O(n) and I can't be certain that ruby's sort is not O(n), it could use radix sort for integer types (I doubt it but it's possible). Results will depend on the input size/range/distribution. A good benchmark of uniq or sort for that matter would test it with many lists of different sizes/ranges/distributions/types. You are right to say that I shouldn't make such bold claims about performance, it was a simple observation.

That being said, you also fed it something really specific range 0..(10k-1) concatenated 100 times. If I feed it something more random i.e.: array = Array.new(10000000) {rand(10000000)}, I see my version beating it again.

Anyway, there's no point going on about it, I was not aiming to create a fast uniq function, I just made a uniq function.
Was This Post Helpful? 0
  • +
  • -

#18 whamp  Icon User is offline

  • New D.I.C Head

Reputation: -3
  • View blog
  • Posts: 10
  • Joined: 16-February 12

Re: Not duplicating integers in sum

Posted 25 February 2012 - 12:18 PM

Thanks for all your help everyone.
Was This Post Helpful? 0
  • +
  • -

#19 Karel-Lodewijk  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 449
  • View blog
  • Posts: 849
  • Joined: 17-March 11

Re: Not duplicating integers in sum

Posted 25 February 2012 - 12:39 PM

Sorry that we got a little off-topic.
Was This Post Helpful? 0
  • +
  • -

#20 sepp2k  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 2113
  • View blog
  • Posts: 3,234
  • Joined: 21-June 11

Re: Not duplicating integers in sum

Posted 25 February 2012 - 08:28 PM

View PostKarel-Lodewijk, on 25 February 2012 - 07:52 PM, said:

I didn't delve into the ruby implementation to make certain. I don't know if ruby's uniq implementation is O(n) and I can't be certain that ruby's sort is not O(n)


Well, I'm certain, so you could just trust me ;-)

Ruby's uniq uses hash maps to find duplicates and ruby's sort uses quicksort, no matter the element type. Note that I only know this for a fact about MRI. Other implementations might do different things, but I don't think they do in this case.

Also note that since ruby's integers are unbounded and arrays can be heterogeneous, it would actually be quite hard for sort to special-case integer arrays. It would first have to make certain that all elements are integers and that they're all within a range where a different algorithm actually makes sense.

Quote

Results will depend on the input size/range/distribution. A good benchmark of uniq or sort for that matter would test it with many lists of different sizes/ranges/distributions/types.


True, I admit I was lazy.

Quote

That being said, you also fed it something really specific range 0..(10k-1) concatenated 100 times. If I feed it something more random i.e.: array = Array.new(10000000) {rand(10000000)}, I see my version beating it again.


I don't think that's about it being random, but about there being less duplicates, i.e. I suspect that ruby's version performs better the more duplicates there are, while yours performs better if there are few duplicates because of the overhead of the hash table growing large. And indeed tests seem to confirm this:

array1 = Array.new(10000000) {rand(10000000)}
array2 = Array.new(10000000) {rand(10000)}

Benchmark.bm do |bm|
  bm.report("uniq on 10000000 elements with few dups") { array1.uniq }
  bm.report("karel on 10000000 elements with few dups") { karels_uniq array1}

  bm.report("uniq on 10000000 elements with lotsa dups") { array2.uniq }
  bm.report("karel on 10000000 elements with lotsa dups") { karels_uniq array2}
end



Results:

                                             user        system    total        real
uniq on 10000000 elements with few dups    14.890000   0.490000  15.380000 ( 15.346995)
karel on 10000000 elements with few dups    8.470000   0.010000   8.480000 (  8.604579)
uniq on 10000000 elements with lotsa dups   2.100000   0.010000   2.110000 (  2.101718)
karel on 10000000 elements with lotsa dups  7.440000   0.000000   7.440000 (  7.542186)



So in conclusion using a custom uniq method like yours will indeed be worthwhile if you expect there to be few duplicates, while ruby's version is best suited for cases where there are a lot of duplicates.

Quote

Anyway, there's no point going on about it, I was not aiming to create a fast uniq function, I just made a uniq function.


I think there's plenty of point going on about this. For one thing I think that curiosity by itself is a perfectly sufficient reason to look at unexpected results in detail. For another I just learned something worthwhile (i.e. in which cases ruby's uniq performs well and in which it does not). So I'm glad we had this discussion (even if we did get off-topic, for which I apologize to the OP).
Was This Post Helpful? 1
  • +
  • -

  • (2 Pages)
  • +
  • 1
  • 2