8 Replies - 1181 Views - Last Post: 12 June 2016 - 04:57 AM

#1 Skydiver  Icon User is online

  • Code herder
  • member icon

Reputation: 5922
  • View blog
  • Posts: 20,246
  • Joined: 05-May 12

Finding the mode of a set of floating point numbers

Posted 11 June 2016 - 06:12 AM

If like in the statistics help thread, one decides to store the data in a set of floating point numbers, what is the correct and efficient way to get the mode?

Do straight on equality checks despite what all our CS teachers tell us about not doing equality checks with floating point numbers?

Do epsilon checks to avoid doing floating point checks? OR

Do string comparisons using the incoming strings from the source file?

I'm leaning towards the last, but wanted to get what other folks opinions are. (I understand that implies that it will be much harder to create a generic function double FindMode(double arr[], size_t count). It'll have to be std::string FindMode(std::string filename).)

Is This A Good Question/Topic? 0
  • +

Replies To: Finding the mode of a set of floating point numbers

#2 andrewsw  Icon User is offline

  • say what now
  • member icon

Reputation: 6408
  • View blog
  • Posts: 25,889
  • Joined: 12-December 12

Re: Finding the mode of a set of floating point numbers

Posted 11 June 2016 - 08:01 AM

I wouldn't call myself a Mathematician but I would consider "flooring" them. Iterate each value reading it downwards (or upwards) to the nearest acceptable multiple. I suppose this is the same as your epsilon checking solution, I am perhaps just side-stepping the process of first grouping all the values into a distinct data structure.

To describe what I mean, I would examine each value, round it down and create, or increase, a dictionary entry for it. The dictionary will have lots of missing intervals, and won't be ordered, but it will allow me to determine the mode.

On the other hand, grouping/tallying them to a more acceptable, and useful, structure, will probably be useful further down the line.

If we don't need the original values then we could instead group them as we read them, which would be more efficient than performing separate passes. (Is there a mathematical term for "boxing" items? I cannot recall just at the moment..)
Was This Post Helpful? 0
  • +
  • -

#3 jimblumberg  Icon User is online

  • member icon

Reputation: 5343
  • View blog
  • Posts: 16,676
  • Joined: 25-December 09

Re: Finding the mode of a set of floating point numbers

Posted 11 June 2016 - 08:42 AM

I really doubt that there is one answer that will solve this problem for every case. In some cases checking for equality may be acceptable, in other cases epsilon checks would be required.

The string comparison would only solve the problem of floating point errors on the read values. But depending on how the file was created there may be floating point errors present in the values written. These errors may be masked if the number of significant digits is small, but more noticeable with a large number of digits. Ie. .3 versus .3333.

Jim
Was This Post Helpful? 0
  • +
  • -

#4 Skydiver  Icon User is online

  • Code herder
  • member icon

Reputation: 5922
  • View blog
  • Posts: 20,246
  • Joined: 05-May 12

Re: Finding the mode of a set of floating point numbers

Posted 11 June 2016 - 09:21 AM

View Postandrewsw, on 11 June 2016 - 11:01 AM, said:

If we don't need the original values then we could instead group them as we read them, which would be more efficient than performing separate passes. (Is there a mathematical term for "boxing" items? I cannot recall just at the moment..)

Quantizing?
Was This Post Helpful? 0
  • +
  • -

#5 andrewsw  Icon User is offline

  • say what now
  • member icon

Reputation: 6408
  • View blog
  • Posts: 25,889
  • Joined: 12-December 12

Re: Finding the mode of a set of floating point numbers

Posted 11 June 2016 - 09:24 AM

Quantiling?
Was This Post Helpful? 0
  • +
  • -

#6 Skydiver  Icon User is online

  • Code herder
  • member icon

Reputation: 5922
  • View blog
  • Posts: 20,246
  • Joined: 05-May 12

Re: Finding the mode of a set of floating point numbers

Posted 11 June 2016 - 06:13 PM

If I recall correctly, quantiling is when you slice up the range of values from your data into particular segments. Typically, this buckets are sized to be 25%, 10%, or 5% segments. I have doubts about how accurate the mode would be after going through that process.
Was This Post Helpful? 0
  • +
  • -

#7 andrewsw  Icon User is offline

  • say what now
  • member icon

Reputation: 6408
  • View blog
  • Posts: 25,889
  • Joined: 12-December 12

Re: Finding the mode of a set of floating point numbers

Posted 11 June 2016 - 10:48 PM

There could be 100 (percentiles) or 1000 (permilles) quantiles ;)
Was This Post Helpful? 1
  • +
  • -

#8 Skydiver  Icon User is online

  • Code herder
  • member icon

Reputation: 5922
  • View blog
  • Posts: 20,246
  • Joined: 05-May 12

Re: Finding the mode of a set of floating point numbers

Posted 12 June 2016 - 04:48 AM

Thanks for that link! It reminded me that the range is not divided so that the intervals are equally sized, but rather the probabilities are equally sized. So some buckets may have bigger ranges than others if they have smaller populations. Since the mode is the most common number, that would make the population higher in its bucket, and so the range narrower.
Was This Post Helpful? 0
  • +
  • -

#9 andrewsw  Icon User is offline

  • say what now
  • member icon

Reputation: 6408
  • View blog
  • Posts: 25,889
  • Joined: 12-December 12

Re: Finding the mode of a set of floating point numbers

Posted 12 June 2016 - 04:57 AM

Ah, so I'm back to my original description for dividing into equal intervals. I think I'll have to stick with just "intervalling" or "grouping" ;). Apologies for my nonsense detour!
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1