If like in the statistics help thread, one decides to store the data in a set of floating point numbers, what is the correct and efficient way to get the mode?

Do straight on equality checks despite what all our CS teachers tell us about not doing equality checks with floating point numbers?

Do epsilon checks to avoid doing floating point checks? OR

Do string comparisons using the incoming strings from the source file?

I'm leaning towards the last, but wanted to get what other folks opinions are. (I understand that implies that it will be much harder to create a generic function double FindMode(double arr[], size_t count). It'll have to be std::string FindMode(std::string filename).)

## 8 Replies - 1344 Views - Last Post: 12 June 2016 - 04:57 AM

##
**Replies To:** Finding the mode of a set of floating point numbers

### #2

## Re: Finding the mode of a set of floating point numbers

Posted 11 June 2016 - 08:01 AM

I wouldn't call myself a Mathematician but I would consider "flooring" them. Iterate each value reading it downwards (or upwards) to the nearest acceptable multiple. I suppose this is the same as your epsilon checking solution, I am perhaps just side-stepping the process of first grouping all the values into a distinct data structure.

To describe what I mean, I would examine each value, round it down and create, or increase, a dictionary entry for it. The dictionary will have lots of missing intervals, and won't be ordered, but it will allow me to determine the mode.

On the other hand, grouping/tallying them to a more acceptable, and useful, structure, will probably be useful further down the line.

If we don't need the original values then we could instead group them as we read them, which would be more efficient than performing separate passes. (Is there a mathematical term for "boxing" items? I cannot recall just at the moment..)

To describe what I mean, I would examine each value, round it down and create, or increase, a dictionary entry for it. The dictionary will have lots of missing intervals, and won't be ordered, but it will allow me to determine the mode.

On the other hand, grouping/tallying them to a more acceptable, and useful, structure, will probably be useful further down the line.

If we don't need the original values then we could instead group them as we read them, which would be more efficient than performing separate passes. (Is there a mathematical term for "boxing" items? I cannot recall just at the moment..)

### #3

## Re: Finding the mode of a set of floating point numbers

Posted 11 June 2016 - 08:42 AM

I really doubt that there is one answer that will solve this problem for every case. In some cases checking for equality may be acceptable, in other cases epsilon checks would be required.

The string comparison would only solve the problem of floating point errors on the read values. But depending on how the file was created there may be floating point errors present in the values written. These errors may be masked if the number of significant digits is small, but more noticeable with a large number of digits. Ie. .3 versus .3333.

Jim

The string comparison would only solve the problem of floating point errors on the read values. But depending on how the file was created there may be floating point errors present in the values written. These errors may be masked if the number of significant digits is small, but more noticeable with a large number of digits. Ie. .3 versus .3333.

Jim

### #4

## Re: Finding the mode of a set of floating point numbers

Posted 11 June 2016 - 09:21 AM

### #5

## Re: Finding the mode of a set of floating point numbers

Posted 11 June 2016 - 09:24 AM

Quantiling?

### #6

## Re: Finding the mode of a set of floating point numbers

Posted 11 June 2016 - 06:13 PM

If I recall correctly, quantiling is when you slice up the range of values from your data into particular segments. Typically, this buckets are sized to be 25%, 10%, or 5% segments. I have doubts about how accurate the mode would be after going through that process.

### #8

## Re: Finding the mode of a set of floating point numbers

Posted 12 June 2016 - 04:48 AM

Thanks for that link! It reminded me that the range is not divided so that the

**intervals**are equally sized, but rather the probabilities are equally sized. So some buckets may have bigger ranges than others if they have smaller populations. Since the mode is the most common number, that would make the population higher in its bucket, and so the range narrower.### #9

## Re: Finding the mode of a set of floating point numbers

Posted 12 June 2016 - 04:57 AM

Ah, so I'm back to my original description for dividing into equal intervals. I think I'll have to stick with just "intervalling" or "grouping" . Apologies for my nonsense detour!

Page 1 of 1