8 Replies - 9235 Views - Last Post: 08 January 2013 - 02:09 AM

#1 rstnsrr  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 5
  • Joined: 20-December 12

A good hashing

Posted 20 December 2012 - 10:57 PM

Hi all,
please suggest me a good hash function to hash 2^32 unsigned integers.Also suggest me the optimal
hash table size.

Thanks in advance,
rstnsrr
Is This A Good Question/Topic? 0
  • +

Replies To: A good hashing

#2 Skydiver  Icon User is online

  • Code herder
  • member icon

Reputation: 3534
  • View blog
  • Posts: 10,941
  • Joined: 05-May 12

Re: A good hashing

Posted 21 December 2012 - 08:06 PM

What are you trying to do with your incoming 232 numbers? What are your memory constraints? What are your time constraints? What is expected to happen in case of collisions?
Was This Post Helpful? 0
  • +
  • -

#3 cfoley  Icon User is offline

  • Cabbage
  • member icon

Reputation: 1949
  • View blog
  • Posts: 4,048
  • Joined: 11-December 07

Re: A good hashing

Posted 04 January 2013 - 05:33 AM

int hash(int i) {
    return i;
}

Was This Post Helpful? 0
  • +
  • -

#4 rstnsrr  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 5
  • Joined: 20-December 12

Re: A good hashing

Posted 04 January 2013 - 07:17 AM

View PostSkydiver, on 21 December 2012 - 08:06 PM, said:

What are you trying to do with your incoming 232 numbers? What are your memory constraints? What are your time constraints? What is expected to happen in case of collisions?



I am sorry for the wrong input. I need a good hash function to store 25 million entries. entries will be structures
containing atleast 10 fields mostly of unsigned 32 bit ints. My constraint is memory limitation. The inserting and searching
of a particular entry must be efficient.
Was This Post Helpful? 0
  • +
  • -

#5 cfoley  Icon User is offline

  • Cabbage
  • member icon

Reputation: 1949
  • View blog
  • Posts: 4,048
  • Joined: 11-December 07

Re: A good hashing

Posted 04 January 2013 - 07:28 AM

And what is the distribution of your ints? If they are evenly distributed then you can't do much better than add them all together. If they are all between 1 and 100 then that approach is crazy.
Was This Post Helpful? 3
  • +
  • -

#6 cfoley  Icon User is offline

  • Cabbage
  • member icon

Reputation: 1949
  • View blog
  • Posts: 4,048
  • Joined: 11-December 07

Re: A good hashing

Posted 04 January 2013 - 12:32 PM

Quote

My constraint is memory limitation.


Hang on, this makes no sense. I'm happy to believe your application as a whole is tight on memory but this hashtable is relatively small. A hashtable is just an array of pointers. 25,000,000 pointers comes to under 100 MB on a 32 bit system. You do need some extra space to keep the hashtable efficient. 150 MB is generous. A 64 bit machine will double these figures but it should have more memory to begin with.

Your primary concerns for the hash function are avoiding collisions and speed. I was only half joking when I said to simply add up the numbers. It's fast and if the numbers are randomly distributed within the allowed range it's probably as good as you will get in in terms of collision avoidance.

If the numbers are not completely random within the range, I would multiply them all by a different prime number before adding them. There are probably better algorithms that I don't know about but this should do a reasonable job of avoiding collisions and is still fast.
Was This Post Helpful? 2
  • +
  • -

#7 rstnsrr  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 5
  • Joined: 20-December 12

Re: A good hashing

Posted 07 January 2013 - 11:13 PM

View Postcfoley, on 04 January 2013 - 12:32 PM, said:

Quote

My constraint is memory limitation.


Hang on, this makes no sense. I'm happy to believe your application as a whole is tight on memory but this hashtable is relatively small. A hashtable is just an array of pointers. 25,000,000 pointers comes to under 100 MB on a 32 bit system. You do need some extra space to keep the hashtable efficient. 150 MB is generous. A 64 bit machine will double these figures but it should have more memory to begin with.

Your primary concerns for the hash function are avoiding collisions and speed. I was only half joking when I said to simply add up the numbers. It's fast and if the numbers are randomly distributed within the allowed range it's probably as good as you will get in in terms of collision avoidance.

If the numbers are not completely random within the range, I would multiply them all by a different prime number before adding them. There are probably better algorithms that I don't know about but this should do a reasonable job of avoiding collisions and is still fast.


Thank you for your input. Actually My application will receive the hash keys from an external application, whose range is
indeterminate. All I need is to hash the key and store the related details in a 2D array. I wanted to know a good hash function and
optimal table size for less collision and even distribution of hash values in the table.
Was This Post Helpful? 0
  • +
  • -

#8 cfoley  Icon User is offline

  • Cabbage
  • member icon

Reputation: 1949
  • View blog
  • Posts: 4,048
  • Joined: 11-December 07

Re: A good hashing

Posted 08 January 2013 - 02:03 AM

It's all about trade-offs. An optimum size for speed and avoiding collisions is a 4 GB array (assuming 32 bit integers). Every number will map to a unique cell so there will never be the chance of collisions and the hash function can just return the original number. The memory use is obviously unacceptable so you have to go smaller with a more complex hash function.

There are lots of tricks and heuristics involving prime numbers, load factors, etc. If you want to build an efficient hashtable then you will need to read up about these. The wikipedia page probably has more information than you want to know. At the end of the day, whatever you build only has to be fast enough to pass your specifications so spending time optimising it further is time wasted. A relatively naive implementation will probably be fast enough for most purposes. It's also more than likely that whatever hashtable is provided in your language of choice (or a good third party library) will outperform whatever you build yourself.
Was This Post Helpful? 1
  • +
  • -

#9 rstnsrr  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 5
  • Joined: 20-December 12

Re: A good hashing

Posted 08 January 2013 - 02:09 AM

Thank You again for the pointers. I will research more into it.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1