Hi! this is my first tutorial and I'm going to explain how hashing and hash table works and how to use it. OK, but what is Hash Table actually and how can it benefit us? well, a Hash Table is a type of table in which we can use each record's id to retrieve the it's data easily. So, each record must have a unique id, which is going to be an int higher than or equal to 1. And another thing to mention, we use a value of -1 for id for each empty record's id.
Hash Function
For this tutorial, I used the following simple function to hash a given id:
hash(id) = id%size;
size is the length of our table, which has to be a prime number. I used 51 in this tutorial. The reason why it needs to be prime is that, if we don't use a prime number, the resulting hashed value will be zero for each id divideable by size. But if we use a prime number for size, then we decrease the chance of id being divideable by size.
Collisions
When inserting items into a hash table, it sometimes happens that an item to be inserted in table has a hash value that points to an already existing location in table. The new item typically is inserted somewhere else, though we must be sure that there is still a way to look up the item.
One of the simplest ways to overcome collisions is to use a rehash function:
rehash(id) = (id+1)%size;
Note: When we hash an id and check and see that it is already taken, we give the hashed id to rehash function.
OK, now here's an example program I wrote to implement a hash table and function. It's quite straight forward but I commented it too:
#include <iostream>
#include <string>
using namespace std;
//data structure to hold id and data, our data-structure we want to use
struct data {
public:
int id; //to hold a unique id for each element
int data; //the data for each element, I used a simple int
};
class hasher {
data dt[51]; //the table to hold hashed data structs
int numel; //number of elements in table, to check if it's full
public:
hasher();
int hash(int &id);
int rehash(int &id);
int add(data &d);
int remove(data &d);
void output();
};
/*this is the function to give hashed id, it's a simple one...
It's better that we use a prime number for our length, thats why I used 11
and the bigger possible, it's better because we reduce our collisions*/
int hasher::hash(int &id) {
return (id%51);
}
/*in case of ay collision(a hashed value which is already occupied before)
we use rehash function instead of hash*/
int hasher::rehash(int &id) {
return ((id+1)%51);
}
hasher::hasher() {
//create an array of data structure
int i;
for(i=0;i<=50;i++) {
dt[i].id = -1; //set all ids to -1 to show they're empty
dt[i].data = 0; //set all data values to 0, which is default
}
numel = 0;
}
int hasher::add(data &d) {
if(numel < 51) {
//table has empty places...
int hashed = hash(d.id);
if(hashed >= 0 && hashed <= 50 && dt[hashed].id == -1) {
//slot is empty, assign new data
dt[hashed].id = d.id;
dt[hashed].data = d.data;
return 0;
} else {
//we need to rehash the id
int i=0;
//try every place in table to find an empty place
while(i<=50) {
hashed = rehash(hashed);
if(dt[hashed].id == -1) {
dt[hashed].id = d.id;
dt[hashed].data = d.data;
return 0;
} else if(i==50) {
//we couldn't find the empty place
return -1; //terminate function with error
}
i++; //update value of i
}
}
} else {
//table is full
return (-1);
}
}
int hasher::remove(data &d) {
int hashed = hash(d.id);
if(dt[hashed].id == d.id) {
//this is the right one to delete
dt[hashed].id = -1;
numel -= 1;
return 0;
} else {
//we need a rehash to find the one
int i=0;
while(i<=50) {
hashed = rehash(hashed);
if(dt[hashed].id == d.id) {
dt[hashed].id = -1; //set the id to -1 because it is going to be empty
numel -= 1; //decrease 1 from number of elements
return 0; //success
} else if(i==50) {
return -1; //terminate function
}
i++; //update the value of i
}
}
}
void hasher::output() {
int i;
for(i=0;i<51;i++) {
cout<<i<<" -> "<<dt[i].id<<" "<<dt[i].data<<endl;
}
}
int main() {
int id=45; //first id
int ret;
data d;
d.data = 52005; //a value for all records, simply
hasher h1;
while(ret != -1) {
d.id=id;
ret = h1.add(d); //add this record to table
id += (id/5); //update the id, I wanted to show how different ids are put in table, look at output
}
d.id = 271861; //set to one of the ids we added to table in above loop
h1.remove(d); //remove that record from table
h1.output(); //see the output, notice there is one empty record, in index number 33, that was the record with id of 271861
return 0;
}
Final Thoughts
In this tutorial, I used a simple hash function which was good enough for working with integer values, but what if we want to hash strings ? Yup we need a more advanced function to do that. There are many ways to do that which is beyond this tutorial's goals. But at least you are now familiar with the way hashing works. Hope this tutorial helped you! 






MultiQuote








|