Table doubling + hash tables = love

Hello reader,

During my quest for learning I have encountered this thing, the “Table doubling”. Boy I was so happy about it that I wrote this article :).

The goal is to structure some data, any data, in such a way that we can delete, insert and retrieve really really fast while having a decent memory usage. A real life example would be the HTML code of a webpage. If we consider each HTML element (node) with its properties a set of data we want to get the one that we care about super fast, not in a second or so. So, in order to get a sense of why this is useful I will lay down some premises.

An array = a linear piece of memory of the length of the array. This will be always busy. The array must have the keys/indexes integers. We can’t have the key/index 2.3 or 0.1. Due to those two properties we can get to any element almost instantly because it exists and we know where it is exactly.

A linked list = A piece of data that besides its “internal” information that we care about (for our example let’s say the inner text), it has a link to location in the device’s memory for the next item after it. So 1 knows where 2 is, 2 knows where 3 is, so on and so forth. This is rather sweet but if goes only forward, with a bit of extra work we get to double linked list.

Double linked list = a linked list where the individual piece of information knows about the previous piece as well as the next. Although it has a bigger footprint it solves our previous request in a neat fast way.

Now, in order to nail down the difference, let’s look at what happens when we want to add an element close to the beginning of an array and of a list (double or linked, it doesn’t matter).

Operation Array List
Fetch an element by key
aka
Exact search
Go to key’s position Go to the first or last
Check if it is the one we need
If not ask where the next or previous is located
Repeat the previous two steps until found it*
Insert Increase the length of the array with a number of elements equal with how many we will insert
Move to the right all the other elements in a descending order
Add our data where we wanted to **
It will go at the end of the list and it will tell the last element to have it’s memory address stored.
Delete Remove the element
Move all the other elements to the left
Decrease the size of the array
Go to the desired element forwards or backwards
Tell the element to the left to point to the element to the right. If needed tell the element to the right
to point to the left one

* In case it is not obvious how slow it is. Take a piece of paper and write the steps for finding the 5th element and count the rows.
** It will not put it at the end of the array because the order matters. The premises was that we are close to the beginning of the array so don’t think of adding at the end.

In conclusion, as we can see, there are quite some advantages and disadvantages for each of the data types.

Now in order to find the holly ground of those two data types we will discover a new one, called Hash Table. The not so official definition would be an array that stores as values of its keys pointers to – in the worst case scenario – lists of data with all its properties.

The other concept of this post is Table doubling . This says in less words that instead of resizing an array 1 element at a time, when it gets filled we make it twice as long as it is now. This will allow us to hold twice as much data for future inserts after one operation, and eliminates the need of moving all the elements all the time. The second part is that on delete we shrink it by at least 1/3. This means that inserting and deleting a few elements will not trigger the resizing.

Let’s look at every operation and see the advantage.

Exact search:
Now due to being an array we get very fast navigation and finding.
Insert:
From time to time it triggers the array re-hashing but not so often. As long as we play with the same amount of data or fiddle a bit with the “1/3” value mentioned earlier, everything will work with the speed of the list.
Delete:
The deletion takes part only where there are a lot of free elements, and event after the deletion we have room for more than we had in the beginning. Almost as fast as the double linked list.

 

Conclusion:
I can see this saving seconds on every test case ran on a page with a few hundredth elements.

Leave a Reply

Your email address will not be published. Required fields are marked *