5 Replies - 1058 Views - Last Post: 27 September 2017 - 06:15 AM Rate Topic: **--- 3 Votes

#1 C-Shark  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 3
  • Joined: 14-September 17

Want to make a Contains method faster but not sure how

Posted 14 September 2017 - 02:18 AM

Hi everyone, I hope this question is appropriate for this forum and not too much to ask for. I recently got into programming so I am trying a lot of things (some ideas from websites, some of my own) out. My most recent test is a program in which I manually created a linked list of 1kk elements and various methods to interact with said list and I am now working on improving the programs performance.
I have already optimized the Put method so it parses new objects efficiently after the most recently added object (without utilizing the "last" Pair it always looped through the whole list first - quite slow with 1kk elements). Now I want to optimize the Contains method as it takes up to 50ms per object since it loops through the whole 1kk elements in case the given element is not in the list.

You'll notice that I haven't used any standard implementations of LinkedList and the like, this is for practice reasons.
My source code of the whole program (202 lines, but I think only code for Put and Contains is relevant here):

Spoiler


My plan to improve the Contains method is as following:
- Add an array of size 256 that contains ASCII pointers/references (the pointers should then be used to be able to point to the first alphabetically sorted object)
- In my case I'd only need pointers for 0 through 9
- Adjust the Put method so it adds Pairs sorted alphabetically/numerically by key
- Adjust the Contains method to use the pointers/references and start looping at the first alphabetically sorted object instead of the very first object
- What I mean by this is that if the given Key is for example 954, it should start looping at the first element with a Key that starts with 9

The problem is that I am absolutely clueless as to how I'd properly construct such an array and fill it with pointers/references to efficiently use later on. As soon as I'd have such an array I could probably work along and adjust Put/Contains on my own, but this step has got me stuck and a little bit confused.
Any help would be greatly appreciated!

Is This A Good Question/Topic? 0
  • +

Replies To: Want to make a Contains method faster but not sure how

#2 C-Shark  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 3
  • Joined: 14-September 17

Re: Want to make a Contains method faster but not sure how

Posted 14 September 2017 - 05:05 AM

Hey, I've made some progress! I declared an array for the pointers and updated the Put function to sort & store the Pairs, and to dynamically add them to the array, see blow:

Pair head;
        Pair last;
        int _objectCount = 0;

        Pair[] pointers = new Pair[256];

        public void Put(string key, object value)
        {
            if (head == null)
            { 
                head = new Pair(key, value);
                var firstHeadAscii = (int)head.key[0];
                pointers[firstHeadAscii] = head;
                _objectCount++;
                last = head;
                return;
            }

            Pair currentPair = last;
            var firstCharAscii = (int)currentPair.key[0];     
            for (int i = 0; i < Count(); i++)
            {
                if (currentPair.key == key)
                {  
                    currentPair.value = value;
                    return;
                }
                if (currentPair.next == null)
                {
                    currentPair.next = new Pair(key, value);
                    pointers[firstCharAscii] = currentPair.next;
                    _objectCount++;
                    return;
                }
                currentPair = currentPair.next;
                last = currentPair;
            }
            throw new Exception("Good job, you reached what is supposed to be an unreachable error.");
        }


New problem is that whenever two keys with the same first number are created, the Pair that was first assigned to pointers[firstCharAscii] will be replaced. I've tried making the pointers[firstCharAscii] into an array, to have an arrays in the pointer array, but that didn't quite work out.
How can I properly store all the Pairs in the pointers array, and prevent them from overwriting eachother?
Was This Post Helpful? 0
  • +
  • -

#3 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 5928
  • View blog
  • Posts: 20,265
  • Joined: 05-May 12

Re: Want to make a Contains method faster but not sure how

Posted 14 September 2017 - 09:00 AM

The problem is that you call your data structure a map (e.g. MyMap), but what you have actually implemented a list. If you just step back and implement your map with better data structure like a hashtable or a balanced tree you will get much better performance.
Was This Post Helpful? 0
  • +
  • -

#4 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 5928
  • View blog
  • Posts: 20,265
  • Joined: 05-May 12

Re: Want to make a Contains method faster but not sure how

Posted 14 September 2017 - 08:16 PM

Now that I'm not constantly being bugged and I have time to read closer, I now understand that you deliberately wanted to implement a linked list because you are trying to exercise that skill. You've also discovered one of the "features" of a linked list which is it's inherent O(n) performance (aka linear performance) when doing searches. Your attempt to improve the performance by using an index into the linked list in theory should give you a resulting O(n/2) average performance, but realize that O(n/2) is still O(n).

Anyway to help you with this, approach, you'll need to remember to keep your linked list in sorted order.

So instead of:
head
|
v
Mercury -> Venus -> Earth -> Mars -> Jupiter -> Saturn -> Uranus -> Neptune



You'll want something like:
head
|
v
Earth -> Jupiter -> Mars -> Mercury -> Neptune -> Saturn -> Uranus -> Venus
^        ^          ^                  ^          ^         ^         ^
|        |          |                  |          |         |         |
E        J          M                  N          S         U         V



You can improve the O(n/2) to O(n/256). Even though it is still O(n), at least the absolute wait time of the user is reduced. The pseudo code for that version of Contains() would look something like:
first = searchKey[0];
node = indexOfNodes[first];
while (node != null && node.key[0] == first)
{
    if (node.key == searchKey)
        return true;
    node = node.next;
}
return false;



As a quick aside, in C#, char is actually 16 bits, so instead of 0-255 character values, you have 0-65335.

This post has been edited by Skydiver: 15 September 2017 - 06:15 AM
Reason for edit:: Fixed alphabetic ordering or linked list.

Was This Post Helpful? 1
  • +
  • -

#5 C-Shark  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 3
  • Joined: 14-September 17

Re: Want to make a Contains method faster but not sure how

Posted 27 September 2017 - 03:01 AM

Oh hey I totally forgot about this thread after reading the reply and adjusting it some time later. I basically scrapped what I had and started the Put method from scratch and worked from there on... I've made multiple changes to make this working and efficient:

I introduced a class for the head node of each sorted list which stores the last edited pair and the number of total pairs:
    class FirstPair : Pair
    {
        public FirstPair(string key, object value) : base(key, value)
        {
        }

        public int num;
        public Pair last;
    }


Then I adjusted Put to look if the first pair of the corresponding array is empty, and if it is create a new first pair. In case it's not it will work with the last edited normal pair which is always linked to the first pair:

        private const int ROOTS_SIZE = 256;
        FirstPair[] roots = new FirstPair[ROOTS_SIZE];

        public void Put(string key, object value)
        {
            var keyAscii = (int)key[0];
            if (roots[keyAscii] == null)
            {
                roots[keyAscii] = new FirstPair(key, value); 
                roots[keyAscii].num++;
                roots[keyAscii].last = roots[keyAscii];
                _objectCount++; _indexCount++;
                return;
            }
                                                                                                                 
            Pair last = roots[keyAscii].last;
            for (int i = 0; i < _objectCount; i++)
            {
                if (roots[keyAscii].key == key)
                {
                    roots[keyAscii].value = value;
                    return;
                } 
                if (last.next == null)
                {
                    last.next = new Pair(key, value);
                    roots[keyAscii].num++;
                    _objectCount++;

                    last = last.next;
                    roots[keyAscii].last = last;
                    return;
                }
            }
            throw new Exception("Good job, you reached what is supposed to be an unreachable error.");
        }


The Contains method creates a temporary copy of the original array to work with (to prevent overwriting and other changes to the original array due to the nature of the loop I use), and simply scans the corresponding array:

        public bool Contains(string key)
        {
            var keyAscii = (int)key[0];
            Pair[] containsArray = new Pair[ROOTS_SIZE]; 
            roots.CopyTo(containsArray, 0); 
            if (containsArray[keyAscii] != null)
            {
                for (int i = 0; i < roots[keyAscii].num; i++)
                { 
                    if (containsArray[keyAscii].key == key)
                    { 
                        return true;
                    }
                    if (containsArray[keyAscii].next == null)
                    { 
                        return false;
                    }
                    containsArray[keyAscii] = containsArray[keyAscii].next;
                }
            }
            return false;
        }


Of course I had to adjust every other method too (every method loops with the new and correctly sorted array and creates a copy thereof to work with), but I will not be posting all of that code since it's not really relevant to the topic.

Also: I tested a second implementation of the interface with another class that uses the standard Dictionary<key, value> implementation, and according to my benchmarks my manual implementation was actually faster! :)/>
Was This Post Helpful? 0
  • +
  • -

#6 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 5928
  • View blog
  • Posts: 20,265
  • Joined: 05-May 12

Re: Want to make a Contains method faster but not sure how

Posted 27 September 2017 - 06:15 AM

Although there is more overhead to copy all the data into a single array, you are getting better use of the CPU cache when it is scanning because all the memory is now contiguous instead of scatter all over memory and thereby causing the CPU to stall while it waits for the memory to be read into the cache.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1