Subscribe to andrewsw's Blog        RSS Feed
-----

Memoization Example (Python)

Icon 2 Comments
Memoization :wikipedia

wiki said:

In computing, memoization is an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously processed inputs.

..that sounds ok.

wiki said:

Memoization has also been used in other contexts .. such as in simple mutually recursive descent parsing in a general top-down parsing algorithm that accommodates ambiguity and left recursion in polynomial time and space.

What?! Although, within the bold text lies hidden the 'meaning of life'! Let's skip passed it..

wiki said:

In the context of some logic programming languages, memoization is also known as tabling; see also lookup table.

The notion of tabling is good, but this will make more sense after the example.

wiki overview said:

A memoized function "remembers" the results corresponding to some set of specific inputs. Subsequent calls with remembered inputs return the remembered result rather than recalculating it, thus eliminating the primary cost of a call with given parameters from all but the first call made to the function with those parameters.

This is much better, although the use of the word function is, in my opinion, a little misleading. I think process would be better as this technique doesn't require, specifically, a function.




If you do some searching it is difficult to find an example other than the good-ol' factorial function, or fibonnaci number. The code that follows uses memoization to count the frequency of occurrence of characters within a string.

text = """
Here is some text, 
and more on another line!
"""

counts = {}     # the dictionary

for ch in text:
    if ord(ch) != 10:       # skip newlines
        if ch in counts:
            counts[ch] += 1
        else:
            counts[ch] = 1      # create the dictionary item

for k in sorted(counts):
    print(k, counts[k])


  • We use an empty dictionary
  • Loop through the characters in the string
  • If the dictionary already has the current character as a key, then increment its value, because we've found another of this character
  • If the dictionary-key doesn't already exist then create it, storing 1 (we've already found one copy of this character).

Dictionary keys are not stored in any particular order but, if we want, we can use sorted() to provide output in sorted-order.




The first time a character is seen we, of necessity, have to create the dictionary item with this key. However, the next time the same character is encountered we already have the dictionary item available. This is memoization.

Dictionary-lookups are extremely efficient (this is the notion of tabling mentioned earlier). Even so, the difference between this lookup and creating a new dictionary item is going to be infinitesimal (except perhaps for LARGE data). That is not the point. Instead of storing the initial value of 1 there could be some complex (time-expensive) calculations performed before a value is stored.

When counting letters it is far more common to create a fixed-array to store the 26 letters. This will be more efficient although, depending on the language, it may be necessary to iterate the array once to store zero-values. Our version will be more efficient in terms of storage - only creating as many keys as necessary. For large (sparse or unknown) input our method possibly gains the upper-hand.

Memoizing (cacheing) function return values is another (but similar) use of this technique. There is an example here at ActiveState (although you have to click a button to accept terms). You could search for others.

Footnote:
Even though we are only counting those letters which are present in a string we can still output zeroes for the missing letters:
import string

for ch in string.ascii_letters:
    print(ch, counts[ch]) if ch in counts else print(ch, 0)

2 Comments On This Entry

Page 1 of 1

andrewsw Icon

30 March 2014 - 02:56 PM
I have a more complex example. It is a little hard to explain but hopefully you'll get the gist.

Basically, I'm parsing a text-file containing code. The code has scopes which define the colours that appear in the IDE (Sublime Text). I am parsing the whole file, reading the scope of the current text, and trying to match this scope in a dictionary of colours. The colour (or the nearest match for it's scope) will then be used to recreate the content as an html-document with the same colours that are seen in the editor.

This line:
    the_colour = self.guess_colour(scope_name.strip())

is used repeatedly while parsing the text/code, calling a method guess_colour:
    def guess_colour(self, the_key):
        the_colour = None
        if the_key in self.colours:
            the_colour = self.colours[the_key]
        else:
            best_match = 0
            for key in self.colours:
                if self.view.score_selector(self.pt, key) > best_match:
                    best_match = self.view.score_selector(self.pt, key)
                    the_colour = self.colours[key]
            self.colours[the_key] = the_colour or self.fground
        return the_colour or self.fground


If the scope-name is in the colours-dictionary then just retrieve the colour.

If it isn't in the dictionary, then find the best-match for the scope-name (it may match exactly). This will provide the colour we need.

But.. rather than having to repeat this same matching-process when the same scope-name is encountered again, create a new dictionary entry using the current scope-name.

Put it this way: if I'm looking for "Dave" I would find the nearest match, "David", with the colour "green". Rather than stopping there I create a new colour-dictionary entry for "Dave" : "green". So.. the next time around "Dave" will immediately retrieve the colour "green".. rather than having to find "David" again.

Phew!! That was tricky to explain!

The key point is that it uses memoization so that any scope-name will only have to be searched for once. Thereafter, it will exist in the dictionary, and the colour can be retrieved immediately.
0

andrewsw Icon

30 March 2014 - 03:08 PM
If you use Sublime Text the above code is the basis for the ExportHtml package.
0
Page 1 of 1

Trackbacks for this entry [ Trackback URL ]

There are no Trackbacks for this entry

August 2014

S M T W T F S
     12
3456789
10111213141516
17181920 21 2223
24252627282930
31      

Tags

    Recent Entries

    Recent Comments

    Search My Blog

    1 user(s) viewing

    1 Guests
    0 member(s)
    0 anonymous member(s)

    Categories