Subscribe to Grim's Projects        RSS Feed
-----

Mimetic Storage Redefined

Icon Leave Comment
The reading (albeit limited) I've been doing on key-value stores has prompted an idea to restructure my Mimesis databases. One of the things that I recall being asked about was if Mimesis supported column insertion and deletions. Currently it does not do so elegantly without a complete database rewrite.

Mimesis currently stores everything as contiguous rows in the data file, but it occurred to me that I could store everything as contiguous columns. The major change is that now instead of having one structural and one data file, the underlying scheme of Mimesis will be as follows:

MASTER (1 file) - contains columnar labels, the indices of relevant Mimesis files, the "primary" file index, and the next Mimesis file index.
STRUCTURE (n files) - contains the appending offset for a data file, the number of original entries to the data file, the number of modifications to the data file, the data offset, and data length.
DATA (n files) - contains key/value data.

The structure and data files work in pairs, if their indices match it means that the nth data file is controlled by the nth structural file. The master file essentially stores the files that are relevant to a particular Mimesis instance, it also does the important task of differentiating one pair of data/structure as a "primary" key. This file would be equivalent to the row labels on a spreadsheet.

A greater degree of file manipulation has arisen as a result of this. It means the Mimesis code will likely get more complex as read/write operations will increase. However, I think the benefit outweighs the complexity increase.

Thinking into the future this new method of storage is more flexible than all the previous ones. If I want to insert a column or delete a column it's as easy as deleting/creating a new structure/data pair and updating the master. If I want to modify only one column then I only need to target that structure/data pair. If I want to update a row I target all the structure/data pairs. If I wanted to change the table I could rename the files and designate a new "primary" structure/data pair.

Furthermore, each one of the files' binary format is such that the heap method of appending data to the end is supported. This way total rewrites are avoided and the chances of data loss on some sort of error are less significant.

Every other class and function that assist the main Mimesis class works just the same as before. Locking would be performed by locking onto the master file for every operation that requires atomicity.

It looks as though I still had some room for growth, but it will likely be a while before this update takes place. Everything is going to change so it's going to be a lot of coding, not to mention new capabilities.

: sigh : It is 1% inspiration and 99% perspiration.

0 Comments On This Entry