mr : a tiny relational database for minneron

Table of Contents

1 overview

1.1 GOAL a simple graph database with a strong relational bias.

  • We want to store data and documents in one place.
  • We want to use the relational model when possible.
    • It's always possible.
      • Some things like source code, html might even benefit.
      • Imagine querying for all the level 2 headlines in posts by a certain author between some dates.
    • But sometimes we don't actually need the relational model and it can be more efficient to just use a raw graph.
      • Example: storing the NFA/DFA for a grammar.

1.2 project breakdown

1.2.1 storing nodes, relations, arbitrary graphs

  1. variable-length content gets stored as nodes:
    • strings / text
    • images
    • audio
    • video
    • compiled code
    • raw binary data
  2. fixed width tables are stored as relations
    1. really just a special kind of blob that grows
    2. each row is a fixed-size array of bytes
    3. these can be converted to record structures when needed
    4. booleans are stored as octets unless they can be bit packed.
  3. default 3-column table (triplestore) for arbitrary graphs
  4. semi-structured data can go either way
    • html files
    • source code

1.2.2 indexing the data

  1. use a b+ tree to store locations of table/string descriptors
  2. Individual columns can be indexed, just like strings.

1.2.3 paged model for database layout

  1. linked list of pages
  2. Links are stored in their own descriptor pages, separate from the pages.

    Mostly this just seems cleaner to me, but also:

    • Pages can use an entire power-of-2-sized block for data (hopefully making pointer arithmetic slightly more efficient).
    • The linked list can be walked and examined without actually loading it, so you can figure out which blocks contain a particular blob without actually loading and walking the pages. 1
  3. These index pages are just sequences of integers.

    These would act as a sort of parallel array to the pages themselves (the pages are contained a virtual "array" in that they're numbered sequentially within the database file). On the first index page, item[0] would contain the .next link for page[0], and so on.

  4. The link index can grow like any other table, and thus contains itself.

    Suppose the index grows large enough that it needs to be extended, but the next block is already in use.

  5. In the linked list, 0 reperesents null.

    The pages are arranged in linked lists, or trees trees, but never in a loop. Therefore, there would never be a back-link to page 0, and it's safe to use 0 to represent a link to null (and thus the end of a chain).

    Zero is a nice number to compare to because many CPUs make it eazy to branch when things are zero.

    Negative values might represent unused pages, or pages in need of cleanup.

  6. relations with fixed-width values always start on a page boundary

    this just makes it slightly simpler to do arithmetic

1.2.4 correlating data in ram : the relational algebra

1.2.5 the ACID properties

  1. For working in RAM, though, we may just want to append to entries to a temporary table. This should help with ACID.

2 implementation

2.1 implementing nodes

2.1.1 Nodes are just blobs of text or binary data.

2.1.2 NIds are assigned sequentially, starting at 1

2.1.3 NId → Str

  • Find the string entry in the descriptor table. (Easy since they're in order.)
  • Use this to find the start page of the string.
  • Given length, it's easy to figure out which pages to load, and then load the whole string.

2.1.4 Str → NId

  • This uses the string descriptor table in conjunction with a b+ tree as an index. The string is used as the comparison key for the lookup,
  • since we want the string index to be ordered for sorting, probably the first 4-8 characters should be cast as an int, and if that doesn't bring us to a leaf node, use the next set of characters.
  • the value stored in the b+ tree is just a pointer to the string table

2.2 Indexing : B+ Trees

2.3 Storing Tuples

2.3.1 Records should (could?) be stored in sequence, and updated in-place.

3 Extensions

3.1 Version Control

3.1.1 For version control, it probably makes sense to keep a running log of transactions as triples.

3.1.2 For branching, we would annotate each triple with .prev pointer, to create a linked list.

3.1.3 As an optimization, the current state of the working copy would be cached as a more traditional database.

3.1.4 The old value for each change could also be stored.



I don't know if this is useful or not yet, but it seems like it might come in handy for garbage collection/defragmentation.