Get blog posts to your inbox.
SQL? I thought CockroachDB was a key-value store?!?
In the past we described CockroachDB as a distributed, transactionally consistent, key-value store. We knew that a key-value API was not the endpoint we wanted to provide and a few months ago started work on a higher level structured data API that would support tables and indexes. Along with supporting such rich structures, we anticipated eventually supporting SQL for manipulating and accessing this structured data. After a bit of soul searching we embraced the inevitable and moved forward full-speed with SQL as the core of our structured data layer.
There are a lot of components to an SQL system: query parsing, query analysis, query planning, query execution, transactions, and persistent storage – to name a few. The CockroachDB SQL system is built on top of the internal CockroachDB key-value store and leverages the monolithic sorted key-value map to store all of the SQL table data and indexes. This post will focus on CockroachDB’s mapping of SQL data to the key-value store* and show how that mapping helps implement SQL functionality. Future posts will talk about query analysis, planning, and execution.
An SQL table is a set of rows where each row is a set of columns. Each column has an associated type (bool, int, float, string, bytes). A table also has associated indexes which allow for the efficient retrieval of a range of rows from a table. This sounds nothing at all like a key-value API where string keys map to string values. How do we map the SQL table data to KV storage?
First, a primer: CockroachDB’s internal key-value API supports a number of operations, but we only need to know about a few of them for this post:
ConditionalPut(key, value, expected-value) – Conditionally set the value at the specified key if the existing value matches the expected value.
Scan(start-key, end-key) – Retrieve all of the keys between start-key (inclusive) and end-key (exclusive).
In CockroachDB, keys and values are both strings which can contain unrestricted byte values. OK, let’s move on!
A foundational piece of the puzzle for mapping SQL table data to keys and values is encoding typed column data into strings. For example, given a tuple of values , the sentinel key would be: /test/10. Huzzah!
So far we’ve ignored secondary indexes. Let’s rectify that oversight:
CREATE INDEX foo ON test (stringVal)
This creates a secondary index on the column stringVal. We haven’t declared the index as unique, so duplicate values are allowed. Similar to the rows for a table, we’ll be storing all of the index data in keys prefixed by the table ID. But we want to separate the index data from the row data. We accomplish that by introducing an index ID which is unique for each index in the table, including the primary key index (sorry, we lied earlier!):
The keys we used as examples above get slightly longer:
And now we’ll also have a single key for the row for our index foo:
You might be wondering why we suffixed this encoding with the primary key value (/10). For a non-unique index like foo, this is necessary to allow the same value to occur in multiple rows. Since the primary key is by definition unique for the table, appending it as a suffix to a non-unique key results in a unique key. In general, for a non-unique index, CockroachDB appends the values of all columns which are contained in the primary key but not contained in the index in question.
Now let’s see what happens if we insert