In-memory SQLite – why and how

The Why

Problem

Somewhere along my career in Dreamlab, I encountered a problem of creating ultra-fast and scalable API. The nature of a problem required our API to respond to any query in under five milliseconds, what completely rules out any network calls. Additionally, we were sure that the API would receive billions of requests per week, so it needed to endure big loads too.

Solution

We all agreed that all the important data should be synchronized to the API applications upon startup and updated every time any changes occur in the system. As the data arrives into the application it needs to be indexed for further querying optimization.

Reliable synchronization can be easily achieved with queuing systems, but it is a topic for another post. Let’s concentrate on indexing.

Why SQL

In-memory indexing can be handled in many ways. For example, let us implement our own data structure class – MyDB. It should have insert and delete methods that keep track of some internal indexing and fast find method that uses the index.

If we were to implement the whole indexing logic ourselves we would spend a few days on just googling and then a few months on implementation and debugging. It is not an easy problem and we should avoid reinventing this wheel.

What if we could use good old SQL syntax to just CREATE TABLE and then CREATE INDEX in MyDB constructor? Then we could also use SELECT … WHERE … to use this index in find method. Insert and delete would automatically use this index on their corresponding SQL queries. This way MyDB would be just a convenient adapter for this SQL syntax.

In-memory

Remember when I said that we can’t make any network calls? This means no remote database connection.

So how do we use the relational database without a server? SQLite is a very lightweight database engine implementation, without a client-server architecture. It means it becomes a part of our application binary and can live as long as our application process. It normally operates on files, but for performance, we can avoid costly disk I/O by just creating the DB inside our application memory. Please note, that this solution makes the database non-persistent, meaning it will be lost when the application process stops.

It is easy to run  SQLite engine inside memory, as most popular programming languages have support for it. You can google it for your favorite language. Further below I will provide some Python examples.

So what did we gain with this approach?

  • No network or disk latency – the DB is inside your application memory.
  • Ease of use – most developers know SQL.
  • Mature and reliable dependency- SQLite has been here for a while.
  • Speed – SQLite is written in C and strongly optimized.
  • Thread safety – thanks to SQLite ACIDity.
  • Extendability – SQLite has great extensions – to name a few:
    • Spatialite for geographic data and algorithms
    • Json1 for JSON operations and indexing.

The How

I recommend to check out Python documentation about a built-in module for SQLite database. Creating a connection is as simple as:

import sqlite3
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('CREATE TABLE ...')
c.execute('SELECT * FROM ... WHERE ...')

When you close all the connections the database is automatically garbage-collected.

Conclusion

Thanks to this approach we solved our original problem and now have fast and easily maintainable API.

There are two ways to look at what we have done:

  1. We traded the inner database persistency for performance and deal with its synchronization every time it starts or the system’s main database changes state.
  2. We used very stable and fast data structure library with convenient and well-known API

I like the latter.

Further reading and ideas