Somewhere along my career in Dreamlab, I encountered a problem of creating ultra-fast and scalable API. The nature of a problem required our API to respond to any query in under five milliseconds, what completely rules out any network calls. Additionally, we were sure that the API would receive billions of requests per week, so it needed to endure big loads too.
We all agreed that all the important data should be synchronized to the API applications upon startup and updated every time any changes occur in the system. As the data arrives into the application it needs to be indexed for further querying optimization.
Reliable synchronization can be easily achieved with queuing systems, but it is a topic for another post. Let’s concentrate on indexing.
In-memory indexing can be handled in many ways. For example, let us implement our own data structure class – MyDB. It should have insert and delete methods that keep track of some internal indexing and fast find method that uses the index.
If we were to implement the whole indexing logic ourselves we would spend a few days on just googling and then a few months on implementation and debugging. It is not an easy problem and we should avoid reinventing this wheel.
What if we could use good old SQL syntax to just CREATE TABLE and then CREATE INDEX in MyDB constructor? Then we could also use SELECT … WHERE … to use this index in find method. Insert and delete would automatically use this index on their corresponding SQL queries. This way MyDB would be just a convenient adapter for this SQL syntax.
Remember when I said that we can’t make any network calls? This means no remote database connection.
So how do we use the relational database without a server? SQLite is a very lightweight database engine implementation, without a client-server architecture. It means it becomes a part of our application binary and can live as long as our application process. It normally operates on files, but for performance, we can avoid costly disk I/O by just creating the DB inside our application memory. Please note, that this solution makes the database non-persistent, meaning it will be lost when the application process stops.
It is easy to run SQLite engine inside memory, as most popular programming languages have support for it. You can google it for your favorite language. Further below I will provide some Python examples.
So what did we gain with this approach?
- No network or disk latency – the DB is inside your application memory.
- Ease of use – most developers know SQL.
- Mature and reliable dependency- SQLite has been here for a while.
- Speed – SQLite is written in C and strongly optimized.
- Thread safety – thanks to SQLite ACIDity.
- Extendability – SQLite has great extensions – to name a few:
- Spatialite for geographic data and algorithms
- Json1 for JSON operations and indexing.
I recommend to check out Python documentation about a built-in module for SQLite database. Creating a connection is as simple as:
import sqlite3 conn = sqlite3.connect(':memory:') c = conn.cursor() c.execute('CREATE TABLE ...') c.execute('SELECT * FROM ... WHERE ...')
When you close all the connections the database is automatically garbage-collected.
Thanks to this approach we solved our original problem and now have fast and easily maintainable API.
There are two ways to look at what we have done:
- We traded the inner database persistency for performance and deal with its synchronization every time it starts or the system’s main database changes state.
- We used very stable and fast data structure library with convenient and well-known API
I like the latter.
Further reading and ideas
- Python documentation about a built-in module for SQLite database
- When to use SQLite
- Database indexes
- Planning SQLite queries
- Table constraints for data validation upon INSERTs
- Gossip protocol for synchronizing databases between multiple instances