As part of learning by doing, I am trying to implement a simple database in Go. I got this project idea from Nikhil’s blog about implementing a DB in Rust. The basic concept for implementation is based on A Simple and Efficient Implementation for Small Databases paper. The paper describes the theoretical details of implementing a simple but fully functional DB.
Implementing a database as a side project looked fascinating to me as, till now, I have majorly worked with web technologies. Writing a DB will also expose me to a lot of new systems concepts and logic behind writing a reliable(hopefully distributed) system. It’s also a good fit for using a static language, Go in this case. By the end of this “Writing a Simple Database” series we will have a working key-value database with persistence and fault-tolerance
In the past, I have casually programmed in Go and I quite like it. But I forget the syntax over time as I don’t get to use it much. Working on a side project will help me to learn it with a purpose this time. What I like about Go is that it’s readable, compiled and language behind a lot fo distributed reliable systems like Kubernetes, etcd, docker, etc.
High-Level Overview
Functional requirements
- The DB will have a client and server architecture
- It will store key-value data
- DB will be completely in-memory
- Backed by Write-ahead logs(WAL)
- Fault tolerance and restore in case of failure using WAL
- The latest snapshot of the in-memory DB will be backed up time-to-time in the file system
Non-functional requirements
- Should be fast(somewhat comparable to key-value stores like etcd/rockdb)
- Should be highly reliable
- Should always follow ACID properties
Extension
- Distributed database
- If DB is too big to fit in memory, overflow to disk
- Log compaction
Some of the requirements are likely to change as I start implementing the individual components.
Progress
Source code: https://github.com/amitt001/moodb
DB Name: “MooDB” or “Mdb”. I got this name idea from Friend’s “Moo point” scene :) and it sounded like a funny but mildly apt name for a DB which won’t be used for anything serious.
I have already started the implementation. Till now I have a working command line with an in-memory key-value store. It supports 4 operations: GET, SET, UPDATE, DELETE
.
Currently, the client and server runs in the same process. In its current state like it’s like an embedded DB library(like SQLite). As each run starts a new memory DB instance.
What’s next?
Next, I plan to implement the following features:
- Separate client and server: to support multi-client and one datastore
- Write-ahead log: to make DB fault-tolerant against power loss or crashes.
I have decided to blog as I go to log my learnings. I am enjoying writing the code and learning a lot of new things. For implementing each part I am learning both the language and the underlying concept behind the DB component. Fun!