Building a Distributed Key-Value Store in C++ (Part 2)
- An append-only log is the simplest durable storage primitive — replay it on startup and you get crash recovery for free.
- Flushing after every write is slow but safe; optimise only after correctness is established.
- Using a dedicated temporary log file in tests prevents test runs from corrupting production data.
Part 1 left us with an in-memory store that vanishes the moment the process exits. Part 2 fixes that with a disk-backed append-only log and adds a Catch2 test suite so regressions surface immediately.
Persistence via an append-only log
Every PUT and DEL operation is appended to a flat text file before the in-memory map is updated. On startup, load_from_log() replays the file top-to-bottom to rebuild state. This is the same pattern used by Redis's AOF mode and countless write-ahead logs.
Writing to the log
void KVStore::append_log(const std::string& op, const std::string& key, const std::string& value) {
if (log_out.is_open()) {
log_out << op << " " << key;
if (op == "PUT") {
log_out << " " << value;
}
log_out << "\n";
log_out.flush();
}
}log_out.flush() is called after every write. This is intentionally conservative — data loss is worse than latency at this stage of the project.
Replaying the log on startup
void KVStore::load_from_log(){
std::ifstream in(log_file_path);
std::string line;
while (std::getline(in, line)) {
std::istringstream iss(line);
std::string op, key, value;
iss >> op >> key;
if (op == "PUT") {
iss >> value;
store[key] = value;
} else if (op == "DEL") {
store.erase(key);
}
}
}The constructor calls load_from_log() before accepting any commands, so the in-memory map always reflects the last persisted state.
Values containing spaces are not yet supported — the parser splits on whitespace. That is a known limitation for now; proper quoting or a length-prefixed binary format can come later.
Testing with Catch2
Pulling in Catch2 via CMake FetchContent
FetchContent_Declare(
Catch2
GIT_REPOSITORY https://github.com/catchorg/Catch2.git
GIT_TAG v3.5.2
)
FetchContent_MakeAvailable(Catch2)
enable_testing()
add_subdirectory(tests)The first test case
TEST_CASE("Basic KVStore operations") {
KVStore store("test_store.log");
store.put("a", "1");
REQUIRE(store.get("a").value() == "1");
store.put("b", "2");
REQUIRE(store.get("b") == std::make_optional(std::string("2")));
REQUIRE(store.del("a") == true);
REQUIRE_FALSE(store.get("a").has_value());
}Pass a dedicated path like "test_store.log" to the KVStore constructor in every test. That way tests never read stale data from the production log, and the temporary file can be deleted in a teardown fixture without side effects.
A SECTION block for crash-recovery can then open a second KVStore instance pointing at the same file and assert that state is fully restored — no mocking needed.
Roadmap
- Phase 1 — Local store — done
- Phase 2 — Persistence & testing — done (this post)
- Phase 3 — Networking — next: TCP server and CLI client
- Phase 4 — Multi-node architecture — planned
- Phase 5 — Consensus / leader election — planned
- Phase 6 — Testing & resilience — planned
What's next
Part 3 puts the store on the network. A KVServer will accept TCP connections and handle PUT, GET, and DEL from any terminal.
…starting point — it covers the in-memory store and the overall roadmap. Parts [2](/notes/building-distributed-kv-store-pt2), [3](/notes/building-distributed-kv-store-pt3), and [4](/notes/building-distrib…
…I/O. </KeyTakeaways> Parts [1](/notes/building-distributed-kv-store-pt1) and [2](/notes/building-distributed-kv-store-pt2) gave us a persistent, tested in-process store. Part 3 wraps it in a TCP server…
…s testing, graceful failure handling, and performance tuning ## Next steps In [Part 2](/notes/building-distributed-kv-store-pt2) I add **unit tests** and integrate persistence via a simple **append-only log**…

Data is my veggies — healthy, versatile, and sometimes hard to digest, but in the end, it always brings value.