smikic.com
evergreentended May 25, 2025

Building a Distributed Key-Value Store in C++ (Part 2)

Key takeaways
  • An append-only log is the simplest durable storage primitive — replay it on startup and you get crash recovery for free.
  • Flushing after every write is slow but safe; optimise only after correctness is established.
  • Using a dedicated temporary log file in tests prevents test runs from corrupting production data.

Part 1 left us with an in-memory store that vanishes the moment the process exits. Part 2 fixes that with a disk-backed append-only log and adds a Catch2 test suite so regressions surface immediately.

Persistence via an append-only log

Every PUT and DEL operation is appended to a flat text file before the in-memory map is updated. On startup, load_from_log() replays the file top-to-bottom to rebuild state. This is the same pattern used by Redis's AOF mode and countless write-ahead logs.

Writing to the log

cpp
void KVStore::append_log(const std::string& op, const std::string& key, const std::string& value) {
  if (log_out.is_open()) {
    log_out << op << " " << key;
    if (op == "PUT") {
      log_out << " " << value;
    }
    log_out << "\n";
    log_out.flush();
  }
}

log_out.flush() is called after every write. This is intentionally conservative — data loss is worse than latency at this stage of the project.

Replaying the log on startup

cpp
void KVStore::load_from_log(){
  std::ifstream in(log_file_path);
  std::string line;
  while (std::getline(in, line)) {
    std::istringstream iss(line);
    std::string op, key, value;
    iss >> op >> key;
    if (op == "PUT") {
      iss >> value;
      store[key] = value;
    } else if (op == "DEL") {
      store.erase(key);
    }
  }
}

The constructor calls load_from_log() before accepting any commands, so the in-memory map always reflects the last persisted state.

NOTE

Values containing spaces are not yet supported — the parser splits on whitespace. That is a known limitation for now; proper quoting or a length-prefixed binary format can come later.

Testing with Catch2

Pulling in Catch2 via CMake FetchContent

cmake
FetchContent_Declare(
  Catch2
  GIT_REPOSITORY https://github.com/catchorg/Catch2.git
  GIT_TAG        v3.5.2
)
FetchContent_MakeAvailable(Catch2)
enable_testing()
add_subdirectory(tests)

The first test case

cpp
TEST_CASE("Basic KVStore operations") {
  KVStore store("test_store.log");
  store.put("a", "1");
  REQUIRE(store.get("a").value() == "1");
  store.put("b", "2");
  REQUIRE(store.get("b") == std::make_optional(std::string("2")));
  REQUIRE(store.del("a") == true);
  REQUIRE_FALSE(store.get("a").has_value());
}
TIP

Pass a dedicated path like "test_store.log" to the KVStore constructor in every test. That way tests never read stale data from the production log, and the temporary file can be deleted in a teardown fixture without side effects.

A SECTION block for crash-recovery can then open a second KVStore instance pointing at the same file and assert that state is fully restored — no mocking needed.

Roadmap

  • Phase 1 — Local store — done
  • Phase 2 — Persistence & testing — done (this post)
  • Phase 3 — Networking — next: TCP server and CLI client
  • Phase 4 — Multi-node architecture — planned
  • Phase 5 — Consensus / leader election — planned
  • Phase 6 — Testing & resilience — planned

What's next

Part 3 puts the store on the network. A KVServer will accept TCP connections and handle PUT, GET, and DEL from any terminal.

3 linked references
Building a Distributed Key-Value Store in C++ (Final)

…starting point — it covers the in-memory store and the overall roadmap. Parts [2](/notes/building-distributed-kv-store-pt2), [3](/notes/building-distributed-kv-store-pt3), and [4](/notes/building-distrib…

Building a Distributed Key-Value Store in C++ (Part 3)

…I/O. </KeyTakeaways> Parts [1](/notes/building-distributed-kv-store-pt1) and [2](/notes/building-distributed-kv-store-pt2) gave us a persistent, tested in-process store. Part 3 wraps it in a TCP server…

Building a Distributed Key-Value Store in C++ (Part 1)

…s testing, graceful failure handling, and performance tuning ## Next steps In [Part 2](/notes/building-distributed-kv-store-pt2) I add **unit tests** and integrate persistence via a simple **append-only log**…

Stefan Mikic
Stefan Mikicdata eng

Data is my veggies — healthy, versatile, and sometimes hard to digest, but in the end, it always brings value.