Building a Distributed Key-Value Store in C++ (Part 4)

🧱 Phase 4: Multi-Node Mode
#

Our key-value store can now talk over TCP and persist its state via append-only logs. Time to take it up a notch.

In this phase, we introduced:

Multi-node support: Launch multiple kvstore_server instances that replicate writes to each other
Peer-to-peer communication: Servers send internal commands over the network
Command forwarding: Only one node needs to handle the client; changes get broadcasted to others

🔁 Replication & Forwarding
#

🔌 ClusterManager
#

To support a dynamic set of peers, we created the ClusterManager class:

class ClusterManager {
public:
  explicit ClusterManager(const std::vector<std::pair<std::string, int>>& peers);
  const std::vector<std::pair<std::string, int>>& get_peers() const;

private:
  std::vector<std::pair<std::string, int>> peer_addresses;
};

It stores peer IPs and ports (excluding itself), which are parsed at startup:

// server_main.cpp
for (auto peer : argv[2].split(',')) {
  if (peer_port != my_port)
    peers.emplace_back(host, port);
}

📡 PeerCommunicator
#

We also built a tiny static helper to send messages to other nodes:

class PeerCommunicator {
public:
  static void send_command(const std::string& host, int port, const std::string& cmd);
};

This uses raw TCP sockets to send internal replication commands like:

REPL_PUT key value
REPL_DEL key

🧠 KVServer Enhancements
#

Now, when a client performs a PUT or DEL, we:

Apply the command locally
Send REPL_ commands to peers

if (cmd == "PUT") {
  store.put(key, value);
  forward_to_peers("REPL_PUT " + key + " " + value);
  response << "OK\n";
}

if (cmd == "DEL") {
  bool ok = store.del(key);
  forward_to_peers("REPL_DEL " + key);
  response << (ok ? "OK\n" : "NOT_FOUND\n");
}

The forward_to_peers() method broadcasts to all peers:

for (const auto& [host, port] : cluster.get_peers()) {
  PeerCommunicator::send_command(host, port, cmd);
}

These commands are then handled by the receiving servers as internal-only operations:

else if (cmd == "REPL_PUT") {
  store.put(key, value);
}
else if (cmd == "REPL_DEL") {
  store.del(key);
}

Simple, but effective!

🛠️ Cleaner Structure
#

While expanding the project, I also did some refactoring:

🧼 Modularization
#

The codebase is now logically separated:

core/: In-memory store + persistence (append logs)
net/: Networking components (cluster, peer communication)
main/: Entry points for server, client, CLI
tests/: Catch2 unit tests

🧪 Test Improvements
#

We added a clear_log() helper in tests to reset state:

if (fs::exists(test_log_path)) {
  fs::remove(test_log_path);
}

This ensures clean replay testing with:

log->replay(*backend);

🔧 Launching a Multi-Node Cluster
#

Start 3 nodes on different ports:

./kvstore_server 12345 127.0.0.1:12346,127.0.0.1:12347
./kvstore_server 12346 127.0.0.1:12345,127.0.0.1:12347
./kvstore_server 12347 127.0.0.1:12345,127.0.0.1:12346

Then connect a client:

$ ./kvstore_client 127.0.0.1 12345
> PUT foo bar
OK
> GET foo
bar

Check the other nodes, the data is replicated!

📦 CMake: Adding Networking Libraries
#

We organized the build like this:

add_library(kvstore_lib
  main/kvstore.cpp
  core/memory_backend.cpp
  core/file_append_log.cpp
  net/cluster_manager.cpp
  net/peer_communicator.cpp
)

All executables link to the same shared core.

🗺️ Updated Roadmap
#

Phase 1: Local Store
✅ Done
Basic In-Memory KV Store
- Put/Get/Delete support
Phase 2: Persistence
✅ Done
Durability with Append Log
- File-backed append log
- Replay mechanism
Phase 3: Networking
✅ Done
Client-Server via TCP
- Text-based protocol
- CLI client
Phase 4: Multi-Node Architecture
✅ Done
Clustered KV Store
- Replicate PUT/DEL to peers
- Internal commands over sockets
Phase 5: Consensus
Next
Coordination & Failover
- Leader election
- Write serialization
Phase 6: Testing & Resilience
Planned
Fault Tolerance

🚀 What’s Next?
#

In Part 5, we’ll tackle a harder problem: consensus.

Our current replication is fire-and-forget, meaning, there’s no guarantee all nodes are consistent. We’ll explore:

Leader election (maybe Raft?)
Handling failures and partitions
Ensuring linearizable writes

There are no articles to list here yet.

🧱 Phase 4: Multi-Node Mode#

🔁 Replication & Forwarding#

🔌 ClusterManager#

📡 PeerCommunicator#

🧠 KVServer Enhancements#

🛠️ Cleaner Structure#

🧼 Modularization#

🧪 Test Improvements#

🔧 Launching a Multi-Node Cluster#

📦 CMake: Adding Networking Libraries#

🗺️ Updated Roadmap#

Phase 1: Local Store

✅ Done

Basic In-Memory KV Store

Phase 2: Persistence

✅ Done

Durability with Append Log

Phase 3: Networking

✅ Done

Client-Server via TCP

Phase 4: Multi-Node Architecture

✅ Done

Clustered KV Store

Phase 5: Consensus

Next

Coordination & Failover

Phase 6: Testing & Resilience

Planned

Fault Tolerance

🚀 What’s Next?#

🧱 Phase 4: Multi-Node Mode
#

🔁 Replication & Forwarding
#

🔌 ClusterManager
#

📡 PeerCommunicator
#

🧠 KVServer Enhancements
#

🛠️ Cleaner Structure
#

🧼 Modularization
#

🧪 Test Improvements
#

🔧 Launching a Multi-Node Cluster
#

📦 CMake: Adding Networking Libraries
#

🗺️ Updated Roadmap
#

🚀 What’s Next?
#