Building a Distributed Key-Value Store in C++ (Part 4)
- Fire-and-forget replication is the simplest multi-node strategy — forward every write to peers and accept that they may temporarily diverge.
- Separating ClusterManager and PeerCommunicator keeps topology concerns out of request-handling code.
- Organising source into subdirectories (core/, net/) early prevents a flat-file mess as the codebase grows.
With a working TCP server from Part 3, the next step is running multiple instances and keeping them in sync. This part adds a ClusterManager, a PeerCommunicator, and replication forwarding to KVServer.
ClusterManager — knowing the cluster topology
class ClusterManager {
public:
explicit ClusterManager(const std::vector<std::pair<std::string, int>>& peers);
const std::vector<std::pair<std::string, int>>& get_peers() const;
private:
std::vector<std::pair<std::string, int>> peer_addresses;
};Each server is started with a list of peer host:port pairs. ClusterManager holds that list and exposes it to the rest of the server code.
PeerCommunicator — fire and forget
class PeerCommunicator {
public:
static void send_command(const std::string& host, int port, const std::string& cmd);
};send_command opens a short-lived TCP connection, writes the command, and closes. There is no acknowledgement or retry — this is intentionally the simplest possible implementation.
Fire-and-forget replication provides no consistency guarantee. If a peer is down when a write arrives, it will miss that write permanently until a full sync mechanism is added. Do not use this approach where data loss is unacceptable.
Wiring replication into KVServer
Forwarding a PUT
if (cmd == "PUT") {
store.put(key, value);
forward_to_peers("REPL_PUT " + key + " " + value);
response << "OK\n";
}The store is updated locally first, then the write is forwarded. If forwarding fails silently, the local node still has the data — peers will be stale.
Iterating over peers
for (const auto& [host, port] : cluster.get_peers()) {
PeerCommunicator::send_command(host, port, cmd);
}REPL_PUT and REPL_DEL are handled as internal commands on the receiving side — they update the store without triggering another round of replication (avoiding infinite loops).
Launching a 3-node cluster
./kvstore_server 12345 127.0.0.1:12346,127.0.0.1:12347
./kvstore_server 12346 127.0.0.1:12345,127.0.0.1:12347
./kvstore_server 12347 127.0.0.1:12345,127.0.0.1:12346Each node lists the other two as peers. A PUT on port 12345 propagates to 12346 and 12347 within a few milliseconds on localhost.
CMake library organisation
add_library(kvstore_lib
main/kvstore.cpp
core/memory_backend.cpp
core/file_append_log.cpp
net/cluster_manager.cpp
net/peer_communicator.cpp
)Grouping source files by concern (core/, net/) keeps the build file readable and makes it easier to swap implementations later (e.g., replacing file_append_log with an LSM-tree backend).
Roadmap
- Phase 1 — Local store — done
- Phase 2 — Persistence & testing — done
- Phase 3 — Networking — done
- Phase 4 — Multi-node architecture — done (this post)
- Phase 5 — Consensus / leader election — next
- Phase 6 — Testing & resilience — planned
What's next
Part 5 adds leader election via the Bully algorithm so writes are coordinated through a single node instead of accepted by all.
…g-distributed-kv-store-pt2), [3](/notes/building-distributed-kv-store-pt3), and [4](/notes/building-distributed-kv-store-pt4) added persistence, TCP networking, and multi-node replication. The remaining ga…
…ion** — planned - **Phase 6 — Testing & resilience** — planned ## What's next [Part 4](/notes/building-distributed-kv-store-pt4) runs multiple server instances and replicates writes between them — the first r…

Data is my veggies — healthy, versatile, and sometimes hard to digest, but in the end, it always brings value.