When a node goes down and comes back up, Cube database uses three mechanisms to ensure data synchronization:
Initial State:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Node 1 │ │ Node 2 │ │ Node 3 │
│ ALIVE │ │ ALIVE │ │ ALIVE │
└─────────┘ └─────────┘ └─────────┘
✓ ✓ ✓
Node 3 crashes or loses network:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Node 1 │ │ Node 2 │ │ Node 3 │
│ ALIVE │ │ ALIVE │ │ DOWN │
└─────────┘ └─────────┘ └─────────┘
✓ ✓ ✗
// Client writes a new key
PUT user:alice "Alice Johnson" CL=QUORUM
Flow:
1. Coordinator (Node 1) receives write
2. Determines replicas: [Node1, Node2, Node3]
3. Sends write to all 3 nodes
4. Node1: ✓ Success (local write)
5. Node2: ✓ Success (network write)
6. Node3: ✗ Timeout (node is down)
7. Success count: 2/3
8. Required for QUORUM: 2/3
9. Result: ✓ WRITE SUCCEEDS
// Store hint for Node 3
10. Coordinator stores "hint" for Node3:
- Key: user:alice
- Value: "Alice Johnson"
- Timestamp: 1234567890
- Target: Node3
Hinted Handoff in Action:
// On Node 1 (coordinator)
hintedHandoff.storeHint(
"node-3", // Target node
"user:alice", // Key
"Alice Johnson", // Value
timestamp // When it was written
);
// Hint is persisted to disk:
// /var/lib/cube/hints/node-3/user_alice-1234567890.hint
Multiple writes happen:
Time T1: PUT user:bob "Bob Smith" CL=QUORUM → Writes to Node1, Node2 → Hint stored for Node3 Time T2: PUT user:charlie "Charlie Brown" CL=QUORUM → Writes to Node1, Node2 → Hint stored for Node3 Time T3: UPDATE user:alice SET value="Alice J." CL=QUORUM → Writes to Node1, Node2 → Hint stored for Node3 State: Node1: alice=Alice J., bob=Bob Smith, charlie=Charlie Brown Node2: alice=Alice J., bob=Bob Smith, charlie=Charlie Brown Node3: [DOWN - has old data] Hints for Node3: - user:alice (v2, timestamp T3) - user:bob (timestamp T1) - user:charlie (timestamp T2)
Node 3 recovers:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Node 1 │ │ Node 2 │ │ Node 3 │
│ ALIVE │ │ ALIVE │ │ ALIVE │
└─────────┘ └─────────┘ └─────────┘
✓ ✓ ✓ (recovered)
// HintedHandoffManager detects Node3 is alive
// Automatically triggers replay
for (Hint hint : hintsForNode3) {
if (!hint.isExpired()) {
// Send hint to Node3
boolean success = sendHintToNode(
node3,
hint.getKey(),
hint.getValue(),
hint.getTimestamp()
);
if (success) {
// Delete hint file
deleteHint(hint);
}
}
}
Replay Process:
Step 1: Node 1 detects Node 3 is alive
→ Heartbeat received from Node 3
Step 2: Trigger hint replay
→ hintedHandoff.replayHintsForNode("node-3")
Step 3: Replay each hint in order
Hint 1: user:alice = "Alice J." (timestamp T3)
→ Node3.put("user:alice", "Alice J.", T3)
→ ✓ Success
Hint 2: user:bob = "Bob Smith" (timestamp T1)
→ Node3.put("user:bob", "Bob Smith", T1)
→ ✓ Success
Hint 3: user:charlie = "Charlie Brown" (timestamp T2)
→ Node3.put("user:charlie", "Charlie Brown", T2)
→ ✓ Success
Step 4: Delete replayed hints
→ All hints successfully replayed and deleted
After Hint Replay:
Node1: alice=Alice J., bob=Bob Smith, charlie=Charlie Brown
Node2: alice=Alice J., bob=Bob Smith, charlie=Charlie Brown
Node3: alice=Alice J., bob=Bob Smith, charlie=Charlie Brown
✓ NOW IN SYNC!
Even with hinted handoff, some data might be missed. Read repair fixes this:
// Client reads user:alice with CL=QUORUM GET user:alice CL=QUORUM Flow: 1. Coordinator reads from Node1, Node2, Node3 2. Receives responses: Node1: "Alice J." (timestamp T3) Node2: "Alice J." (timestamp T3) Node3: "Alice" (timestamp T0) ← OLD DATA! 3. Read Repair detects inconsistency - Canonical value: "Alice J." (newest timestamp T3) - Stale replica: Node3 4. Async repair triggered: → Send "Alice J." to Node3 → Node3 updates to latest value 5. Return to client: → Value: "Alice J." → Repair performed in background
Read Repair Implementation:
// In ReadRepairManager
List<ReadResponse> responses = new ArrayList<>();
responses.add(new ReadResponse(node1, "user:alice", "Alice J.", T3));
responses.add(new ReadResponse(node2, "user:alice", "Alice J.", T3));
responses.add(new ReadResponse(node3, "user:alice", "Alice", T0)); // Stale
// Detect conflicts
ReadRepairResult result = readRepair.performReadRepair(responses,
(node, key, value, timestamp) -> {
// Repair Node3
node.put(key, value, timestamp);
return true;
}
);
// Result:
// - Canonical value: "Alice J."
// - Repaired nodes: 1 (Node3)
┌──────────────────────────────────────────────────────────────┐ │ WRITE PHASE (Node 3 is DOWN) │ ├──────────────────────────────────────────────────────────────┤ │ │ │ Client → PUT user:alice "Alice" CL=QUORUM │ │ │ │ │ ▼ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Node 1 │ ──────→ │ Node 2 │ ✗ │ Node 3 │ │ │ │ (Coord) │ │ │ │ (DOWN) │ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ │ ✓ Write │ ✓ Write │ ✗ Timeout │ │ │ │ │ │ │ ▼ ▼ │ │ │ alice=Alice alice=Alice │ (no data) │ │ │ │ │ │ └─────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ Store Hint for Node3 │ │ /hints/node-3/alice.hint │ │ │ └──────────────────────────────────────────────────────────────┘ ┌──────────────────────────────────────────────────────────────┐ │ RECOVERY PHASE (Node 3 comes back) │ ├──────────────────────────────────────────────────────────────┤ │ │ │ Node 3 sends heartbeat │ │ │ │ │ ▼ │ │ ┌──────────┐ ┌──────────┐ │ │ │ Node 1 │ ─── Replay Hints ────→ │ Node 3 │ │ │ │ │ │ │ │ │ └──────────┘ └──────────┘ │ │ │ │ │ │ │ Load hints from disk │ ✓ Receive │ │ │ /hints/node-3/alice.hint │ alice=Alice │ │ │ │ │ │ └──────────────────────────────────────┘ │ │ │ │ Result: Node 3 now has alice=Alice │ │ │ └──────────────────────────────────────────────────────────────┘ ┌──────────────────────────────────────────────────────────────┐ │ READ REPAIR PHASE (if hint missed or failed) │ ├──────────────────────────────────────────────────────────────┤ │ │ │ Client → GET user:alice CL=QUORUM │ │ │ │ │ ▼ │ │ Read from all replicas: │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Node 1 │ │ Node 2 │ │ Node 3 │ │ │ │alice=v2 │ │alice=v2 │ │alice=v1 │ ← STALE! │ │ │t=T2 │ │t=T2 │ │t=T1 │ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ │ │ │ └────────────────┴────────────────┘ │ │ │ │ │ ▼ │ │ Compare responses │ │ Newest: v2, t=T2 │ │ │ │ │ ▼ │ │ Async repair Node3 │ │ Node3.put(alice, v2, T2) │ │ │ │ │ ▼ │ │ ✓ All nodes in sync! │ │ │ └──────────────────────────────────────────────────────────────┘
// Initialize hinted handoff manager
HintedHandoffManager hintedHandoff = new HintedHandoffManager(
"/var/lib/cube/hints", // Hints directory
10000, // Max 10K hints per node
3600000 // 1 hour hint window
);
// Hints are automatically:
// - Persisted to disk
// - Replayed when node recovers
// - Deleted after successful replay
// - Expired after hint window
// Background thread checks for node recovery
ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
scheduler.scheduleAtFixedRate(() -> {
for (ClusterNode node : cluster.getNodes()) {
if (node.isAlive() && hintedHandoff.getHintCount(node.getId()) > 0) {
// Node is alive and has pending hints
hintedHandoff.replayHintsForNode(node.getId(), hint -> {
// Send hint to recovered node
return sendToNode(node, hint.getKey(), hint.getValue());
});
}
}
}, 10, 10, TimeUnit.SECONDS); // Check every 10 seconds
// Configure read repair probability
ReadRepairManager readRepair = new ReadRepairManager(10); // 10% chance
// On every read:
if (readRepair.shouldPerformReadRepair()) {
// Perform read repair
compareResponses();
repairStaleReplicas();
}
Timeline: T0: Node3 goes down T1-T5: Writes happen (hints stored) T6: Node3 comes back (5 minutes later) Recovery: ✓ Hinted handoff replays all missed writes ✓ Node3 catches up in seconds ✓ Read repair handles any edge cases
Timeline: T0: Node3 goes down T1-T100: Many writes happen T101: Node3 comes back (2 hours later) Recovery: ✓ Hinted handoff replays accumulated hints ✓ May take a few minutes depending on hint count ✓ Read repair fixes any missed updates
Timeline: T0: Node3 goes down T1-T1000: Massive number of writes T1001: Node3 comes back (3 days later) Issues: ✗ Hints may have expired (hint window: 1 hour default) ✗ Too many hints to store (max hints: 10K per node) Recovery: ⚠ Hinted handoff may be incomplete ✓ Read repair will fix data over time ✓ Manual repair may be needed (nodetool repair)
Initial: DC1: Node1, Node2 (can communicate) DC2: Node3 (isolated) Writes in DC1: ✓ CL=QUORUM succeeds (2/3 nodes) ✓ Hints stored for Node3 When partition heals: ✓ Hints replayed to Node3 ✓ Read repair fixes inconsistencies
// Short-lived outages (default) HintedHandoffManager(hintsDir, 10000, 3600000); // 1 hour // Medium-lived outages HintedHandoffManager(hintsDir, 50000, 7200000); // 2 hours // Long-lived outages (not recommended) HintedHandoffManager(hintsDir, 100000, 86400000); // 24 hours
Recommendation: 1-3 hours is optimal. Longer windows risk:
// Low traffic, strong consistency needed ReadRepairManager(100); // 100% - all reads trigger repair // Balanced (recommended) ReadRepairManager(10); // 10% - good balance // High traffic, eventual consistency OK ReadRepairManager(1); // 1% - minimal overhead
// Via shell
cube> STATS
Output:
Pending Hints: 45
Hints for node-3: 45
// Via API
GET /api/v1/replication/hints
Response:
{
"totalHints": 45,
"nodeHints": {
"node-3": 45
}
}
// Read from all replicas GET user:alice CL=ALL // If all nodes return same value: ✓ Nodes are in sync // If values differ: ⚠ Read repair will fix automatically
// Force full synchronization
POST /api/v1/replication/repair
{
"keyspace": "users",
"table": "profiles"
}
Hinted Handoff (Primary)
Read Repair (Secondary)
Anti-Entropy Repair (Tertiary - Phase 3)
Node Down ↓ Writes continue (hints stored) ↓ Node Recovers ↓ Hints replayed (seconds to minutes) ↓ Read repair fixes edge cases (ongoing) ↓ Fully Synchronized ✓
✅ Automatic - No manual intervention needed for short outages
✅ Fast - Hints replay in seconds for normal outages
✅ Reliable - Multiple layers ensure eventual consistency
✅ Configurable - Tune hint windows and read repair probability
Your data stays synchronized even when nodes fail! 🎉