# Data Synchronization: Node Down and Recovery in Cube Database

## Overview

When a node goes down and comes back up, Cube database uses **three mechanisms** to ensure data synchronization:

1. **Hinted Handoff** - Store missed writes and replay them
2. **Read Repair** - Fix inconsistencies during reads
3. **Anti-Entropy Repair** - Periodic full synchronization (Phase 3 feature)

---

## How It Works: Complete Flow

### Scenario: 3-Node Cluster, Node 3 Goes Down

```
Initial State:
┌─────────┐  ┌─────────┐  ┌─────────┐
│ Node 1  │  │ Node 2  │  │ Node 3  │
│  ALIVE  │  │  ALIVE  │  │  ALIVE  │
└─────────┘  └─────────┘  └─────────┘
     ✓            ✓            ✓
```

---

## Phase 1: Node Goes Down

```
Node 3 crashes or loses network:
┌─────────┐  ┌─────────┐  ┌─────────┐
│ Node 1  │  │ Node 2  │  │ Node 3  │
│  ALIVE  │  │  ALIVE  │  │  DOWN   │
└─────────┘  └─────────┘  └─────────┘
     ✓            ✓            ✗
```

### What Happens to New Writes?

#### Write with CL=QUORUM (2 of 3 nodes)

```java
// Client writes a new key
PUT user:alice "Alice Johnson" CL=QUORUM

Flow:
1. Coordinator (Node 1) receives write
2. Determines replicas: [Node1, Node2, Node3]
3. Sends write to all 3 nodes
4. Node1: ✓ Success (local write)
5. Node2: ✓ Success (network write)
6. Node3: ✗ Timeout (node is down)
7. Success count: 2/3
8. Required for QUORUM: 2/3
9. Result: ✓ WRITE SUCCEEDS

// Store hint for Node 3
10. Coordinator stores "hint" for Node3:
    - Key: user:alice
    - Value: "Alice Johnson"
    - Timestamp: 1234567890
    - Target: Node3
```

**Hinted Handoff in Action:**

```java
// On Node 1 (coordinator)
hintedHandoff.storeHint(
    "node-3",              // Target node
    "user:alice",          // Key
    "Alice Johnson",       // Value
    timestamp              // When it was written
);

// Hint is persisted to disk:
// /var/lib/cube/hints/node-3/user_alice-1234567890.hint
```

---

## Phase 2: While Node 3 is Down

Multiple writes happen:

```
Time T1: PUT user:bob "Bob Smith" CL=QUORUM
  → Writes to Node1, Node2
  → Hint stored for Node3

Time T2: PUT user:charlie "Charlie Brown" CL=QUORUM
  → Writes to Node1, Node2
  → Hint stored for Node3

Time T3: UPDATE user:alice SET value="Alice J." CL=QUORUM
  → Writes to Node1, Node2
  → Hint stored for Node3

State:
Node1: alice=Alice J., bob=Bob Smith, charlie=Charlie Brown
Node2: alice=Alice J., bob=Bob Smith, charlie=Charlie Brown
Node3: [DOWN - has old data]

Hints for Node3:
  - user:alice (v2, timestamp T3)
  - user:bob (timestamp T1)
  - user:charlie (timestamp T2)
```

---

## Phase 3: Node 3 Comes Back Online

```
Node 3 recovers:
┌─────────┐  ┌─────────┐  ┌─────────┐
│ Node 1  │  │ Node 2  │  │ Node 3  │
│  ALIVE  │  │  ALIVE  │  │  ALIVE  │
└─────────┘  └─────────┘  └─────────┘
     ✓            ✓            ✓ (recovered)
```

### Automatic Hint Replay

```java
// HintedHandoffManager detects Node3 is alive
// Automatically triggers replay

for (Hint hint : hintsForNode3) {
    if (!hint.isExpired()) {
        // Send hint to Node3
        boolean success = sendHintToNode(
            node3,
            hint.getKey(),
            hint.getValue(),
            hint.getTimestamp()
        );
        
        if (success) {
            // Delete hint file
            deleteHint(hint);
        }
    }
}
```

**Replay Process:**

```
Step 1: Node 1 detects Node 3 is alive
  → Heartbeat received from Node 3

Step 2: Trigger hint replay
  → hintedHandoff.replayHintsForNode("node-3")

Step 3: Replay each hint in order
  Hint 1: user:alice = "Alice J." (timestamp T3)
    → Node3.put("user:alice", "Alice J.", T3)
    → ✓ Success

  Hint 2: user:bob = "Bob Smith" (timestamp T1)
    → Node3.put("user:bob", "Bob Smith", T1)
    → ✓ Success

  Hint 3: user:charlie = "Charlie Brown" (timestamp T2)
    → Node3.put("user:charlie", "Charlie Brown", T2)
    → ✓ Success

Step 4: Delete replayed hints
  → All hints successfully replayed and deleted
```

**After Hint Replay:**

```
Node1: alice=Alice J., bob=Bob Smith, charlie=Charlie Brown
Node2: alice=Alice J., bob=Bob Smith, charlie=Charlie Brown
Node3: alice=Alice J., bob=Bob Smith, charlie=Charlie Brown
       ✓ NOW IN SYNC!
```

---

## Read Repair: Catching Missed Updates

Even with hinted handoff, some data might be missed. Read repair fixes this:

### Example: Read After Node Recovery

```java
// Client reads user:alice with CL=QUORUM
GET user:alice CL=QUORUM

Flow:
1. Coordinator reads from Node1, Node2, Node3
2. Receives responses:
   Node1: "Alice J." (timestamp T3)
   Node2: "Alice J." (timestamp T3)
   Node3: "Alice"    (timestamp T0) ← OLD DATA!

3. Read Repair detects inconsistency
   - Canonical value: "Alice J." (newest timestamp T3)
   - Stale replica: Node3

4. Async repair triggered:
   → Send "Alice J." to Node3
   → Node3 updates to latest value

5. Return to client:
   → Value: "Alice J."
   → Repair performed in background
```

**Read Repair Implementation:**

```java
// In ReadRepairManager
List<ReadResponse> responses = new ArrayList<>();
responses.add(new ReadResponse(node1, "user:alice", "Alice J.", T3));
responses.add(new ReadResponse(node2, "user:alice", "Alice J.", T3));
responses.add(new ReadResponse(node3, "user:alice", "Alice", T0)); // Stale

// Detect conflicts
ReadRepairResult result = readRepair.performReadRepair(responses, 
    (node, key, value, timestamp) -> {
        // Repair Node3
        node.put(key, value, timestamp);
        return true;
    }
);

// Result:
// - Canonical value: "Alice J."
// - Repaired nodes: 1 (Node3)
```

---

## Complete Synchronization Flow Diagram

```
┌──────────────────────────────────────────────────────────────┐
│  WRITE PHASE (Node 3 is DOWN)                               │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  Client → PUT user:alice "Alice" CL=QUORUM                  │
│     │                                                        │
│     ▼                                                        │
│  ┌──────────┐         ┌──────────┐         ┌──────────┐    │
│  │  Node 1  │ ──────→ │  Node 2  │    ✗    │  Node 3  │    │
│  │ (Coord)  │         │          │         │  (DOWN)  │    │
│  └──────────┘         └──────────┘         └──────────┘    │
│     │ ✓ Write            │ ✓ Write            │ ✗ Timeout  │
│     │                    │                    │             │
│     ▼                    ▼                    │             │
│  alice=Alice         alice=Alice              │ (no data)   │
│     │                                         │             │
│     └─────────────────────────────────────────┘             │
│                    │                                        │
│                    ▼                                        │
│            Store Hint for Node3                            │
│            /hints/node-3/alice.hint                        │
│                                                              │
└──────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────┐
│  RECOVERY PHASE (Node 3 comes back)                         │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  Node 3 sends heartbeat                                     │
│     │                                                        │
│     ▼                                                        │
│  ┌──────────┐                          ┌──────────┐         │
│  │  Node 1  │  ─── Replay Hints ────→  │  Node 3  │         │
│  │          │                          │          │         │
│  └──────────┘                          └──────────┘         │
│     │                                      │                │
│     │ Load hints from disk                │ ✓ Receive       │
│     │ /hints/node-3/alice.hint            │   alice=Alice   │
│     │                                      │                │
│     └──────────────────────────────────────┘                │
│                                                              │
│  Result: Node 3 now has alice=Alice                         │
│                                                              │
└──────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────┐
│  READ REPAIR PHASE (if hint missed or failed)               │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  Client → GET user:alice CL=QUORUM                          │
│     │                                                        │
│     ▼                                                        │
│  Read from all replicas:                                    │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐              │
│  │  Node 1  │    │  Node 2  │    │  Node 3  │              │
│  │alice=v2  │    │alice=v2  │    │alice=v1  │ ← STALE!     │
│  │t=T2      │    │t=T2      │    │t=T1      │              │
│  └──────────┘    └──────────┘    └──────────┘              │
│     │                │                │                     │
│     └────────────────┴────────────────┘                     │
│                      │                                      │
│                      ▼                                      │
│              Compare responses                              │
│              Newest: v2, t=T2                              │
│                      │                                      │
│                      ▼                                      │
│              Async repair Node3                            │
│              Node3.put(alice, v2, T2)                      │
│                      │                                      │
│                      ▼                                      │
│              ✓ All nodes in sync!                          │
│                                                              │
└──────────────────────────────────────────────────────────────┘
```

---

## Implementation Details

### 1. Hinted Handoff Configuration

```java
// Initialize hinted handoff manager
HintedHandoffManager hintedHandoff = new HintedHandoffManager(
    "/var/lib/cube/hints",  // Hints directory
    10000,                   // Max 10K hints per node
    3600000                  // 1 hour hint window
);

// Hints are automatically:
// - Persisted to disk
// - Replayed when node recovers
// - Deleted after successful replay
// - Expired after hint window
```

### 2. Automatic Hint Replay

```java
// Background thread checks for node recovery
ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
scheduler.scheduleAtFixedRate(() -> {
    for (ClusterNode node : cluster.getNodes()) {
        if (node.isAlive() && hintedHandoff.getHintCount(node.getId()) > 0) {
            // Node is alive and has pending hints
            hintedHandoff.replayHintsForNode(node.getId(), hint -> {
                // Send hint to recovered node
                return sendToNode(node, hint.getKey(), hint.getValue());
            });
        }
    }
}, 10, 10, TimeUnit.SECONDS); // Check every 10 seconds
```

### 3. Read Repair Probability

```java
// Configure read repair probability
ReadRepairManager readRepair = new ReadRepairManager(10); // 10% chance

// On every read:
if (readRepair.shouldPerformReadRepair()) {
    // Perform read repair
    compareResponses();
    repairStaleReplicas();
}
```

---

## Handling Different Failure Scenarios

### Scenario 1: Short Outage (Minutes)

```
Timeline:
T0: Node3 goes down
T1-T5: Writes happen (hints stored)
T6: Node3 comes back (5 minutes later)

Recovery:
✓ Hinted handoff replays all missed writes
✓ Node3 catches up in seconds
✓ Read repair handles any edge cases
```

### Scenario 2: Medium Outage (Hours)

```
Timeline:
T0: Node3 goes down
T1-T100: Many writes happen
T101: Node3 comes back (2 hours later)

Recovery:
✓ Hinted handoff replays accumulated hints
✓ May take a few minutes depending on hint count
✓ Read repair fixes any missed updates
```

### Scenario 3: Long Outage (Days)

```
Timeline:
T0: Node3 goes down
T1-T1000: Massive number of writes
T1001: Node3 comes back (3 days later)

Issues:
✗ Hints may have expired (hint window: 1 hour default)
✗ Too many hints to store (max hints: 10K per node)

Recovery:
⚠ Hinted handoff may be incomplete
✓ Read repair will fix data over time
✓ Manual repair may be needed (nodetool repair)
```

### Scenario 4: Network Partition

```
Initial:
DC1: Node1, Node2 (can communicate)
DC2: Node3 (isolated)

Writes in DC1:
✓ CL=QUORUM succeeds (2/3 nodes)
✓ Hints stored for Node3

When partition heals:
✓ Hints replayed to Node3
✓ Read repair fixes inconsistencies
```

---

## Configuration Best Practices

### Hint Window Configuration

```java
// Short-lived outages (default)
HintedHandoffManager(hintsDir, 10000, 3600000);  // 1 hour

// Medium-lived outages
HintedHandoffManager(hintsDir, 50000, 7200000);  // 2 hours

// Long-lived outages (not recommended)
HintedHandoffManager(hintsDir, 100000, 86400000); // 24 hours
```

**Recommendation**: 1-3 hours is optimal. Longer windows risk:
- Too much disk space for hints
- Slower replay times
- Stale hints that may conflict

### Read Repair Probability

```java
// Low traffic, strong consistency needed
ReadRepairManager(100);  // 100% - all reads trigger repair

// Balanced (recommended)
ReadRepairManager(10);   // 10% - good balance

// High traffic, eventual consistency OK
ReadRepairManager(1);    // 1% - minimal overhead
```

---

## Monitoring and Verification

### Check Hint Status

```java
// Via shell
cube> STATS

Output:
Pending Hints:         45
Hints for node-3:      45

// Via API
GET /api/v1/replication/hints

Response:
{
  "totalHints": 45,
  "nodeHints": {
    "node-3": 45
  }
}
```

### Verify Synchronization

```java
// Read from all replicas
GET user:alice CL=ALL

// If all nodes return same value:
✓ Nodes are in sync

// If values differ:
⚠ Read repair will fix automatically
```

### Manual Repair (if needed)

```java
// Force full synchronization
POST /api/v1/replication/repair

{
  "keyspace": "users",
  "table": "profiles"
}
```

---

## Summary

### Data Synchronization Mechanisms:

1. **Hinted Handoff** (Primary)
   - Stores missed writes while node is down
   - Automatically replays when node recovers
   - Fast and efficient for short outages

2. **Read Repair** (Secondary)
   - Fixes inconsistencies during reads
   - Probabilistic (configurable %)
   - Catches anything hints missed

3. **Anti-Entropy Repair** (Tertiary - Phase 3)
   - Full table scan and comparison
   - Expensive but comprehensive
   - Used for long outages

### Recovery Timeline:

```
Node Down
   ↓
Writes continue (hints stored)
   ↓
Node Recovers
   ↓
Hints replayed (seconds to minutes)
   ↓
Read repair fixes edge cases (ongoing)
   ↓
Fully Synchronized ✓
```

### Key Points:

✅ **Automatic** - No manual intervention needed for short outages  
✅ **Fast** - Hints replay in seconds for normal outages  
✅ **Reliable** - Multiple layers ensure eventual consistency  
✅ **Configurable** - Tune hint windows and read repair probability  

**Your data stays synchronized even when nodes fail!** 🎉
