Dragonfly latency spikes during full sync replication

**Describe the bug**
When setting up a DragonflyDB replica to sync from a primary instance with 32 million keys, we observe significant latency spikes (p99 response time jumps from 4ms to 50ms) on the primary during the full sync process. This issue persists even under moderate load conditions (around 132K ops/sec). The latency spikes are detrimental to our application's performance and user experience.

**To Reproduce**
Steps to reproduce the behavior:
1. Start a DragonflyDB primary instance with below command:
   ```
   docker run --rm --name dfly0 -d -p 16379:6379 --ulimit memlock=-1 --cpus=64 docker.dragonflydb.io/dragonflydb/dragonfly:v1.35.1 --cache_mode=true --maxmemory=64g --point_in_time_snapshot=false --background_snapshotting=false --dbfilename=
   ```
2. Populate the primary with 32 million keys using a data population script.
3. Start a DragonflyDB replica instance with below command:
   ```
   docker run --rm --name dfly1 -d -p 26379:6379 --ulimit memlock=-1 --cpus=64 docker.dragonflydb.io/dragonflydb/dragonfly:v1.35.1 --cache_mode=true --maxmemory=64g --point_in_time_snapshot=false --background_snapshotting=false --dbfilename=
   ```
4. Use `memtier_benchmark` to generate a load of around 244K ops/sec on the primary as below:
   ```
   docker run --rm -d -v /$(pwd):/home redislabs/memtier_benchmark -s <primary_ip> -p 16379 --data-size=256  --command="MGET __key__ __key__ __key__ __key__ __key__" --command-ratio=10 --command="SET __key__ __data__" --command-ratio=1 --test-time=60 --rate-limiting=660 --json-out-file=/home/memtier_out.json
   ```
5. Configure the replica to sync from the primary:
    ```
    redis-cli -h <replica_ip> -p 26379 replicaof <primary_ip> 16379
    ```
6. Monitor the p99 latency on the primary using `memtier_benchmark` output.


**Expected behavior**
The p99 latency on the primary should remain stable or show minimal increase during the replica's full sync process, ideally staying below 10ms.

**Screenshots**
<img width="718" height="352" alt="Image" src="https://github.com/user-attachments/assets/75e2fe8f-46be-4753-80c3-e5b63ff0ebab" />
<img width="718" height="350" alt="Image" src="https://github.com/user-attachments/assets/cb27a9ca-aa4b-4b38-9deb-2537053a66f7" />

**Environment (please complete the following information):**
 - OS: [ubuntu 22.04]
 - Kernel: Linux as-fscache-3043 5.15.0-157-generic #167-Ubuntu SMP Wed Sep 17 21:35:53 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
 - Containerized?: Yes, Docker 28.5.1
 - Dragonfly Version: v1.35.1

**Reproducible Code Snippet**
Script to populate data (populate_data.py):
```
import redis
import os
import base64

# ---------- Config ----------
REDIS_HOST = '10.143.160.122'
REDIS_PORT = 16379
DB = 0

TOTAL_KEYS = 32_000_000
BATCH_SIZE = 10_000          # increase batch size
KEY_PREFIX = b"rand:"        # use bytes directly


# ---------- Helpers ----------
# Faster random generator using os.urandom + base64
def random_bytes(n: int) -> bytes:
    # base64 expands by ~4/3; trim to exactly n
    return base64.urlsafe_b64encode(os.urandom(int(n * 0.8)))[:n]


def random_key_bytes(length: int = 16) -> bytes:
    return KEY_PREFIX + random_bytes(length)


def random_value_bytes(length: int = 1024) -> bytes:
    return random_bytes(length)


# ---------- Main ----------
def main():
    # decode_responses=False to avoid encoding/decoding overhead
    r = redis.Redis(
        host=REDIS_HOST,
        port=REDIS_PORT,
        db=DB,
        decode_responses=False,
    )

    # transaction=False to avoid MULTI/EXEC
    pipe = r.pipeline(transaction=False)

    for i in range(1, TOTAL_KEYS + 1):
        k = random_key_bytes()
        v = random_value_bytes()
        pipe.set(k, v)

        if i % BATCH_SIZE == 0:
            pipe.execute()
            # print(f"Inserted {i} keys...")

    # flush remaining
    pipe.execute()
    print("Done.")


if __name__ == "__main__":
    main()
```

**Additional context**
I found another issue report that seems related: #4787, which discusses DragonflyDB is unresponsive during full sync replication. Looks like that issue is fixed already, but the latency spike problem still persists in our case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dragonfly latency spikes during full sync replication #6131

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dragonfly latency spikes during full sync replication #6131

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions