-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Describe the bug
When setting up a DragonflyDB replica to sync from a primary instance with 32 million keys, we observe significant latency spikes (p99 response time jumps from 4ms to 50ms) on the primary during the full sync process. This issue persists even under moderate load conditions (around 132K ops/sec). The latency spikes are detrimental to our application's performance and user experience.
To Reproduce
Steps to reproduce the behavior:
- Start a DragonflyDB primary instance with below command:
docker run --rm --name dfly0 -d -p 16379:6379 --ulimit memlock=-1 --cpus=64 docker.dragonflydb.io/dragonflydb/dragonfly:v1.35.1 --cache_mode=true --maxmemory=64g --point_in_time_snapshot=false --background_snapshotting=false --dbfilename= - Populate the primary with 32 million keys using a data population script.
- Start a DragonflyDB replica instance with below command:
docker run --rm --name dfly1 -d -p 26379:6379 --ulimit memlock=-1 --cpus=64 docker.dragonflydb.io/dragonflydb/dragonfly:v1.35.1 --cache_mode=true --maxmemory=64g --point_in_time_snapshot=false --background_snapshotting=false --dbfilename= - Use
memtier_benchmarkto generate a load of around 244K ops/sec on the primary as below:docker run --rm -d -v /$(pwd):/home redislabs/memtier_benchmark -s <primary_ip> -p 16379 --data-size=256 --command="MGET __key__ __key__ __key__ __key__ __key__" --command-ratio=10 --command="SET __key__ __data__" --command-ratio=1 --test-time=60 --rate-limiting=660 --json-out-file=/home/memtier_out.json - Configure the replica to sync from the primary:
redis-cli -h <replica_ip> -p 26379 replicaof <primary_ip> 16379 - Monitor the p99 latency on the primary using
memtier_benchmarkoutput.
Expected behavior
The p99 latency on the primary should remain stable or show minimal increase during the replica's full sync process, ideally staying below 10ms.
Environment (please complete the following information):
- OS: [ubuntu 22.04]
- Kernel: Linux as-fscache-3043 5.15.0-157-generic #167-Ubuntu SMP Wed Sep 17 21:35:53 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
- Containerized?: Yes, Docker 28.5.1
- Dragonfly Version: v1.35.1
Reproducible Code Snippet
Script to populate data (populate_data.py):
import redis
import os
import base64
# ---------- Config ----------
REDIS_HOST = '10.143.160.122'
REDIS_PORT = 16379
DB = 0
TOTAL_KEYS = 32_000_000
BATCH_SIZE = 10_000 # increase batch size
KEY_PREFIX = b"rand:" # use bytes directly
# ---------- Helpers ----------
# Faster random generator using os.urandom + base64
def random_bytes(n: int) -> bytes:
# base64 expands by ~4/3; trim to exactly n
return base64.urlsafe_b64encode(os.urandom(int(n * 0.8)))[:n]
def random_key_bytes(length: int = 16) -> bytes:
return KEY_PREFIX + random_bytes(length)
def random_value_bytes(length: int = 1024) -> bytes:
return random_bytes(length)
# ---------- Main ----------
def main():
# decode_responses=False to avoid encoding/decoding overhead
r = redis.Redis(
host=REDIS_HOST,
port=REDIS_PORT,
db=DB,
decode_responses=False,
)
# transaction=False to avoid MULTI/EXEC
pipe = r.pipeline(transaction=False)
for i in range(1, TOTAL_KEYS + 1):
k = random_key_bytes()
v = random_value_bytes()
pipe.set(k, v)
if i % BATCH_SIZE == 0:
pipe.execute()
# print(f"Inserted {i} keys...")
# flush remaining
pipe.execute()
print("Done.")
if __name__ == "__main__":
main()
Additional context
I found another issue report that seems related: #4787, which discusses DragonflyDB is unresponsive during full sync replication. Looks like that issue is fixed already, but the latency spike problem still persists in our case.

