-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Open
Labels
kind/bugIssues or changes related a bugIssues or changes related a bugneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.
Milestone
Description
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version: 25cee2bf188255d75f5d83de92ba61bb14564bfe
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:Current Behavior
A search operation on multi-replica-milvus took over 50s when a replica is down.
[2025/12/04 07:51:06.534 +00:00] [WARN] [grpcclient/client.go:385] ["fail to get session"] [clientRequestUnixmsec=1764834666223] [traceID=62b418b4118f551ec30fe91acdd265a8] [clientRole=querynode-29] [error="context canceled"]
I find that the shard client on proxy is retrying to verify session and get an unexpected context canceled.
Expected Behavior
A multi-replica cluster should retry the operation on another replica if current replica is down.
So the request should be retried and success after the session of crashed replica is down.
Steps To Reproduce
Milvus Log
No response
Anything else?
No response
Metadata
Metadata
Assignees
Labels
kind/bugIssues or changes related a bugIssues or changes related a bugneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.