-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Open
Labels
ci-failureIssue about an unexpected test failure in CIIssue about an unexpected test failure in CI
Description
Name of failing test
pip install git+https://github.com/TIGER-AI-Lab/Mantis.git; pytest -v -s models/multimodal/generation/test_common.py -m 'split(group=0) and not core_model'
Basic information
- Flaky test
- Can reproduce locally
- Caused by external libraries (e.g. bug in
transformers)
🧪 Describe the failing test
Failing Tests Summary:
ovis2_5 model (test_case8-11)
- Tests:
test_single_image_models,test_multi_image_models,test_video_models - Failure type: Logprob comparison mismatch between vLLM and HuggingFace outputs
- Configuration: dtype=half, model=AIDC-AI/Ovis2.5-2B with image/multi-image/video inputs
- Likely cause: Numerical precision differences in visual encoder or attention computations between vLLM and HF implementations.
smolvlm model (test_case21-24)
- Tests:
test_single_image_models,test_multi_image_models - Failure type: Logprob comparison mismatch
- Configuration: dtype varies, model=HuggingFaceTB/SmolVLM2-2.2B-Instruct with image inputs
- Likely cause: SmolVLM2's Idefics3-based architecture may have numerical stability issues in the vision-text fusion layers when compiled
minimax_vl_01 model (test_case29-32)
- Tests:
test_single_image_models,test_multi_image_models - Failure type: Logprob comparison mismatch
- Configuration: dtype=bfloat16, model=MiniMaxAI/MiniMax-VL-01 with large GPU requirement (80GB)
- Likely cause: Large model with custom HF processor and output post-processing may have numerical divergence in the visual feature extraction or token generation paths
paddleocr_vl model (test_case45-46)
- Tests:
test_single_image_models,test_multi_image_models - Failure type: Logprob comparison mismatch
- Configuration: model=PaddlePaddle/PaddleOCR-VL with OCR-specific visual processing
- Likely cause: OCR-focused visual encoder may have implementation differences in convolution or text detection layers affecting logprob alignment
qwen2_vl model (test_case54)
- Tests:
test_single_image_models - Failure type: Logprob comparison mismatch
- Configuration: model=Qwen/Qwen2-VL-2B-Instruct, marked as cpu_model
- Likely cause: Test marked for CPU but running on ROCm may expose inconsistencies in vision encoder's dynamic resolution handling or the vision_start/vision_end token processing
📝 History of failing test
AMD-CI build Buildkite references:
- 1041
- 1077
- 1088
- 1109
- 1111
CC List.
No response
Metadata
Metadata
Assignees
Labels
ci-failureIssue about an unexpected test failure in CIIssue about an unexpected test failure in CI
Type
Projects
Status
No status
Status
In review