Skip to content

[CI Failure]: mi325_1: Multi-Modal Models Test (Extended) 2 #29536

@AndreasKaratzas

Description

@AndreasKaratzas

Name of failing test

pip install git+https://github.com/TIGER-AI-Lab/Mantis.git; pytest -v -s models/multimodal/generation/test_common.py -m 'split(group=0) and not core_model'

Basic information

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

Failing Tests Summary:

ovis2_5 model (test_case8-11)

  • Tests: test_single_image_models, test_multi_image_models, test_video_models
  • Failure type: Logprob comparison mismatch between vLLM and HuggingFace outputs
  • Configuration: dtype=half, model=AIDC-AI/Ovis2.5-2B with image/multi-image/video inputs
  • Likely cause: Numerical precision differences in visual encoder or attention computations between vLLM and HF implementations.

smolvlm model (test_case21-24)

  • Tests: test_single_image_models, test_multi_image_models
  • Failure type: Logprob comparison mismatch
  • Configuration: dtype varies, model=HuggingFaceTB/SmolVLM2-2.2B-Instruct with image inputs
  • Likely cause: SmolVLM2's Idefics3-based architecture may have numerical stability issues in the vision-text fusion layers when compiled

minimax_vl_01 model (test_case29-32)

  • Tests: test_single_image_models, test_multi_image_models
  • Failure type: Logprob comparison mismatch
  • Configuration: dtype=bfloat16, model=MiniMaxAI/MiniMax-VL-01 with large GPU requirement (80GB)
  • Likely cause: Large model with custom HF processor and output post-processing may have numerical divergence in the visual feature extraction or token generation paths

paddleocr_vl model (test_case45-46)

  • Tests: test_single_image_models, test_multi_image_models
  • Failure type: Logprob comparison mismatch
  • Configuration: model=PaddlePaddle/PaddleOCR-VL with OCR-specific visual processing
  • Likely cause: OCR-focused visual encoder may have implementation differences in convolution or text detection layers affecting logprob alignment

qwen2_vl model (test_case54)

  • Tests: test_single_image_models
  • Failure type: Logprob comparison mismatch
  • Configuration: model=Qwen/Qwen2-VL-2B-Instruct, marked as cpu_model
  • Likely cause: Test marked for CPU but running on ROCm may expose inconsistencies in vision encoder's dynamic resolution handling or the vision_start/vision_end token processing

📝 History of failing test

AMD-CI build Buildkite references:

  • 1041
  • 1077
  • 1088
  • 1109
  • 1111

CC List.

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    ci-failureIssue about an unexpected test failure in CI

    Type

    No type

    Projects

    Status

    No status

    Status

    In review

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions