[ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements #30270

AndreasKaratzas · 2025-12-08T17:00:22Z

This PR addresses several ROCm-specific issues with multi-modal/vision-language models and improves attention backend dispatching for encoder-only self-attention models. It renders green the following test groups on ROCm:

Multi-Modal Models Test (Standard)
Multi-Modal Models Test (Extended) 1
Multi-Modal Models Test (Extended) 2
Multi-Modal Models Test (Extended) 3

Key Changes

Attention Backend Selection (vllm/platforms/rocm.py):

Added dtype validation (fp16/bf16 only) for Flash Attention backend selection
Added automatic FlexAttention fallback for ENCODER_ONLY attention types
Implemented VLLM_ATTENTION_BACKEND environment variable override mechanism for model-specific backend selection
Improved logging messages with "V1 engine" suffix for clarity

Qwen3-VL Support (vllm/model_executor/models/qwen3_vl.py):

Added ROCm-specific attention backend override to use ROCM_AITER_FA on gfx9 architectures
Sets VLLM_ATTENTION_BACKEND env var to propagate backend choice to platform layer

Transformers Multimodal Mixin (vllm/model_executor/models/transformers/multimodal.py):

Force MATH SDP backend for vision encoder on ROCm to avoid accuracy issues with flash_sdp and mem_efficient_sdp
Patches issue #30167

SigLIP2 NaViT (vllm/model_executor/models/siglip2navit.py):

Disabled vllm_flash_attn rotary embedding on ROCm

Test Updates:

Moved conftest.py up one level to share ROCm SDP workarounds across multimodal tests
Use float16 dtype instead of float for CLIP/SigLIP pooling tests on ROCm
Added ROCm-specific processor kwargs for Qwen2.5-VL tests
Skip MiniMax-VL-01 on ROCm due to GPU OOM and pickle issues with spawn + tp>1
Modified CI timeouts for multi-modal test suites for precision

CI Configuration (.buildkite/test-amd.yaml):

Extended timeouts for Multi-Modal Models Test suites (Standard: 100min, Extended 2: 120min, Accuracy Eval: 240min)

…s; added automation for Qwen3-VL backend; fixes for groups Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…dels - fixes siglip fp32 on ROCm Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…mers test case Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2025-12-08T17:00:54Z

cc @hmellor

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-08T17:02:58Z

vllm/platforms/rocm.py

+            # Priority 5: If model is Encoder-only self-attention type
+            if attn_type is not None and attn_type in (AttentionType.ENCODER_ONLY):
+                logger.info("Using FlexAttention backend  on V1 engine.")
+                return AttentionBackendEnum.FLEX_ATTENTION.get_path()


Encoder-only fallback condition raises at runtime

The new FlexAttention fallback uses attn_type in (AttentionType.ENCODER_ONLY) without a trailing comma, so the right-hand side is a single AttentionType value rather than an iterable. As soon as attn_type is provided (e.g., for encoder-only models such as CLIP/SigLIP on ROCm), evaluating this condition raises TypeError: argument of type 'AttentionType' is not iterable, preventing backend selection and crashing initialization instead of falling back to FlexAttention.

Useful? React with 👍 / 👎.

Comment addressed in 9c9f225

vllm/platforms/rocm.py

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

gemini-code-assist

Code Review

This pull request introduces a range of fixes and improvements for multi-modal model support on ROCm. The changes include updates to CI timeouts, model-specific workarounds for attention backends and data types, and enhancements to the attention backend selection logic. While the changes are generally positive and address clear needs, I have identified two significant concerns. First, a critical issue with the use of an environment variable for configuration, which introduces global state and risks race conditions in a concurrent environment. Second, a high-severity maintainability issue arising from code duplication to resolve a circular import. Addressing these points would improve the robustness and long-term maintainability of the codebase.

gemini-code-assist · 2025-12-08T17:18:02Z

vllm/model_executor/models/qwen3_vl.py

+                    attn_backend_override = AttentionBackendEnum.ROCM_AITER_FA
+                    # Set also env variable for platform rocm to use ROCM_AITER_FA
+                    # for `selected_backend` in attention backend getter.
+                    os.environ["VLLM_ATTENTION_BACKEND"] = "ROCM_AITER_FA"


Using os.environ to control behavior between different parts of the application introduces a global state, which is a potential source of bugs. In a concurrent environment, such as a server handling multiple model loading requests simultaneously, this can lead to race conditions where one request unintentionally affects another. A safer approach would be to pass this configuration through the VllmConfig or a similar mechanism that avoids global state. For instance, you could add an attn_backend_override to ModelConfig which can be set by model-specific code and then read by the attention backend selection logic.

There is no concurrency that involves this part of code. This behavior is ROCm specific. This is critical for this model for now. Nonetheless, we are going to add the support necessary for the rest of the ROCm specific attention backends and make this mini-patch obsolete in the future.

gemini-code-assist · 2025-12-08T17:18:02Z

vllm/transformers_utils/config.py

+def getattr_iter(obj: object, names: Iterable[str], default: Any) -> Any:
+    """
+    A helper function that retrieves an attribute from an object which may
+    have multiple possible names. This is useful when fetching attributes from
+    ``transformers.PretrainedConfig`` instances.
+
+    Note:
+        Duplicated from ``vllm.config.utils`` to avoid circular import
+        (vllm.transformers_utils.config <-> vllm.config.model).
+    """
+    for name in names:
+        if hasattr(obj, name):
+            return getattr(obj, name)
+    return default


This function getattr_iter is duplicated from vllm.config.utils to avoid a circular import, as noted in the docstring. Code duplication poses a significant maintainability risk, as changes made to one copy may not be propagated to the other, leading to inconsistent behavior and potential bugs. To resolve this, please consider refactoring to break the circular dependency. A common solution is to move the shared utility function to a more fundamental module (e.g., a new file like vllm/utils/config_utils.py) that can be safely imported by both vllm.config.utils and vllm.transformers_utils.config.

This was a standard function. It serves very specific action and would be highly unlikely to be needed any "maintenance" or modifications at all in the future. Therefore, the duplication only serves the resolution of the circular import bug.

AndreasKaratzas added 10 commits December 5, 2025 05:16

[ROCm][Bugfix] Added automation for encoder only self attention model…

84c26b1

…s; added automation for Qwen3-VL backend; fixes for groups Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Merge remote-tracking branch 'origin/main' into akaratza_multi_mod_ext_b

d4c47f3

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Merge remote-tracking branch 'origin/main' into akaratza_multi_mod_ext_b

9af82a4

[ROCm] Configured attention backend dispatchign for CV/multi-modal mo…

f63c60e

…dels - fixes siglip fp32 on ROCm Signed-off-by: Andreas Karatzas <akaratza@amd.com>

[ROCm] patches accuracy issue with flash attention vision embeds

e8dc42f

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

[ROCm][Bugfix] Patching vision_embeddings for the qwen2_5_vl-transfor…

ac960de

…mers test case Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Merge remote-tracking branch 'origin/main' into akaratza_multi_mod_ext_b

5fe66ac

Debugging Qwen3-VL

008e755

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Merge remote-tracking branch 'origin/main' into akaratza_multi_mod_ext_b

206b410

Fixed attention backend override mechanism

ea2111c

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas requested review from DarkLight1337, hmellor, noooop, sighingnow, tjtanaa and ywang96 as code owners December 8, 2025 17:00

mergify bot added ci/build multi-modality Related to multi-modality (#4194) qwen Related to Qwen models rocm Related to AMD ROCm labels Dec 8, 2025

chatgpt-codex-connector bot reviewed Dec 8, 2025

View reviewed changes

This was referenced Dec 8, 2025

[CI Failure]: mi325_1: Multi-Modal Models Test (Extended) 2 #29536

Open

[CI Failure]: mi325_1: Multi-Modal Models Test (Standard) #29520

Open

[CI Failure]: mi325_1: Multi-Modal Models Test (Extended) 1 #29511

Open

hmellor reviewed Dec 8, 2025

View reviewed changes

vllm/platforms/rocm.py Outdated Show resolved Hide resolved

AndreasKaratzas added 3 commits December 8, 2025 17:11

Addressing Codex bot review comment

9c9f225

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Merge remote-tracking branch 'origin/main' into akaratza_multi_mod_ext_b

8fb6c74

Addressing Harry's comment

33f9bff

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

gemini-code-assist bot reviewed Dec 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements #30270

[ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements #30270

AndreasKaratzas commented Dec 8, 2025 •

edited by github-actions bot

Loading

Uh oh!

AndreasKaratzas commented Dec 8, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 8, 2025

Uh oh!

AndreasKaratzas Dec 8, 2025

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 8, 2025

Uh oh!

AndreasKaratzas Dec 8, 2025

Uh oh!

gemini-code-assist bot Dec 8, 2025

Uh oh!

AndreasKaratzas Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements #30270

Are you sure you want to change the base?

[ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements #30270

Conversation

AndreasKaratzas commented Dec 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

Uh oh!

AndreasKaratzas commented Dec 8, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AndreasKaratzas commented Dec 8, 2025 •

edited by github-actions bot

Loading