[NIXL] refine decoder side post process for heterogeneous BlockSize and kv_layout #30275

xuechendi · 2025-12-08T19:34:15Z

Purpose

We have supported heterogeneous BlockSize and kv_layout in seperate post process methods.
This PR is to clean up and use single method to post_process for cases.

What is changed in this PR:

I removed permute_device_kv and blocksize_post_process, and move the logic into post_process_device_kv_on_receive as single post_process function with 3 options:

if enable_permute_local_kv and block_size_ratio > 1:
    _kv_postprocess_blksize_and_layout(
        cache, indices, block_size_ratio
    )
elif enable_permute_local_kv:
    _kv_postprocess_layout(cache, indices)
else:
    _kv_postprocess_blksize(cache, indices, block_size_ratio)

Test Plan

Test with heterogeneous KV_layout + heterogeneous block_size

DECODER_KV_LAYOUT=NHD PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=64 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize.log

GPU_MEMORY_UTILIZATION=0.8 MODEL_NAMES="deepseek-ai/DeepSeek-V2-Lite-Chat" DECODER_KV_LAYOUT=NHD PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=64 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize_MLA.log

=> Passed accuracy test

Test with heterogeneous KV_layout

DECODER_KV_LAYOUT=NHD PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=16 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize.log

GPU_MEMORY_UTILIZATION=0.8 MODEL_NAMES="deepseek-ai/DeepSeek-V2-Lite-Chat" DECODER_KV_LAYOUT=NHD PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=16 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize_MLA.log

=> Passed accuracy test

Test with heterogeneous heterogeneous block_size

PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=64 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize.log

GPU_MEMORY_UTILIZATION=0.8 MODEL_NAMES="deepseek-ai/DeepSeek-V2-Lite-Chat" PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=64 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize_MLA.log

=> Passed accuracy test

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

gemini-code-assist

Code Review

This pull request refactors the post-processing logic for heterogeneous BlockSize and kv_layout, which is a good direction for code cleanup. However, the implementation introduces several issues. There are critical bugs in the tensor reshape operations within the new helper functions (_kv_postprocess_layout, _kv_postprocess_blksize, and _kv_postprocess_blksize_and_layout), which will likely lead to runtime errors or corrupted KV cache data. Additionally, there's a redundant index_select operation that should be removed to improve performance. These issues need to be addressed to ensure the correctness and efficiency of the new implementation.

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

Clean up post_process when received on decoder side

29371dd

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

xuechendi requested review from ApostaC and NickLucche as code owners December 8, 2025 19:34

mergify bot added v1 kv-connector labels Dec 8, 2025

chatgpt-codex-connector bot reviewed Dec 8, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Dec 8, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Show resolved Hide resolved

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Show resolved Hide resolved

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Show resolved Hide resolved

xuechendi mentioned this pull request Dec 8, 2025

[RFC]: Nixl Connector Heterogeneous BlockSize support #26744

Open

1 task

xuechendi added 2 commits December 8, 2025 11:52

Update test with print

227b30e

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

forgot physical logical block id mapping, added here

edc6d6e

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

xuechendi force-pushed the dev/decode_KV_post_process branch from b405900 to edc6d6e Compare December 8, 2025 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[NIXL] refine decoder side post process for heterogeneous BlockSize and kv_layout #30275

[NIXL] refine decoder side post process for heterogeneous BlockSize and kv_layout #30275

xuechendi commented Dec 8, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[NIXL] refine decoder side post process for heterogeneous BlockSize and kv_layout #30275

Are you sure you want to change the base?

[NIXL] refine decoder side post process for heterogeneous BlockSize and kv_layout #30275

Conversation

xuechendi commented Dec 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

What is changed in this PR:

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xuechendi commented Dec 8, 2025 •

edited by github-actions bot

Loading