Skip to content

Conversation

@xuechendi
Copy link
Contributor

@xuechendi xuechendi commented Dec 8, 2025

Purpose

We have supported heterogeneous BlockSize and kv_layout in seperate post process methods.
This PR is to clean up and use single method to post_process for cases.

What is changed in this PR:

I removed permute_device_kv and blocksize_post_process, and move the logic into post_process_device_kv_on_receive as single post_process function with 3 options:

if enable_permute_local_kv and block_size_ratio > 1:
    _kv_postprocess_blksize_and_layout(
        cache, indices, block_size_ratio
    )
elif enable_permute_local_kv:
    _kv_postprocess_layout(cache, indices)
else:
    _kv_postprocess_blksize(cache, indices, block_size_ratio)

Test Plan

Test with heterogeneous KV_layout + heterogeneous block_size

DECODER_KV_LAYOUT=NHD PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=64 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize.log

GPU_MEMORY_UTILIZATION=0.8 MODEL_NAMES="deepseek-ai/DeepSeek-V2-Lite-Chat" DECODER_KV_LAYOUT=NHD PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=64 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize_MLA.log

=> Passed accuracy test

Test with heterogeneous KV_layout

DECODER_KV_LAYOUT=NHD PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=16 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize.log

GPU_MEMORY_UTILIZATION=0.8 MODEL_NAMES="deepseek-ai/DeepSeek-V2-Lite-Chat" DECODER_KV_LAYOUT=NHD PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=16 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize_MLA.log

=> Passed accuracy test

Test with heterogeneous heterogeneous block_size

PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=64 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize.log

GPU_MEMORY_UTILIZATION=0.8 MODEL_NAMES="deepseek-ai/DeepSeek-V2-Lite-Chat" PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=64 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize_MLA.log

=> Passed accuracy test

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the post-processing logic for heterogeneous BlockSize and kv_layout, which is a good direction for code cleanup. However, the implementation introduces several issues. There are critical bugs in the tensor reshape operations within the new helper functions (_kv_postprocess_layout, _kv_postprocess_blksize, and _kv_postprocess_blksize_and_layout), which will likely lead to runtime errors or corrupted KV cache data. Additionally, there's a redundant index_select operation that should be removed to improve performance. These issues need to be addressed to ensure the correctness and efficiency of the new implementation.

Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
@xuechendi xuechendi force-pushed the dev/decode_KV_post_process branch from b405900 to edc6d6e Compare December 8, 2025 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant