Must the sp_size be equal to the total_gpus in UlyssesSPAttentionHF? #7671
Unanswered
NiuMa-1234
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I found the
sequence_parallel_sizein the provided example of Ulysses (test_ulysses_sp_hf.py) is equal to the world_size( total gpus) , and if thesequence_parallel_sizeis less than the world_size, the training would encounter an error when going backwards, as shown below:And the error is likely caused by that, when executing
torch._AllGather, thegradoutputonly hassp_world_sizeitems buttorch.distributed.get_rank()causes each GPU to choose its own grad from the gradoutput. Therefore raise this index mismatch error.So is there a must to make sure the sequence_parallel_size be equal to the world_size?
Beta Was this translation helpful? Give feedback.
All reactions