You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In your roadmap, you mentioned the planning for sequence parallelism, specifically the intention to implement ring-attention as part of sequence parallelism. I suggest you consider implementing Unified Sequence Parallel (USP), which combines Ulysses and Ring into a 2D sequence parallelism approach. USP offers better performance compared to using Ring or Ulysses alone.
The code we developed has been widely applied in large language models (LLM) and DiT long sequence training and inference scenarios. You can check our code at the following link:
In your roadmap, you mentioned the planning for sequence parallelism, specifically the intention to implement ring-attention as part of sequence parallelism. I suggest you consider implementing Unified Sequence Parallel (USP), which combines Ulysses and Ring into a 2D sequence parallelism approach. USP offers better performance compared to using Ring or Ulysses alone.
The code we developed has been widely applied in large language models (LLM) and DiT long sequence training and inference scenarios. You can check our code at the following link:
https://github.com/feifeibear/long-context-attention
For a detailed technical report, please refer to:
https://arxiv.org/abs/2405.07719
I hope this information is helpful to you, and I look forward to your team considering this suggestion.
The text was updated successfully, but these errors were encountered: