MoBA
Visit ToolMoBA is an open-source research tool that introduces Mixture of Block Attention for long-context LLMs. It enables efficient processing of long sequences by allowing query tokens to attend to relevant KV blocks.
At a glance
Trending
MoBA is an open-source research tool that introduces Mixture of Block Attention for long-context LLMs. It enables efficient processing of long sequences by allowing query tokens to attend to relevant KV blocks.
Trending
About
MoBA (Mixture of Block Attention) is an innovative open-source approach designed to enhance the efficiency of Large Language Models (LLMs) when processing long contexts. It addresses the quadratic computational complexity of traditional attention mechanisms by dividing the full context into blocks. Each query token learns to attend to the most relevant KV blocks, utilizing a parameter-less top-k gating mechanism to select informative blocks. This allows for seamless transitions between full and sparse attention modes, offering flexibility and efficiency without compromising performance. MoBA has been deployed to support Kimi’s long-context requests and requires continued training of existing models to achieve its acceleration benefits, making it a valuable tool for researchers and developers working on advanced LLM architectures.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending