Medusa
Visit ToolMedusa is an open-source framework for accelerating LLM generation using multiple decoding heads. It significantly speeds up LLM inference, offering up to 3.6x faster generation with parameter-efficient training.
At a glance
Trending