Mirage
Visit ToolMirage is a coding & development tool that compiles LLMs into high-performance megakernels. It reduces LLM inference latency by automatically fusing GPU operations.
At a glance
Trending
Mirage is a coding & development tool that compiles LLMs into high-performance megakernels. It reduces LLM inference latency by automatically fusing GPU operations.
Trending
About
Mirage Persistent Kernel (MPK) is a compiler and runtime system designed to optimize large language model (LLM) inference by transforming it into a single, high-performance megakernel. This end-to-end GPU fusion approach significantly reduces LLM inference latency, offering improvements of 1.2× to 6.7× with minimal developer effort. MPK allows users to compile LLMs from the Hugging Face model zoo into a megakernel using Python, abstracting away complex CUDA/Triton programming. It provides an API for instantiating persistent kernels, attaching existing PyTorch tensors, and defining computation graphs by chaining fused operations. The tool is open source and aims to simplify the deployment of efficient LLM inference.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending