Marlin
Visit ToolMarlin is an Open Source AI tool that provides an FP16xINT4 LLM inference kernel. It achieves near-ideal ~4x speedups for LLM inference up to medium batch sizes of 16-32 tokens, making it suitable for larger-scale serving.
At a glance
Trending