Exllamav3
Visit ToolExLlamaV3 is an optimized quantization and inference library for running LLMs locally on modern consumer GPUs. It features a new EXL3 quantization format based on QTIP and supports flexible tensor-parallel inference.
At a glance
Trending
Also listed in