Smoothquant
Visit ToolSmoothQuant is an open-source tool for accurate and efficient post-training quantization of large language models. It enables INT8 model inference and W8A8 quantization with minimal loss for various LLMs.
At a glance
Trending
SmoothQuant is an open-source tool for accurate and efficient post-training quantization of large language models. It enables INT8 model inference and W8A8 quantization with minimal loss for various LLMs.
Trending
About
SmoothQuant is an open-source project from MIT Han Lab, designed to provide accurate and efficient post-training quantization for large language models (LLMs). This tool enables INT8 model inference and W8A8 quantization, significantly reducing the memory footprint and computational cost of LLMs like Llama, Falcon, Mistral, and Mixtral, all while maintaining minimal loss in accuracy. It is particularly valuable for developers and researchers working with large models who need to optimize performance for deployment on resource-constrained hardware or to achieve faster inference speeds. The project is available on GitHub, making it accessible for community contributions and widespread adoption in AI development.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending