Gptq
Visit ToolGPTQ is an open-source code implementation for accurate post-training quantization of generative pretrained transformers. It allows compressing models from the OPT and BLOOM families to 2/3/4 bits, including an efficient implementation of the GPTQ algorithm.
At a glance
Trending