Gpu_poor
Visit Toolgpu_poor calculates GPU memory and token generation speed for any LLM and GPU/CPU. It supports various quantization methods and inference frameworks to optimize performance.
At a glance
Trending
gpu_poor calculates GPU memory and token generation speed for any LLM and GPU/CPU. It supports various quantization methods and inference frameworks to optimize performance.
Trending
About
gpu_poor is a specialized tool designed to calculate the GPU memory requirements and token generation speed for any Large Language Model (LLM) across various GPU and CPU configurations. It provides detailed breakdowns of memory usage, including KV Cache, Model Size, Activation Memory, and overheads. The tool supports popular quantization methods like GGML, bitsandbytes, and QLoRA, as well as inference frameworks such as vLLM, llama.cpp, and Hugging Face. It also approximates finetuning times and helps users determine optimal context lengths and batch sizes, making it invaluable for optimizing LLM deployment and training.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending