Fastllm
Visit Toolfastllm is a Coding & Development tool that provides a high-performance, dependency-free inference library for large language models. It supports tensor parallel inference for dense and MOE models on various GPUs.
At a glance
Trending