Chitu
Visit ToolChitu is a high-performance inference framework for large language models, focusing on efficiency, flexibility, and availability. It supports diverse hardware and scales from single GPU to large clusters.
At a glance
Trending
Chitu is a high-performance inference framework for large language models, focusing on efficiency, flexibility, and availability. It supports diverse hardware and scales from single GPU to large clusters.
Trending
About
Chituγθ΅€ε γis a high-performance inference framework designed for large language models, emphasizing efficiency, flexibility, and availability. Positioned as a "production-grade large model inference engine," Chitu addresses the progressive needs of enterprise AI deployment, from small-scale experiments to large-scale operations. It offers diverse computing power adaptation, supporting not only various NVIDIA products but also optimized support for domestic chips. The framework provides scalable solutions for all scenarios, ranging from pure CPU deployment and single GPU deployment to large-scale cluster deployments. Chitu is built for long-term stable operation, capable of handling concurrent business traffic in actual production environments. It supports models like DeepSeek, Qwen, GLM, and Kimi, and offers features such as FP4 to FP8/BF16 efficient operators and CPU+GPU heterogeneous mixed inference.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending