LMCache
Visit ToolLMCache is an open-source AI Frameworks & Infra tool that supercharges LLM performance by providing a fast KV cache layer. It reduces TTFT and increases throughput, especially in long-context scenarios.
At a glance
Trending
LMCache is an open-source AI Frameworks & Infra tool that supercharges LLM performance by providing a fast KV cache layer. It reduces TTFT and increases throughput, especially in long-context scenarios.
Trending
About
LMCache is an open-source library designed to accelerate Large Language Model (LLM) performance by acting as a high-speed Key-Value (KV) cache layer. It significantly reduces Time To First Token (TTFT) and boosts throughput, particularly beneficial in scenarios involving long contexts. LMCache achieves this by storing and reusing KV caches of texts across various storage tiers like GPU, CPU, Disk, and even S3, utilizing advanced acceleration techniques such as zero CPU copy and GDS. It integrates seamlessly with popular LLM serving engines like vLLM and SGLang, offering features like high-performance CPU KVCache offloading and disaggregated prefill. This allows developers to achieve substantial delay savings and GPU cycle reductions in diverse LLM use cases, including multi-round QA and RAG.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending