Llm-D
Visit Toolllm-d optimizes AI inference on Kubernetes, delivering state-of-the-art performance. It offers features like reproducible benchmarks, KV offloading, and LoRA routing for efficient and scalable AI deployments.
At a glance
Trending
llm-d optimizes AI inference on Kubernetes, delivering state-of-the-art performance. It offers features like reproducible benchmarks, KV offloading, and LoRA routing for efficient and scalable AI deployments.
Trending
About
llm-d is a tool designed to enhance the inference performance of AI models when deployed on modern accelerators within a Kubernetes environment. It provides several key features to achieve this, including reproducible benchmark workflows that allow for consistent performance evaluation. The tool also incorporates hierarchical KV offloading and cache-aware LoRA routing, which are crucial for optimizing memory usage and data access during inference. Furthermore, llm-d supports active-active High Availability (HA) and scale-to-zero autoscaling, ensuring both reliability and cost-efficiency for AI inference workloads.
Capabilities
Pricing & Plans
unknown
Free
FAQs
Trending