Moreh
Visit ToolMoreh is a DevOps & Infrastructure tool that optimizes LLM inference on various accelerators. It provides full-stack software for AMD GPUs, Tenstorrent chips, and heterogeneous GPU clusters.
At a glance
Trending
Moreh is a DevOps & Infrastructure tool that optimizes LLM inference on various accelerators. It provides full-stack software for AMD GPUs, Tenstorrent chips, and heterogeneous GPU clusters.
Trending
About
Moreh offers full-stack inference software designed to unlock peak LLM inference performance across a range of hardware, including AMD GPUs, Tenstorrent chips, and heterogeneous GPU clusters. Its MoAI Inference Framework handles routing, scheduling, auto-scaling, and SLO-driven optimization, while Moreh vLLM provides state-of-the-art model optimization, quantization, and graph execution. The platform also includes native vLLM Moreh Libraries with custom kernels for GEMM/Attention/MoE and communication. Moreh aims to unify GPUs across vendors and generations, maximize tokens per dollar through chip-level and cluster-level optimization, and significantly reduce inference costs and latency, as demonstrated by benchmarks showing substantial improvements over existing solutions.
Capabilities
Pricing & Plans
Likely Not Free
Not publicly disclosed. Check moreh.io for current pricing.
FAQs
Trending