Ktransformers

Visit Tool

ktransformers is an open-source framework for optimizing large language model inference and fine-tuning. It leverages CPU-GPU heterogeneous computing for enhanced efficiency and performance.

Claim this tool

1View

At a glance

Pricing

Open Source

Free tier

Yes

API

Yes

Skill level

Technical

About

What is ktransformers?

KTransformers is an open-source research project focused on efficient inference and fine-tuning of large language models (LLMs) through CPU-GPU heterogeneous computing. It comprises two core modules: kt-kernel for high-performance inference kernels and kt-sft for a fine-tuning framework. kt-kernel offers CPU-optimized operations with AMX/AVX acceleration, MoE optimization, and quantization support (INT4/INT8 CPU, GPTQ GPU), with easy integration via Python API. kt-sft integrates with LLaMA-Factory for resource-efficient fine-tuning of ultra-large MoE models, supporting LoRA and production-ready features like chat and batch inference. The framework is designed for researchers and engineers working to optimize LLM performance on diverse hardware configurations.

Best used for

Ideal for developers and researchers who need to optimize large language model inference, fine-tune MoE models efficiently, and leverage heterogeneous computing. Especially valuable for achieving high throughput and reduced memory usage on diverse hardware configurations, including CPU and GPU.

Common actions

optimize LLM inference

fine-tune LLMs

accelerate AI models

manage heterogeneous computing

github copilotface swappingautomated workflow"AI Agents"open-sourcecollaborationworkflowslow-code/no-codedeepfake

Capabilities

Key features

CPU-GPU heterogeneous computing
AMX/AVX acceleration
MoE inference optimization
INT4/INT8 quantization support
LLaMA-Factory integration
LoRA fine-tuning
Python API for integration

Target Audience

developer

Integrations

llama-factorysg-lang

Pricing & Plans

Open Source

Free

FAQs

What kind of LLMs does KTransformers support for optimization?

KTransformers supports a wide range of LLMs, including DeepSeek-R1, DeepSeek-V3, GLM-5, MiniMax-M2.5, Kimi-K2.5, Qwen3-Next, and various LLaMA models. It continuously updates to support new and emerging large language models for both inference and fine-tuning.

What hardware configurations does KTransformers optimize for?

KTransformers is designed for CPU-GPU heterogeneous computing, optimizing performance across Intel AMX/AVX CPUs, various GPUs (NVIDIA, Intel Arc, AMD ROCm), and NPUs like Ascend. It also supports multi-GPU setups and different precision levels like BF16, FP8, and INT4/INT8 quantization.

How does KTransformers help with fine-tuning large MoE models?

KTransformers integrates with LLaMA-Factory via its kt-sft module to enable resource-efficient fine-tuning of ultra-large Mixture-of-Experts (MoE) models. It supports LoRA fine-tuning and can significantly reduce GPU memory requirements, for example, fine-tuning a 671B DeepSeek-V3 with only 70GB GPU memory.

Trending

Subcategories trending in Coding & Development

Open Source & Models Code Assistants No-Code / Low-Code Testing & QA Backend & APIs Prompt Engineering

Trending

Also listed in

This tool also appears in

AI Agents & Automation › AI Frameworks & Infra

Explore

Browse AI tools by category

Content & Design Productivity & Business Coding & Development AI Agents & Automation Research & Education Wellness & Lifestyle Career Development Marketing & Growth Data & Analytics Customer Support & CX Finance E-commerce