CTranslate2
Visit ToolCTranslate2 is an AI inference engine that accelerates Transformer models. It provides fast and efficient execution on CPU and GPU for various NLP tasks, significantly reducing memory usage.
At a glance
Trending
CTranslate2 is an AI inference engine that accelerates Transformer models. It provides fast and efficient execution on CPU and GPU for various NLP tasks, significantly reducing memory usage.
Trending
About
CTranslate2 is a C++ and Python library designed for efficient inference with Transformer models. It implements a custom runtime that applies numerous performance optimization techniques, such as weights quantization, layers fusion, and batch reordering, to accelerate and reduce the memory usage of Transformer models on both CPU and GPU. The library supports a wide range of encoder-decoder, decoder-only, and encoder-only models, including T5, Gemma, GPT-2, Llama, BERT, and more. It includes converters for popular frameworks like OpenNMT-py, Fairseq, and Transformers, making it production-oriented with backward compatibility guarantees. Key features include support for reduced precision weights (FP16, BF16, INT16, INT8, AWQ INT4), multiple CPU architectures with automatic detection, parallel and asynchronous execution, and dynamic memory usage.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending