Megatron LM
Visit ToolMegatron LM is a GPU-optimized library for training large transformer models at scale. It provides building blocks and pre-configured scripts for distributed training of models with billions of parameters.
At a glance
Trending
Megatron LM is a GPU-optimized library for training large transformer models at scale. It provides building blocks and pre-configured scripts for distributed training of models with billions of parameters.
Trending
About
Megatron-LM is an NVIDIA-developed, GPU-optimized library designed for training large transformer models at scale. It comprises two main components: Megatron-LM, which offers pre-configured training scripts for research teams and quick experimentation, and Megatron Core, a composable library providing GPU-optimized building blocks for custom training frameworks. Megatron Core includes transformer building blocks, advanced parallelism strategies (TP, PP, DP, EP, CP), mixed precision support (FP16, BF16, FP8, FP4), and various model architectures. It's ideal for framework developers and ML engineers building custom training pipelines. The library also features Megatron Bridge for bidirectional Hugging Face ↔ Megatron checkpoint conversion, ensuring interoperability and production-ready recipes. It supports training models from 2B to 462B parameters across thousands of GPUs, achieving high Model FLOP Utilization (MFU).
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending