Safe-Rlhf
Visit Toolsafe-rlhf is an open-source framework for Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback. It provides a reproducible code pipeline for alignment research, supporting SFT, RLHF, and Safe RLHF training methods.
At a glance
Trending
Also listed in