MOSS-RLHF
Visit ToolMOSS-RLHF is an open-source research tool that explores the secrets of Reinforcement Learning from Human Feedback (RLHF) in large language models. It implements the PPO algorithm and provides code for training reward models and policy models.
At a glance
Trending