OpenRLHF/OpenRLHF
stableAn Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)
large-language-modelsproximal-policy-optimizationraylibreinforcement-learningreinforcement-learning-from-human-feedbacktransformersvisual-language-modelsvllm