RLM ♥ GEPA: You can use RLMs to improve RLMs with GEPA
Reinforcement Learning: An Overview
We’re introducing HALO 😇
Hierarchal Agent Loop Optimizer
HALO is an RLM-based agent optimization technique capable of recursively self-improving agents by analyzing their execution traces and suggesting changes.
This work is inspired by the Mismanaged Genius Hypothesis
The State of Reinforcement Learning for LLM Reasoning
Understanding GRPO and New Insights from Reasoning Model Papers
Reinforcement Learning (RL) Guide | Unsloth Documentation
Learn all about Reinforcement Learning (RL) and how to train your own DeepSeek-R1 reasoning model with Unsloth using GRPO. A complete guide from beginner to advanced.
Advanced: Reinforcement Learning, Kernels, Reasoning, Quantization & Agents AIE 2025
➤ Check out our updated Reinforcement Learning guide!
LangGraph Rollout: Evolving VeRL’s Multi-Turn Capabilities for Agent RL
After completing our multi-turn tokenization and masking refactoring, we eliminated a critical bottleneck that was preventing us from building a more consistent and flexible rollout system for our Agent RL research. This breakthrough enabled us to implement a LangGraph-based rollout for VeRL in just a few days, which we’ve already successfully deployed in our Agent RL experiments. In this article, I’ll share our journey from VeRL’s native multi-turn implementation to our new LangGraph-based solution, explaining both the motivations driving this evolution and the technical details of our implementation.