Search Reinforcement Learning

Found 7 bookmarks

Newest

Reinforcement Learning: An Overview

·arxiv.org·May 3, 2026

We’re introducing HALO 😇

Hierarchal Agent Loop Optimizer HALO is an RLM-based agent optimization technique capable of recursively self-improving agents by analyzing their execution traces and suggesting changes. This work is inspired by the Mismanaged Genius Hypothesis

·x.com·Apr 30, 2026

We’re introducing HALO 😇

RLM ♥ GEPA: You can use RLMs to improve RLMs with GEPA

·x.com·Apr 25, 2026

RLM ♥ GEPA: You can use RLMs to improve RLMs with GEPA

The State of Reinforcement Learning for LLM Reasoning

Understanding GRPO and New Insights from Reasoning Model Papers

·magazine.sebastianraschka.com·Jul 30, 2025

The State of Reinforcement Learning for LLM Reasoning

Reinforcement Learning (RL) Guide | Unsloth Documentation

Learn all about Reinforcement Learning (RL) and how to train your own DeepSeek-R1 reasoning model with Unsloth using GRPO. A complete guide from beginner to advanced.

·docs.unsloth.ai·Jul 22, 2025

Reinforcement Learning (RL) Guide | Unsloth Documentation

Advanced: Reinforcement Learning, Kernels, Reasoning, Quantization & Agents AIE 2025

➤ Check out our updated Reinforcement Learning guide!

·docs.google.com·Jul 22, 2025

Advanced: Reinforcement Learning, Kernels, Reasoning, Quantization & Agents AIE 2025

LangGraph Rollout: Evolving VeRL’s Multi-Turn Capabilities for Agent RL

After completing our multi-turn tokenization and masking refactoring, we eliminated a critical bottleneck that was preventing us from building a more consistent and flexible rollout system for our Agent RL research. This breakthrough enabled us to implement a LangGraph-based rollout for VeRL in just a few days, which we’ve already successfully deployed in our Agent RL experiments. In this article, I’ll share our journey from VeRL’s native multi-turn implementation to our new LangGraph-based solution, explaining both the motivations driving this evolution and the technical details of our implementation.

·jybsuper.github.io·Jul 6, 2025

LangGraph Rollout: Evolving VeRL’s Multi-Turn Capabilities for Agent RL