Improvement Technique

·arxiv.org·May 3, 2026

We’re introducing HALO 😇

Hierarchal Agent Loop Optimizer HALO is an RLM-based agent optimization technique capable of recursively self-improving agents by analyzing their execution traces and suggesting changes. This work is inspired by the Mismanaged Genius Hypothesis

·x.com·Apr 30, 2026

We’re introducing HALO 😇

Exploring GEPA

·x.com·Apr 27, 2026

Exploring GEPA

RLM ♥ GEPA: You can use RLMs to improve RLMs with GEPA

RLM ♥ GEPA: You can use RLMs to improve RLMs with GEPA

·x.com·Apr 25, 2026

Learning DSPy (3): Working with optimizers

A walkthrough of using the bootstrap fewshot and GEPA optimizers in DSPy

Learning DSPy (3): Working with optimizers

·thedataquarry.com·Dec 2, 2025

Self-Evolving Agents - A Cookbook for Autonomous Agent Retraining

Agentic systems often reach a plateau after proof-of-concept because they depend on humans to diagnose edge cases and correct failures. T...

Self-Evolving Agents - A Cookbook for Autonomous Agent Retraining

·cookbook.openai.com·Nov 11, 2025

Prompt Learning Playbook

·up.raindrop.io·Nov 10, 2025

Prompt Learning Playbook

The State of Reinforcement Learning for LLM Reasoning

Understanding GRPO and New Insights from Reasoning Model Papers

The State of Reinforcement Learning for LLM Reasoning

·magazine.sebastianraschka.com·Jul 30, 2025

Understanding RAG vs Fine-Tuning

Discover the key differences between RAG and fine-tuning, what each approach can bring, and how to choose the right AI approach for your business goals.

Understanding RAG vs Fine-Tuning

·cohere.com·Jul 24, 2025

Reinforcement Learning (RL) Guide | Unsloth Documentation

Learn all about Reinforcement Learning (RL) and how to train your own DeepSeek-R1 reasoning model with Unsloth using GRPO. A complete guide from beginner to advanced.

Reinforcement Learning (RL) Guide | Unsloth Documentation

·docs.unsloth.ai·Jul 22, 2025

Advanced: Reinforcement Learning, Kernels, Reasoning, Quantization & Agents AIE 2025

➤ Check out our updated Reinforcement Learning guide!

Advanced: Reinforcement Learning, Kernels, Reasoning, Quantization & Agents AIE 2025

·docs.google.com·Jul 22, 2025

LangGraph Rollout: Evolving VeRL’s Multi-Turn Capabilities for Agent RL

After completing our multi-turn tokenization and masking refactoring, we eliminated a critical bottleneck that was preventing us from building a more consistent and flexible rollout system for our Agent RL research. This breakthrough enabled us to implement a LangGraph-based rollout for VeRL in just a few days, which we’ve already successfully deployed in our Agent RL experiments. In this article, I’ll share our journey from VeRL’s native multi-turn implementation to our new LangGraph-based solution, explaining both the motivations driving this evolution and the technical details of our implementation.

LangGraph Rollout: Evolving VeRL’s Multi-Turn Capabilities for Agent RL

·jybsuper.github.io·Jul 6, 2025

Fine-tune ModernBERT for RAG with Synthetic Data

A Blog post by Sara Han Díaz on Hugging Face

Fine-tune ModernBERT for RAG with Synthetic Data

·huggingface.co·Jul 3, 2025

Training and Finetuning Sparse Embedding Models with Sentence Transformers v5

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Fine-Tuning #embedding #transformer

·huggingface.co·Jul 2, 2025

Training and Finetuning Sparse Embedding Models with Sentence Transformers v5

An Engineer's Guide to Fine-Tuning LLMs, Part 2: The Execution Playbook

A deep dive into the methods, fine-tuning pipeline, and operational risks of building specialised models.

An Engineer's Guide to Fine-Tuning LLMs, Part 2: The Execution Playbook

·open.substack.com·Jun 30, 2025

LoRA Hyperparameters Guide | Unsloth Documentation

Best practices for LoRA hyperparameters and how they affect the fine-tuning process.

LoRA Hyperparameters Guide | Unsloth Documentation

·docs.unsloth.ai·Jun 24, 2025

fine-tuning-magistral.ipynb · kingabzpro/Magistral-Small-Medical-QA at main

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Magistral

·huggingface.co·Jun 19, 2025

fine-tuning-magistral.ipynb · kingabzpro/Magistral-Small-Medical-QA at main

Fine-Tuning Magistral: A Step-By-Step Guide

Step-by-step guide to fine-tuning the Mistral reasoning model on a medical MCQs dataset using the Transformers framework.

Magistral

·datacamp.com·Jun 19, 2025

Fine-Tuning Magistral: A Step-By-Step Guide

AIEBootcamp/09_Finetuning_Embeddings/Fine_tuning_Embedding_Models_for_RAG_using_RAGAS.ipynb at main · apatti/AIEBootcamp

AI Engineering bootcamp. Contribute to apatti/AIEBootcamp development by creating an account on GitHub.

AIEBootcamp/09_Finetuning_Embeddings/Fine_tuning_Embedding_Models_for_RAG_using_RAGAS.ipynb at main · apatti/AIEBootcamp

·github.com·Feb 24, 2025

"regular people don't fine-tune VLMs"

but wtf not? - skill gap - high fine-tuning costs - lack of standards and unified approaches over the past few weeks I've been working on maestro - streamlined tool for VLM fine-tuning link: — SkalskiP (@skalskip92)

VLM

·x.com·Feb 6, 2025

"regular people don't fine-tune VLMs"

🚀 Getting Started — Oumi

·oumi.ai·Feb 5, 2025

🚀 Getting Started — Oumi

Fine Tune DeepSeek R1 | Build a Medical Chatbot

In this video, we show you how to fine-tune DeepSeek R1, an open-source reasoning model, using LoRA (Low-Rank Adaptation). We'll also be using Kaggle, Hugging Face and Weights & Biases. We walk you through data preparation, model configuration, and optimization, including advanced techniques like four-bit quantization for efficient training on consumer GPUs. By the end of this tutorial, you’ll be equipped with the skills to customize DeepSeek R1 for your own specialized tasks, such as medical reasoning. 🔗 Resources & Tutorials Kaggle Notebook: https://www.kaggle.com/code/aan1994/fine-tuning-deepseek-r1-reasoning-model-youtube How Transformers Work: https://www.datacamp.com/tutorial/how-transformers-work Fine-Tuning DeepSeek R1 Reasoning Model: https://www.datacamp.com/tutorial/fine-tuning-deepseek-r1-reasoning-model DeepSeek R1 Blog Overview: https://www.datacamp.com/blog/deepseek-r1 Understanding Janus Pro: https://www.datacamp.com/blog/janus-pro DeepSeek R1 Project Walkthrough: https://www.datacamp.com/tutorial/deepseek-r1-project DeepSeek vs ChatGPT: https://www.datacamp.com/blog/deepseek-vs-chatgpt Qwen-2.5 MAX Model: https://www.datacamp.com/blog/qwen-2-5-max DeepSeek R1 Ollama Tutorial: https://www.datacamp.com/tutorial/deepseek-r1-ollama 📕 Chapters 00:00 Introduction 00:30 Why Fine-Tuning DeepSeek Matters 02:30 LoRA Explained with a PS5 Factory Analogy 05:20 Tools & Setup Overview 09:00 Loading DeepSeek R1 Model and Tokenizer 16:10 Formatting Data for Fine-Tuning 23:00 Applying LoRA for Efficient Updates 34:00 Configuring Training Parameters 43:15 Running the Fine-Tuning Process on Kaggle 46:00 Comparing Model Performance After Fine-Tuning 47:50 Final Thoughts on Future Models 📱 Follow Us on Social Media Facebook: https://www.facebook.com/datacampinc/ Twitter: https://twitter.com/datacamp LinkedIn: https://www.linkedin.com/school/datacampinc/ Instagram: https://www.instagram.com/datacamp/ #deepseek #DeepSeekR1 #FineTuningAI #LearnAI #MachineLearning #Transformers #HuggingFace #Kaggle #WeightsAndBiases #LoRA #LargeLanguageModels #DeepSeekTutorial #AIResearch #AIOptimization #DataScience

Fine Tune DeepSeek R1 | Build a Medical Chatbot

·youtu.be·Feb 2, 2025

transformerlab/transformerlab-app: Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.

Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer. - transformerlab/transformerlab-app

transformerlab/transformerlab-app: Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.

·t.co·Feb 2, 2025

Alex Strick van Linschoten - How to think about creating a dataset for LLM finetuning evaluation

I summarise the kinds of evaluations that are needed for a structured data generation task.

Alex Strick van Linschoten - How to think about creating a dataset for LLM finetuning evaluation

·mlops.systems·Jun 28, 2024

Prompt Tuning With PEFT. - Hugging Face Open-Source AI Cookbook

We’re on a journey to advance and democratize artificial intelligence through open source and open science.