Reinforcement Learning: An Overview
We’re introducing HALO 😇
Hierarchal Agent Loop Optimizer
HALO is an RLM-based agent optimization technique capable of recursively self-improving agents by analyzing their execution traces and suggesting changes.
This work is inspired by the Mismanaged Genius Hypothesis
Exploring GEPA
RLM ♥ GEPA: You can use RLMs to improve RLMs with GEPA
Learning DSPy (3): Working with optimizers
A walkthrough of using the bootstrap fewshot and GEPA optimizers in DSPy
Self-Evolving Agents - A Cookbook for Autonomous Agent Retraining
Agentic systems often reach a plateau after proof-of-concept because they depend on humans to diagnose edge cases and correct failures. T...
Prompt Learning Playbook
The State of Reinforcement Learning for LLM Reasoning
Understanding GRPO and New Insights from Reasoning Model Papers
Understanding RAG vs Fine-Tuning
Discover the key differences between RAG and fine-tuning, what each approach can bring, and how to choose the right AI approach for your business goals.
Reinforcement Learning (RL) Guide | Unsloth Documentation
Learn all about Reinforcement Learning (RL) and how to train your own DeepSeek-R1 reasoning model with Unsloth using GRPO. A complete guide from beginner to advanced.
Advanced: Reinforcement Learning, Kernels, Reasoning, Quantization & Agents AIE 2025
➤ Check out our updated Reinforcement Learning guide!
LangGraph Rollout: Evolving VeRL’s Multi-Turn Capabilities for Agent RL
After completing our multi-turn tokenization and masking refactoring, we eliminated a critical bottleneck that was preventing us from building a more consistent and flexible rollout system for our Agent RL research. This breakthrough enabled us to implement a LangGraph-based rollout for VeRL in just a few days, which we’ve already successfully deployed in our Agent RL experiments. In this article, I’ll share our journey from VeRL’s native multi-turn implementation to our new LangGraph-based solution, explaining both the motivations driving this evolution and the technical details of our implementation.
Fine-tune ModernBERT for RAG with Synthetic Data
A Blog post by Sara Han Díaz on Hugging Face
Training and Finetuning Sparse Embedding Models with Sentence Transformers v5
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
An Engineer's Guide to Fine-Tuning LLMs, Part 2: The Execution Playbook
A deep dive into the methods, fine-tuning pipeline, and operational risks of building specialised models.
LoRA Hyperparameters Guide | Unsloth Documentation
Best practices for LoRA hyperparameters and how they affect the fine-tuning process.
fine-tuning-magistral.ipynb · kingabzpro/Magistral-Small-Medical-QA at main
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Fine-Tuning Magistral: A Step-By-Step Guide
Step-by-step guide to fine-tuning the Mistral reasoning model on a medical MCQs dataset using the Transformers framework.
AIEBootcamp/09_Finetuning_Embeddings/Fine_tuning_Embedding_Models_for_RAG_using_RAGAS.ipynb at main · apatti/AIEBootcamp
AI Engineering bootcamp. Contribute to apatti/AIEBootcamp development by creating an account on GitHub.
"regular people don't fine-tune VLMs"
but wtf not?
- skill gap
- high fine-tuning costs
- lack of standards and unified approaches
over the past few weeks I've been working on maestro - streamlined tool for VLM fine-tuning
link:
— SkalskiP (@skalskip92)
🚀 Getting Started — Oumi
Fine Tune DeepSeek R1 | Build a Medical Chatbot
In this video, we show you how to fine-tune DeepSeek R1, an open-source reasoning model, using LoRA (Low-Rank Adaptation). We'll also be using Kaggle, Hugging Face and Weights & Biases. We walk you through data preparation, model configuration, and optimization, including advanced techniques like four-bit quantization for efficient training on consumer GPUs.
By the end of this tutorial, you’ll be equipped with the skills to customize DeepSeek R1 for your own specialized tasks, such as medical reasoning.
🔗 Resources & Tutorials
Kaggle Notebook: https://www.kaggle.com/code/aan1994/fine-tuning-deepseek-r1-reasoning-model-youtube
How Transformers Work: https://www.datacamp.com/tutorial/how-transformers-work
Fine-Tuning DeepSeek R1 Reasoning Model: https://www.datacamp.com/tutorial/fine-tuning-deepseek-r1-reasoning-model
DeepSeek R1 Blog Overview: https://www.datacamp.com/blog/deepseek-r1
Understanding Janus Pro: https://www.datacamp.com/blog/janus-pro
DeepSeek R1 Project Walkthrough: https://www.datacamp.com/tutorial/deepseek-r1-project
DeepSeek vs ChatGPT: https://www.datacamp.com/blog/deepseek-vs-chatgpt
Qwen-2.5 MAX Model: https://www.datacamp.com/blog/qwen-2-5-max
DeepSeek R1 Ollama Tutorial: https://www.datacamp.com/tutorial/deepseek-r1-ollama
📕 Chapters
00:00 Introduction
00:30 Why Fine-Tuning DeepSeek Matters
02:30 LoRA Explained with a PS5 Factory Analogy
05:20 Tools & Setup Overview
09:00 Loading DeepSeek R1 Model and Tokenizer
16:10 Formatting Data for Fine-Tuning
23:00 Applying LoRA for Efficient Updates
34:00 Configuring Training Parameters
43:15 Running the Fine-Tuning Process on Kaggle
46:00 Comparing Model Performance After Fine-Tuning
47:50 Final Thoughts on Future Models
📱 Follow Us on Social Media
Facebook: https://www.facebook.com/datacampinc/
Twitter: https://twitter.com/datacamp
LinkedIn: https://www.linkedin.com/school/datacampinc/
Instagram: https://www.instagram.com/datacamp/
#deepseek #DeepSeekR1 #FineTuningAI #LearnAI #MachineLearning #Transformers #HuggingFace #Kaggle #WeightsAndBiases #LoRA #LargeLanguageModels #DeepSeekTutorial #AIResearch #AIOptimization #DataScience
transformerlab/transformerlab-app: Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.
Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer. - transformerlab/transformerlab-app
Alex Strick van Linschoten - How to think about creating a dataset for LLM finetuning evaluation
I summarise the kinds of evaluations that are needed for a structured data generation task.
Prompt Tuning With PEFT. - Hugging Face Open-Source AI Cookbook
We’re on a journey to advance and democratize artificial intelligence through open source and open science.