Multimodal RAG

16 bookmarks

Newest

Memory-enhanced Retrieval Augmentation for Long Video Understanding

Paper #video-rag

·arxiv.org·Apr 29, 2026

Memory-enhanced Retrieval Augmentation for Long Video Understanding

Video-RAG: Training-Free Retrieval for Long-Video LVLMs

Learn how Video-RAG boosts training-free and low-compute long-video understanding by pairing OCR, ASR, and open-vocabulary detection with any long-video LVLMs.

Guide #video-rag

·learnopencv.com·Apr 29, 2026

Video-RAG: Training-Free Retrieval for Long-Video LVLMs

Nexa AI

18 likes, 2 comments. "How to build local Multimodal RAG with Qwen3-VL | by NEXA Community member"

Guide

·youtube.com·Apr 29, 2026

Nexa AI

Video Understanding - Qwen3-VL

Video OCR, long video understanding, and video grounding

Guide #video-rag

·mintlify.com·Apr 29, 2026

Video Understanding - Qwen3-VL

Video-RAG: Training-Free Retrieval for Long-Video LVLMs

Learn how Video-RAG boosts training-free and low-compute long-video understanding by pairing OCR, ASR, and open-vocabulary detection with any long-video LVLMs.

Guide #video-rag

·learnopencv.com·Apr 29, 2026

Video-RAG: Training-Free Retrieval for Long-Video LVLMs

How to grep video

Notes on context engineering and agent harnesses for video libraries: designing the structured representations to make media legible to LLMs. Since we started joining meetings from our computers, video has become the default way that organizations capture what happens at work. We’re at the point now where recording things

Guide #video-rag

·blog.cloudglue.dev·Apr 24, 2026

How to grep video

Multimodal Embeddings and RAG: A Practical Guide | Weaviate

Multimodal embeddings allow AI systems to search and reason across text, images, audio, and video in their native formats. This blog covers the key intuitions behind how this all works and walks through three practical implementations using Weaviate and Gemini.

Tutorial

·weaviate.io·Apr 1, 2026

Multimodal Embeddings and RAG: A Practical Guide | Weaviate

From Text-RAG to Vision-RAG w/ VP Search @ Cohere

Visual RAG expands AI's ability to understand and utilize charts, graphs, and images, a critical skill as 65% of people are visual learners. Mastering this technology allows you to build truly multimodal AI systems that can reason about visual data, giving you a competitive edge in enterprise AI development and opening new possibilities for data-driven applications.

Concept

·maven.com·May 17, 2025

From Text-RAG to Vision-RAG w/ VP Search @ Cohere

How to pass multimodal data to models | 🦜️🔗 LangChain

Here we demonstrate how to pass multimodal input directly to models.

Tutorial

·python.langchain.com·May 5, 2025

How to pass multimodal data to models | 🦜️🔗 LangChain

recipes/weaviate-features/multi-vector/multi-vector-colipali-rag.ipynb at main · weaviate/recipes

This repository shares end-to-end notebooks on how to use various Weaviate features and integrations! - weaviate/recipes

Example #Embedding #vector-similarity

·github.com·Apr 23, 2025

recipes/weaviate-features/multi-vector/multi-vector-colipali-rag.ipynb at main · weaviate/recipes

Cohere Embed v4 - Getting started - PDF Search.ipynb - Colab

Colab notebook

Example #embedding #cohere #semantic-search #vector-similarity

·colab.research.google.com·Apr 20, 2025

Cohere Embed v4 - Getting started - PDF Search.ipynb - Colab

Multi-vector embeddings (ColBERT, ColPali, etc.) | Weaviate

Learn how to use multi-vector embeddings in Weaviate.

Concept #embedding #colpali #semantic-search #vector-similarity

·weaviate.io·Apr 19, 2025

Multi-vector embeddings (ColBERT, ColPali, etc.) | Weaviate

cohere-developer-experience/notebooks at main · cohere-ai/cohere-developer-experience

Docs, Snippets, Guides. Contribute to cohere-ai/cohere-developer-experience development by creating an account on GitHub.

Example #cohere #embedding

·github.com·Apr 17, 2025

cohere-developer-experience/notebooks at main · cohere-ai/cohere-developer-experience

An Overview of Late Interaction Retrieval Models: ColBERT, ColPali, and ColQwen

Late interaction allow for semantically rich interactions that enable a precise retrieval process across different modalities of unstructured data, including text and images.

In this context, “interaction” refers to the process of assessing how well a document matches a given search query by comparing their representations.

A dense retrieval model is a model that uses some type of neural network architecture to retrieve relevant documents for a search query.

Traditional methods for retrieval commonly use “no-interaction” retrieval models. In this case, the search query and documents are processed separately

Advantages of no-interaction retrieval models are primarily that they are fast and computationally efficient

These characteristics make full interaction models great for second-stage retrieval, like reranking a curated set of candidate documents

extremely computationally expensive

contextually rich

scalable and contextually rich

storage requirements - they require an embedding for each token, which requires a lot more storage for a complete set of vectors

Disadvantages of no-interaction retrieval models lie in the lack of interaction between the search query and the documents.

multimodal late interaction retrieval models

vision language models (VLMs) instead of text-only models

Concept

·weaviate.io·Apr 10, 2025

An Overview of Late Interaction Retrieval Models: ColBERT, ColPali, and ColQwen

Ok, I’ll bite: What’s ColPali?

(And why should anyone working with RAG over PDFs care?) ColPali makes information retrieval from complex document types - like PDFs - easier. Information retrieval from PDFs is hard because they contain various components: Text, images, tables,… — Leonie (@helloiamleonie)

Concept #colpali

·x.com·Nov 2, 2024

Ok, I’ll bite: What’s ColPali?

ColPali is changing the game for PDF retrieval by eliminating the need for OCR and chunking methods 🚀

Inspired by ColBERT’s success with text, ColPali splits an image of a document into patches, which are then processed through a vision LLM called PaliGemma. The embeddings for… — Victoria Slocum (@victorialslocum)

Concept #embedding #colpali

·x.com·Nov 2, 2024

ColPali is changing the game for PDF retrieval by eliminating the need for OCR and chunking methods 🚀