Memory-enhanced Retrieval Augmentation for Long Video Understanding
Video-RAG: Training-Free Retrieval for Long-Video LVLMs
Learn how Video-RAG boosts training-free and low-compute long-video understanding by pairing OCR, ASR, and open-vocabulary detection with any long-video LVLMs.
Nexa AI
18 likes, 2 comments. "How to build local Multimodal RAG with Qwen3-VL | by NEXA Community member"
Video Understanding - Qwen3-VL
Video OCR, long video understanding, and video grounding
Video-RAG: Training-Free Retrieval for Long-Video LVLMs
Learn how Video-RAG boosts training-free and low-compute long-video understanding by pairing OCR, ASR, and open-vocabulary detection with any long-video LVLMs.
How to grep video
Notes on context engineering and agent harnesses for video libraries: designing the structured representations to make media legible to LLMs.
Since we started joining meetings from our computers, video has become the default way that organizations capture what happens at work. We’re at the point now where recording things
Multimodal Embeddings and RAG: A Practical Guide | Weaviate
Multimodal embeddings allow AI systems to search and reason across text, images, audio, and video in their native formats. This blog covers the key intuitions behind how this all works and walks through three practical implementations using Weaviate and Gemini.
From Text-RAG to Vision-RAG w/ VP Search @ Cohere
Visual RAG expands AI's ability to understand and utilize charts, graphs, and images, a critical skill as 65% of people are visual learners. Mastering this technology allows you to build truly multimodal AI systems that can reason about visual data, giving you a competitive edge in enterprise AI development and opening new possibilities for data-driven applications.
How to pass multimodal data to models | 🦜️🔗 LangChain
Here we demonstrate how to pass multimodal input directly to models.
recipes/weaviate-features/multi-vector/multi-vector-colipali-rag.ipynb at main · weaviate/recipes
This repository shares end-to-end notebooks on how to use various Weaviate features and integrations! - weaviate/recipes
Cohere Embed v4 - Getting started - PDF Search.ipynb - Colab
Colab notebook
Multi-vector embeddings (ColBERT, ColPali, etc.) | Weaviate
Learn how to use multi-vector embeddings in Weaviate.
cohere-developer-experience/notebooks at main · cohere-ai/cohere-developer-experience
Docs, Snippets, Guides. Contribute to cohere-ai/cohere-developer-experience development by creating an account on GitHub.
An Overview of Late Interaction Retrieval Models: ColBERT, ColPali, and ColQwen
Late interaction allow for semantically rich interactions that enable a precise retrieval process across different modalities of unstructured data, including text and images.
In this context, “interaction” refers to the process of assessing how well a document matches a given search query by comparing their representations.
A dense retrieval model is a model that uses some type of neural network architecture to retrieve relevant documents for a search query.
Traditional methods for retrieval commonly use “no-interaction” retrieval models. In this case, the search query and documents are processed separately
Advantages of no-interaction retrieval models are primarily that they are fast and computationally efficient
These characteristics make full interaction models great for second-stage retrieval, like reranking a curated set of candidate documents
extremely computationally expensive
contextually rich
scalable and contextually rich
storage requirements - they require an embedding for each token, which requires a lot more storage for a complete set of vectors
Disadvantages of no-interaction retrieval models lie in the lack of interaction between the search query and the documents.
multimodal late interaction retrieval models
vision language models (VLMs) instead of text-only models
Ok, I’ll bite: What’s ColPali?
(And why should anyone working with RAG over PDFs care?)
ColPali makes information retrieval from complex document types - like PDFs - easier.
Information retrieval from PDFs is hard because they contain various components:
Text, images, tables,…
— Leonie (@helloiamleonie)
ColPali is changing the game for PDF retrieval by eliminating the need for OCR and chunking methods 🚀
Inspired by ColBERT’s success with text, ColPali splits an image of a document into patches, which are then processed through a vision LLM called PaliGemma. The embeddings for…
— Victoria Slocum (@victorialslocum)