GenAI

GenAI

694 bookmarks
Newest
RAG from Scratch
RAG from Scratch
Contribute to labdmitriy/llm-rag development by creating an account on GitHub.
·github.com·
RAG from Scratch
What Is GraphRAG?
What Is GraphRAG?
GraphRAG is a powerful retrieval mechanism that improves Generative AI applications by taking advantage of the rich context in graph data structures.
·neo4j.com·
What Is GraphRAG?
An Overview of Late Interaction Retrieval Models: ColBERT, ColPali, and ColQwen
An Overview of Late Interaction Retrieval Models: ColBERT, ColPali, and ColQwen
Late interaction allow for semantically rich interactions that enable a precise retrieval process across different modalities of unstructured data, including text and images.
In this context, “interaction” refers to the process of assessing how well a document matches a given search query by comparing their representations.
A dense retrieval model is a model that uses some type of neural network architecture to retrieve relevant documents for a search query.
Traditional methods for retrieval commonly use “no-interaction” retrieval models. In this case, the search query and documents are processed separately
Advantages of no-interaction retrieval models are primarily that they are fast and computationally efficient
These characteristics make full interaction models great for second-stage retrieval, like reranking a curated set of candidate documents
extremely computationally expensive
contextually rich
scalable and contextually rich
storage requirements - they require an embedding for each token, which requires a lot more storage for a complete set of vectors
Disadvantages of no-interaction retrieval models lie in the lack of interaction between the search query and the documents.
multimodal late interaction retrieval models
vision language models (VLMs) instead of text-only models
·weaviate.io·
An Overview of Late Interaction Retrieval Models: ColBERT, ColPali, and ColQwen
The "think" tool: Enabling Claude to stop and think \ Anthropic
The "think" tool: Enabling Claude to stop and think \ Anthropic
A blog post for developers, describing a new method for complex tool-use situations
The primary evaluation metric used in τ-bench is pass^k, which measures the probability that all k independent task trials are successful for a given task, averaged across all tasks. Unlike the pass@k metric that is common for other LLM evaluations (which measures if at least one of k trials succeeds), pass^k evaluates consistency and reliability—critical qualities for customer service applications where consistent adherence to policies is essential.
·anthropic.com·
The "think" tool: Enabling Claude to stop and think \ Anthropic
LangChain (@LangChainAI) on X
LangChain (@LangChainAI) on X
Understanding multi-agent handoffs Handoffs are a central concept in multi-agent systems. LangGraph swarm is built on them. But, they can be hard to understand. Here, we break-down the swarm handoff mechanism. 📽️: https://t.co/YkSCFeg9A8
·x.com·
LangChain (@LangChainAI) on X
Open-Source MCP servers | Glama
Open-Source MCP servers | Glama
Enterprise-grade security, privacy, with features like agents, MCP, prompt templates, and more.
·glama.ai·
Open-Source MCP servers | Glama
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation...
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation...
Given the widespread adoption and usage of Large Language Models (LLMs), it is crucial to have flexible and interpretable evaluations of their instruction-following ability. Preference judgments between model outputs have become the de facto evaluation standard, despite distilling complex, multi-faceted preferences into a single ranking. Furthermore, as human annotation is slow and costly, LLMs are increasingly used to make these judgments, at the expense of reliability and interpretability. In this work, we propose TICK (Targeted Instruct-evaluation with ChecKlists), a fully automated, interpretable evaluation protocol that structures evaluations with LLM-generated, instruction-specific checklists. We first show that, given an instruction, LLMs can reliably produce high-quality, tailored evaluation checklists that decompose the instruction into a series of YES/NO questions. Each question asks whether a candidate response meets a specific requirement of the instruction. We demonstrate that using TICK leads to a significant increase (46.4% $\to$ 52.2%) in the frequency of exact agreements between LLM judgements and human preferences, as compared to having an LLM directly score an output. We then show that STICK (Self-TICK) can be used to improve generation quality across multiple benchmarks via self-refinement and Best-of-N selection. STICK self-refinement on LiveBench reasoning tasks leads to an absolute gain of $+$7.8%, whilst Best-of-N selection with STICK attains $+$6.3% absolute improvement on the real-world instruction dataset, WildBench. In light of this, structured, multi-faceted self-improvement is shown to be a promising way to further advance LLM capabilities. Finally, by providing LLM-generated checklists to human evaluators tasked with directly scoring LLM responses to WildBench instructions, we notably increase inter-annotator agreement (0.194 $\to$ 0.256).
·arxiv.org·
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation...
Running Dockerized Puppeteer in Claude Desktop
Running Dockerized Puppeteer in Claude Desktop
Discover how the Model Context Protocol (MCP) simplifies building AI applications by seamlessly integrating Anthropic Claude with Docker Desktop, enhancing developer productivity and workflow efficiency.
·docker.com·
Running Dockerized Puppeteer in Claude Desktop
mcp-use
mcp-use
Model-Agnostic MCP Library for LLMs
·pypi.org·
mcp-use