Report

Report

21 bookmarks
Newest
Slidecrafting
Slidecrafting
how to create slides with quarto
·slidecrafting-book.com·
Slidecrafting
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation...
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation...
Given the widespread adoption and usage of Large Language Models (LLMs), it is crucial to have flexible and interpretable evaluations of their instruction-following ability. Preference judgments between model outputs have become the de facto evaluation standard, despite distilling complex, multi-faceted preferences into a single ranking. Furthermore, as human annotation is slow and costly, LLMs are increasingly used to make these judgments, at the expense of reliability and interpretability. In this work, we propose TICK (Targeted Instruct-evaluation with ChecKlists), a fully automated, interpretable evaluation protocol that structures evaluations with LLM-generated, instruction-specific checklists. We first show that, given an instruction, LLMs can reliably produce high-quality, tailored evaluation checklists that decompose the instruction into a series of YES/NO questions. Each question asks whether a candidate response meets a specific requirement of the instruction. We demonstrate that using TICK leads to a significant increase (46.4% $\to$ 52.2%) in the frequency of exact agreements between LLM judgements and human preferences, as compared to having an LLM directly score an output. We then show that STICK (Self-TICK) can be used to improve generation quality across multiple benchmarks via self-refinement and Best-of-N selection. STICK self-refinement on LiveBench reasoning tasks leads to an absolute gain of $+$7.8%, whilst Best-of-N selection with STICK attains $+$6.3% absolute improvement on the real-world instruction dataset, WildBench. In light of this, structured, multi-faceted self-improvement is shown to be a promising way to further advance LLM capabilities. Finally, by providing LLM-generated checklists to human evaluators tasked with directly scoring LLM responses to WildBench instructions, we notably increase inter-annotator agreement (0.194 $\to$ 0.256).
·arxiv.org·
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation...
tjmlabs/ColiVara: Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has state of the art retrieval performance on both text and visual documents. using vision models instead of chunking and text-processing for documents. No OCR, no text extraction, no broken tables, or missing images.
tjmlabs/ColiVara: Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has state of the art retrieval performance on both text and visual documents. using vision models instead of chunking and text-processing for documents. No OCR, no text extraction, no broken tables, or missing images.
Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has state of the art retrieval performance on both text and visual...
·github.com·
tjmlabs/ColiVara: Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has state of the art retrieval performance on both text and visual documents. using vision models instead of chunking and text-processing for documents. No OCR, no text extraction, no broken tables, or missing images.
Best Vision Language Models for Document Data Extraction
Best Vision Language Models for Document Data Extraction
Compare performance, cost, and accuracy of leading Vision Language Models including GPT-4V, Claude 3.5, and open-source alternatives. Real-world testing on document processing tasks.
·nanonets.com·
Best Vision Language Models for Document Data Extraction
GitHub - bytedance/pasa: PaSa -- an advanced paper search agent powered by large language models. It can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant references, to ultimately obtain comprehensive and accurate results for complex scholarly queries.
GitHub - bytedance/pasa: PaSa -- an advanced paper search agent powered by large language models. It can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant references, to ultimately obtain comprehensive and accurate results for complex scholarly queries.
PaSa -- an advanced paper search agent powered by large language models. It can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant refe...
·github.com·
GitHub - bytedance/pasa: PaSa -- an advanced paper search agent powered by large language models. It can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant references, to ultimately obtain comprehensive and accurate results for complex scholarly queries.
Reducto Document Ingestion API
Reducto Document Ingestion API
Reducto is an API that provides high quality data ingestion for large language models (LLMs). It works with any vector database or embedding system. It can parse PDFs, Excel, PowerPoint, and more.
·reducto.ai·
Reducto Document Ingestion API
DeepSeekV3, Gemini, Mixtral and many others are all Mixture of Experts (MoEs).
DeepSeekV3, Gemini, Mixtral and many others are all Mixture of Experts (MoEs).
But what exactly are MoEs? 🤔 A Mixture of Experts (MoE) is a machine learning framework that resembles a team of specialists, each adept at handling different aspects of a complex task. It's like… — Akshay 🚀 (@akshay_pachaar)
·x.com·
DeepSeekV3, Gemini, Mixtral and many others are all Mixture of Experts (MoEs).
What can VLM brings to RAG beyond input modality change?
What can VLM brings to RAG beyond input modality change?
For “R”, our DSE dropped the document processing and improved relevancy modeling by preserving the content integration. Now for “G”, we propose VISA. Aiming to take a step towards more verifiable and intuitive V-RAG.… — Xueguang Ma (@xueguang_ma)
·x.com·
What can VLM brings to RAG beyond input modality change?
Predictions for the Future of RAG - jxnl.co
Predictions for the Future of RAG - jxnl.co
Explore the future of RAG in report generation, enhancing decision-making and resource allocation for businesses.
·jxnl.co·
Predictions for the Future of RAG - jxnl.co