Practical Text-to-SQL for Data Analytics
The Problem with Reasoners
A new tool that blends your everyday work apps into one. It's the all-in-one workspace for you and your team
From PDFs to AI-ready structured data: a deep dive · Explosion
This blog post presents a new modular workflow for converting PDFs and similar documents to structured data and shows you how to build end-to-end document understanding and information extraction pipelines for industry use cases.
How to Count Tokens - Tokenization With Tiktoken.
Counting tokens is a useful task in natural language processing (NLP) that allows us to measure the length and complexity of a text. The two important use cases for counting the tokens are: controlling the length of the prompt - models has limit …
In context scheming reasoning paper
Struggling to keep up with new RAG variants?
Here’s a cheat sheet of 7 of the most popular RAG architectures.
Which variants did we miss?
— Weaviate • vector database (@weaviate_io)
GraphRAG in Action: From Commercial Contracts to a Dynamic Q&A Agent
A question-based extraction approach
LangChain Neo4j Integration - Neo4j Labs
Awesome guide with templates
A Multi-Agent Framework for Synthetic Data Generation
Presents MAG-V, a multi-agent framework that first generates a dataset of questions that mimic customer queries. It then reverse engineer alternate questions from responses to verify agent trajectories.
Reports that the…
— elvis (@omarsar0)
Agentless is a great example of how a more constrained agent is better than a general agent for specific tasks 💡 - it achieves much higher scores on SWE-Bench Lite for bug-fixing than other agent approaches 🛠️
The whole point is to not let the agent do everything, but to do a…
— Jerry Liu (@jerryjliu0)
(12) Pedro Domingos on X: "Calling an LLM an agent doesn’t suddenly make it more intelligent." / X
— Pedro Domingos (@pmddomingos)
ZenML - LLMOps Database
List of solutions
nategro (Nathanael)
User profile of Nathanael on Hugging Face
PatentSBERTa: A Deep NLP based Hybrid Model for Patent Distance...
This study provides an efficient approach for using text data to calculate patent-to-patent (p2p) technological similarity, and presents a hybrid framework for leveraging the resulting p2p...
TRIZ Technical Contradiction Extraction Method Based on Patent Semantic Space Mapping | Proceedings of the 2020 11th International Conference on E-business, Management and Economics
A Hierarchical Feature Extraction Model for Multi-Label Mechanical Patent Classification
Various studies have focused on feature extraction methods for automatic patent classification in recent years. However, most of these approaches are based on the knowledge from experts in related domains. Here we propose a hierarchical feature extraction model (HFEM) for multi-label mechanical patent classification, which is able to capture both local features of phrases as well as global and temporal semantics. First, a n-gram feature extractor based on convolutional neural networks (CNNs) is designed to extract salient local lexical-level features. Next, a long dependency feature extraction model based on the bidirectional long–short-term memory (BiLSTM) neural network model is proposed to capture sequential correlations from higher-level sequence representations. Then the HFEM algorithm and its hierarchical feature extraction architecture are detailed. We establish the training, validation and test datasets, containing 72,532, 18,133, and 2679 mechanical patent documents, respectively, and then check the performance of HFEMs. Finally, we compared the results of the proposed HFEM and three other single neural network models, namely CNN, long–short-term memory (LSTM), and BiLSTM. The experimental results indicate that our proposed HFEM outperforms the other compared models in both precision and recall.
DAIR.AI
Learn important prompt engineering techniques to build use cases with LLMs.
NCOSE Guide to Writing Requirements V4 – Summary Sheet
LLM-based Extraction of Contradictions from Patents
Vector Similarity: Going Beyond Full-Text Search | Qdrant - Qdrant
Discover how vector similarity expands data exploration beyond full-text search. Explore diversity sampling and more for enhanced data discovery!
nategro/contradiction-psb · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Fundamental Research on Detecting Contradictions in Requirements: Taxonomy and Semi-Automated Approach
Requirements documents can contain several thousand individual requirements. They must be error-free to avoid unnecessary complications and costs in the later product development stages. An important part of this is to identify contradictions between two requirements. The first step is therefore to define what contradictions are and in what form they can occur in requirement documents. In this paper the scientific theories regarding contradictions are discussed, concerning to their usefulness for the topic. In doing so, the Aristotelian Logic proved to provide the best basis for an application in the Requirements Engineering context. Based on this theory, we have created specific subtypes of contradictions to match them to the requirements engineering field. The identification of these subtypes is done by a formalization of the requirement sentences and a subsequent analysis by means of simple questions. To validate the method, industrial requirement documents were searched for contradictions. For each detected type of contradiction, we present an example of the detection process. Thereby, we show that the method is easy to apply and may also be used by non-specialists. Thus, our method provides a taxonomy as a basis for further research on automated contradiction detection as well as on automated quality analysis of requirements documents.
Finding Contradictions in Text
Check grounding with RAG | Vertex AI Agent Builder | Google Cloud
Check grounding with RAG
ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models | AI Research Paper Details
In recent times, large language models (LLMs) have shown impressive performance on various document-level tasks such as document classification, summarization, and question-answering. However, research on understanding their capabilities on the task of self-contradictions in long documents has been very limited. In this work, we introduce ContraDoc, the first human-annotated dataset to study self-contradictions in long documents across multiple domains, varying document lengths, self-contradictions types, and scope. We then analyze the current capabilities of four state-of-the-art open-source and commercially available LLMs: GPT3.5, GPT4, PaLM2, and LLaMAv2 on this dataset. While GPT4 performs the best and can outperform humans on this task, we find that it is still unreliable and struggles with self-contradictions that require more nuance and context. We release the dataset and all the code associated with the experiments (https://github.com/ddhruvkr/CONTRADOC).
LLM-powered data classification for data entities at scale
With the advent of the Large Language Model (LLM), new possibilities dawned for metadata generation and sensitive data identification at Grab. This prompted the inception of our project aimed to integrate LLM classification into our existing data management service. Read to find out how we transformed what used to be a tedious and painstaking process to a highly efficient system and how it has empowered the teams across the organisation.
QueryGPT - Natural Language to SQL using Generative AI | Uber Blog
Discover how QueryGPT revolutionizes SQL query generation at Uber! Learn about the cutting-edge AI that turns natural language prompts into efficient SQL queries, boosting productivity at Uber. Dive into our journey of innovation and transformation.
Creating a LLM-as-a-Judge That Drives Business Results –
A step-by-step guide with my learnings from 30+ AI implementations.
astriaai/headshots-starter
How to pass runtime values to tools
Could be used for Bundesflow, to add memory to it.