HomeRAG

RAG

Retrieval-Augmented Generation (RAG) connects LLMs to external knowledge sources at inference time, enabling accurate, up-to-date answers without retraining. A core pattern in production AI systems.

25 articles

25 articles
The Protocol That Cleaned Up Our Agent Architecture
Towards Data Science· Today
The Protocol That Cleaned Up Our Agent Architecture

The authors successfully integrated the Model Context Protocol (MCP) into their agent architecture, achieving a 30% reduction in code complexity and a 25% decrease in server latency. This was accomplished by consolidating scattered tool definitions into a single, discoverable server using MCP's standardized protocol. The result is a more maintainable and scalable system. By leveraging MCP, the authors were able to simplify their architecture and improve performance, paving the way for future innovations.

ChatSee raises $6.5M to build ‘failure memory’ for enterprise AI agents
SiliconANGLE AI· 3 days ago
ChatSee raises $6.5M to build ‘failure memory’ for enterprise AI agents

ChatSee.AI Inc. has raised $6.5 million in seed funding to develop a 'failure memory' layer for enterprise AI agents, enabling them to learn from past failures and improve performance. This technology aims to reduce the risk of AI system failures and improve overall reliability. The authors note that traditional AI systems often lack the ability to learn from failures, leading to repeated mistakes. By incorporating a failure memory layer, ChatSee's technology promises to enhance the robustness and resilience of AI agents. This development has significant implications for the adoption of AI in high-stakes industries such as finance and healthcare.

Talk to all your data, wherever it lives
Databricks Blog· 6 min read· 3 days ago
Talk to all your data, wherever it lives

Agentic AI has created demand for cross-source reasoning that didn't exist 12 months ago, driving the need for a unified data access framework that can integrate multiple data sources, including databases, APIs, and file systems. This new framework, called "DataConnect," allows developers to easily connect to and reason over data from various sources, enabling more comprehensive and accurate AI decision-making. DataConnect uses a standardized API to abstract away the complexities of data access, making it easier to integrate data from different sources and enabling developers to focus on building more sophisticated AI models. This approach has the potential to significantly improve the accuracy and reliability of AI decision-making, particularly in applications where data is scattered across multiple sources.

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention
Ahead of AI· 27 min read· May 16, 2026
Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention

Recent advancements in LLM architectures have led to the development of open-weight models, such as Gemma 4 and DeepSeek V4, which leverage key-value sharing, multi-head cross-attention (mHC), and compressed attention mechanisms to significantly reduce long-context costs. These innovations have resulted in a 2x reduction in parameters while maintaining comparable performance to previous models. However, this comes at the cost of increased computational complexity, particularly in the attention mechanism. The authors demonstrate the effectiveness of these techniques on a range of benchmarks, including the long-range dependency test, with a 25% improvement in accuracy. This breakthrough has the potential to make large language models more practical for real-world applications, but further research is needed to optimize the attention mechanism for production use.

Build context-rich research agents with Deep Agents and Bedrock AgentCore
AWS ML Blog· 11 min read· Today
Build context-rich research agents with Deep Agents and Bedrock AgentCore

The authors demonstrate building a competitive research agent with Deep Agents and Bedrock AgentCore for isolated execution environments in multi-step AI workflows. This walkthrough showcases a pattern end to end, utilizing Bedrock AgentCore for deployment. The resulting agent achieves state-of-the-art performance on a specific dataset, outperforming baseline models by 15% in terms of accuracy. This approach enables developers to seamlessly integrate and deploy AI agents in production environments. By leveraging Bedrock AgentCore, developers can isolate and manage complex AI workflows with ease, ensuring reproducibility and scalability.

Vision LLMs are PDF Parsers Too: Reading Charts and Diagrams for RAG
Towards Data Science· Yesterday
Vision LLMs are PDF Parsers Too: Reading Charts and Diagrams for RAG

Vision LLMs have been found to be capable of reading charts and diagrams in PDFs, in addition to text, making them useful for Retrieval-Augmented Generation (RAG) tasks. This capability allows vision LLMs to parse PDFs more comprehensively than traditional parsers. The practical implication for engineers building AI systems is that they can leverage vision LLMs to extract valuable information from visual elements in documents. Vision LLMs can be used to improve document understanding and analysis.

Games people — and machines — play: Untangling strategic reasoning to advance AI
MIT News AI· 5 min read· May 5, 2026
Games people — and machines — play: Untangling strategic reasoning to advance AI

The authors present a novel framework for strategic reasoning in complex multi-agent decision-making, leveraging insights from game theory and multi-agent systems. This framework, called "Strategic Reasoning Graphs," enables the representation of complex decision-making processes as a graph, allowing for more efficient and scalable reasoning. The authors demonstrate the effectiveness of their framework on a variety of benchmark scenarios, including a multi-agent game with 10 players and 100 actions, achieving a 30% improvement in decision-making speed. The proposed framework has the potential to advance AI systems in complex decision-making environments, such as autonomous vehicles and smart cities.

Larger Context Windows Don’t Fix RAG — So I Built a System That Does
Towards Data Science· 2 days ago
Larger Context Windows Don’t Fix RAG — So I Built a System That Does

The article discusses the limitations of increasing context size in Retrieval-Augmented Generation (RAG) systems for aggregation tasks, finding that it does not improve accuracy and instead makes errors harder to detect. The author benchmarks retrieval-based pipelines against a deterministic full-scan engine across 100,000 rows, demonstrating the need to route computation queries away from RAG. This finding has significant implications for engineers building AI systems, as it suggests that alternative approaches are needed to improve accuracy in aggregation tasks. The author's system, built in response to these limitations, offers a potential solution.

PhoenixAI raises $80M to drive the development of agentic AI-ready database technology
SiliconANGLE AI· 4 days ago
PhoenixAI raises $80M to drive the development of agentic AI-ready database technology

PhoenixAI, a company formerly known as CelerData, has secured $80 million in Series B funding to accelerate the development of its AI-native database technology, designed to support the growth of agentic AI in regulated industries. This investment will enable the company to expand its governance capabilities and further develop its database technology. The AI-native database is expected to improve data management and analysis for applications that rely on large language models and multi-step AI agents. This move marks a significant step towards creating more robust and scalable AI systems that can handle complex data and tasks.

Diverse reasoning traces teach LLMs to make better decisions
Amazon Science· 5 min read· May 26, 2026
Diverse reasoning traces teach LLMs to make better decisions

Researchers have developed a novel training method that leverages tokens to control distinct reasoning strategies, enabling large language models (LLMs) to generate diverse and accurate reasoning paths. By incorporating these tokens, LLMs can produce multiple, high-quality solutions to a problem, rather than relying on a single, dominant path. This approach improves the decision-making capabilities of LLMs, making them more versatile and effective in real-world applications. However, it also increases the computational cost and requires careful tuning of the token-based reasoning strategy. A key benefit of this method is its ability to improve the robustness and generalizability of LLMs, allowing them to perform well across a wide range of tasks and domains.

From PDFs to insights: Architecting an intelligent document processing pipeline with AWS generative AI services
AWS ML Blog· 14 min read· 3 days ago
From PDFs to insights: Architecting an intelligent document processing pipeline with AWS generative AI services

This article presents a cost-effective and scalable intelligent document processing pipeline on AWS, utilizing Amazon Bedrock and its BDA service to automate insights extraction from documents. The pipeline is demonstrated to extract key information from PDFs with a high degree of accuracy, achieving a 95% accuracy rate. This solution enables businesses to unlock valuable insights from large volumes of documents, improving operational efficiency and decision-making. The pipeline's scalability and cost-effectiveness make it an attractive option for organizations with extensive document collections.

Parse PDFs for RAG Locally with Docling: Rich Tables, No Cloud Upload
Towards Data Science· 2 days ago
Parse PDFs for RAG Locally with Docling: Rich Tables, No Cloud Upload

The Docling tool allows for parsing PDFs locally, enabling Retrieval-Augmented Generation (RAG) without the need for cloud uploads. This approach provides cloud-grade structure for table cells, OCR, captions, and headings, all while running on the user's own machine. The practical implication for engineers building AI systems is the ability to maintain data privacy and avoid per-page billing.

When PyMuPDF Can’t See the Table: Parse PDFs for RAG with Azure Layout
Towards Data Science· 3 days ago
When PyMuPDF Can’t See the Table: Parse PDFs for RAG with Azure Layout

The article discusses the limitations of PyMuPDF in parsing tables from PDFs, particularly when dealing with relational tables, native table cells, and scanned pages. It introduces Azure Layout as an alternative solution for parsing PDFs, allowing for the extraction of captions, headings, and table data without relying on regex. This approach has practical implications for engineers building AI systems, especially those working on Retrieval-Augmented Generation (RAG) tasks. The use of Azure Layout can improve the accuracy and efficiency of PDF parsing, enabling better document understanding and information extraction.

NVIDIA and Doosan Group Collaborate to Advance Physical AI and AI Factory Infrastructure
NVIDIA Blog· 4 min read· Jun 7, 2026
NVIDIA and Doosan Group Collaborate to Advance Physical AI and AI Factory Infrastructure

NVIDIA and Doosan Group are expanding their collaboration to advance physical AI and AI factory infrastructure, leveraging NVIDIA's full-stack AI computing platform to integrate AI into Doosan's robotics, construction equipment, and energy solutions. The partnership aims to enhance the efficiency, safety, and productivity of Doosan's manufacturing processes and products. By combining NVIDIA's AI expertise with Doosan's industry expertise, the collaboration will drive innovation in AI factory infrastructure and robotics. This strategic partnership will enable Doosan to accelerate the development and deployment of AI-powered solutions across its various business units.

Promptimus: Improving already good LLM prompts with zero manual engineering
Amazon Science· 13 min read· May 14, 2026
Promptimus: Improving already good LLM prompts with zero manual engineering

The authors introduce Promptimus, a novel framework that automatically improves the performance of pre-existing large language model (LLM) prompts by identifying and addressing specific failure points through targeted solutions, achieving up to a 30% increase in accuracy without compromising existing functionality. This framework enables zero manual engineering in prompt optimization, streamlining the development process. By leveraging a combination of natural language processing and machine learning techniques, Promptimus efficiently refines prompts, making it a valuable tool for LLM developers. The authors demonstrate the effectiveness of Promptimus on a range of benchmark tasks, showcasing its potential to significantly enhance LLM performance with minimal additional effort.

Evaluate AI agents systematically with Agent-EvalKit
AWS ML Blog· 13 min read· 4 days ago
Evaluate AI agents systematically with Agent-EvalKit

Agent-EvalKit, an open-source toolkit under the Apache 2.0 license, enables systematic evaluation of AI agents by integrating with popular AI coding assistants, including Claude Code, Kiro CLI, and Kilo Code. It spans six evaluation phases, facilitating a comprehensive assessment of AI agents. This evaluation framework can be applied to various domains, including travel research, showcasing its versatility. By leveraging Agent-EvalKit, developers can refine and improve their AI agents, leading to better performance and more accurate results. However, the toolkit's effectiveness heavily relies on the quality of the evaluation metrics and the agents being assessed.

Stop Returning Flat Text from a PDF: The Relational Tables RAG Needs
Towards Data Science· 4 days ago
Stop Returning Flat Text from a PDF: The Relational Tables RAG Needs

Researchers propose a novel Relational Augmented Generation (RAG) model, dubbed "Relational Shape RAG," capable of extracting a structured, relational representation of PDF content, including lines, pages, tables of contents, images, cross-references, captions, spans, and a parsing summary, from a single input PDF file. This model outperforms existing solutions in terms of accuracy and efficiency. The Relational Shape RAG model can be used in various applications, such as document analysis, information retrieval, and text summarization.

Optimize blueprint extraction accuracy in Amazon Bedrock Data Automation
AWS ML Blog· 15 min read· 4 days ago
Optimize blueprint extraction accuracy in Amazon Bedrock Data Automation

Amazon Bedrock Data Automation's blueprint instruction optimization feature can refine extraction instructions to improve accuracy in minutes, with a 10-example document input, resulting in improved blueprint extraction accuracy. This feature directly addresses the challenge of optimizing blueprint extraction and reduces the time required from weeks to minutes. By leveraging this feature, engineers can improve the accuracy of their data extraction pipelines, enabling faster and more reliable data processing. This optimization is particularly useful for large-scale data processing tasks where accuracy is critical.

Build an AI-Powered Equipment Repair Assistant Using Amazon Bedrock AgentCore
AWS ML Blog· 13 min read· 5 days ago
Build an AI-Powered Equipment Repair Assistant Using Amazon Bedrock AgentCore

The authors demonstrate a practical AI-powered equipment repair assistant built using Amazon Bedrock AgentCore, which integrates natural language processing (NLP) capabilities to diagnose equipment issues, identify required parts, and provide manufacturer-approved repair procedures. This solution utilizes AgentCore Runtime, a cloud-based service that enables seamless integration with Amazon SageMaker and other AWS services. By leveraging AgentCore's capabilities, the repair assistant can process user queries and generate relevant responses, reducing the time and effort required for equipment maintenance. This solution showcases the potential of AI-powered tools in improving agricultural productivity and efficiency.

Hands-free first notice of loss: Using Strands Agents and Amazon Bedrock AgentCore Browser Tool for intelligent claims intake
AWS ML Blog· 22 min read· 6 days ago
Hands-free first notice of loss: Using Strands Agents and Amazon Bedrock AgentCore Browser Tool for intelligent claims intake

We present a hands-free first notice of loss (FNOL) intake system that integrates Strands Agents and Amazon Bedrock AgentCore Browser Tool, leveraging domain reasoning and live portal interaction to automate repetitive tasks, thereby preserving human expertise. This system demonstrates a 30% reduction in manual data entry time and a 25% increase in accuracy. The integration enables seamless communication between agents and the portal, streamlining the FNOL process. This solution can be applied to various industries, including insurance and healthcare, where FNOL is a critical step in the claims process.

Build an agentic incident triage assistant with Amazon Quick and New Relic
AWS ML Blog· 10 min read· 6 days ago
Build an agentic incident triage assistant with Amazon Quick and New Relic

Engineers can now build an agentic incident triage assistant using Amazon Quick and New Relic, leveraging the Model Context Protocol (MCP) Server to orchestrate a response. This assistant can be integrated with existing incident triage workflows, reducing mean time to detect (MTTD) and mean time to resolve (MTTR) by 30%. The assistant can be trained on New Relic's MCP Server to learn from historical data and adapt to new patterns, enabling more accurate and efficient incident triage.

It’s safe to close your laptop now: Hosting coding agents on Amazon Bedrock AgentCore
AWS ML Blog· 24 min read· Jun 8, 2026
It’s safe to close your laptop now: Hosting coding agents on Amazon Bedrock AgentCore

Amazon Bedrock AgentCore Runtime enables the concurrent execution of multiple AI coding agents, such as Claude Code, Codex, Kiro, and Cursor, in isolated microVMs with persistent workspaces and secure tool access, allowing developers to close their laptops without interrupting the workflow. This solution provides built-in observability and eliminates the need to share secrets, ports, or filesystems. The result is a more efficient and secure way to run AI-powered coding agents in parallel. This tradeoff is achieved by sacrificing some overhead in terms of resource allocation and management. To integrate this solution, developers can use the Amazon Bedrock AgentCore API and Gateway services.

NVIDIA Research Unlocks Advanced Grasping, Smarter Autonomous Driving and Agent Training at Scale
NVIDIA Blog· 5 min read· Jun 3, 2026
NVIDIA Research Unlocks Advanced Grasping, Smarter Autonomous Driving and Agent Training at Scale

NVIDIA researchers have developed a new AI framework for grasping, autonomous driving, and multi-agent training that leverages a combination of simulation and real-world data to improve performance and robustness. The framework uses a novel architecture that integrates a multi-modal perception model with a reinforcement learning-based control policy, enabling robots to adapt to new objects and environments. This approach has been demonstrated to improve grasping success rates by 15% and autonomous driving safety by 20% in simulation. By training agents in simulation and fine-tuning them on real-world data, the framework enables scalable and efficient training of complex AI systems.

NVIDIA Enables the Next Era Of Physical AI Research With Agent Skills For Autonomous Vehicles, Robotics And Vision AI
NVIDIA Blog· 7 min read· Jun 3, 2026
NVIDIA Enables the Next Era Of Physical AI Research With Agent Skills For Autonomous Vehicles, Robotics And Vision AI

NVIDIA is introducing Agent Skills for autonomous vehicles, robotics, and vision AI, enabling researchers to accelerate development by providing a complete workflow for physical AI research. This includes a set of pre-trained models, a simulation environment, and a suite of tools for data collection and training. By streamlining the development process, researchers can focus on higher-level tasks such as system integration and testing. This marks a significant step towards more efficient physical AI research, potentially leading to breakthroughs in autonomous vehicles and robotics.

NVIDIA Jetson Brings Agentic AI to the Physical World
NVIDIA Blog· 5 min read· Jun 2, 2026
NVIDIA Jetson Brings Agentic AI to the Physical World

NVIDIA has announced NVIDIA JetPack 7.2 and NVIDIA NemoClaw support on NVIDIA Jetson, enabling agentic AI capabilities in the physical world. This is achieved through a substantial performance gain on the Jetson AGX Orin 32GB module, with NVIDIA CUDA 13 now supported on the NVIDIA Jetson Orin. The Yocto project is also supported, providing a flexible and customizable build system. This development brings agentic AI to the edge, empowering developers to create more sophisticated and interactive AI experiences.

EXPLORE AI NEWS

Daily hand-picked stories on LLMs, RAG, agents and production AI — curated for engineers who ship.

BROWSE NEWS

GET THE WEEKLY DIGEST

Join engineers getting the Monday signal-over-noise AI breakdown. No spam, unsubscribe anytime.

LEARN AI ENGINEERING

Curated courses, research papers, repos and tutorials built for engineers leveling up in AI.

START LEARNING