HomeAmazon

Amazon

19 curated articles on Amazon for AI engineers

19 articles
Build interactive PDF text extraction from Amazon S3
AWS ML Blog· 15 min read· Yesterday
Build interactive PDF text extraction from Amazon S3

This article presents a solution for building an interactive PDF text extraction server from Amazon S3, providing real-time access to text inside PDFs without batch pipelines or heavy infrastructure. The solution utilizes a Model Context Protocol (MCP) server approach, which sits between custom scripts and batch pipelines, offering interactive access with minimal setup. This approach is suitable for text-based PDFs in development and proof of concept settings, whereas Amazon Textract is recommended for complex document processing. The practical implication for engineers building AI systems is that they can leverage this solution to provide on-demand access to text inside PDFs, enhancing the efficiency of compliance, legal, financial services, and executive teams.

The fuel of the future is already here: Why TRISO matters
Amazon Science· 5 min read· 3 days ago
The fuel of the future is already here: Why TRISO matters

Amazon is investing in next-generation nuclear technology, specifically tristructural isotropic (TRISO) fuel particles, to meet the rising energy demands of AI infrastructure and cloud computing. TRISO particles have a ceramic shell with three layers, providing exceptional mechanical integrity and thermal resilience, with a failure fraction of ≤ 6.6 × 10⁻⁵ at 1600°C. This technology offers greater flexibility in fuel form and reactor design, enabling new operational modes and potentially reducing waste. The practical implication for engineers building AI systems is the potential for more efficient and sustainable energy sources to power their infrastructure.

Graviton5’s improved design increases speed and energy efficiency — beyond Moore’s law
Amazon Science· 5 min read· Jun 10, 2026
Graviton5’s improved design increases speed and energy efficiency — beyond Moore’s law

The authors have demonstrated a 25% improvement in performance for general-purpose and agentic AI workloads using the Graviton5 chiplet architecture, custom die-to-die connectivity, and support for DDR5-8800 memory and the latest PCIe gen6 interconnects, effectively surpassing Moore's Law. This breakthrough enables faster and more energy-efficient processing for AI workloads. The improved design is particularly beneficial for large-scale AI applications, where every percentage point of performance gain can significantly impact overall system efficiency. This achievement has the potential to accelerate AI adoption in various industries.

Production-grade AI agents for financial compliance: Lessons from Stripe
AWS ML Blog· 16 min read· Yesterday
Production-grade AI agents for financial compliance: Lessons from Stripe

Stripe built a production-grade AI agent system on AWS using Amazon Bedrock, reducing review handling time by 26 percent while maintaining human oversight and achieving over 96 percent helpfulness ratings. The system, based on Stripe's ReAct agent framework, utilizes task decomposition, orchestration patterns, and cost optimization through prompt caching to scale compliance operations. This approach addresses the $206 billion global compliance burden by identifying 95% of card-testing attacks in real time and reducing unnecessary customer friction by 20%. The practical implication for engineers building AI systems is the importance of designing agentic systems that balance automation with human oversight and accountability.

Optimize model training on Amazon SageMaker AI with NVIDIA Blackwell
AWS ML Blog· 13 min read· 2 days ago
Optimize model training on Amazon SageMaker AI with NVIDIA Blackwell

The introduction of NVIDIA Blackwell GPUs on Amazon SageMaker AI enables the optimization of model training for large AI models by reducing constraints such as batch sizes limited by GPU memory and sequence lengths cut short to avoid out-of-memory errors. With Blackwell's expanded memory and new precision formats, users can train models with larger batch sizes, longer sequence lengths, and reduced model sharding, resulting in improved throughput and reduced communication overhead. The use of PyTorch Fully Sharded Data Parallel (FSDP) and strategic application of activation checkpointing can further optimize training configurations. This leads to faster iteration cycles, less networking overhead, and lower infrastructure costs. By properly configuring Blackwell training jobs, users can process larger batch sizes without aggressive sharding and achieve better results for long-range depende

Amazon Research Awards recipients announced
Amazon Science· 6 min read· May 27, 2026
Amazon Research Awards recipients announced

The Amazon Research Awards (ARA) recipients have been announced, spanning 49 universities across 11 countries, with access to Amazon public datasets, AWS AI/ML services, and tools. This collaboration enables researchers to leverage Amazon's resources, accelerating AI/ML advancements. The recipients will utilize these resources to drive innovation and push the boundaries of AI research. The ARA program fosters a collaborative environment between academia and industry, promoting knowledge sharing and advancements in AI.

Implementing super resolution by deploying SeedVR2 on Amazon SageMaker AI
AWS ML Blog· 11 min read· 2 days ago
Implementing super resolution by deploying SeedVR2 on Amazon SageMaker AI

The SeedVR2 model, an open-source video restoration model developed by ByteDance's Seed team, can be deployed on Amazon SageMaker AI to address the challenge of upscaling lower-resolution video content to higher resolutions. This approach provides a scalable solution for super resolution, analyzing visual information frame by frame to restore details and improve video quality. By leveraging SageMaker's managed infrastructure, users can process video collections at scale while maintaining cost efficiency and performance. The solution architecture utilizes a three-tier AWS architecture defined with AWS Cloud Development Kit (AWS CDK) for infrastructure as code. The practical implication for engineers building AI systems is the ability to implement video upscaling using SeedVR2 on SageMaker AI, enabling the restoration of historical footage, enhancement of subscriber experiences, and effici

Build self-service AWS Health analytics to find actionable health insights with AI agents powered by Amazon Bedrock
AWS ML Blog· 23 min read· 2 days ago
Build self-service AWS Health analytics to find actionable health insights with AI agents powered by Amazon Bedrock

The Chaplin solution utilizes AI agents powered by Amazon Bedrock and exposed through the Model Context Protocol (MCP) to provide self-service health event analytics for AWS Health notifications. This approach enables teams to ask questions in natural language and receive precise, contextualized answers without relying on AWS Support. With Chaplin, teams can identify actionable health insights, prioritize events, and make informed decisions. The practical implication for engineers building AI systems is that they can leverage Chaplin to streamline health event management and focus on innovation rather than reactive firefighting.

Building agentic AI applications with a modern data mesh strategy on AWS
AWS ML Blog· 22 min read· 2 days ago
Building agentic AI applications with a modern data mesh strategy on AWS

Building agentic AI applications on a modern data mesh strategy on AWS requires fine-grained access control enforced at every layer of the data interaction chain. The proposed architecture extends the original with three key changes: replacing Amazon OpenSearch Serverless with Amazon S3 Vectors, replacing general-purpose Amazon S3 with Amazon S3 Tables governed by AWS Lake Formation, and exposing the data mesh as Model Context Protocol (MCP) tools through AgentCore Gateway with AWS Lambda-backed interceptors. This approach provides a secure, scalable data foundation for production agentic AI, reducing vector storage and query costs by up to 90% and increasing transactions per second by up to 10 times. The practical implication for engineers building AI systems is the ability to enforce fine-grained access control and provide a governed data mesh for agentic AI applications.

Huntington Bank: Redacting sensitive data from 400M+ documents with AWS
AWS ML Blog· 7 min read· 3 days ago
Huntington Bank: Redacting sensitive data from 400M+ documents with AWS

Huntington Bank utilized Amazon Textract, Amazon SageMaker, AWS Step Functions, and AWS Lambda to design a scalable redaction workflow, reducing the timeline for processing 400 million documents from years to months. The solution ensured data encryption at rest and in transit, met strict access requirements, and achieved redaction accuracy of 95% or higher. By leveraging AWS services, Huntington was able to efficiently process large volumes of documents while maintaining compliance with PCI DSS requirements. This approach has significant implications for engineers building AI systems that require large-scale document processing and redaction.

Build a healthcare appointment agent with Amazon Nova 2 Sonic
AWS ML Blog· 13 min read· 3 days ago
Build a healthcare appointment agent with Amazon Nova 2 Sonic

This article demonstrates how to build a healthcare appointment agent using Amazon Nova 2 Sonic and Amazon Bedrock AgentCore, achieving 90% accuracy in appointment reminder conversations and reducing manual data entry by 75%. The agent leverages voice authentication, appointment management, and pre-visit health information collection. This solution enables healthcare providers to streamline patient interactions and improve operational efficiency. The tradeoff is a potential increase in upfront development costs due to the need for custom voice models and integration with existing systems.

How Loka Built a Natural, Low-Latency Voice Agent with Amazon Nova 2 Sonic
AWS ML Blog· 11 min read· 3 days ago
How Loka Built a Natural, Low-Latency Voice Agent with Amazon Nova 2 Sonic

Loka built a conversational AI agent with Amazon Nova 2 Sonic, achieving high speech reasoning accuracy and low latency, outperforming traditional voice AI pipelines. The native speech-to-speech model processed audio end-to-end, capturing tone, emotion, and subtle cues, and scored 87.0 on the Big Bench Audio benchmark. This approach solved the common frustration of robotic, slow voice assistants, delivering natural and responsive experiences. The practical implication for engineers building AI systems is that native speech-to-speech models can provide a better solution for voice AI adoption, with lower costs and faster response times.

9 ways AI is reshaping enterprise operations: Key insights from AWS Summit NYC
SiliconANGLE AI· 4 days ago
9 ways AI is reshaping enterprise operations: Key insights from AWS Summit NYC

The AWS Summit NYC 2026 highlighted the evolving role of AI in enterprise operations, shifting from experimentation to practical deployment. Key discussions centered around the use of physical robots and agentic systems to address labor shortages and reshape operations. Not mentioned are specific numbers, model names, or benchmark results. The practical implication for engineers building AI systems is the increasing focus on deployment and real-world applications.

Build a protein research copilot with Amazon Bedrock AgentCore
AWS ML Blog· 15 min read· 4 days ago
Build a protein research copilot with Amazon Bedrock AgentCore

This article presents a technical guide on building a protein research copilot using Amazon Bedrock AgentCore, which enables researchers to search for structurally similar peptides across large datasets using natural language queries. The system combines natural language query parsing, vector similarity search over protein embeddings, and AI-generated scientific summaries of search results. The copilot is built using the Strands Agents SDK and deployed to Amazon Bedrock AgentCore for production serving. The practical implication for engineers building AI systems is the ability to create conversational interfaces that can handle complex research workflows and provide accurate results.

Embed the world: Multimodal AI for searchable aerial imagery at scale
AWS ML Blog· 25 min read· 5 days ago
Embed the world: Multimodal AI for searchable aerial imagery at scale

The AWS Generative AI Innovation Center (GenAIIC) partnered with Vexcel to develop a multimodal AI system for searchable aerial imagery at scale, leveraging Amazon Bedrock and Amazon OpenSearch Serverless. The system uses multimodal embeddings, large language model (LLM) captioning, and vector search to enable natural-language-searchable knowledge bases. The evaluation methodology, built on OpenStreetMap ground truth, compared embedding models, fusion strategies, captioning, and search methods, with Amazon Nova Multimodal Embeddings delivering the highest F1 scores. This approach removes the per-feature training step, allowing for faster and more efficient semantic search. The practical implication for engineers building AI systems is the potential to apply this architecture to other domains, enabling faster and more efficient search capabilities.

Running ComfyUI workflows on Amazon SageMaker AI processing jobs
AWS ML Blog· 12 min read· 5 days ago
Running ComfyUI workflows on Amazon SageMaker AI processing jobs

ComfyUI workflows can be deployed on Amazon SageMaker AI processing jobs to automate content generation at scale, allowing enterprises to generate hundreds of high-quality images in a single batch. This solution utilizes AWS Cloud Development Kit (AWS CDK) for infrastructure setup, GPU-accelerated processing, and automation of image generation. By leveraging ComfyUI and SageMaker, businesses can accelerate campaigns, boost conversions through personalization, and protect brand equity. The practical implication for engineers building AI systems is the ability to scale their creative pipeline and automate repetitive tasks, freeing creative teams to focus on high-impact strategy.

Introducing Web Search on Amazon Bedrock AgentCore
AWS ML Blog· 10 min read· Jun 19, 2026
Introducing Web Search on Amazon Bedrock AgentCore

Amazon Bedrock AgentCore now offers a fully managed web search capability, allowing AI agents to access up-to-date information from the web without infrastructure overhead. This feature, compatible with the Model Context Protocol (MCP), provides a purpose-built web index spanning tens of billions of documents, updated continually to reflect new content within minutes. The privacy model ensures that queries stay within AWS, and retrieval can combine a knowledge graph with semantic snippet extraction. This development has significant implications for engineers building AI systems, as it addresses the limitation of frozen knowledge at training time and enables agents to respond to real-time queries.

Accelerate campaign workflow with insights from Adobe Marketing Agent for Amazon Quick
AWS ML Blog· 14 min read· Jun 19, 2026
Accelerate campaign workflow with insights from Adobe Marketing Agent for Amazon Quick

The Adobe Marketing Agent for Amazon Quick integration enables marketing teams to access campaign insights within governed conversations in seconds, using natural language to ask questions about campaign performance, audiences, and journeys. The integration is configured using the Model Context Protocol (MCP) and provides capabilities such as campaign review and monitoring, campaign planning, audience insights, journey insights, and journey conflict analysis. The solution applies governance controls, including least privilege, tenant isolation, and audit logging, to ensure secure and compliant data access. This integration has practical implications for engineers building AI systems, as it demonstrates the potential for AI-powered analysis and automation in marketing workflows.

Monitor and debug generative AI inference with SageMaker detailed metrics and Insights dashboard on CloudWatch
AWS ML Blog· 14 min read· Jun 18, 2026
Monitor and debug generative AI inference with SageMaker detailed metrics and Insights dashboard on CloudWatch

Amazon SageMaker AI now provides detailed inference metrics and a SageMaker Insights dashboard in Amazon CloudWatch to monitor and debug generative AI inference endpoints. The dashboard supports both single-model endpoints (SME) and inference component (IC) endpoints, and provides over 100 metrics, including GPU health, token-level latency, and KV cache pressure. This allows machine learning platform engineers, MLOps teams, and site reliability engineers (SREs) to keep inference endpoints healthy, responsive, and cost-efficient. The practical implication for engineers building AI systems is that they can now easily monitor and troubleshoot their generative AI inference endpoints, reducing downtime and improving overall performance. The SageMaker Insights dashboard provides a fully managed observability solution, removing the need for custom Grafana dashboards and Prometheus configuration

EXPLORE AI NEWS

Daily hand-picked stories on LLMs, RAG, agents and production AI — curated for engineers who ship.

BROWSE NEWS

GET THE WEEKLY DIGEST

Join engineers getting the Monday signal-over-noise AI breakdown. No spam, unsubscribe anytime.

LEARN AI ENGINEERING

Curated courses, research papers, repos and tutorials built for engineers leveling up in AI.

START LEARNING