AWS ML Blog

Amazon SageMaker AI Async Inference now supports inline request payloads

June 17, 2026•6 min read•

Level:Intermediate

For:AI Engineers

✦TL;DR

Amazon SageMaker AI Async Inference now supports inline request payloads, allowing customers to send inference payloads directly in the request body of the InvokeEndpointAsync API, removing the need to upload input data to Amazon S3 before each invocation. This feature is available for payloads up to 128,000 bytes and simplifies client-side code, reducing the operational surface area of asynchronous inference workloads. The new Body parameter is mutually exclusive with the InputLocation parameter, and the API rejects requests that set both. This change is designed to work with existing async endpoints, with no model or container changes expected. The practical implication for engineers building AI systems is that they can now use inline payloads to simplify their async inference workflows.

⚡ Key Takeaways

The new Body parameter in the InvokeEndpointAsync API allows for inline payloads up to 128,000 bytes.
The Body and InputLocation parameters are mutually exclusive, and the API rejects requests that set both.
The output behavior remains unchanged, with output written to the S3 OutputLocation.
The feature is available in 31 commercial AWS Regions.
The InvokeEndpointAsync API returns synchronous ValidationError responses for size and mutual-exclusivity violations.

💡 Why It Matters

This feature simplifies the async inference workflow for customers with small input payloads, reducing the operational surface area and latency associated with uploading input data to Amazon S3. This change can help engineers building AI systems to improve the efficiency and scalability of their async inference workloads.

✅ Practical Steps

Use the new Body parameter in the InvokeEndpointAsync API to send inference payloads directly in the request body.
Remove the S3 upload step from your async inference workflow for payloads up to 128,000 bytes.
Update your client-side code to use the inline Body parameter instead of the InputLocation parameter.

Want the full story? Read the original article.

Read on AWS ML Blog ↗

Amazon SageMaker AI Async Inference now supports inline request payloads

⚡ Key Takeaways

✅ Practical Steps

More like this

Monitor and debug generative AI inference with SageMaker detailed metrics and Insights dashboard on CloudWatch

Databricks and NVIDIA: Building for the Agentic Era

Building an End-to-End Sentiment Analysis Pipeline with Scikit-LLM

Graviton5’s improved design increases speed and energy efficiency — beyond Moore’s law

Amazon SageMaker AI Async Inference now supports inline request payloads

⚡ Key Takeaways

✅ Practical Steps

More like this

Monitor and debug generative AI inference with SageMaker detailed metrics and Insights dashboard on CloudWatch

Databricks and NVIDIA: Building for the Agentic Era

Building an End-to-End Sentiment Analysis Pipeline with Scikit-LLM

Graviton5&#8217;s improved design increases speed and energy efficiency &#8212; beyond Moore&#8217;s law

Graviton5’s improved design increases speed and energy efficiency — beyond Moore’s law