Amazon SageMaker AI Async Inference now supports inline request payloads
Amazon SageMaker AI Async Inference now supports inline request payloads, allowing customers to send inference payloads directly in the request body of the InvokeEndpointAsync API, removing the need to upload input data to Amazon S3 before each invocation. This feature is available for payloads up to 128,000 bytes and simplifies client-side code, reducing the operational surface area of asynchronous inference workloads. The new Body parameter is mutually exclusive with the InputLocation parameter, and the API rejects requests that set both. This change is designed to work with existing async endpoints, with no model or container changes expected. The practical implication for engineers building AI systems is that they can now use inline payloads to simplify their async inference workflows.
⚡ Key Takeaways
- The new Body parameter in the InvokeEndpointAsync API allows for inline payloads up to 128,000 bytes.
- The Body and InputLocation parameters are mutually exclusive, and the API rejects requests that set both.
- The output behavior remains unchanged, with output written to the S3 OutputLocation.
- The feature is available in 31 commercial AWS Regions.
- The InvokeEndpointAsync API returns synchronous ValidationError responses for size and mutual-exclusivity violations.
This feature simplifies the async inference workflow for customers with small input payloads, reducing the operational surface area and latency associated with uploading input data to Amazon S3. This change can help engineers building AI systems to improve the efficiency and scalability of their async inference workloads.
✅ Practical Steps
- Use the new Body parameter in the InvokeEndpointAsync API to send inference payloads directly in the request body.
- Remove the S3 upload step from your async inference workflow for payloads up to 128,000 bytes.
- Update your client-side code to use the inline Body parameter instead of the InputLocation parameter.
Want the full story? Read the original article.
Read on AWS ML Blog ↗