AWS ML Blog
Efficiently serve dozens of fine-tuned models with vLLM on Amazon SageMaker AI and Amazon Bedrock
•1 min read•
#llm#bedrock
✦TL;DR
In this post, we explain how we implemented multi-LoRA inference for Mixture of Experts (MoE) models in vLLM, describe the kernel-level optimizations we performed, and show you how you can benefit from this work. We use GPT-OSS 20B as our primary example throughout this post....
Want the full story? Read the original article.
Read on AWS ML Blog ↗Share this summary
More like this
Evaluating Skills
LangChain Blog•#langchain
OpenAI launches GPT-5.4 with native computer use mode, financial plugins for Microsoft Excel, Google Sheets
VentureBeat AI•#llm
Drive organizational growth with Amazon Lex multi-developer CI/CD pipeline
AWS ML Blog•#deployment
Building custom model provider for Strands Agents with LLMs hosted on SageMaker AI endpoints
AWS ML Blog•#llm