How Shopify built an AI stack that doesn't care which models survive
Shopify has developed an LLM proxy that allows engineers to access multiple AI providers with automatic failover, ensuring uninterrupted workflows even when a model is shut down or updated. The proxy enables access to reporting and failover, and the company has also implemented a distillation strategy, where smaller language models (SLMs) are used to improve performance and reduce costs. In some cases, these SLMs have proven to be 2x cheaper and faster, and up to 30x cheaper and faster in more extreme cases. This approach has significant implications for engineers building AI systems, as it allows for greater flexibility and resilience in the face of changing AI landscapes.
⚡ Key Takeaways
- Shopify's LLM proxy provides automatic failover to alternative models, such as Claude Opus or GPT 5.5, in the event of a model shutdown or update.
- The company uses distillation to create smaller language models (SLMs) that can be more beneficial than generalized, off-the-shelf models in certain circumstances.
- SLMs can be up to 2x cheaper and faster, and in some cases up to 30x cheaper and faster, compared to more generalized models.
- Shopify's internal platform, Tangle, allows engineers to visualize the pipeline and deploy fine-tuned models without requiring approval.
- The company exposes engineers to different harnesses, such as Claude Code, Codex, and GitHub Copilot, to allow them to choose the best tool for their workflow.
Shopify's approach to AI development has significant implications for engineers building production AI systems, as it highlights the importance of flexibility and resilience in the face of changing AI landscapes. By using an LLM proxy and distillation strategy, engineers can ensure that their workflows are not disrupted by model shutdowns or updates, and can take advantage of smaller, more special
✅ Practical Steps
- Implement an LLM proxy to provide automatic failover to alternative models in the event of a model shutdown or update.
- Use distillation to create smaller language models (SLMs) that can be more beneficial than generalized, off-the-shelf models in certain circumstances.
- Utilize internal platforms, such as Tangle, to visualize and deploy fine-tuned models without requiring approval.
Want the full story? Read the original article.
Read on VentureBeat AI ↗