← Back
VentureBeat AI

How Shopify built an AI stack that doesn't care which models survive

4 min read
#llm#enterprise#inference
How Shopify built an AI stack that doesn't care which models survive
Level:Advanced
For:AI Engineers
TL;DR

Shopify has developed an LLM proxy that allows engineers to access multiple AI providers with automatic failover, ensuring uninterrupted workflows even when a model is shut down or updated. The proxy enables access to reporting and failover, and the company has also implemented a distillation strategy, where smaller language models (SLMs) are used to improve performance and reduce costs. In some cases, these SLMs have proven to be 2x cheaper and faster, and up to 30x cheaper and faster in more extreme cases. This approach has significant implications for engineers building AI systems, as it allows for greater flexibility and resilience in the face of changing AI landscapes.

⚡ Key Takeaways

  • Shopify's LLM proxy provides automatic failover to alternative models, such as Claude Opus or GPT 5.5, in the event of a model shutdown or update.
  • The company uses distillation to create smaller language models (SLMs) that can be more beneficial than generalized, off-the-shelf models in certain circumstances.
  • SLMs can be up to 2x cheaper and faster, and in some cases up to 30x cheaper and faster, compared to more generalized models.
  • Shopify's internal platform, Tangle, allows engineers to visualize the pipeline and deploy fine-tuned models without requiring approval.
  • The company exposes engineers to different harnesses, such as Claude Code, Codex, and GitHub Copilot, to allow them to choose the best tool for their workflow.
💡 Why It Matters

Shopify's approach to AI development has significant implications for engineers building production AI systems, as it highlights the importance of flexibility and resilience in the face of changing AI landscapes. By using an LLM proxy and distillation strategy, engineers can ensure that their workflows are not disrupted by model shutdowns or updates, and can take advantage of smaller, more special

✅ Practical Steps

  1. Implement an LLM proxy to provide automatic failover to alternative models in the event of a model shutdown or update.
  2. Use distillation to create smaller language models (SLMs) that can be more beneficial than generalized, off-the-shelf models in certain circumstances.
  3. Utilize internal platforms, such as Tangle, to visualize and deploy fine-tuned models without requiring approval.

Want the full story? Read the original article.

Read on VentureBeat AI

More like this

Your enterprise AI agents should automatically remember which model is right for which task. Mindstone built the capability with Rebel

VentureBeat AI#agents

The fuel of the future is already here: Why TRISO matters

Amazon Science#amazon

Huntington Bank: Redacting sensitive data from 400M+ documents with AWS

AWS ML Blog#deployment

Why I Stopped Using One Agent and Built a Multi-Agent Pipeline Instead

Towards Data Science#agents

EXPLORE AI NEWS

Daily hand-picked stories on LLMs, RAG, agents and production AI — curated for engineers who ship.

BROWSE NEWS

GET THE WEEKLY DIGEST

Join engineers getting the Monday signal-over-noise AI breakdown. No spam, unsubscribe anytime.

LEARN AI ENGINEERING

Curated courses, research papers, repos and tutorials built for engineers leveling up in AI.

START LEARNING