Home›Inference

Inference

37 curated articles on Inference for AI engineers

37 articles

Pragmatic Engineer· 6 min read· 3 days ago

The Pulse: a new trend, smart model routing

A new trend in AI engineering is smart model routing, where an "intelligent" router picks the right model for the right task to reduce spending on AI. Vendors such as Factory Router, Not Diamond, and Vercel AI gateway offer solutions that claim cost savings of 20-30%. These solutions automatically select the best model for a given task, considering factors such as cost, latency, and availability. The practical implication for engineers building AI systems is that they can optimize their AI infrastructure costs by leveraging these smart routing solutions.

The Pulse: a new trend, smart model routing

How Amazon Bedrock catches AI-generated phishing

Context vs. Memory Engineering in Agentic AI Systems

NVIDIA Unlocks AI Compute at Scale, Inviting Partners to Power the AI Infrastructure Buildout

Using Local Coding Agents

Trunk Tools' stack cut document review from 60 days to 10 by ditching general-purpose models

Best practices for multi-turn reinforcement learning in Amazon SageMaker AI

LLMs help robots understand vague instructions and focus on key details

Exclusive: LucidLink launches MCP server to give AI agents shared access to distributed files

Graviton5&#8217;s improved design increases speed and energy efficiency &#8212; beyond Moore&#8217;s law

Real-world grounding in agentic AI

Building Browser-Using AI Agents in Python

How NVIDIA’s Inference Software Stack Powers the Lowest Token Cost

New chip could help tiny robots traverse complex environments

Agentic infrastructure startup Seltz raises $12.5M to help AI agents search the web for answers

Bridging intent and execution in agentic systems

Long Context vs. Short Context Model: When Does a Long Context Model Win?

Memory maker SK hynix files for $29B US IPO amid AI demand

The Control Gap: Enterprise AI organizations have an ownership problem, not a technology problem — and most are governing it by hand

Qualcomm shares jump 14% on Modular acquisition, guidance upgrade

Open Models, Closed Environments: Palantir Brings Secure AI to US Agencies With NVIDIA Nemotron

Grammarly parent Superhuman buys AI detector GPTZero

Tokenminning: How to Get More from Your Chatbot for Less

NVIDIA and AWS Collaborate to Bring AI to Production at Scale

Time-Series LLMs, Explained with t0-alpha

Persistent Latent Memory for Multi-Hop LLM Agents: How a 6G Handover Paper Closes the Agent Cold-Start

What Can We Do When Memory Becomes the New Bottleneck in Data Engineering?

Implementing resilience patterns with Amazon Bedrock and LLM gateway

How Outpost VFX Uses AWS to Accelerate AI Model Training for Visual Effects

Fine-tune Amazon Nova models for accurate email data extraction

Pair Nova 2 Lite with Claude for cost-optimized document processing

Upbound open-sources Modelplane to optimize inference clusters

ClickHouse brings real-time analytics to agentic AI

Momentic raises the bar for software testing with agentic quality platform

NVIDIA Powers Over 400 of the World’s 500 Fastest Supercomputers

Startup’s nuclear-inspired cooling system could make data centers more sustainable

The consequences of relying on AI for accurate news

Graviton5’s improved design increases speed and energy efficiency — beyond Moore’s law