← Back
NVIDIA Blog

NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark

4 min read
#agents#inference#nvidia
Level:Advanced
For:AI Engineers
TL;DR

The NVIDIA Blackwell Ultra NVL72 platform has achieved leading performance in the first round of the AgentPerf benchmark, a new industry standard for agentic AI infrastructure, running 20x more agents per megawatt than the NVIDIA Hopper. This benchmark measures the performance of systems in handling complex, multi-step AI workloads, which are fundamentally different from conversational AI. The results demonstrate the importance of codesign and optimization across the full stack for achieving high performance in agentic AI. The practical implication for engineers building AI systems is that they need to consider the unique requirements of agentic AI workloads when designing and optimizing their systems.

⚡ Key Takeaways

  • The NVIDIA Blackwell Ultra NVL72 platform delivers leading performance in the AgentPerf benchmark, running 20x more agents per megawatt than the NVIDIA Hopper.
  • The AgentPerf benchmark measures agentic performance using the DeepSeek V4 Pro model, a large mixture-of-experts (MoE) model.
  • The NVIDIA GB300 NVL72 supports far more concurrent agents per megawatt than the NVIDIA H200 at both service-level objectives of 20 and 60 tokens per second per agent.
  • The performance advantage of the NVIDIA GB300 NVL72 comes from extreme codesign across the full stack, including the connection of 72 GPUs into a single rack-scale system.
  • NVIDIA TensorRT LLM sustains efficiency as concurrent agent sessions scale by separating the processing of inputs from the generation of outputs.
💡 Why It Matters

The AgentPerf benchmark provides a clear way to compare systems for agentic AI, which is crucial for companies building and deploying agents at scale. The results of this benchmark can help engineers optimize their systems for agentic AI workloads, leading to improved performance and efficiency.

✅ Practical Steps

  1. Use the AgentPerf benchmark to evaluate the performance of your agentic AI system.
  2. Consider the unique requirements of agentic AI workloads when designing and optimizing your system, including the need for extreme codesign across the full stack.
  3. Utilize NVIDIA TensorRT LLM to sustain efficiency as concurrent agent sessions scale.

Want the full story? Read the original article.

Read on NVIDIA Blog

More like this

Enterprise-grade AI image generation in 2 seconds is here: Krea 2 Raw and Turbo available as open weights under custom license

VentureBeat AI#llm

Genesis Workbench: A blueprint for industry AI in life sciences, powered by Databricks and NVIDIA

Databricks Blog#compute

Build a protein research copilot with Amazon Bedrock AgentCore

AWS ML Blog#agents

How Businesses Are Building Specialized AI They Can Trust

NVIDIA Blog#agents

EXPLORE AI NEWS

Daily hand-picked stories on LLMs, RAG, agents and production AI — curated for engineers who ship.

BROWSE NEWS

GET THE WEEKLY DIGEST

Join engineers getting the Monday signal-over-noise AI breakdown. No spam, unsubscribe anytime.

LEARN AI ENGINEERING

Curated courses, research papers, repos and tutorials built for engineers leveling up in AI.

START LEARNING