VentureBeat AI

New AI optimization framework beats Claude Code and Codex by 2.5x on the same compute budget

June 18, 2026•9 min read•

Level:Advanced

For:AI Engineers

✦TL;DR

Researchers from Renmin University of China and Microsoft Research introduced Arbor, a framework that optimizes AI-driven research and optimization, outperforming Claude Code and Codex by 2.5x on the same compute budget. Arbor organizes hypotheses, experiments, and insights into a tree, enabling cumulative learning from prior failures. This approach automates the continuous improvement of complex engineering systems, addressing the challenge of autonomous optimization. The practical implication for engineers building AI systems is that Arbor can significantly improve the performance of AI agents in real-world engineering tasks.

⚡ Key Takeaways

Arbor delivered more than 2.5 times the verifiable performance gains of standard AI coding agents.
Arbor organizes hypotheses, experiments, and insights into a tree to help the system learn from prior failures.
Autonomous optimization (AO) is a fundamental loop of autonomous research that requires iterative improvement of an artifact through experimental feedback.
Current agent systems lack the capacity to accumulate and act on what they've learned from each attempt.

💡 Why It Matters

The introduction of Arbor has significant implications for engineers building AI systems, as it enables cumulative learning and automation of continuous improvement, leading to improved performance and efficiency. This can revolutionize the field of autonomous optimization, allowing AI agents to learn from their mistakes and make smarter decisions.

✅ Practical Steps

Apply the concepts from this article to your own system design, incorporating Arbor's framework for cumulative learning and optimization.
Consider integrating Arbor into your existing AI agent architecture to improve performance and efficiency.

Want the full story? Read the original article.

Read on VentureBeat AI ↗

New AI optimization framework beats Claude Code and Codex by 2.5x on the same compute budget

⚡ Key Takeaways

✅ Practical Steps

More like this

Monitor and debug generative AI inference with SageMaker detailed metrics and Insights dashboard on CloudWatch

Anthropic's Claude Code Artifacts update brings live, shared dashboards and interactive workspaces to enterprises

Structured Outputs with LLMs: JSON Mode, Function Calling, and When to Use Each

At Cannes Lions, NVIDIA Partners Reshape Advertising and Marketing With AI