HomeCompute

Compute

AI infrastructure and compute: GPU availability, cloud pricing, hardware releases, and how compute constraints shape model architecture decisions.

17 articles

17 articles
Satya Nadella warns that AI could hollow out entire industries, echoing the damage done by globalization
VentureBeat AI· 11 min read· Today
Satya Nadella warns that AI could hollow out entire industries, echoing the damage done by globalization

Microsoft CEO Satya Nadella warns that AI could hollow out entire industries by centralizing expertise and commoditizing it, leaving businesses without competitive advantages. He introduces the concept of "token capital" as the new currency of enterprise AI strategy, which refers to a firm's AI capability, and emphasizes the importance of human capital in driving token capital growth. Nadella argues that the solution requires a new architecture for businesses to interact with AI, focusing on building a learning loop on top of models where human capital and token capital compound. The key test of a company's sovereignty in this new era is its ability to switch out a generalist model without losing company veteran expertise. This has significant implications for engineers building AI systems, as they must consider the long-term effects of AI on industries and develop strategies to mitigate

Graviton5’s improved design increases speed and energy efficiency — beyond Moore’s law
Amazon Science· 5 min read· 5 days ago
Graviton5’s improved design increases speed and energy efficiency — beyond Moore’s law

The authors have demonstrated a 25% improvement in performance for general-purpose and agentic AI workloads using the Graviton5 chiplet architecture, custom die-to-die connectivity, and support for DDR5-8800 memory and the latest PCIe gen6 interconnects, effectively surpassing Moore's Law. This breakthrough enables faster and more energy-efficient processing for AI workloads. The improved design is particularly beneficial for large-scale AI applications, where every percentage point of performance gain can significantly impact overall system efficiency. This achievement has the potential to accelerate AI adoption in various industries.

Startup’s nuclear-inspired cooling system could make data centers more sustainable
MIT News AI· 6 min read· 6 days ago
Startup’s nuclear-inspired cooling system could make data centers more sustainable

Ferveret, a startup founded by Reza Azizian and Matteo Bucci, is developing a nuclear-inspired cooling system for data centers that uses a specialized liquid to absorb heat, reducing electricity usage and water consumption. The company's Adaptive Phase Cooling (APC) solution has shown a 15% improvement in computational power efficiency compared to state-of-the-art liquid cooling solutions. By combining APC with a power control system, Ferveret claims to enable data centers to generate 35% more tokens from their AI models with the same amount of power. This innovation has the potential to make data centers more sustainable and efficient. The practical implication for engineers building AI systems is that they can potentially reduce their energy consumption and increase their computational power efficiency by adopting Ferveret's cooling system.

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI
NVIDIA Blog· 5 min read· 5 days ago
NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

NVIDIA has optimized Google DeepMind's experimental open model, DiffusionGemma, for exceptionally fast text generation on NVIDIA GeForce RTX GPUs, RTX PRO platform, and DGX Spark systems, achieving significant speedup across local PCs and the cloud. This optimization enables real-time text generation capabilities, with the potential to accelerate applications such as chatbots, language translation, and content creation. The optimized model can be used in various settings, from local PCs to large-scale cloud deployments. This achievement highlights the importance of hardware acceleration in AI model performance.

GPU Time-Slicing for Concurrent LLM Agents on Kubernetes
Towards Data Science· Yesterday
GPU Time-Slicing for Concurrent LLM Agents on Kubernetes

The article provides a deep dive into the microarchitectural costs of Kubernetes GPU time-slicing for concurrent Large Language Model (LLM) agents. Not mentioned are specific numbers, model names, or benchmark results, but the post explores the systems-level implications of co-locating Agentic AI workloads on Kubernetes. The practical implication for engineers building AI systems is a better understanding of the hidden costs of GPU time-slicing, enabling more efficient deployment of LLM agents. The article focuses on the technical aspects of Kubernetes and GPU time-slicing, highlighting the need for careful consideration of resource allocation and workload management.

NVIDIA Confidential Computing to Help Expand Apple’s Private Cloud Compute
NVIDIA Blog· 4 min read· 6 days ago
NVIDIA Confidential Computing to Help Expand Apple’s Private Cloud Compute

NVIDIA's Confidential Computing technology is being used by Apple to support confidential inference in their Private Cloud Compute, expanding beyond Apple's data centers to Google Cloud, with NVIDIA Blackwell GPUs providing a hardware-based security layer for accelerated AI workloads. This collaboration aims to support next-generation Apple Intelligence features, leveraging the technologies behind the Gemini family of models. The adoption of NVIDIA Confidential Computing reflects a broader shift in AI infrastructure towards high-performance, server-side inference while maintaining strong privacy and security guarantees. This has significant implications for engineers building AI systems, as they must consider the importance of privacy and security in their designs.

Using Scikit-LLM with Open-Source LLMs
Machine Learning Mastery· Jun 4, 2026
Using Scikit-LLM with Open-Source LLMs

This article demonstrates the integration of Scikit-LLM with open-source LLMs, specifically Mistral, Gemma, and Llama 3, using the Ollama repository, to perform text classification tasks. The authors achieve this by leveraging Scikit-LLM's ability to handle locally hosted LLMs of manageable size, showcasing the potential for cost-effective and flexible large language model integration. However, this approach may come at the cost of model performance due to the smaller model sizes. The article highlights the use of Scikit-LLM as a viable option for developers looking to experiment with LLMs without relying on cloud-based services.

MCP solved tool calling. A2A solved coordination. What solves transport?
VentureBeat AI· 6 min read· 2 days ago
MCP solved tool calling. A2A solved coordination. What solves transport?

The AI agent ecosystem is currently in a phase of protocol proliferation, with four significant protocols published in the past eighteen months: Model Context Protocol (MCP), Agent2Agent (A2A), Agent Communication Protocol (ACP), and Agent Network Protocol (ANP). MCP has already won the tool-calling layer, with over 10,000 active public MCP servers and 164 million monthly Python SDK downloads by April 2026. A2A is a task coordination interface that defines how two agents delegate a task, while ACP is a message envelope format and ANP is a discovery and identity protocol. The practical implication for engineers building AI systems is that they need to understand the different layers of the stack and choose the appropriate protocol for their specific use case.

NVIDIA and LG Group Build an AI Factory to Advance Physical AI, Mobility and AI Infrastructure
NVIDIA Blog· 5 min read· Jun 8, 2026
NVIDIA and LG Group Build an AI Factory to Advance Physical AI, Mobility and AI Infrastructure

NVIDIA and LG Group are building an AI factory to accelerate LG Group's next wave of AI-driven businesses, utilizing NVIDIA's accelerated computing infrastructure for training, simulation, and validation of AI models in robotics, autonomous driving, data center technologies, and GPU cloud services. This collaboration aims to drive innovation in physical AI, mobility, and AI infrastructure. The AI factory will enable LG Group to develop and deploy AI solutions at scale, leveraging NVIDIA's expertise in AI computing.

NVIDIA and Doosan Group Collaborate to Advance Physical AI and AI Factory Infrastructure
NVIDIA Blog· 4 min read· Jun 7, 2026
NVIDIA and Doosan Group Collaborate to Advance Physical AI and AI Factory Infrastructure

NVIDIA and Doosan Group are expanding their collaboration to advance physical AI and AI factory infrastructure, leveraging NVIDIA's full-stack AI computing platform to integrate AI into Doosan's robotics, construction equipment, and energy solutions. The partnership aims to enhance the efficiency, safety, and productivity of Doosan's manufacturing processes and products. By combining NVIDIA's AI expertise with Doosan's industry expertise, the collaboration will drive innovation in AI factory infrastructure and robotics. This strategic partnership will enable Doosan to accelerate the development and deployment of AI-powered solutions across its various business units.

When GPU Utilization Lies: The Hidden Systems Problem Slowing Modern AI
Towards Data Science· 4 days ago
When GPU Utilization Lies: The Hidden Systems Problem Slowing Modern AI

A recent analysis reveals that relying on average GPU utilization metrics can lead to inaccurate assessments of system load, resulting in underutilized or overutilized resources. This issue is particularly pronounced in modern AI workflows, where GPU utilization can fluctuate significantly due to the varying computational demands of different tasks. As a result, engineers may need to adopt more nuanced monitoring strategies to ensure optimal resource allocation. A potential solution involves using a combination of metrics, such as GPU utilization, memory usage, and task queue lengths, to gain a more comprehensive understanding of system load.

Stop hand-tuning kernels: How Neuron Agentic Development accelerates AWS Trainium optimizations
AWS ML Blog· 12 min read· 5 days ago
Stop hand-tuning kernels: How Neuron Agentic Development accelerates AWS Trainium optimizations

AWS has introduced Neuron Agentic Development, a collection of AI agents and skills that accelerates kernel development for AWS Trainium and AWS Inferentia, reducing the need for manual kernel tuning. This capability is expected to streamline the development process and improve performance on these hardware accelerators. By leveraging AI-driven optimization, developers can focus on higher-level tasks, such as model development and deployment, while the system automatically fine-tunes the kernels for optimal performance. The Neuron Agentic Development capabilities are designed to work seamlessly with the existing AWS Trainium and AWS Inferentia infrastructure.

Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI
AWS ML Blog· 24 min read· 6 days ago
Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI

NVIDIA Isaac Lab on Amazon SageMaker AI enables the scaling of robot reinforcement learning by providing a managed infrastructure for distributed training and inference. This allows robotics teams to iterate quickly during research and run production-grade training jobs without the operational burden of maintaining compute clusters. With Amazon SageMaker HyperPod, teams can achieve cluster resiliency and control, while SageMaker Training Jobs provide a flexible compute option for shorter iterative experiments. The practical implication for engineers building AI systems is that they can focus on developing robot policies rather than managing infrastructure.

NVIDIA Enables the Next Era Of Physical AI Research With Agent Skills For Autonomous Vehicles, Robotics And Vision AI
NVIDIA Blog· 7 min read· Jun 3, 2026
NVIDIA Enables the Next Era Of Physical AI Research With Agent Skills For Autonomous Vehicles, Robotics And Vision AI

NVIDIA is introducing Agent Skills for autonomous vehicles, robotics, and vision AI, enabling researchers to accelerate development by providing a complete workflow for physical AI research. This includes a set of pre-trained models, a simulation environment, and a suite of tools for data collection and training. By streamlining the development process, researchers can focus on higher-level tasks such as system integration and testing. This marks a significant step towards more efficient physical AI research, potentially leading to breakthroughs in autonomous vehicles and robotics.

NVIDIA Partners With Microsoft on Unified Stack for Agentic AI Deployment, From Windows Devices to Cloud to Local
NVIDIA Blog· 6 min read· Jun 2, 2026
NVIDIA Partners With Microsoft on Unified Stack for Agentic AI Deployment, From Windows Devices to Cloud to Local

NVIDIA and Microsoft are collaborating on a unified stack for agentic AI deployment, integrating AI models with fast hardware, secure runtimes, and a responsive data layer across Windows devices, cloud, and local environments. This stack is designed to support long-running reasoning and real-time decision-making in AI applications. The partnership aims to accelerate the development and deployment of agentic AI systems, enabling developers to build more sophisticated and responsive AI experiences. The unified stack is expected to bridge the gap between model development and deployment, reducing the complexity and increasing the efficiency of AI development.

NVIDIA Jetson Brings Agentic AI to the Physical World
NVIDIA Blog· 5 min read· Jun 2, 2026
NVIDIA Jetson Brings Agentic AI to the Physical World

NVIDIA has announced NVIDIA JetPack 7.2 and NVIDIA NemoClaw support on NVIDIA Jetson, enabling agentic AI capabilities in the physical world. This is achieved through a substantial performance gain on the Jetson AGX Orin 32GB module, with NVIDIA CUDA 13 now supported on the NVIDIA Jetson Orin. The Yocto project is also supported, providing a flexible and customizable build system. This development brings agentic AI to the edge, empowering developers to create more sophisticated and interactive AI experiences.

NVIDIA AI Cloud Ecosystem Expands Worldwide to Meet Global AI Compute Demand
NVIDIA Blog· 7 min read· Jun 1, 2026
NVIDIA AI Cloud Ecosystem Expands Worldwide to Meet Global AI Compute Demand

NVIDIA has expanded its AI Cloud ecosystem worldwide to address the increasing global demand for AI compute resources, partnering with various organizations to scale agentic AI applications. This expansion enables enterprises, startups, and governments to access AI infrastructure, accelerating the development of AI factory infrastructure. The NVIDIA AI Cloud ecosystem now spans multiple regions, supporting a wide range of AI workloads, from research to production. This expansion is expected to drive widespread adoption of AI, but may also introduce challenges related to data management and security. The increased accessibility of AI compute resources is likely to lead to new breakthroughs in fields such as healthcare, finance, and climate modeling.

EXPLORE AI NEWS

Daily hand-picked stories on LLMs, RAG, agents and production AI — curated for engineers who ship.

BROWSE NEWS

GET THE WEEKLY DIGEST

Join engineers getting the Monday signal-over-noise AI breakdown. No spam, unsubscribe anytime.

LEARN AI ENGINEERING

Curated courses, research papers, repos and tutorials built for engineers leveling up in AI.

START LEARNING