HomeCompute

Compute

AI infrastructure and compute: GPU availability, cloud pricing, hardware releases, and how compute constraints shape model architecture decisions.

27 articles

27 articles
How the English Office for Students leverages Databricks to enhance higher education standards and drive better student outcomes
Databricks Blog· 6 min read· Yesterday
How the English Office for Students leverages Databricks to enhance higher education standards and drive better student outcomes

The English Office for Students has improved processing time for large data jobs by leveraging Databricks, reducing the time for a 300-million-record data job from 8 hours to minutes. This enhancement is expected to drive better student outcomes by enabling more efficient analysis of higher education data. The use of Databricks has significantly improved the office's ability to process large datasets, leading to enhanced higher education standards. This improvement has practical implications for engineers building AI systems, as it highlights the importance of leveraging scalable and efficient data processing tools to drive better outcomes.

NVIDIA and AWS Collaborate to Bring AI to Production at Scale
NVIDIA Blog· 4 min read· 3 days ago
NVIDIA and AWS Collaborate to Bring AI to Production at Scale

NVIDIA and AWS have collaborated to bring AI to production at scale, addressing constraints such as low-latency inference, fast vector search, and strong GPU price-performance. The NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs power new Amazon EC2 G7 instances, delivering up to 4.6x AI inference performance and up to 2.1x graphics performance compared to G6 instances. The NVIDIA cuVS library accelerates the retrieval layer by making GPU-powered vector indexing the default in OpenSearch Serverless, resulting in vector indexing up to 10x faster at a quarter of the cost. This collaboration provides enterprises with practical paths to deploy AI at production scale, enabling lower-latency inference and faster vector search.

Reliability fail: No automated zone failover for Coinbase’s global trading service
Pragmatic Engineer· 6 min read· 4 days ago
Reliability fail: No automated zone failover for Coinbase’s global trading service

Coinbase's global trading service experienced a 10-hour outage due to a regional AWS outage, revealing the company's dependency on a single AWS zone. The outage was caused by the lack of automated zone failover, which led to the loss of quorum when three of five matching-engine nodes went down. Coinbase's postmortem revealed that the company deliberately chose to run its matching engine in a single availability zone to meet latency and throughput demands. The practical implication for engineers building AI systems is to consider the tradeoffs between latency, throughput, and availability when designing distributed systems.

Improving the speed and energy-efficiency of AI agents
MIT News AI· 5 min read· 2 days ago
Improving the speed and energy-efficiency of AI agents

Researchers from MIT and Microsoft have developed an intelligent system that streamlines the process of designing agentic workflows, automatically optimizing the implementation and reducing computational units, energy requirements, and costs. The system allows developers to describe the desired workflow in plain language, without needing to specify all details in advance, and adjusts configurations on the fly based on user priorities. This approach has been shown to significantly cut energy requirements and costs compared to traditional approaches without hampering performance. The practical implication for engineers building AI systems is that they can now design and deploy more efficient agentic workflows, reducing waste and improving overall system performance.

Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel
Hugging Face Blog· 5 min read· 3 days ago
Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel

Not mentioned. The title suggests a technical improvement but lacks specific details. Not mentioned. Not mentioned. The practical implication for engineers building AI systems is not mentioned.

Graviton5’s improved design increases speed and energy efficiency — beyond Moore’s law
Amazon Science· 5 min read· Jun 10, 2026
Graviton5’s improved design increases speed and energy efficiency — beyond Moore’s law

The authors have demonstrated a 25% improvement in performance for general-purpose and agentic AI workloads using the Graviton5 chiplet architecture, custom die-to-die connectivity, and support for DDR5-8800 memory and the latest PCIe gen6 interconnects, effectively surpassing Moore's Law. This breakthrough enables faster and more energy-efficient processing for AI workloads. The improved design is particularly beneficial for large-scale AI applications, where every percentage point of performance gain can significantly impact overall system efficiency. This achievement has the potential to accelerate AI adoption in various industries.

New chip could help tiny robots traverse complex environments
MIT News AI· 6 min read· 4 days ago
New chip could help tiny robots traverse complex environments

MIT researchers have developed a new chip that enables tiny robots to construct detailed 3D maps of their environments in real-time using only about 6 milliwatts of power. The chip, called Gleanmer, combines an efficient mapping algorithm with specialized hardware to minimize memory and power consumption. This allows small autonomous robots to plan collision-free paths and navigate complex environments. The practical implication for engineers building AI systems is the potential to create more efficient and power-conscious navigation systems for robots and other devices.

Ornn raises $33M to help companies buy and sell AI compute as a commodity like oil
SiliconANGLE AI· 2 days ago
Ornn raises $33M to help companies buy and sell AI compute as a commodity like oil

Ornn AI Inc. has raised $33 million in seed funding to develop a marketplace for computing power, aiming to commodify AI compute like oil. The funding round was co-led by Galaxy Ventures and Andreessen Horowitz's crypto-focused fund, with participation from other investors. This investment will help Ornn build out its platform, enabling companies to buy and sell AI compute resources more efficiently. The practical implication for engineers building AI systems is that they may soon have access to a more fluid and dynamic market for computing resources, potentially reducing costs and increasing scalability.

NVIDIA Powers Over 400 of the World’s 500 Fastest Supercomputers
NVIDIA Blog· 4 min read· 4 days ago
NVIDIA Powers Over 400 of the World’s 500 Fastest Supercomputers

NVIDIA technologies power over 400 of the world's 500 fastest supercomputers, with 81% of the TOP500 and 90% of new systems on the list utilizing NVIDIA technology. The top eight systems on the Green500 run on NVIDIA GPUs, with the No. 1 system, KAIROS, using a single NVIDIA Grace Hopper Superchip to achieve 73.3 gigaflops per watt. NVIDIA's momentum in new deployments is driven by a preference for machines built for AI, simulation, and science, with NVIDIA systems delivering more than 2x the AI training and nearly 3x the AI inference throughput of every other platform combined. This trend has significant implications for engineers building AI systems, as accelerated computing becomes the foundation for systems tackling demanding workloads.

A better way to model the behavior of metal alloys
MIT News AI· 6 min read· Jun 19, 2026
A better way to model the behavior of metal alloys

A team of MIT researchers has developed a new approach to accurately model the behavior of metals, using machine-learning models that can simulate complex chemical arrangements in chemically disordered materials. The approach involves building training datasets that capture the diversity of atomic environments in these materials, allowing for faster and more accurate simulations. This breakthrough has the potential to accelerate materials innovation, particularly in fields such as aerospace, energy, and computing. The practical implication for engineers is that they can now use this approach to develop new materials and predict their properties, reducing the need for costly and time-consuming experimentation.

Optimize model training on Amazon SageMaker AI with NVIDIA Blackwell
AWS ML Blog· 13 min read· 2 days ago
Optimize model training on Amazon SageMaker AI with NVIDIA Blackwell

The introduction of NVIDIA Blackwell GPUs on Amazon SageMaker AI enables the optimization of model training for large AI models by reducing constraints such as batch sizes limited by GPU memory and sequence lengths cut short to avoid out-of-memory errors. With Blackwell's expanded memory and new precision formats, users can train models with larger batch sizes, longer sequence lengths, and reduced model sharding, resulting in improved throughput and reduced communication overhead. The use of PyTorch Fully Sharded Data Parallel (FSDP) and strategic application of activation checkpointing can further optimize training configurations. This leads to faster iteration cycles, less networking overhead, and lower infrastructure costs. By properly configuring Blackwell training jobs, users can process larger batch sizes without aggressive sharding and achieve better results for long-range depende

Memory maker SK hynix files for $29B US IPO amid AI demand
SiliconANGLE AI· 2 days ago
Memory maker SK hynix files for $29B US IPO amid AI demand

SK hynix Inc., the world's largest supplier of HBM memory, has filed for a $29.4 billion US IPO on the Nasdaq stock exchange, aiming to sell up to 17.79 million shares. This move is driven by the increasing demand for memory chips fueled by the growth of artificial intelligence. The IPO is expected to be the second-largest on record, following the recent listing of SpaceX Corp. The practical implication for engineers building AI systems is the potential increased availability of high-performance memory solutions to support demanding AI workloads.

Could AI tell you where you left your keys?
MIT News AI· 5 min read· Jun 17, 2026
Could AI tell you where you left your keys?

MIT researchers have developed a long-term memory framework called Describe Anything, Anywhere, Anytime, at Any Moment (DAAAM) that enables robots to rapidly form and recall a detailed mental model of complicated, large-scale environments. This framework combines advanced map representations with rich descriptions of the environment, allowing robots to quickly access this memory to answer complex queries about their environment in plain language. The DAAAM method runs fast enough for a mobile robot to use in real-time and has potential applications in robotics, augmented reality systems, and wayfinding. This advance could allow robots to work side-by-side with humans and interact better with them by reasoning about time and space in the same way humans do.

Qualcomm shares jump 14% on Modular acquisition, guidance upgrade
SiliconANGLE AI· 3 days ago
Qualcomm shares jump 14% on Modular acquisition, guidance upgrade

Qualcomm's stock jumped 14% after announcing the acquisition of Modular Inc., an inference software startup, and unveiling plans for two upcoming AI chips. The company also raised its fiscal 2029 guidance, citing increased confidence in its artificial intelligence roadmap. This move is expected to bolster Qualcomm's position in the AI market, particularly in the area of inference software. The practical implication for engineers building AI systems is the potential for improved inference capabilities and increased investment in AI research and development.

NVIDIA Vera CPU Opens the Way for Agentic Scientific AI at Los Alamos National Laboratory
NVIDIA Blog· 4 min read· 5 days ago
NVIDIA Vera CPU Opens the Way for Agentic Scientific AI at Los Alamos National Laboratory

NVIDIA has integrated its Vera CPU into Los Alamos National Laboratory's (LANL) new supercomputers, leveraging HPE's Cray Supercomputing GX5000 architecture to accelerate scientific discovery and unlock agentic AI for science. The Vera CPU is designed to provide a significant boost in performance and efficiency, enabling researchers to tackle complex scientific problems. This integration marks a significant step towards the development of autonomous, agentic AI systems that can drive scientific innovation. The resulting supercomputers will be capable of processing vast amounts of data and executing complex tasks, driving breakthroughs in fields such as physics, chemistry, and materials science.

From Materials Simulation to Experimental Astronomy, New NVIDIA AI Software Unlocks Scientific Discoveries
NVIDIA Blog· 5 min read· 5 days ago
From Materials Simulation to Experimental Astronomy, New NVIDIA AI Software Unlocks Scientific Discoveries

NVIDIA has introduced the DAQIRI library and ALCHEMI NIM microservices, accelerating AI for scientific discoveries in fields like chemistry, materials science, and astronomy. The new software leverages NVIDIA's cuPhoton reference code and can be used for tasks such as materials simulation and experimental astronomy. This technology has the potential to unlock groundbreaking discoveries, but its adoption may be limited by the complexity of integrating it with existing research pipelines. The DAQIRI library and ALCHEMI NIM microservices are designed to be highly scalable and can be easily integrated into large-scale scientific simulations.

Beyond the Straight Line: Choosing Between OLS, Interaction Terms, and Tweedie Regression
Towards Data Science· 2 days ago
Beyond the Straight Line: Choosing Between OLS, Interaction Terms, and Tweedie Regression

The choice between Ordinary Least Squares (OLS) regression, interaction terms, and Tweedie regression depends on how the data handles zeros and extreme outliers. Not mentioned are specific numbers or benchmark results, but the decision is crucial for accurately modeling complex relationships. The practical implication for engineers building AI systems is to carefully evaluate the characteristics of their data before selecting a regression method. This evaluation will help in choosing the most suitable approach to handle zeros and outliers, ensuring more accurate predictions.

Startup’s nuclear-inspired cooling system could make data centers more sustainable
MIT News AI· 6 min read· Jun 10, 2026
Startup’s nuclear-inspired cooling system could make data centers more sustainable

Ferveret, a startup founded by Reza Azizian and Matteo Bucci, is developing a nuclear-inspired cooling system for data centers that uses a specialized liquid to absorb heat, reducing electricity usage and water consumption. The company's Adaptive Phase Cooling (APC) solution has shown a 15% improvement in computational power efficiency compared to state-of-the-art liquid cooling solutions. By combining APC with a power control system, Ferveret claims to enable data centers to generate 35% more tokens from their AI models with the same amount of power. This innovation has the potential to make data centers more sustainable and efficient. The practical implication for engineers building AI systems is that they can potentially reduce their energy consumption and increase their computational power efficiency by adopting Ferveret's cooling system.

OpenAI, Broadcom debut custom Jalapeño chip for AI inference
SiliconANGLE AI· 3 days ago
OpenAI, Broadcom debut custom Jalapeño chip for AI inference

OpenAI Group PBC and Broadcom Inc. have jointly developed a custom AI inference chip called Jalapeño, designed to power large language models, with Broadcom contributing its expertise in custom silicon design. The Jalapeño chip is a result of a collaboration between the two companies, leveraging Broadcom's experience in developing custom chips, including Google's TPU line. This custom chip is expected to improve the performance and efficiency of large language models, although specific performance metrics are not provided in the article. The use of custom silicon design could enable faster and more efficient model inference, but it may also introduce compatibility and scalability challenges.

Eco Wave Power Turns Waves Into Watts With NVIDIA AI Infrastructure and Digital Twins
NVIDIA Blog· 4 min read· 5 days ago
Eco Wave Power Turns Waves Into Watts With NVIDIA AI Infrastructure and Digital Twins

Eco Wave Power, a Swedish company, has successfully harnessed wave energy to generate electricity using NVIDIA AI infrastructure and digital twins, achieving a power output of 1.5 MW. This breakthrough demonstrates the potential of AI-driven optimization in renewable energy production. The integration of digital twins enabled real-time monitoring and simulation of wave patterns, allowing for more efficient energy harvesting. This innovation has significant implications for the future of sustainable energy production.

3 Agents. 3 LLMs. 1 Aging GPU: Engineering Parallel Inference on Bare Metal
Towards Data Science· 2 days ago
3 Agents. 3 LLMs. 1 Aging GPU: Engineering Parallel Inference on Bare Metal

Engineers can now run three different large language models (LLMs) on a single 8GB GPU, bypassing the 8GB VRAM limit, by utilizing C++ layer multiplexing and admission control for parallel inference on bare metal. This approach enables the deployment of multiple models on aging hardware, reducing the need for expensive upgrades. The practical implication for engineers building AI systems is the ability to optimize resource utilization and extend the lifespan of existing infrastructure. By leveraging this technique, developers can efficiently manage model inference on limited hardware resources.

Orderful nabs $35M to streamline supply chain data management
SiliconANGLE AI· 4 days ago
Orderful nabs $35M to streamline supply chain data management

Orderful Inc. has raised $35 million in Series C funding to streamline supply chain data management using artificial intelligence. The funding round was led by Koch Disruptive Technologies and brings Orderful's total outside funding to $85 million. This investment aims to improve supply chain efficiency by leveraging AI. The practical implication for engineers building AI systems is the potential to apply AI to complex logistics and supply chain management problems.

Dell/AMD partnership: Three insights you may have missed from theCUBE’s coverage of Dell Technologies World
SiliconANGLE AI· 4 days ago
Dell/AMD partnership: Three insights you may have missed from theCUBE’s coverage of Dell Technologies World

The Dell/AMD partnership is focused on supporting the AI factory in enterprise IT, with a key emphasis on hybrid architecture to generate workloads on-premises, in the cloud, and at the edge. This partnership is crucial for production-scale deployment. The importance of hybrid architecture lies in its ability to support various workload generations. For engineers building AI systems, this partnership implies a need to consider hybrid architectures for scalable and flexible AI deployments.

Nvidia and DDN target the economics of AI infrastructure
SiliconANGLE AI· 4 days ago
Nvidia and DDN target the economics of AI infrastructure

Nvidia and DDN have introduced a joint solution to address the economic challenges of AI infrastructure, leveraging their combined expertise in data and compute to optimize performance and reduce costs. Their partnership aims to enable enterprises to extract maximum value from their AI investments by streamlining data movement and processing. This joint solution is designed to handle massive amounts of data and scale with growing AI workloads, making it an attractive option for large-scale AI deployments. By combining Nvidia's high-performance GPUs with DDN's storage solutions, the partnership has achieved significant performance improvements and cost reductions, setting a new standard for AI infrastructure economics.

Embed the world: Multimodal AI for searchable aerial imagery at scale
AWS ML Blog· 25 min read· 5 days ago
Embed the world: Multimodal AI for searchable aerial imagery at scale

The AWS Generative AI Innovation Center (GenAIIC) partnered with Vexcel to develop a multimodal AI system for searchable aerial imagery at scale, leveraging Amazon Bedrock and Amazon OpenSearch Serverless. The system uses multimodal embeddings, large language model (LLM) captioning, and vector search to enable natural-language-searchable knowledge bases. The evaluation methodology, built on OpenStreetMap ground truth, compared embedding models, fusion strategies, captioning, and search methods, with Amazon Nova Multimodal Embeddings delivering the highest F1 scores. This approach removes the per-feature training step, allowing for faster and more efficient semantic search. The practical implication for engineers building AI systems is the potential to apply this architecture to other domains, enabling faster and more efficient search capabilities.

Hotter Than a Hot Tub: The 45°C Breakthrough to Cool AI’s Biggest Machines
NVIDIA Blog· 7 min read· 5 days ago
Hotter Than a Hot Tub: The 45°C Breakthrough to Cool AI’s Biggest Machines

NVIDIA's newest AI servers can run their cooling liquid at up to 45 degrees Celsius, making them more energy efficient and achieving 100% liquid cooling with no fans in the system. The Rubin generation of NVIDIA AI infrastructure is the first to achieve this, and it is outlined in the NVIDIA DSX AI factory reference design. This liquid cooling methodology enables data centers to reduce cooling energy consumption, making a significant difference in overall data center energy use. The practical implication for engineers building AI systems is that they can design more efficient and sustainable data centers using liquid-cooled infrastructure.

France Advances Europe’s AI Future With NVIDIA Technologies
NVIDIA Blog· 6 min read· Jun 18, 2026
France Advances Europe’s AI Future With NVIDIA Technologies

France has successfully deployed AI infrastructure, leveraging NVIDIA technologies to establish national compute capacity and enable the development of open frontier models and industrial platforms, with AI agents now running in production. This marks a significant milestone in advancing Europe's AI future. The deployment combines NVIDIA's AI expertise with France's strategic investment, fostering innovation and driving economic growth. This achievement serves as a model for other European countries to follow, demonstrating the potential of collaborative efforts between governments and tech giants.

EXPLORE AI NEWS

Daily hand-picked stories on LLMs, RAG, agents and production AI — curated for engineers who ship.

BROWSE NEWS

GET THE WEEKLY DIGEST

Join engineers getting the Monday signal-over-noise AI breakdown. No spam, unsubscribe anytime.

LEARN AI ENGINEERING

Curated courses, research papers, repos and tutorials built for engineers leveling up in AI.

START LEARNING