A microfluidics computer chip showing microchannel grooves
AI Innovation

AI chips are getting hotter. A microfluidics breakthrough goes straight to the silicon to cool up to three times better.

By Catherine Bolgar

AI is hot – literally.

The chips that datacenters use to run the latest AI breakthroughs generate much more heat than previous generations of silicon. Anybody whose phone or laptop has overheated knows that electronics don’t like to get hot. In the face of rising demand for AI and newer chip designs, the current cooling technology will put a ceiling on progress in just a few years.

To help address this problem, Microsoft has successfully tested a new cooling system that removed heat up to three times better than cold plates, an advanced cooling technology commonly used today. It uses microfluidics, an approach that brings liquid coolant directly inside the silicon – where the heat is. Tiny channels are etched directly on the back of the silicon chip, creating grooves that allow cooling liquid to flow directly onto the chip and more efficiently remove heat. The team also used AI to identify the unique heat signatures on a chip and direct the coolant with more precision. 

Researchers say microfluidics could boost efficiency and improve sustainability for next-generation AI chips. Most GPUs operating in today’s datacenters are currently cooled with cold plates, which are separated from the heat source by several layers that limit the amount of heat they can remove.

As each new generation of AI chips becomes more powerful, they generate more heat. In as soon as five years, “if you’re still relying heavily on traditional cold plate technology, you’re stuck,” said Sashi Majety, senior technical program manager for Cloud Operations and Innovation at Microsoft.

Today, Microsoft announced that it has successfully developed an in-chip microfluidic cooling system that can effectively cool a server running core services for a simulated Teams meeting.

A microfluidics computer chip showing microchannel grooves
Microsoft has demonstrated a new way to cool silicon chips using microfluidics. Channels are etched in the silicon that allow cooling liquid to flow directly onto the chip and more efficiently remove heat. The team also used AI to identify the unique heat signatures on a chip and direct the coolant with more precision. Photo by Dan DeLong for Microsoft.

“Microfluidics would allow for more power-dense designs that will enable more features that customers care about and give better performance in a smaller amount of space,” said Judy Priest, corporate vice president and chief technical officer of Cloud Operations and Innovation at Microsoft.

“But we needed to prove the technology and the design worked, and then the very next thing I wanted to do was test reliability,” Priest said.

The company’s lab-scale tests showed microfluidics performed up to three times better than cold plates at removing heat, depending on workloads and configurations involved. Microfluidics also reduced the maximum temperature rise of the silicon inside a GPU by 65 percent, though this will vary by the type of chip. The team expects the advanced cooling technology would also improve power usage effectiveness, a key metric for measuring how energy efficient a datacenter is, and reduce operational costs.

Using AI to mimic nature

Microfluidics is not a new concept, but getting it to work has been a challenge across the industry. “Systems thinking is crucial when developing a technology like microfluidics. You need to understand systems interactions across silicon, coolant, server and the datacenter to make the most of it,” said Husam Alissa, director of systems technology in Cloud Operations and Innovation at Microsoft.

Just getting the grooves right is hard. The microchannel dimensions are similar in size to human hair, meaning there’s no margin for error. As part of the prototyping effort, Microsoft collaborated with Swiss startup Corintis to use AI to help optimize a bio-inspired design to cool chips’ hot spots more efficiently than straight up-and-down channels, which they also tested. The bio-design resembles the veins in a leaf or a butterfly wing – nature has proven adept at finding the most efficient routes to distribute what’s needed. 

Microfluidics requires more than innovative channel design. It is a complex engineering challenge.

It required ensuring that the channels are deep enough to circulate adequate cooling liquid without clogging while not being so deep as to weaken the silicon such that it risks breaking. The team produced four design iterations in the past year alone.

Microfluidics also required designing a leak-proof package for the chip, finding the best coolant formula, testing different etching methods and developing a step-by-step process for adding etching to manufacturing the chips.

The breakthrough is just one example of how Microsoft is investing and innovating in infrastructure to meet demand for AI services and capabilities. For example, the company plans to spend over $30 billion on capital expenditures in the current quarter.

Those investments include developing its own family of Cobalt and Maia chips designed specifically to run Microsoft and customer workloads more efficiently. Since Microsoft deployed its Cobalt 100 chip, for instance, Microsoft and its customers are benefiting from its energy-efficient compute power, scalability and performance.

Chips are just one piece of the puzzle, though, since the silicon works within a complex system of boards, racks and servers within a datacenter. Microsoft’s systems approach means fine tuning every part of this stack to work together and maximize performance and efficiency. An important part of that is developing next-generation cooling techniques like microfluidics.

As a next step, Microsoft continues to investigate how microfluidic cooling can be incorporated into future generations of its first-party chips. It will also continue to work with fabrication and silicon partners to bring microfluidics into production across its datacenters, the company said.

“Hardware is the foundation of our services,” said Jim Kleewein, technical fellow, Microsoft 365 Core Management. “We all have a vested interest in that foundation – how reliable it is, how cost effective, how fast, how consistent the behavior we can get from it, and how sustainable, to name just a few. Microfluidics improves each of those: cost, reliability, speed, consistency of behavior, sustainability.”

Advantages of microfluidics

A simple Microsoft Teams call, for instance, illustrates the advantages microfluidic cooling could offer. Teams isn’t a single service but a set of about 300 different services that cooperate seamlessly. One connects a customer to a meeting, another hosts the meeting, another stores the chat, another merges audio streams so that when multiple people talk everyone is heard, another records, another transcribes.

“Each service has different characteristics and stresses different parts of the server,” Kleewein said. “The more heavily utilized a server is, the more heat it generates, which makes sense.”

A microfluidics computer chip housed in a computer server
This microfluidics chip developed by Microsoft is covered and has tubing attached so the coolant can flow safely. Photo by Dan DeLong for Microsoft.

For example, most Teams calls tend to start on the hour or the half-hour. The call controller gets very busy about five minutes before to three minutes after those times and isn’t very busy the rest of the time. There are two ways to handle peaks in demand – install a lot of expensive extra capacity that isn’t used most of the time or run the servers harder, which is called overclocking. Because overclocking makes chips even hotter, it can’t be done too much or it can damage chips.

“Whenever we have spiky workloads, we want to be able to overclock. Microfluidics would allow us to overclock without worrying about melting the chip down because it’s a more efficient cooler of the chip,” Kleewein said. “There are advantages in cost and reliability. And speed, because we can overclock.”

How cooling fits into the bigger picture

Microfluidics is part of a bigger Microsoft initiative to advance next-generation cooling techniques and optimize every part of the cloud stack. Traditionally, datacenters have been cooled with air blown by large fans, but liquids conduct heat much more efficiently than air does.

One form of liquid cooling Microsoft has already deployed in its datacenters is cold plates. The plates sit on top of the chips, with cold liquid coming in, circulating through channels inside the plates to pick up heat from the chips below, and hot liquid going out to be cooled down.

Chips are packaged with layers of materials to help spread their heat away from hot spots and to protect them. But these materials also act like blankets, limiting the performance of cold plates by holding in heat and keeping out the cold. Future generations of chips that work well for AI are expected to be even more powerful – and to get too hot to be cooled by cold plates.

Cooling chips directly through microfluidic channels is far more efficient – not just for taking away heat but also for running the overall system. With all those layers of insulation removed and coolant directly touching the hot silicon, the coolant doesn’t need to be anywhere near as cold in order to do its job. That would save energy that won’t be needed to chill the coolant, while doing a better job than current cold plates. Microfluidics technology also enables higher-quality waste heat use.

Microsoft also aims to optimize datacenter operations through software and other approaches. “If microfluidic cooling can use less power to cool the datacenters, that will put less stress on energy grids to nearby communities,” said Ricardo Bianchini, Microsoft technical fellow and corporate vice president for Azure specializing in compute efficiency.

Heat also puts limits on datacenter design. One benefit of a datacenter for computing is that servers are physically close together. Distance slows communication between servers – something called latency. But today’s servers can be packed together only so tightly before heat becomes a problem. Microfluidics would allow datacenters to increase the density of servers. That means datacenters could potentially increase compute without requiring additional buildings.

The future of chip innovation

Microfluidics also has the potential to open the door to completely new chip architectures, such as 3D chips. Just as putting servers close together reduces latency, stacking chips reduces it even more. This kind of 3D architecture is challenging to build because of the heat it generates.

However, microfluidics brings coolant extremely close to where power is consumed so “we might flow liquid through the chip,” as would be the case with 3D designs, Bianchini said. That would involve a different microfluidics design using cylindrical pins between the stacked chips, a bit like pillars in a multilevel parking garage, with fluid flowing around them.

“Anytime we can do things more efficiently and simplify this opens up the opportunity for new innovation where we could look at new chip architectures,” Priest said.

Removing the limit set by heat could also allow for more chips in a datacenter rack or more cores on a chip, which would improve speed and allow for smaller but more powerful datacenters.

By demonstrating how new cooling techniques such as microfluidics can be made to work, Microsoft hopes to help pave the way for more efficient and sustainable next-generation chips across the industry, the company said.

“We want microfluidics to become something everybody does, not just something we do,” Kleewein said. “The more people that adopt it the better, the faster the technology is going to develop, the better it’s going to be for us, for our customers, for everybody.”

Related Links: