The focus of this new AI accelerator is inference— the production deployment of AI models in applications. Its architecture combines high compute performance with a newly designed memory system and a scalable networking architecture. With these new chips, Microsoft is equipping its global cloud infrastructure for the next generation of AI workloads. Maia 200 is specifically designed for compute‑intensive AI inference and integrates seamlessly into Microsoft Azure.
Key highlights at a glance
- Maia 200 delivers over 10 petaFLOPS at 4‑bit precision (FP4) and more than 5 petaFLOPS at 8‑bit precision (FP8), based on cutting‑edge 3‑nanometer technology. The networking design scales over standard Ethernet to clusters of up to 6,144 AI accelerators.
- This means that a single Maia 200 can effortlessly run today’s largest AI models while still leaving ample headroom for even larger models in the future.
- Maia 200 will initially be deployed in U.S. regions of the Microsoft Azure cloud and will be used for AI models from the Microsoft Superintelligence team. It will accelerate projects such as Azure AI Foundry (Microsoft’s integrated and interoperable AI platform for developing AI applications and agents) and support Microsoft 365 Copilot.
- Microsoft’s integrated model — combining chips, AI models, and applications— creates a unique competitive advantage. Because Microsoft operates some of the world’s most demanding AI workloads, it can tightly align chip design, model development, and application‑level optimization.
- Alongside the introduction of Maia 200, Microsoft is previewing the Maia Software Development Kit (SDK). The SDK supports common AI frameworks and helps developers optimize their models specifically for deployment on Maia systems. The new Maia SDK includes a Triton compiler, PyTorch support, NPL programming, as well as a simulator and cost calculator.
Read the full English blog post by Scott Guthrie, Executive Vice President – Cloud and AI at Microsoft, on the introduction of Maia 200 here.