Microsoft MAI Model Specification Sheet

Overview

Developer: Microsoft AI (MAI)
Purpose: Build Microsoft’s own suite of foundation AI models to power Copilot and other products
Strategy: Reduce reliance on external models by developing in-house models for text, speech, and image AI tasks
Leadership: Headed by Mustafa Suleyman, Microsoft AI chief

Core MAI Models

MAI-1-preview: First fully in-house foundation text model undergoing testing and early Copilot integration
MAI-Voice-1: High-fidelity, expressive speech generation for Copilot features and real-time voice applications
MAI-Image-1: Microsoft’s first in-house text-to-image AI model now available in Bing Image Creator and Copilot

Technical Highlights

Architecture: Transformer-based models trained in-house with mixture-of-experts and optimized datasets
Training Infrastructure: Models like MAI-1-preview were trained on clusters of NVIDIA H100 GPUs with Microsoft’s GB200 clusters operational
Efficiency: MAI-Voice-1 can generate high-quality speech quickly, making it suitable for real-time use
Photorealistic Outputs: MAI-Image-1 excels at producing detailed nature, food, and lighting-rich images

Performance Indicators

Benchmark Testing: MAI-1-preview has been community-benchmarked and ranked on evaluation platforms
Speech Quality: High expressive fidelity and low latency generation for voice use cases
Image Quality: MAI-Image-1 charts within top models for text-to-image generation quality in industry comparisons
Real-World Use: Integrated directly into consumer and productivity tools for practical tasks

Availability & Integration

MAI models are being integrated into Microsoft Copilot experiences across Bing, Office, and other products
MAI-Image-1 is available in Bing Image Creator and Copilot Audio Expressions as of late 2025
MAI-Voice-1 powers expressive voice features in Copilot Daily and Podcasts
MAI-1-preview continues testing and phased Copilot deployment for text tasks

Use Cases

Text AI: Conversational assistants, productivity workflows, document generation
Voice Generation: Natural speech synthesis for interactive Copilot voice features
Image Creation: Creative content generation, photorealistic imagery for documents and creative tools
Copilot Integration: Enhancing Microsoft 365 and Bing experiences with proprietary AI backends

Technical Goals

Develop a family of purpose-built AI models that serve specific user needs efficiently
Ensure high performance at lower operational cost compared with larger models
Maintain Microsoft’s strategic balance between in-house models and partner technologies
Continue evolving the MAI portfolio with future specialized models

Limitations & Notes

MAI models are newer and may lag some frontier models in absolute benchmark scores
Not all MAI models are fully public with broad API access yet
Microsoft continues to use and support OpenAI and other models alongside MAI

Recent Highlights

MAI-Voice-1 and MAI-1-preview announced and tested mid-2025
MAI-Image-1 launched into Bing Image Creator and Copilot late 2025
Microsoft establishing a more independent AI roadmap with in-house foundational technologies