Meta Llama 4 Model Specification Sheet

Overview

Developer: Meta (formerly Facebook) AI
Model Family: Llama 4 Series
Release Date: April 5, 2025
Nature: Multimodal large language models with industry-leading context capabilities
Primary Variants: Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth (preview)

Architecture: Auto-regressive Transformer with Mixture-of-Experts (MoE)
Multimodal Support: Native text + image + vision input handling
Mixture-of-Experts: Routes tokens through subsets of expert modules for efficiency and scalability
Training Data: Trained on a large multilingual corpus across many domains

Llama 4 Scout:
- 17B active parameters with 16 experts
- ~109B total parameters
- Industry-leading context window — up to 10 million tokens
- Designed to fit and run on a single NVIDIA H100 GPU
- Fast and efficient for document-scale applications
Llama 4 Maverick:
- 17B active parameters with 128 experts
- ~400B total parameters
- Context window up to ~1 million tokens
- General-purpose high-performance multimodal model
- Competitive with top tier models on reasoning, coding, and multimodal tasks
Llama 4 Behemoth:
- Massive model with up to ~2 trillion parameters (preview)
- Designed for high-end STEM and complex reasoning benchmarks
- Still in training and not fully released

Long Context Handling: Scout’s 10M token context supports whole documents, books, and extended dialogues in one session
Multimodal Reasoning: Native support for joint text and image understanding
Maverick performs strongly on reasoning, coding, and multimodal benchmarks
Llama 4 family prioritizes efficient compute via MoE for practical deployment

Llama 4 models are available for download and use on platforms such as Hugging Face and Meta AI services
Deployable via cloud services like AWS SageMaker JumpStart and Amazon Bedrock
Scout and Maverick readily accessible for developers; Behemoth remains in preview

The full 10M context capability may vary in practical API scenarios, with various implementations limiting context sizes for cost or performance reasons
Llama 4 Behemoth's release timeline may be postponed depending on performance progression
Real-world benchmarks vary, and not every task fully realizes the advertised context

Llama 4 Scout set a new industry benchmark for context length with support for multi-million token sequences
Llama 4 Maverick positions Meta’s open-weight models as strong competitors to leading AI systems
Meta continues to enhance Llama ecosystem with future models like Behemoth and advanced reasoning versions