Overview
- Developer: Meta (formerly Facebook) AI
- Model Family: Llama 4 Series
- Release Date: April 5, 2025
- Nature: Multimodal large language models with industry-leading context capabilities
- Primary Variants: Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth (preview)
Core Technologies
- Architecture: Auto-regressive Transformer with Mixture-of-Experts (MoE)
- Multimodal Support: Native text + image + vision input handling
- Mixture-of-Experts: Routes tokens through subsets of expert modules for efficiency and scalability
- Training Data: Trained on a large multilingual corpus across many domains
Model Variants & Specs
-
Llama 4 Scout:
- 17B active parameters with 16 experts
- ~109B total parameters
- Industry-leading context window — up to 10 million tokens
- Designed to fit and run on a single NVIDIA H100 GPU
- Fast and efficient for document-scale applications
-
Llama 4 Maverick:
- 17B active parameters with 128 experts
- ~400B total parameters
- Context window up to ~1 million tokens
- General-purpose high-performance multimodal model
- Competitive with top tier models on reasoning, coding, and multimodal tasks
-
Llama 4 Behemoth:
- Massive model with up to ~2 trillion parameters (preview)
- Designed for high-end STEM and complex reasoning benchmarks
- Still in training and not fully released
Performance & Capabilities
- Long Context Handling: Scout’s 10M token context supports whole documents, books, and extended dialogues in one session
- Multimodal Reasoning: Native support for joint text and image understanding
- Maverick performs strongly on reasoning, coding, and multimodal benchmarks
- Llama 4 family prioritizes efficient compute via MoE for practical deployment
Use Cases
- Long-form document summarization and analysis
- Large codebase understanding and generation
- Multimodal chat and assistant applications
- Knowledge-intensive workflows and data extraction
Deployment & Availability
- Llama 4 models are available for download and use on platforms such as Hugging Face and Meta AI services
- Deployable via cloud services like AWS SageMaker JumpStart and Amazon Bedrock
- Scout and Maverick readily accessible for developers; Behemoth remains in preview
Limitations & Notes
- The full 10M context capability may vary in practical API scenarios, with various implementations limiting context sizes for cost or performance reasons
- Llama 4 Behemoth's release timeline may be postponed depending on performance progression
- Real-world benchmarks vary, and not every task fully realizes the advertised context
Recent Highlights
- Llama 4 Scout set a new industry benchmark for context length with support for multi-million token sequences
- Llama 4 Maverick positions Meta’s open-weight models as strong competitors to leading AI systems
- Meta continues to enhance Llama ecosystem with future models like Behemoth and advanced reasoning versions
Go Back
>Quit Program