Efficient

Llama 4 Scout

17B x 16 Experts Instruct

Meta's efficient MoE model optimized for speed and cost. 16 experts deliver competitive quality at half the cost of Maverick, with the same 128K context length for versatile applications.

Model Specifications

Parameters17B x 16 Experts

ArchitectureMixture of Experts (MoE)

Context Length128K tokens

Active Parameters~17B per token

DeveloperMeta AI

LicenseLlama License

Why Choose Llama 4 Scout

Efficient Architecture

16 experts provide excellent quality with lower resource needs.

128K Context

Same extended context as Maverick for long document processing.

Fast Inference

Optimized for speed with smaller expert count.

Cost Effective

Half the price of Maverick with competitive quality.

Pricing Options

Serverless API

Pay per token with auto-scaling

₹15 input /1M tokens

₹30 output /1M tokens

Auto-scaling
No minimum
99.9% uptime
Rate limits apply

Recommended

Dedicated Instance

Reserved GPU for consistent performance

₹200/hour

2x H100 GPUs
No rate limits
Fine-tuning support
Private deployment

Use Cases

High-Volume Tasks

Cost-effective solution for high-throughput applications.

Real-time Chat

Fast response times for interactive applications.

Content Moderation

Quick content analysis at scale.

Text Classification

Efficient categorization of documents and messages.

Ready to Deploy Llama 4 Scout?

Get excellent performance at half the cost of larger models.