MiniMax-01 integrates MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding, combining multimodal strengths in a single model. It features 456B parameters, with 45.9B active per inference, and supports context lengths of up to 4 million tokens.
The text component uses a hybrid architecture that blends Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE). The vision component follows a “ViT-MLP-LLM” framework, trained on top of the text model to enable advanced multimodal reasoning.
Creator MiniMax Release Date January, 2025 License MiniMax Model License Agreement Context Window 1,000,192 Image Input Support No Open Source (Weights) Yes Parameters 456B, 45.9B active at inference time Model Weights Click here
Performance Benchmarks
Core Academic Benchmarks
Tasks GPT-4o (11-20) Claude-3.5-Sonnet (10-22) Gemini-1.5-Pro (002) Gemini-2.0-Flash (exp) Qwen2.5-72B-Inst. DeepSeek-V3 Llama-3.1-405B-Inst. MiniMax-Text-01 General MMLU* 85.7 88.3 86.8 86.5 86.1 88.5 88.6 88.5 MMLU-Pro* 74.4 78.0 75.8 76.4 71.1 75.9 73.3 75.7 SimpleQA 39.0 28.1 23.4 26.6 10.3 24.9 23.2 23.7 C-SimpleQA 64.6 56.8 59.4 63.3 52.2 64.8 54.7 67.4 IFEval (avg) 84.1 90.1 89.4 88.4 87.2 87.3 86.4 89.1 Arena-Hard 92.4 87.6 85.3 72.7 81.2 91.4 63.5 89.1 Reasoning GPQA* (diamond) 46.0 65.0 59.1 62.1 49.0 59.1 50.7 54.4 DROP* (F1) 89.2 88.8 89.2 89.3 85.0 91.0 92.5 87.8 Mathematics GSM8k* 95.6 96.9 95.2 95.4 95.8 96.7 96.7 94.8 MATH* 76.6 74.1 84.6 83.9 81.8 84.6 73.8 77.4 Coding MBPP + 76.2 75.1 75.4 75.9 77.0 78.8 73.0 71.7 HumanEval 90.2 93.7 86.6 89.6 86.6 92.1 89.0 86.9
Ruler
Model 4k 8k 16k 32k 64k 128k 256k 512k 1M GPT-4o (11-20) 0.970 0.921 0.890 0.888 0.884 – – – – Claude-3.5-Sonnet (10-22) 0.965 0.960 0.957 0.950 0.952 0.938 – – – Gemini-1.5-Pro (002) 0.962 0.960 0.960 0.958 0.938 0.917 0.916 0.861 0.850 Gemini-2.0-Flash (exp) 0.960 0.960 0.951 0.957 0.937 0.860 0.797 0.709 – MiniMax-Text-01 0.963 0.961 0.953 0.954 0.943 0.947 0.945 0.928 0.910
LongBench V2
Model overall easy hard short medium long Human 53.7 100.0 25.1 47.2 59.1 53.7 w/ CoT GPT-4o (11-20) 51.4 54.2 49.7 59.6 48.6 43.5 Claude-3.5-Sonnet (10-22) 46.7 55.2 41.5 53.9 41.9 44.4 Deepseek-V3 – – – – – – Qwen2.5-72B-Inst. 43.5 47.9 40.8 48.9 40.9 39.8 MiniMax-Text-01 56.5 66.1 50.5 61.7 56.7 47.2 w/o CoT GPT-4o (11-20) 50.1 57.4 45.6 53.3 52.4 40.2 Claude-3.5-Sonnet (10-22) 41.0 46.9 37.3 46.1 38.6 37.0 Deepseek-V3 48.7 – – – – – Qwen2.5-72B-Inst. 42.1 42.7 41.8 45.6 38.1 44.4 MiniMax-Text-01 52.9 60.9 47.9 58.9 52.6 43.5
MTOB
Context Type no context half book full book Δ half book Δ full book eng → kalam (ChrF) GPT-4o (11-20) 9.90 54.30 – 44.40 – Claude-3.5-Sonnet (10-22) 20.22 53.62 55.65 33.39 35.42 Gemini-1.5-Pro (002) 16.79 53.68 57.90 36.89 41.11 Gemini-2.0-Flash (exp) 12.20 49.50 53.30 37.30 41.10 Qwen-Long 16.55 48.48 45.94 31.92 29.39 MiniMax-Text-01 6.0 51.74 51.60 45.7 45.6 kalam → eng (BLEURT) GPT-4o (11-20) 33.20 58.30 – 25.10 – Claude-3.5-Sonnet (10-22) 31.42 59.70 62.30 28.28 30.88 Gemini-1.5-Pro (002) 32.02 61.52 63.09 29.50 31.07 Gemini-2.0-Flash (exp) 33.80 57.50 57.00 23.70 23.20 Qwen-Long 30.13 53.14 32.15 23.01 2.02 MiniMax-Text-01 33.65 57.10 58.00 23.45 24.35