Frontier coding
Strong SWE-Bench Pro and Terminal-Bench results for real coding and debugging.
Use MiniMax M3 in a clean web app — an open-weight frontier model for coding, agentic workflows, and 1M-token context. Keep Thinking on for hard problems, and turn on web search when you need current sources.
Starter prompts
Community Reviews
Hands-on MiniMax M3 reviews and tests from the developer community.
A deep hands-on test of MiniMax M3 across coding and agentic tasks.
First real-world tests of M3's agentic coding and long context.
Using MiniMax M3 as the coding model inside Claude Code.
Full walkthrough plus how to try M3 through a free API.
Cost-and-quality comparison of MiniMax M3 against Opus 4.7.
Overview of what MiniMax M3 brings in a single model.
MiniMax M3 is an open-weight frontier model built for coding, agentic tool use, and long-context work. It accepts native text, image, and video input, and its MSA architecture cuts per-token compute at 1M context to about 1/20 of the previous generation.
Released with open weights, model ID minimax/minimax-m3, focused on coding and agentic tasks.
Multi-Sparse-Attention keeps 1M-token context affordable — about 1/20 the per-token compute of the prior generation.
Accepts text, image, and video input with text output.
Where MiniMax M3 stands out for day-to-day engineering and agent work.
Strong SWE-Bench Pro and Terminal-Bench results for real coding and debugging.
$0.30 / 1M input and $1.20 / 1M output — frontier quality at a fraction of typical cost.
Hold whole repos, long documents, and multi-step agent traces in one context.
Built for multi-step tool calling, MCP, and autonomous workflows.
Open-weight release you can self-host and audit.
MSA architecture keeps 1M-context responses fast and affordable.
The core capabilities you get when you chat with MiniMax M3 on M-Chat.
Toggle extended reasoning for harder coding, planning, and analysis.
Server-side Tavily search injects current evidence before the model answers.
Send text, images, and video as described by the model's capabilities.
Write, refactor, and debug code, or drive multi-step agent tasks.
Work across large codebases and long documents without losing context.
Use model ID minimax/minimax-m3 in your own apps.
Full MiniMax M3 benchmark table from the official release, covering coding, cowork agent, GUI, multimodal, and reasoning evaluations.
| Benchmark | MiniMax M3 | MiniMax M2.7 | Claude Opus 4.7 | GPT 5.5 | Gemini 3.1 Pro | Claude Sonnet 4.6 | DeepSeek V4 Pro | GLM 5.1 Thinking | Kimi K2.6 Thinking |
|---|---|---|---|---|---|---|---|---|---|
| Coding | |||||||||
| SWE-Bench Verified | 80.5 | 79.9 | 87.6 | 82.9 | 80.6 | 79.6 | 80.6 | - | 80.2 |
| SWE-Bench Pro | 59.0 | 56.2 | 64.3 | 58.6 | 54.2 | - | 55.4 | 58.4 | 58.6 |
| Terminal Bench 2.1 | 66.0 | 51.1 | 66.1 | 78.2 | 70.3 | - | - | - | - |
| SWE Atlas-QnA | 37.9 | 11.29 | 45.16 | 45.43 | 13.5 | 31.20 | - | - | - |
| nl2repo | 42.13 | 34.99 | 56.28 | 52.9 | 21.62 | - | 35.5 | 41 | 42.8 |
| SWE Atlas-Test Writing | 30.83 | 18.89 | 38.21 | 42.59 | 29.84 | 31.76 | - | - | - |
| SWE-fficiency | 34.8 | 13.98 | 42.2 | 46.6 | 19.7 | - | - | - | - |
| LiveSQLBench | 40.17 | 33.17 | 41.00 | 40.17 | 39.83 | - | - | - | - |
| CL-bench | 20.48 | 15.38 | 22.92 | 25.38 | 21.06 | - | - | - | - |
| VIBE-V2 | 50.12 | 37.89 | 55.87 | 50.50 | 28.00 | - | - | - | - |
| SVG-Bench | 63.7 | 48.0 | 62.3 | 58.2 | 59.2 | - | - | - | - |
| PostTrainBench | 37.1 | 13.1 | 42.4 | 39.3 | 15.2 | - | - | - | - |
| KernelBench Hard | 28.8 | 10.5 | 30.7 | 20.9 | 18.6 | - | - | - | - |
| PaperBench | 52.6 | 30.6 | 58.5 | 57.5 | 46.7 | - | - | - | - |
| Cowork (Agent) | |||||||||
| BrowseComp | 83.52 | 76.3 | 79.3 | 84.4 | 85.9 | 74.7 | 83.4 | 79.3 | 83.2 |
| DRACO | 73.23 | 66.77 | 77.7 | - | - | 75.8 | - | - | - |
| GDPval rubrics | 74.78 | 66.44 | 79.8 | 80.66 | 57.82 | 75.65 | 70.32 | 68.26 | 65.12 |
| BankerToolBench | 76.12 | 63.89 | 81.34 | 70.04 | 67.03 | - | - | - | - |
| OfficeQA Pro | 45.1 | - | 43.6 | 52.6 | 18.1 | - | - | - | - |
| SpreadSheetBench-v1 | 89.35 | 84.92 | 88.49 | 88.11 | 56.06 | - | 84.9 | 85.2 | 84.5 |
| YC-Bench | 2.10M | 0 | 2.19M | 1.28M | 1.05M | - | - | - | - |
| LOCA-Bench (256k) | 49.3 | 0 | 57 | - | - | - | - | - | - |
| MCP Atlas | 74.2 | 49.4 | 77 | 75.3 | 69.2 | 61.3 | 73.6 | 71.8 | 66.6 |
| Apex-Agents | 27.7 | 5.6 | 37.2 | 41.7 | 33.4 | 26.2 | - | - | - |
| Claw-Eval | 74.5 | 49.7 | 71.6 | - | 57.8 | 68.3 | 58.4 | 62.7 | 61.5 |
| GUI | |||||||||
| OSWorld-Verified | 70.06 | - | 82.8 | 78.7 | 76.2 | 72.5 | 80.6 | - | 73.1 |
| MultiModal | |||||||||
| OmniDocBench | 91.6 | - | 89.3 | 87.5 | 88.1 | 86.9 | - | - | - |
| MMMU-Pro | 78.1 | - | 77 | 81.2 | 80.5 | 74.5 | - | - | 79.4 |
| Video-MMMU | 84.6 | - | 83 | 86.4 | 87.9 | - | - | - | - |
| VideoMME (w/ sub) | 85.4 | - | - | 89.4 | 87.9 | - | - | - | - |
| Reasoning | |||||||||
| IMO 2025 | 35 / 42 | - | - | - | - | - | - | - | - |
| USAMO 2026 | 36 / 42 | - | 52.8% | 98.21% | 74.40% | - | - | - | - |
Source: MiniMax M3 official release full benchmark table. Dashes match empty cells in the source table.
Updated 2026-06-01Choose an M-Chat plan for MiniMax M3 access.
Includes
Includes
Includes
Includes
Common questions about M-Chat, MiniMax M3, Thinking, web search, and local validation.
M-Chat is an independent web chat and research site for MiniMax M3. It is not MiniMax official infrastructure.
The chat backend uses MiniMax M3, model ID minimax/minimax-m3.
MiniMax M3 is described as supporting text, image, and video inputs with text output. M-Chat is validated for text chat.
Thinking enables extended reasoning for prompts that need deeper analysis.
The server calls Tavily search and provides the search result context to MiniMax M3 before the answer is generated.
MiniMax M3 is released as an open-weight model, and M-Chat gives every signed-in user 10 free credits to try it. Beyond that, access is paid: OpenRouter and the official MiniMax API bill per token, and M-Chat sells credit plans.
On OpenRouter, MiniMax M3 lists at $0.30 per 1M input tokens and $1.20 per 1M output tokens during its launch discount. The official MiniMax API is tiered by input length, with higher rates above 512K tokens. Verify live pricing before budgeting.
MiniMax M3 ships as an open-weight model you can download and self-host, but the release stops short of a full open-source license. Check MiniMax's terms before building a commercial product on the weights.
MiniMax M3 targets coding and agentic work, scoring 59.0% on SWE-Bench Pro and 66.0% on Terminal-Bench 2.1 in MiniMax's release. It narrowly leads the open-access group on SWE-Bench Pro, though closed models like Claude Opus 4.8 score higher.
MiniMax reports M3 just ahead of GPT-5.5 on SWE-Bench Pro and stronger on BrowseComp. Claude Opus 4.8 leads on raw coding (about 69.2% on SWE-Bench Pro) but costs far more. M3's pitch is near-frontier quality at a fraction of the price.
MiniMax M3 has a 1M-token context window, larger than many peers such as Kimi K2.6 at 256K. That makes it practical for full-repository analysis, long transcripts, and tool-heavy agent sessions inside a single prompt.
MiniMax M3 uses the model ID minimax/minimax-m3. M-Chat routes chat through OpenRouter's unified API, so you can reach the same model with an OpenRouter key and that model ID.
Report notes, benchmark, pricing, and model comparison guides for MiniMax M3.

MiniMax M3 benchmark results on SWE-Bench Pro, Terminal-Bench, BrowseComp, and MCP Atlas, how the model compares to GPT-5.5 and Claude Opus 4.8, and what early developers are saying.

MiniMax M3 price across OpenRouter and the official MiniMax API, how it compares to Claude Opus 4.8 and GPT-5.5 on cost, M-Chat credits, and what the community says.

MiniMax M3 vs DeepSeek V4 Pro on SWE-Bench Pro, Terminal-Bench, BrowseComp, and MCP Atlas, where each model leads, community reception, and how to choose for production.

MiniMax M3 vs GLM 5.1 on SWE-Bench Pro, Terminal-Bench, BrowseComp, and MCP Atlas, where the gaps show up, community reception, and how to evaluate both for real work.

MiniMax M3 vs Kimi K2.6 on coding and terminal benchmarks, the 1M vs 256K context gap, community reception, and how to choose for chat and agent workflows.