MiniMax formally launched MiniMax M3 on June 1, 2026. The mannequin introduces MSA (MiniMax Sparse Consideration), a brand new sparse consideration structure that offers M3 a 1M-token context window. M3 additionally helps picture and video enter and desktop laptop operation natively. The API is reside now.
MiniMax M3 is accessible at this time through MiniMax Code, the MiniMax Token Plan, and the MiniMax API. It’s the subsequent mannequin within the M-series line after M2.7. MiniMax positions M3 as an open-weight mannequin combining frontier-level coding efficiency, a 1M-token context window, and native multimodal enter in a single structure — the primary to take action, per MiniMax. The corresponding mannequin weights and technical report are scheduled for launch inside 10 days of launch.
MSA: MiniMax Sparse Consideration
The central architectural change in MiniMax M3 is MSA (MiniMax Sparse Consideration). Commonplace full consideration has quadratic computational complexity: as context size grows, compute value grows because the sq. of the sequence size. MSA is designed to handle this.
Sparse consideration mechanisms typically add a pre-filtering stage earlier than computing consideration, avoiding full quadratic value. MiniMax staff states that in comparison with approaches like DSA and MoBA, MSA partitions the KV cache into blocks extra exactly, attaining larger efficient context protection.
On the operator degree, MSA makes use of a “KV outer collect Q” method. KV blocks function the outer loop to combination the queries that hit them. Every block is learn solely as soon as and reminiscence entry is contiguous. MiniMax staff studies that is greater than 4× quicker than open-source implementations similar to Flash-Sparse-Consideration and flash-moba underneath MiniMax M3’s head configuration.
The outcome: at a context size of 1 million tokens, MiniMax M3’s per-token compute is 1/twentieth that of the previous-generation M2 fashions. MiniMax staff studies a speedup of greater than 9× within the prefill stage and greater than 15× within the decoding stage at 1M-token context. Throughout a number of ablation research, MSA matched full consideration on nearly all of capabilities.
Coding and Agentic Benchmarks
Coding and agentic capabilities are key areas of enchancment for M3. The benchmark outcomes beneath are reported by MiniMax staff. A number of evaluations had been run on MiniMax inside infrastructure, whereas some comparability scores had been taken from official leaderboards or exterior benchmark sources, as famous in MiniMax’s methodology. SWE-Bench Verified was examined on inside infrastructure utilizing Claude Code scaffolding and averaged over 4 runs. SWE-Bench Professional was additionally examined on inside infrastructure utilizing Claude Code scaffolding, with testing logic aligned to the official analysis.
- SWE-Bench Professional: 59.0% (surpasses GPT-5.5 and Gemini 3.1 Professional; approaches Opus 4.7)
- Terminal-Bench 2.1: 66.0%
- SWE-fficiency: 34.8%
- KernelBench Exhausting: 28.8% (evaluated on NVIDIA Blackwell GPUs, CUDA functionality sm_120)
- MCP Atlas: 74.2%
- Claw-Eval: highest rating amongst fashions evaluated (Normal Activity Group, 161 duties)
- SVG-Bench: surpasses Opus 4.7
On OmniDocBench, a multimodal doc understanding benchmark, M3 scores above Gemini 3.1 Professional. On OSWorld-Verified (361 samples), M3 achieves a 70.06% activity completion charge for laptop use (Max Steps = 200).
MiniMax additionally constructed an interactive consumer simulator framework for coaching and analysis. It simulates multi-turn developer collaboration: requirement elaboration, answer dialogue, feedback-based correction, steady activity switching, and multi-round undertaking iteration. That is meant to cut back the hole between single-turn benchmark efficiency and real-world, multi-turn developer workflows.
Native Multimodality
MiniMax M3 underwent mixed-modality coaching from step 0. Textual content, photographs, and video are skilled collectively from the start fairly than added post-training. MiniMax staff studies that interleaved knowledge — sequences the place textual content and pictures are naturally intermixed — is extra important to mannequin efficiency than generally assumed. After rebuilding your entire knowledge pipeline for interleaved codecs, coaching knowledge was scaled to the order of 100 trillion tokens.
MiniMax M3 helps picture and video enter and might function a desktop laptop.
Actual-World Activity Examples from MiniMax
MiniMax paperwork three inside duties within the launch publish:
Paper copy: MiniMax gave MiniMax M3 the ICLR 2025 Excellent Paper Award-winning paper Studying Dynamics of LLM Finetuning and requested it to breed the experiments independently. M3 ran autonomously for practically 12 hours, produced 18 commits and 23 experimental figures, and accomplished the core experiments with out human intervention. It required multimodal functionality to learn curves and formulation, lengthy context to carry the paper and experiment logs concurrently, and coding functionality to execute the copy throughout a protracted thread.
CUDA kernel optimization: MiniMax requested MiniMax M3 to optimize an FP8 matrix multiplication (GEMM) kernel on NVIDIA Hopper structure GPUs. The mannequin began with solely a activity description, a benchmark analysis script, and a non-functional Triton skeleton — no reference implementation was supplied. Over roughly 24 hours, MiniMax M3 made 147 benchmark submissions and 1,959 instrument calls. It progressed by baseline implementation, autotune configuration era, efficiency bottleneck prognosis, CUDA Graph integration, persistent kernel rewriting, and host-side scheduling optimization. After six landmark rounds of optimization, MiniMax M3 improved Hopper FP8 {hardware} peak utilization from 7.6% to 71.3%, a 9.4× speedup. The most effective answer appeared on the 145th submission. MiniMax notes that the majority different fashions stopped making new progress inside the first 30 submissions; solely Opus 4.7 and M3 continued past that time.
PostTrainBench (autonomous mannequin coaching): MiniMax gave MiniMax M3 4 base fashions that had accomplished pretraining solely. MiniMax M3 autonomously ran the complete knowledge synthesis → coaching → analysis → iteration cycle over 12 hours with no human intervention. The goal was for the bottom fashions to amass capabilities throughout mathematical reasoning (AIME2025), instrument calling (BFCL), scientific information reasoning (GPQA Most important), arithmetic reasoning (GSM8K), and code era (HumanEval). MiniMax M3 scored 0.37, beneath Opus 4.7 (0.42) and GPT-5.5 (0.39), however forward of the opposite fashions examined.
Marktechpost’s Visible Explainer
Key Takeaways
- MiniMax M3 launched June 1, 2026; API is reside now. MiniMax has dedicated to releasing open mannequin weights and a technical report inside 10 days.
- MSA (MiniMax Sparse Consideration) delivers greater than 9× prefill and greater than 15× decoding speedup at 1M-token context versus M2, at 1/twentieth the per-token compute.
- M3 scores 59.0% on SWE-Bench Professional, surpassing GPT-5.5 and Gemini 3.1 Professional.
- M3 is natively multimodal from step 0, supporting picture and video enter, and achieves 70.06% on OSWorld-Verified for laptop use.
Try the Technical details. Additionally, be happy to comply with us on Twitter and don’t neglect to hitch our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Have to accomplice with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so on.? Connect with us

