MiniMax Releases MiniMax M3 with MSA Structure Supporting 1M-Token Context, Native Multimodality, and Agentic Coding

by root June 1, 2026

written by root June 1, 2026 0 comment 77 views

MiniMax formally launched MiniMax M3 on June 1, 2026. The mannequin introduces MSA (MiniMax Sparse Consideration), a brand new sparse consideration structure that offers M3 a 1M-token context window. M3 additionally helps picture and video enter and desktop laptop operation natively. The API is reside now.

MiniMax M3 is accessible at this time through MiniMax Code, the MiniMax Token Plan, and the MiniMax API. It’s the subsequent mannequin within the M-series line after M2.7. MiniMax positions M3 as an open-weight mannequin combining frontier-level coding efficiency, a 1M-token context window, and native multimodal enter in a single structure — the primary to take action, per MiniMax. The corresponding mannequin weights and technical report are scheduled for launch inside 10 days of launch.

MSA: MiniMax Sparse Consideration

The central architectural change in MiniMax M3 is MSA (MiniMax Sparse Consideration). Commonplace full consideration has quadratic computational complexity: as context size grows, compute value grows because the sq. of the sequence size. MSA is designed to handle this.

Sparse consideration mechanisms typically add a pre-filtering stage earlier than computing consideration, avoiding full quadratic value. MiniMax staff states that in comparison with approaches like DSA and MoBA, MSA partitions the KV cache into blocks extra exactly, attaining larger efficient context protection.

On the operator degree, MSA makes use of a “KV outer collect Q” method. KV blocks function the outer loop to combination the queries that hit them. Every block is learn solely as soon as and reminiscence entry is contiguous. MiniMax staff studies that is greater than 4× quicker than open-source implementations similar to Flash-Sparse-Consideration and flash-moba underneath MiniMax M3’s head configuration.

The outcome: at a context size of 1 million tokens, MiniMax M3’s per-token compute is 1/twentieth that of the previous-generation M2 fashions. MiniMax staff studies a speedup of greater than 9× within the prefill stage and greater than 15× within the decoding stage at 1M-token context. Throughout a number of ablation research, MSA matched full consideration on nearly all of capabilities.

Coding and Agentic Benchmarks

Coding and agentic capabilities are key areas of enchancment for M3. The benchmark outcomes beneath are reported by MiniMax staff. A number of evaluations had been run on MiniMax inside infrastructure, whereas some comparability scores had been taken from official leaderboards or exterior benchmark sources, as famous in MiniMax’s methodology. SWE-Bench Verified was examined on inside infrastructure utilizing Claude Code scaffolding and averaged over 4 runs. SWE-Bench Professional was additionally examined on inside infrastructure utilizing Claude Code scaffolding, with testing logic aligned to the official analysis.

SWE-Bench Professional: 59.0% (surpasses GPT-5.5 and Gemini 3.1 Professional; approaches Opus 4.7)
Terminal-Bench 2.1: 66.0%
SWE-fficiency: 34.8%
KernelBench Exhausting: 28.8% (evaluated on NVIDIA Blackwell GPUs, CUDA functionality sm_120)
MCP Atlas: 74.2%
Claw-Eval: highest rating amongst fashions evaluated (Normal Activity Group, 161 duties)
SVG-Bench: surpasses Opus 4.7

On OmniDocBench, a multimodal doc understanding benchmark, M3 scores above Gemini 3.1 Professional. On OSWorld-Verified (361 samples), M3 achieves a 70.06% activity completion charge for laptop use (Max Steps = 200).

MiniMax additionally constructed an interactive consumer simulator framework for coaching and analysis. It simulates multi-turn developer collaboration: requirement elaboration, answer dialogue, feedback-based correction, steady activity switching, and multi-round undertaking iteration. That is meant to cut back the hole between single-turn benchmark efficiency and real-world, multi-turn developer workflows.

Native Multimodality

MiniMax M3 underwent mixed-modality coaching from step 0. Textual content, photographs, and video are skilled collectively from the start fairly than added post-training. MiniMax staff studies that interleaved knowledge — sequences the place textual content and pictures are naturally intermixed — is extra important to mannequin efficiency than generally assumed. After rebuilding your entire knowledge pipeline for interleaved codecs, coaching knowledge was scaled to the order of 100 trillion tokens.

MiniMax M3 helps picture and video enter and might function a desktop laptop.

Actual-World Activity Examples from MiniMax

MiniMax paperwork three inside duties within the launch publish:

Paper copy: MiniMax gave MiniMax M3 the ICLR 2025 Excellent Paper Award-winning paper Studying Dynamics of LLM Finetuning and requested it to breed the experiments independently. M3 ran autonomously for practically 12 hours, produced 18 commits and 23 experimental figures, and accomplished the core experiments with out human intervention. It required multimodal functionality to learn curves and formulation, lengthy context to carry the paper and experiment logs concurrently, and coding functionality to execute the copy throughout a protracted thread.

CUDA kernel optimization: MiniMax requested MiniMax M3 to optimize an FP8 matrix multiplication (GEMM) kernel on NVIDIA Hopper structure GPUs. The mannequin began with solely a activity description, a benchmark analysis script, and a non-functional Triton skeleton — no reference implementation was supplied. Over roughly 24 hours, MiniMax M3 made 147 benchmark submissions and 1,959 instrument calls. It progressed by baseline implementation, autotune configuration era, efficiency bottleneck prognosis, CUDA Graph integration, persistent kernel rewriting, and host-side scheduling optimization. After six landmark rounds of optimization, MiniMax M3 improved Hopper FP8 {hardware} peak utilization from 7.6% to 71.3%, a 9.4× speedup. The most effective answer appeared on the 145th submission. MiniMax notes that the majority different fashions stopped making new progress inside the first 30 submissions; solely Opus 4.7 and M3 continued past that time.

PostTrainBench (autonomous mannequin coaching): MiniMax gave MiniMax M3 4 base fashions that had accomplished pretraining solely. MiniMax M3 autonomously ran the complete knowledge synthesis → coaching → analysis → iteration cycle over 12 hours with no human intervention. The goal was for the bottom fashions to amass capabilities throughout mathematical reasoning (AIME2025), instrument calling (BFCL), scientific information reasoning (GPQA Most important), arithmetic reasoning (GSM8K), and code era (HumanEval). MiniMax M3 scored 0.37, beneath Opus 4.7 (0.42) and GPT-5.5 (0.39), however forward of the opposite fashions examined.

Marktechpost’s Visible Explainer

Overview

MiniMax M3: Frontier Coding, 1M-Token Context, Native Multimodality

MiniMax formally launched M3 on June 1, 2026. The API is reside now. Mannequin weights and technical report might be open-sourced inside 10 days.

M3 is the following mannequin within the M-series line after M2.7. MiniMax positions it as the primary open-weight mannequin to mix all three of the next in a single structure:

1M
Token Context Window

59.0%
SWE-Bench Professional Rating

MSA
Sparse Consideration Structure

70.06%
OSWorld-Verified (Pc Use)

Structure

MSA: MiniMax Sparse Consideration

Commonplace full consideration has quadratic computational complexity. As context size grows, compute value grows because the sq. of the sequence size. MSA is designed to resolve this on the operator degree.

In comparison with approaches like DSA and MoBA, MSA partitions the KV cache into blocks extra exactly, attaining larger efficient context protection.

MSA makes use of a “KV outer collect Q” method — every KV block is learn solely as soon as, reminiscence entry is contiguous, and arithmetic depth is considerably higher than frequent strategies.

>9×
Prefill Speedup at 1M ctx

>15×
Decoding Speedup at 1M ctx

1/20
Per-token compute vs M2 at 1M

>4×
Sooner than Flash-Sparse-Attn

Benchmarks

Coding and Agentic Efficiency

Outcomes reported by MiniMax. SWE-Bench Verified used Claude Code scaffolding, averaged over 4 runs. SWE-Bench Professional used Claude Code scaffolding, aligned to official analysis.

SWE-Bench Professional: 59.0% — surpasses GPT-5.5 and Gemini 3.1 Professional; approaches Opus 4.7
Terminal-Bench 2.1: 66.0%
SWE-fficiency: 34.8%
KernelBench Exhausting: 28.8% — evaluated on NVIDIA Blackwell GPUs (sm_120)
MCP Atlas: 74.2%
Claw-Eval: Highest rating amongst fashions evaluated (161 duties)
SVG-Bench: Surpasses Opus 4.7
OmniDocBench: Above Gemini 3.1 Professional
OSWorld-Verified: 70.06% — 361 samples, Max Steps = 200

Multimodality

Native Multimodal Coaching from Step 0

M3 underwent mixed-modality coaching from step 0. Textual content, photographs, and video are skilled collectively from the beginning — not added as a post-training functionality.

MiniMax studies that interleaved knowledge — sequences the place textual content and pictures are naturally intermixed — is extra important to mannequin efficiency than generally assumed.

After rebuilding your entire knowledge pipeline for interleaved codecs, coaching knowledge was scaled to the order of 100 trillion tokens.

Picture enter
Video enter
Desktop laptop operation (laptop use)

Actual-World Duties

Three Inside Duties Documented by MiniMax

Paper Copy — M3 reproduced the ICLR 2025 paper Studying Dynamics of LLM Finetuning autonomously over ~12 hours, producing 18 commits and 23 experimental figures with no human intervention.
CUDA Kernel Optimization — M3 optimized an FP8 GEMM kernel on NVIDIA Hopper GPUs over ~24 hours: 147 benchmark submissions, 1,959 instrument calls, 6 landmark optimization rounds. Improved Hopper FP8 peak utilization from 7.6% → 71.3% (9.4× speedup). Greatest answer appeared on submission 145.
PostTrainBench — M3 autonomously ran knowledge synthesis → coaching → analysis → iteration for 4 base fashions over 12 hours. Scored 0.37, beneath Opus 4.7 (0.42) and GPT-5.5 (0.39), however forward of different evaluated fashions. Targets: AIME2025, BFCL, GPQA Most important, GSM8K, HumanEval.

MiniMax Code

MiniMax Code: Agent Product Constructed and Skilled with M3

MiniMax Code is an agent product constructed and skilled along with M3. Out there at agent.minimaxi.com/obtain. Works with MiniMax Token Plans.

Agent Groups — a number of brokers run concurrent, multi-stage, dynamically adjustable workflows
Producer + Verifier loop — adversarial harness allows steady self-correction throughout execution
Pc use — M3’s native multimodal functionality allows cross-application desktop automation
Constructed on OpenCode and Pi — MiniMax states it plans to open-source MiniMax Code sooner or later

// Instance use case
Person (on cellphone): “Open the native ERP shopper
and batch-enter bill knowledge from this Excel file.”
→ MiniMax Code handles operations throughout
functions, information, and methods on desktop.

API & Pricing

API Particulars and Token Plan Tiers

The M3 API is reside at platform.minimax.io.

Pricing by enter size: Calls ≤512K tokens → normal charge. Calls >512K → larger long-context charge.

Pondering mode: Toggle on/off at request time. Each modes share the identical pricing.

Service tiers: normal (default) and precedence (service_tier=precedence) — precedence out there through gross sales, opening to all customers quickly.

Plus
~1.7B tokens/mo
$20/mo

Max
~5.1B tokens/mo
$50/mo

Extremely
~9.8B tokens/mo
$120/mo

Textual content, picture, speech, and music utilization all draw from the identical token pool.

Key Takeaways

What Engineers and Researchers Have to Know

MiniMax M3 launched June 1, 2026. API is reside. Open mannequin weights and technical report dedicated inside 10 days.
MSA delivers >9× prefill and >15× decoding speedup at 1M-token context vs M2, at 1/twentieth the per-token compute.
M3 scores 59.0% on SWE-Bench Professional, surpassing GPT-5.5 and Gemini 3.1 Professional.
Natively multimodal from step 0 — helps picture, video enter, and 70.06% on OSWorld-Verified for laptop use.
Pondering mode toggleable at request time. Token Plan begins at $20/month (~1.7B M3 tokens).

Key Takeaways

MiniMax M3 launched June 1, 2026; API is reside now. MiniMax has dedicated to releasing open mannequin weights and a technical report inside 10 days.
MSA (MiniMax Sparse Consideration) delivers greater than 9× prefill and greater than 15× decoding speedup at 1M-token context versus M2, at 1/twentieth the per-token compute.
M3 scores 59.0% on SWE-Bench Professional, surpassing GPT-5.5 and Gemini 3.1 Professional.
M3 is natively multimodal from step 0, supporting picture and video enter, and achieves 70.06% on OSWorld-Verified for laptop use.

Introducing MiniMax M3: The First Open-Weights Mannequin to Mix Three Frontier Capabilities

– Coding & Agentic Frontier: 59.0% SWE-Bench Professional, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Exhausting, 74.2% MCP Atlas
– MiniMax Sparse Consideration scales context to 1M
-… pic.twitter.com/TF891iJukF

— MiniMax (official) (@MiniMax_AI) June 1, 2026

Try the Technical details. Additionally, be happy to comply with us on Twitter and don’t neglect to hitch our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Have to accomplice with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so on.? Connect with us

Welcome to Ivugangingo!

At Ivugangingo, we're passionate about delivering insightful content that empowers and informs our readers across a spectrum of crucial topics. Whether you're delving into the world of insurance, navigating the complexities of cryptocurrency, or seeking wellness tips in health and fitness, we've got you covered.

MiniMax Releases MiniMax M3 with MSA Structure Supporting 1M-Token Context, Native Multimodality, and Agentic Coding

MSA: MiniMax Sparse Consideration

Coding and Agentic Benchmarks

Native Multimodality

Actual-World Activity Examples from MiniMax

Marktechpost’s Visible Explainer

MiniMax M3: Frontier Coding, 1M-Token Context, Native Multimodality

MSA: MiniMax Sparse Consideration

Coding and Agentic Efficiency

Native Multimodal Coaching from Step 0

Three Inside Duties Documented by MiniMax

MiniMax Code: Agent Product Constructed and Skilled with M3

API Particulars and Token Plan Tiers

What Engineers and Researchers Have to Know

Key Takeaways

Why the price of cyber legal responsibility insurance coverage is rising quickly

Do turmeric and curcumin have any actual well being advantages?

Converter

Editors Pick

Newsletter

Categories

Related Posts