(FM) MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to Cart failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Please try again

Unfollow podcast failed

Please try again

(FM) MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Listen for free

View show details

About this listen

Join us to explore MiniMax-M1, a revolutionary development from MiniMax, hailed as the world's first open-weight, large-scale hybrid-attention reasoning model. At its core, MiniMax-M1 leverages a sophisticated hybrid Mixture-of-Experts (MoE) architecture paired with a novel lightning attention mechanism, which together facilitate the efficient scaling of test-time compute. A significant advancement is its native support for an impressive 1 million token context length, an eightfold expansion compared to competitors like DeepSeek R1, making it exceptionally well-suited for complex tasks demanding the processing of extensive inputs and prolonged reasoning.

Further enhancing its capabilities, MiniMax-M1 was trained using CISPO, a pioneering reinforcement learning algorithm. This method, which clips importance sampling weights rather than token updates, notably boosts RL efficiency, demonstrated by the model’s full RL training completing in just three weeks on 512 H800 GPUs for a cost of only $534,700. The model exhibits particular strengths in practical applications such as complex software engineering, effective tool utilization, and various long-context tasks, having been rigorously trained in diverse real-world software engineering environments. While its innovative design and performance are thoroughly detailed, the provided sources do not explicitly outline any limitations of the MiniMax-M1 model.

To learn more, explore the full technical report: https://arxiv.org/abs/2506.13585.

No reviews yet

Get Started

Popular Lists

Explore Audible

(FM) MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Failed to add items

Add to Cart failed.

Add to Wish List failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

(FM) MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

About this listen