GLM-4.7 vs MiniMax M2.1: A Developer Comparison
Zhipu AI (creators of GLM) and MiniMax tend to occupy the same niche in the LLM space; both make models for agentic use cases where action of the model is most important. Them coming out so close to each other is quite interesting because of this. Just as they came out and I found them on OpenRouter, I did some research and also conducted a few tests of my own. This post is about my findings that can be useful for people deciding which to use.
How I tested them:
I tasked both models to make a blackboard application. This wasn't a "one-shot" though. It would be unfair to test these models that way, as most of their use by developers in the real world won't be one-shot applications; it will be iterating and making something actually useful.
I looked for the following:
-
The quality of their work
- How well did the application function and follow the plan outlined by myself
- The cost
-
The speed
- The real speed, as I raced them together in real time via git worktrees and concurrent Kilo Code agents.
My findings:
Cost:
GLM-4.7 was slightly cheaper than MiniMax M2.1, by about 20%. This isn't the cost per million tokens, but the cost for similar outcomes—so making the entire application to my standards via multiple iterations.
Both were cheap; for the entire application taking less than $1.00.
Speed:
GLM-4.7 was significantly faster on larger tasks, but as the tasks were more gradual and broken down, MiniMax M2.1 took the lead. Both models were very fast. In my experience, the only coding experience that was sometimes faster was with Claude models.Composer 1 honorable mention.
Quality:
On quality, GLM-4.7 took the lead. The code it wrote was more modular and, as per my requests, enabled for easy expansion. M2.1 seemed to disregard that part of my request and just wrote what it wanted. GLM-4.7 also encountered fewer bugs, only getting a failed build 2 times, while M2.1 failed about 15 times before the first prototype was functional.
In the planning process, I preferred M2.1. It was much nicer to talk to, and they understood my requirements much better. GLM-4.7 had such strong AI vibes that it was genuinely hard to talk to. I'll never use it outside of writing code. For asking questions and discussing plans, M2.1 is far better. It doesn't quite have the feeling of Kimi K2, but it's usable, unlike GLM-4.7.
On debugging tasks, GLM-4.7 took the lead. It followed best practices in the debugging process, rarely getting sidetracked or falling off the deep end. M2.1 cannot say the same; it got lost a few times. But on things where overthinking was barely possible, it did well.
Conclusion:
Both models have their uses. As a coding agent, I'd use GLM-4.7. For small alterations to a codebase, I might use M2.1, but not if the codebase was poorly architected—GLM-4.7 is still the better choice there.
If you require the model to interact with the user via text, using GLM-4.7 is a huge mistake. In the context of a coding agent which sometimes asks a few questions, it's fine, but in something like a chatbot, it's horrendous. M2.1 was decent at that; it has a wider range of applications because of this. Both models call tools very well, so if you have a task that requires natural language interaction with a user that is converted into some action of an application, M2.1 is a great choice. It's quite token-efficient and fast while being comfortable to talk to and calling tools reliably.