Guolian Minsheng Securities: Rising Significance of Model Unit Costs, Multimodal and "Visual Execution" Take Center Stage

Stock News02-04 14:27

In traditional conversational paradigms, a single interaction typically requires only a few model calls; however, under the workflow paradigm, a single task often spans multiple stages including planning, retrieval, tool invocation, validation, error correction, and writing to external systems. Compared to basic chat functions, agent services designed for complex tasks may consume dozens of times more tokens, underscoring the increasing importance of model unit costs. In the Agent era, large language models are evolving from "chat tools" into "autonomous employees." Large model manufacturers that master core algorithms and industry interfaces are poised to deeply benefit from the dividends of universal intelligence, with recommendations to focus on the "large model twin stars" MiniMax-WP (00100) and Knowledge Atlas (02513). Guolian Minsheng Securities' primary views are as follows.

An event: As of February 2, 2026, Clawdbot had garnered over 130,000 stars on the code hosting platform GitHub, with its official website accumulating more than 2 million visits, making it one of the fastest-growing open-source technology projects recently. Additionally, the emergence of "AI-only communities" like Moltbook, which rapidly amassed a scale of millions of agent accounts, naturally corresponds to higher request densities and more frequent API triggers. The most direct external variable is a step-change increase in API call frequency and token throughput. Strongly recommended by Clawdbot founder Peter Steinberger, the M2.1 model from the domestic AI unicorn MiniMax, known for its proficiency in long-text and logical reasoning, has successfully gained significant traction.

The importance of model unit cost is rising. Under the traditional dialogue paradigm, a single interaction requires only a few model calls; but in the workflow paradigm, a task often spans planning, retrieval, tool invocation, validation, error correction, and writing to external systems. This leads to a multiplicative increase in model call frequency, context length, and the complexity of intermediate information. Multi-step reasoning and multi-round tool invocation inherently create "multi-turn contexts," while retries and self-correction generate additional invalid tokens. Compared to basic chat, agent services for complex tasks might consume dozens of times more tokens. Consequently, "model unit cost × unit output" becomes the critical "life-or-death line" for the scalable implementation of Agent-type products—because during task execution, multi-round reasoning and tool collaboration linearly amplify costs. Precisely for this reason, Clawdbot's founder explicitly recommended MiniMax, whose M2.1 model's characteristics of "combining efficiency and cost advantages, strong long-text capability, and reasoning and programming abilities" align with the current needs of many users.

Combining efficiency and cost: The M2.1 model aims to address the pain point of high token costs faced by developers in automated programming through extreme cost advantages, with its pricing structure being approximately 8% of that of Claude Sonnet. Furthermore, Coding Plan innovatively introduced a high-frequency refresh mechanism with "quota reset every 5 hours," breaking the industry-standard daily or monthly quota models and unleashing productivity in high-frequency, intensive development scenarios. Regarding the billing model, unlike the common pay-as-you-go logic for tokens used by underlying large model manufacturers, the company instead adopts a tiered monthly subscription system.

Strong long-text capability: In real workflows, continuously evolving context usually includes tool calls, historical information, retrieved snippets, constraints, and more. The M2.1's long-text capability makes it more suitable for achieving "continuous memory," meaning it can read longer documents, accommodate more intermediate results, and reduce logical breaks caused by truncation.

Reasoning and programming ability: In products like Clawdbot that emphasize automated execution and closed-loop error correction, the model is used for writing code, modifying code, making judgments, and performing validations. The M2.1's "sufficient and highly cost-effective" reasoning and programming capabilities make it the most suitable choice for integration into production systems and for high-frequency invocation.

Guolian Minsheng Securities points out that in the Agent era, "which is smarter" is certainly important, but more critical is "who can transform strong capabilities into high-frequency, usable productivity at a lower cost," which is MiniMax's advantage.

Multimodal and "visual execution" come to the forefront. As Agents enter office and production scenarios, input is no longer primarily from pure text but largely comes from visual information such as screenshots, PDFs, tables, charts, and interface elements. In executable workflows like Clawdbot, users input not only structured text but also accompanying screenshots, web interfaces, error pop-ups, tables/charts, or PDF pages. MiniMax's multimodal capabilities assist Agents in better understanding interfaces, extracting key information, outputting executable steps/code, and then using screenshot re-reading for validation and error correction. This enables Clawdbot to perform "visually-driven automation": for example, automatically filling out forms after recognizing table fields, locating causes and modifying scripts after reading error screenshots, extracting data from charts and writing it into reports, and comparing before-and-after screenshots to confirm task completion. Leveraging its own multimodal capabilities, MiniMax can better complete service closed loops, reduce manual paraphrasing, enable rapid error correction, and achieve stronger deliverability.

Risk提示: Uncertainty in technological development routes; intensifying industry competition.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment