Morgan Stanley’s View on DeepSeek: Substituting Memory for Compute, Achieving More with Less!

Deep News10:56

DeepSeek is rewriting the rules of AI scaling: the decisive factor for next-generation AI is no longer simply amassing larger GPU clusters, but rather employing smarter hybrid architectures that use more cost-effective DRAM to replace scarce HBM resources.

According to information from Zhui Feng Trading Desk, a latest research report released by Morgan Stanley on January 21 indicates that DeepSeek is altering the construction method of large language models through an innovative module named "Engram." The core breakthrough lies in separating storage from computation; by introducing a "Conditional Memory" mechanism, it significantly reduces the demand for expensive and scarce High Bandwidth Memory (HBM), instead utilizing lower-cost general system memory (DRAM) to handle complex inference tasks.

Analyst Shawn Kim and his team at Morgan Stanley believe that DeepSeek demonstrates the philosophy of "Doing More With Less." This technical path of decoupling storage and computation not only alleviates the AI computing constraints faced by China but also proves to the market that efficient hybrid architectures are the next frontier for AI.

This architecture, which is a key focus for Morgan Stanley, originates from a significant paper titled "Conditional Memory via Scalable Lookup," released on January 13 by a team including DeepSeek founder Liang Wenfeng and collaborators from Peking University. In this paper, the team首次 proposed the "Engram" module for the first time.

Farewell to Brute-Force Computing: The Engram Module and "Conditional Memory" Morgan Stanley's report points out that current Transformer models are highly inefficient when memorizing and recalling simple, static facts. For example, processing a simple query like "London is in the UK" requires expensive computations through multiple layers of attention mechanisms and feedforward networks to reconstruct the information in traditional models. This approach wastes precious GPU computing resources.

DeepSeek's solution is to introduce the "Conditional Memory" principle, embodied in the Engram module. The core of this architecture lies in separating the storage of static patterns from dynamic reasoning. DeepSeek no longer loads all information into expensive HBM at once; instead, it offloads the model's "library" or "dictionary" (static knowledge) to the CPU or system memory (DRAM), retrieving it only when needed.

Analysts at Morgan Stanley emphasized in the report: "DeepSeek's separation of 'conditional memory' from computation unlocks new levels of efficiency for Large Language Models (LLMs). Engram is a method to efficiently 'look up' basic information without overloading HBM, thereby freeing up capacity for more complex reasoning tasks." This design directly addresses the most expensive bottleneck in current AI infrastructure – HBM. By reducing the occupancy of HBM, DeepSeek demonstrates that improving efficiency within existing GPU and system memory architectures can effectively reduce the need for costly hardware upgrades.

Infrastructure Economics: Reducing HBM Reliance, Amplifying DRAM Value The most direct impact of this technological shift is the reshaping of hardware cost structures. Morgan Stanley notes that the Engram architecture minimizes the demand for high-speed memory (HBM) by separating static pattern storage from dynamic computation. This implies that infrastructure costs may shift from expensive GPUs towards more cost-effective memory (DRAM).

The report provides a detailed breakdown of the data implications: "Although the paper does not explicitly state it, a 100-billion parameter (100B) Engram (assuming 2 bytes per parameter under FP16/BF16) implies a minimum requirement of approximately 200GB of system DRAM." In comparison, NVIDIA's Vera Rubin systems are equipped with 1.5TB of DRAM per CPU. Analysts calculated that DeepSeek's architecture suggests "the usage of commoditized DRAM per system will increase by approximately 13%." The investment logic behind this shift is very clear:

Cost Structure Shift: Infrastructure costs may shift from GPUs towards memory.

Value for Money is King: Configurations with moderate compute but massive memory might offer higher "performance per dollar" than pure GPU scaling.

Memory Value Reassessment: The improvement in inference capability surpasses the gains in knowledge acquisition, indicating that the value of memory has extended beyond computation itself.

"Innovation Induced by Constraints": The Path for Chinese AI's Breakthrough Data from Morgan Stanley shows that, despite constraints in advanced computing power, hardware access, and training scale, leading Chinese AI models have rapidly narrowed the performance gap with global frontier models (such as ChatGPT 5.2) over the past two years. DeepSeek V3.2 performed excellently in standardized benchmarks, with an MMLU score of approximately 88.5% and coding capability (SWE-Bench) around 72%, demonstrating strong competitiveness in reasoning and efficiency.

The report attributes this phenomenon to "constraint-induced innovation." The development of Chinese AI no longer relies solely on the brute-force growth of parameters but is turning towards algorithmic efficiency, system design, and deployment pragmatism. Morgan Stanley points out: "DeepSeek is proving that the next leap in AI capability might not come from more GPUs, but from learning to think within constraints."

Morgan Stanley analysts stated: "Strategically, this suggests that the progress of Chinese AI may increasingly depend not on directly narrowing the hardware gap, but on algorithmic and system-level innovations that bypass hardware bottlenecks."

Looking Ahead: Running Large Models on Consumer-Grade Graphics Cards? The report offers highly attractive predictions for DeepSeek's next-generation model, V4. Morgan Stanley anticipates that, leveraging the Engram memory architecture, V4 will achieve a significant leap upon release, particularly in coding and reasoning capabilities.

What captures even more market attention is its potential to lower hardware barriers. Morgan Stanley wrote in the report: "Like its predecessor, the model is highly likely to run on consumer-grade hardware; consumer-grade hardware (such as the RTX 5090) might be sufficient." This implies that the marginal cost of high-level AI inference will be further reduced, enabling AI applications to be deployed more widely without complete reliance on expensive data-center-grade GPU clusters.

Based on the above technical trends, Morgan Stanley reiterated its positive outlook on the themes of memory and semiconductor equipment localization in China. The report clearly states: "By decoupling memory from computation, China is building LLMs that are not only smarter but also structurally more efficient." Although the scale of the Chinese AI market remains only a fraction of that in the US, its spending and adoption momentum suggest that the upside potential might be underestimated.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment