NVIDIA Optimizes Blackwell Platform for DeepSeek-V4 AI Models with Out-of-Box Performance Over 150 Tokens/Second Per User

Deep News04-25 19:21

NVIDIA announced today that its Blackwell platform now supports both DeepSeek-V4-Pro and DeepSeek-V4-Flash AI models. Developers can deploy these models through NVIDIA NIM microservices or utilize SGLang and vLLM frameworks for customized inference.

The DeepSeek-V4-Pro model features 1.6 trillion total parameters with 49 billion activated parameters, targeting advanced reasoning tasks. The DeepSeek-V4-Flash version contains 284 billion total parameters and 13 billion activated parameters, designed for high-speed and efficient applications.

Both models support a 1 million token context window and maximum output length of 384,000 tokens, covering core applications such as long-text encoding and document analysis. The models are released under the MIT open-source license.

Performance testing shows DeepSeek-V4-Pro achieves out-of-the-box performance exceeding 150 tokens per second per user on NVIDIA GB200 NVL72 systems. Using vLLM's Day 0 recipes, developers can quickly deploy the models on Blackwell B300 systems. Further performance improvements are expected with deep optimization of Dynamo, NVFP4, and CUDA kernels.

For deployment ecosystems, developers have multiple options including NVIDIA NIM microservices for direct deployment, or SGLang and vLLM frameworks for customized inference. SGLang offers three recipe types: low latency, balanced, and maximum throughput. vLLM supports multi-node scaling to over 100 GPUs with tool calling and speculative decoding capabilities.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap

Leave a comment

{"i18n":{"language":"en_US"},"isChannel":false,"data":{"share":"https://ttm.financial/m/news/1119323456?lang=en_US&edition=fundamental","thumbnail":"","is_english":true,"pubTime":"2026-04-25 19:21","share_image_url":"https://static.laohu8.com/e9f99090a1c2ed51c021029395664489","id":"1119323456","market":"sh","top_or_hot":-1,"title":"NVIDIA Optimizes Blackwell Platform for DeepSeek-V4 AI Models with Out-of-Box Performance Over 150 Tokens/Second Per User","media":"Deep News","content":"<p>NVIDIA announced today that its Blackwell platform now supports both DeepSeek-V4-Pro and DeepSeek-V4-Flash AI models. Developers can deploy these models through NVIDIA NIM microservices or utilize SGLang and vLLM frameworks for customized inference.</p>\n<p>The DeepSeek-V4-Pro model features 1.6 trillion total parameters with 49 billion activated parameters, targeting advanced reasoning tasks. The DeepSeek-V4-Flash version contains 284 billion total parameters and 13 billion activated parameters, designed for high-speed and efficient applications.</p>\n<p>Both models support a 1 million token context window and maximum output length of 384,000 tokens, covering core applications such as long-text encoding and document analysis. The models are released under the MIT open-source license.</p>\n<p>Performance testing shows DeepSeek-V4-Pro achieves out-of-the-box performance exceeding 150 tokens per second per user on NVIDIA GB200 NVL72 systems. Using vLLM's Day 0 recipes, developers can quickly deploy the models on Blackwell B300 systems. Further performance improvements are expected with deep optimization of Dynamo, NVFP4, and CUDA kernels.</p>\n<p>For deployment ecosystems, developers have multiple options including NVIDIA NIM microservices for direct deployment, or SGLang and vLLM frameworks for customized inference. SGLang offers three recipe types: low latency, balanced, and maximum throughput. vLLM supports multi-node scaling to over 100 GPUs with tool calling and speculative decoding capabilities.</p>","source":null,"html":"<!DOCTYPE html>\n<html>\n<head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\n<meta name=\"viewport\" content=\"width=device-width,initial-scale=1.0,minimum-scale=1.0,maximum-scale=1.0,user-scalable=no\"/>\n<meta name=\"format-detection\" content=\"telephone=no,email=no,address=no\" />\n<title>NVIDIA Optimizes Blackwell Platform for DeepSeek-V4 AI Models with Out-of-Box Performance Over 150 Tokens/Second Per User</title>\n<style type=\"text/css\">\na,abbr,acronym,address,applet,article,aside,audio,b,big,blockquote,body,canvas,caption,center,cite,code,dd,del,details,dfn,div,dl,dt,\nem,embed,fieldset,figcaption,figure,footer,form,h1,h2,h3,h4,h5,h6,header,hgroup,html,i,iframe,img,ins,kbd,label,legend,li,mark,menu,nav,\nobject,ol,output,p,pre,q,ruby,s,samp,section,small,span,strike,strong,sub,summary,sup,table,tbody,td,tfoot,th,thead,time,tr,tt,u,ul,var,video{ font:inherit;margin:0;padding:0;vertical-align:baseline;border:0 }\nbody{ font-size:16px; line-height:1.5; color:#999; background:transparent; }\n.wrapper{ overflow:hidden;word-break:break-all;padding:10px; }\nh1,h2{ font-weight:normal; line-height:1.35; margin-bottom:.6em; }\nh3,h4,h5,h6{ line-height:1.35; margin-bottom:1em; }\nh1{ font-size:24px; }\nh2{ font-size:20px; }\nh3{ font-size:18px; }\nh4{ font-size:16px; }\nh5{ font-size:14px; }\nh6{ font-size:12px; }\np,ul,ol,blockquote,dl,table{ margin:1.2em 0; }\nul,ol{ margin-left:2em; }\nul{ list-style:disc; }\nol{ list-style:decimal; }\nli,li p{ margin:10px 0;}\nimg{ max-width:100%;display:block;margin:0 auto 1em; }\nblockquote{ color:#B5B2B1; border-left:3px solid #aaa; padding:1em; }\nstrong,b{font-weight:bold;}\nem,i{font-style:italic;}\ntable{ width:100%;border-collapse:collapse;border-spacing:1px;margin:1em 0;font-size:.9em; }\nth,td{ padding:5px;text-align:left;border:1px solid #aaa; }\nth{ font-weight:bold;background:#5d5d5d; }\n.symbol-link{font-weight:bold;}\n/* header{ border-bottom:1px solid #494756; } */\n.title{ margin:0 0 8px;line-height:1.3;color:#ddd; }\n.meta {color:#5e5c6d;font-size:13px;margin:0 0 .5em; }\na{text-decoration:none; color:#2a4b87;}\n.meta .head { display: inline-block; overflow: hidden}\n.head .h-thumb { width: 30px; height: 30px; margin: 0; padding: 0; border-radius: 50%; float: left;}\n.head .h-content { margin: 0; padding: 0 0 0 9px; float: left;}\n.head .h-name {font-size: 13px; color: #eee; margin: 0;}\n.head .h-time {font-size: 11px; color: #7E829C; margin: 0;line-height: 11px;}\n.small {font-size: 12.5px; display: inline-block; transform: scale(0.9); -webkit-transform: scale(0.9); transform-origin: left; -webkit-transform-origin: left;}\n.smaller {font-size: 12.5px; display: inline-block; transform: scale(0.8); -webkit-transform: scale(0.8); transform-origin: left; -webkit-transform-origin: left;}\n.bt-text {font-size: 12px;margin: 1.5em 0 0 0}\n.bt-text p {margin: 0}\n</style>\n</head>\n<body>\n<div class=\"wrapper\">\n<header>\n<h2 class=\"title\">\nNVIDIA Optimizes Blackwell Platform for DeepSeek-V4 AI Models with Out-of-Box Performance Over 150 Tokens/Second Per User\n</h2>\n\n<h4 class=\"meta\">\n\n\n<a class=\"head\" href=\"https://laohu8.com/wemedia/1039043262\">\n\n\n<div class=\"h-thumb\" style=\"background-image:url(https://community-static.tradeup.com/news/8296859682db4b478146245e72de1922);background-size:cover;\"></div>\n\n<div class=\"h-content\">\n<p class=\"h-name\">Deep News </p>\n<p class=\"h-time\">2026-04-25 19:21</p>\n</div>\n\n</a>\n\n\n</h4>\n\n</header>\n<article>\n<p>NVIDIA announced today that its Blackwell platform now supports both DeepSeek-V4-Pro and DeepSeek-V4-Flash AI models. Developers can deploy these models through NVIDIA NIM microservices or utilize SGLang and vLLM frameworks for customized inference.</p>\n<p>The DeepSeek-V4-Pro model features 1.6 trillion total parameters with 49 billion activated parameters, targeting advanced reasoning tasks. The DeepSeek-V4-Flash version contains 284 billion total parameters and 13 billion activated parameters, designed for high-speed and efficient applications.</p>\n<p>Both models support a 1 million token context window and maximum output length of 384,000 tokens, covering core applications such as long-text encoding and document analysis. The models are released under the MIT open-source license.</p>\n<p>Performance testing shows DeepSeek-V4-Pro achieves out-of-the-box performance exceeding 150 tokens per second per user on NVIDIA GB200 NVL72 systems. Using vLLM's Day 0 recipes, developers can quickly deploy the models on Blackwell B300 systems. Further performance improvements are expected with deep optimization of Dynamo, NVFP4, and CUDA kernels.</p>\n<p>For deployment ecosystems, developers have multiple options including NVIDIA NIM microservices for direct deployment, or SGLang and vLLM frameworks for customized inference. SGLang offers three recipe types: low latency, balanced, and maximum throughput. vLLM supports multi-node scaling to over 100 GPUs with tool calling and speculative decoding capabilities.</p>\n\n</article>\n</div>\n</body>\n</html>\n","isBrief":false,"type":0,"news_type":1,"symbol":null,"symbol_name":null,"start_time":0,"source_url":"","article_id":"1119323456","we_media_id":"1039043262","thumbnails":[],"rights":null,"url":"https://stock-news.laohu8.com/highlight/detail?id=1119323456","pubTimestamp":1777116084,"columns":[],"sourceInfo":null,"weMediaInfo":{"media_name":"Deep News","introduction":"Global Stock Market Deep Analysis","home_visible":1,"id":"1039043262","head_image":"https://community-static.tradeup.com/news/8296859682db4b478146245e72de1922"},"summary":"NVIDIA announced today that its Blackwell platform now supports both DeepSeek-V4-Pro and DeepSeek-V4-Flash AI models. Developers can deploy these models through NVIDIA NIM microservices or utilize...","collect":0,"end_time":0,"defaultTopTitle":"","property":[],"viewcount":null,"language":"en","relate_stocks":{},"translate_title":"NVIDIA为DeepSeek-V4 AI模型优化Blackwell平台，开箱性能超过每用户150个令牌/秒","themeId":"","isJumpTheme":false,"ttsUrl":"https://static.tigerbbs.com/b98379b06e88cfe519ef934566790bff","symbols_score_info":{"AMZN":1,"NVDG":1,"NVDY":1,"NVDS.UK":1,"ANV":1,"TSLA":1,"GOOG":1,"NVDD":1,"NVDQ":1,"NVDS":1,"NVDX":1,"MSFT":1,"AAPL":1,"NVD3.UK":1,"2NVD.UK":1,"NVDL":1,"NVDU":1,"NVDO":1,"NVDA":1,"NVD2.UK":1,"NVDW":1,"NVD":1,"NVDB":1,"NVIW.SI":1,"NVII":1,"META":1,"SNVD.UK":1,"DIPS":1,"3NVD.UK":1},"content_text":"NVIDIA announced today that its Blackwell platform now supports both DeepSeek-V4-Pro and DeepSeek-V4-Flash AI models. Developers can deploy these models through NVIDIA NIM microservices or utilize SGLang and vLLM frameworks for customized inference.\nThe DeepSeek-V4-Pro model features 1.6 trillion total parameters with 49 billion activated parameters, targeting advanced reasoning tasks. The DeepSeek-V4-Flash version contains 284 billion total parameters and 13 billion activated parameters, designed for high-speed and efficient applications.\nBoth models support a 1 million token context window and maximum output length of 384,000 tokens, covering core applications such as long-text encoding and document analysis. The models are released under the MIT open-source license.\nPerformance testing shows DeepSeek-V4-Pro achieves out-of-the-box performance exceeding 150 tokens per second per user on NVIDIA GB200 NVL72 systems. Using vLLM's Day 0 recipes, developers can quickly deploy the models on Blackwell B300 systems. Further performance improvements are expected with deep optimization of Dynamo, NVFP4, and CUDA kernels.\nFor deployment ecosystems, developers have multiple options including NVIDIA NIM microservices for direct deployment, or SGLang and vLLM frameworks for customized inference. SGLang offers three recipe types: low latency, balanced, and maximum throughput. vLLM supports multi-node scaling to over 100 GPUs with tool calling and speculative decoding capabilities.","kind":"news","is_publish_news":true,"is_publish_highlight":false,"is_publish_live":false,"is_publish_wemedia":null,"editions":null,"column":"","sentiment":"0","news_tag":"productRelease","news_rank":0,"isVideo":false,"video":null,"symbols":[],"gpt_button":0,"need_auth":false,"need_login_tip":false,"code":"91000000","status":"200"},"commentList":[],"isCommentEnd":true,"newsSizeData":{"likeSize":0,"commentSize":0,"repostSize":0,"favoriteSize":0,"likeStatus":false,"favoriteStatus":false},"APP":{"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","isDev":false,"isTTM":true,"tenantId":"TBGLOBAL","deviceId":"web-server-community-laohu8-v3","version":"4.42.0","shortVersion":"4.42.0","platform":"web","vendor":"web","appName":"ttm","isIOS":false,"isAndroid":false,"isTiger":false,"isTHS":false,"isWeiXin":false,"isWeiXinMini":false,"isWeiBo":false,"isQQ":false,"isBaiduSwan":false,"isBaiduBox":false,"isDingTalk":false,"isToutiao":false,"isOnePlus":false,"isHuaWei":false,"isXiaomi":false,"isXiaomiWebView":false,"isOppo":false,"isVivo":false,"isSamsung":false,"isMobile":false},"href":"/m/news/1119323456","isCrawlerRequest":true}