NVIDIA announced today that its Blackwell platform now supports both DeepSeek-V4-Pro and DeepSeek-V4-Flash AI models. Developers can deploy these models through NVIDIA NIM microservices or utilize SGLang and vLLM frameworks for customized inference.
The DeepSeek-V4-Pro model features 1.6 trillion total parameters with 49 billion activated parameters, targeting advanced reasoning tasks. The DeepSeek-V4-Flash version contains 284 billion total parameters and 13 billion activated parameters, designed for high-speed and efficient applications.
Both models support a 1 million token context window and maximum output length of 384,000 tokens, covering core applications such as long-text encoding and document analysis. The models are released under the MIT open-source license.
Performance testing shows DeepSeek-V4-Pro achieves out-of-the-box performance exceeding 150 tokens per second per user on NVIDIA GB200 NVL72 systems. Using vLLM's Day 0 recipes, developers can quickly deploy the models on Blackwell B300 systems. Further performance improvements are expected with deep optimization of Dynamo, NVFP4, and CUDA kernels.
For deployment ecosystems, developers have multiple options including NVIDIA NIM microservices for direct deployment, or SGLang and vLLM frameworks for customized inference. SGLang offers three recipe types: low latency, balanced, and maximum throughput. vLLM supports multi-node scaling to over 100 GPUs with tool calling and speculative decoding capabilities.
Comments