On April 28, Sensetime-W (00020) officially launched and open-sourced its SenseNova U1 series of native unified models for understanding and generation. The series is built on the company's self-developed NEO-unify architecture, introduced in March, which integrates multimodal comprehension, reasoning, and generation within a single model framework. The NEO-unify architecture moves away from mainstream拼接式 designs by eliminating visual encoders and variational autoencoders, reconstructing a unified representational space deeply embedded across all computational layers. This shift enables a paradigm leap from modal integration to native unification. The SenseNova U1 models directly model linguistic and visual information as a unified composite, facilitating efficient synergy between language and vision. This approach enhances both comprehension and generation capabilities while preserving semantic richness and maintaining pixel-level visual fidelity. In areas such as logical reasoning and spatial intelligence, the models demonstrate deep understanding of complex physical-world layouts and intricate relationships. Looking ahead, the technology is expected to serve as an embodied brain for robotics, enabling full-cycle operations—from complex environmental perception and logical deduction to precise task execution—within a single closed-loop model. This advancement provides a critical foundation and key engine for driving technological and industrial development.
Comments