Giant Network AI Lab Unveils Multimodal Generation Breakthroughs, Plans to Open-Source Technologies

Deep News2025-11-27

On November 27, Giant Network Group Co.,Ltd.'s AI Lab, in collaboration with Tsinghua University's SATLab and Northwestern Polytechnical University, announced three new multimodal generation technologies in the audio-visual domain. The research outcomes will be progressively open-sourced on platforms like GitHub and HuggingFace.

The three innovations released include: 1. **YingVideo-MV**: A music-driven video generation model capable of producing synchronized music video clips using just "one music track plus one character image." The model performs multimodal analysis of rhythm, emotion, and structural content in music, enabling seamless alignment of camera movements (e.g., pans, zooms, tilts) with audio. Its long-sequence consistency mechanism mitigates common issues like character distortion and frame jumps in lengthy videos.

2. **YingMusic-SVC**: A zero-shot singing voice conversion model optimized for real-world music scenarios. It minimizes interference from accompaniments, harmonies, and reverb, reducing vocal breakage and pitch distortion risks while providing stable support for high-quality music reproduction.

3. **YingMusic-Singer**: A singing synthesis model that generates natural vocals with clear pronunciation and stable melody from arbitrary lyrics input. Its flexibility in adapting to varying lyric lengths and zero-shot timbre cloning enhances AI-assisted music creation, lowering the barrier for artistic production.

These advancements underscore the team's progress in multimodal audio-visual generation technologies.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap

Leave a comment

{"i18n":{"language":"en_US"},"isChannel":false,"data":{"share":"https://ttm.financial/m/news/1191697745?lang=en_US&edition=fundamental","thumbnail":"","is_english":true,"pubTime":"2025-11-27 13:51","share_image_url":"https://static.laohu8.com/e9f99090a1c2ed51c021029395664489","id":"1191697745","market":"sg","top_or_hot":-1,"title":"Giant Network AI Lab Unveils Multimodal Generation Breakthroughs, Plans to Open-Source Technologies","media":"Deep News","content":"<p>On November 27, Giant Network Group Co.,Ltd.'s AI Lab, in collaboration with Tsinghua University's SATLab and Northwestern Polytechnical University, announced three new multimodal generation technologies in the audio-visual domain. The research outcomes will be progressively open-sourced on platforms like GitHub and HuggingFace.</p>\n<p>The three innovations released include:  \n1. **YingVideo-MV**: A music-driven video generation model capable of producing synchronized music video clips using just \"one music track plus one character image.\" The model performs multimodal analysis of rhythm, emotion, and structural content in music, enabling seamless alignment of camera movements (e.g., pans, zooms, tilts) with audio. Its long-sequence consistency mechanism mitigates common issues like character distortion and frame jumps in lengthy videos.</p>\n<p>2. **YingMusic-SVC**: A zero-shot singing voice conversion model optimized for real-world music scenarios. It minimizes interference from accompaniments, harmonies, and reverb, reducing vocal breakage and pitch distortion risks while providing stable support for high-quality music reproduction.</p>\n<p>3. **YingMusic-Singer**: A singing synthesis model that generates natural vocals with clear pronunciation and stable melody from arbitrary lyrics input. Its flexibility in adapting to varying lyric lengths and zero-shot timbre cloning enhances AI-assisted music creation, lowering the barrier for artistic production.</p>\n<p>These advancements underscore the team's progress in multimodal audio-visual generation technologies.</p>","source":null,"html":"<!DOCTYPE html>\n<html>\n<head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\n<meta name=\"viewport\" content=\"width=device-width,initial-scale=1.0,minimum-scale=1.0,maximum-scale=1.0,user-scalable=no\"/>\n<meta name=\"format-detection\" content=\"telephone=no,email=no,address=no\" />\n<title>Giant Network AI Lab Unveils Multimodal Generation Breakthroughs, Plans to Open-Source Technologies</title>\n<style type=\"text/css\">\na,abbr,acronym,address,applet,article,aside,audio,b,big,blockquote,body,canvas,caption,center,cite,code,dd,del,details,dfn,div,dl,dt,\nem,embed,fieldset,figcaption,figure,footer,form,h1,h2,h3,h4,h5,h6,header,hgroup,html,i,iframe,img,ins,kbd,label,legend,li,mark,menu,nav,\nobject,ol,output,p,pre,q,ruby,s,samp,section,small,span,strike,strong,sub,summary,sup,table,tbody,td,tfoot,th,thead,time,tr,tt,u,ul,var,video{ font:inherit;margin:0;padding:0;vertical-align:baseline;border:0 }\nbody{ font-size:16px; line-height:1.5; color:#999; background:transparent; }\n.wrapper{ overflow:hidden;word-break:break-all;padding:10px; }\nh1,h2{ font-weight:normal; line-height:1.35; margin-bottom:.6em; }\nh3,h4,h5,h6{ line-height:1.35; margin-bottom:1em; }\nh1{ font-size:24px; }\nh2{ font-size:20px; }\nh3{ font-size:18px; }\nh4{ font-size:16px; }\nh5{ font-size:14px; }\nh6{ font-size:12px; }\np,ul,ol,blockquote,dl,table{ margin:1.2em 0; }\nul,ol{ margin-left:2em; }\nul{ list-style:disc; }\nol{ list-style:decimal; }\nli,li p{ margin:10px 0;}\nimg{ max-width:100%;display:block;margin:0 auto 1em; }\nblockquote{ color:#B5B2B1; border-left:3px solid #aaa; padding:1em; }\nstrong,b{font-weight:bold;}\nem,i{font-style:italic;}\ntable{ width:100%;border-collapse:collapse;border-spacing:1px;margin:1em 0;font-size:.9em; }\nth,td{ padding:5px;text-align:left;border:1px solid #aaa; }\nth{ font-weight:bold;background:#5d5d5d; }\n.symbol-link{font-weight:bold;}\n/* header{ border-bottom:1px solid #494756; } */\n.title{ margin:0 0 8px;line-height:1.3;color:#ddd; }\n.meta {color:#5e5c6d;font-size:13px;margin:0 0 .5em; }\na{text-decoration:none; color:#2a4b87;}\n.meta .head { display: inline-block; overflow: hidden}\n.head .h-thumb { width: 30px; height: 30px; margin: 0; padding: 0; border-radius: 50%; float: left;}\n.head .h-content { margin: 0; padding: 0 0 0 9px; float: left;}\n.head .h-name {font-size: 13px; color: #eee; margin: 0;}\n.head .h-time {font-size: 11px; color: #7E829C; margin: 0;line-height: 11px;}\n.small {font-size: 12.5px; display: inline-block; transform: scale(0.9); -webkit-transform: scale(0.9); transform-origin: left; -webkit-transform-origin: left;}\n.smaller {font-size: 12.5px; display: inline-block; transform: scale(0.8); -webkit-transform: scale(0.8); transform-origin: left; -webkit-transform-origin: left;}\n.bt-text {font-size: 12px;margin: 1.5em 0 0 0}\n.bt-text p {margin: 0}\n</style>\n</head>\n<body>\n<div class=\"wrapper\">\n<header>\n<h2 class=\"title\">\nGiant Network AI Lab Unveils Multimodal Generation Breakthroughs, Plans to Open-Source Technologies\n</h2>\n\n<h4 class=\"meta\">\n\n\n<a class=\"head\" href=\"https://laohu8.com/wemedia/1039043262\">\n\n\n<div class=\"h-thumb\" style=\"background-image:url(https://community-static.tradeup.com/news/8296859682db4b478146245e72de1922);background-size:cover;\"></div>\n\n<div class=\"h-content\">\n<p class=\"h-name\">Deep News </p>\n<p class=\"h-time\">2025-11-27 13:51</p>\n</div>\n\n</a>\n\n\n</h4>\n\n</header>\n<article>\n<p>On November 27, Giant Network Group Co.,Ltd.'s AI Lab, in collaboration with Tsinghua University's SATLab and Northwestern Polytechnical University, announced three new multimodal generation technologies in the audio-visual domain. The research outcomes will be progressively open-sourced on platforms like GitHub and HuggingFace.</p>\n<p>The three innovations released include:  \n1. **YingVideo-MV**: A music-driven video generation model capable of producing synchronized music video clips using just \"one music track plus one character image.\" The model performs multimodal analysis of rhythm, emotion, and structural content in music, enabling seamless alignment of camera movements (e.g., pans, zooms, tilts) with audio. Its long-sequence consistency mechanism mitigates common issues like character distortion and frame jumps in lengthy videos.</p>\n<p>2. **YingMusic-SVC**: A zero-shot singing voice conversion model optimized for real-world music scenarios. It minimizes interference from accompaniments, harmonies, and reverb, reducing vocal breakage and pitch distortion risks while providing stable support for high-quality music reproduction.</p>\n<p>3. **YingMusic-Singer**: A singing synthesis model that generates natural vocals with clear pronunciation and stable melody from arbitrary lyrics input. Its flexibility in adapting to varying lyric lengths and zero-shot timbre cloning enhances AI-assisted music creation, lowering the barrier for artistic production.</p>\n<p>These advancements underscore the team's progress in multimodal audio-visual generation technologies.</p>\n\n</article>\n</div>\n</body>\n</html>\n","isBrief":false,"type":0,"news_type":1,"symbol":"BK0196","symbol_name":"行业龙头","start_time":0,"source_url":"","article_id":"1191697745","we_media_id":"1039043262","thumbnails":[],"rights":null,"url":"https://stock-news.laohu8.com/highlight/detail?id=1191697745","pubTimestamp":1764222696,"columns":[],"sourceInfo":null,"weMediaInfo":{"media_name":"Deep News","introduction":"Global Stock Market Deep Analysis","home_visible":1,"id":"1039043262","head_image":"https://community-static.tradeup.com/news/8296859682db4b478146245e72de1922"},"summary":"On November 27, Giant Network Group Co.,Ltd.'s AI Lab, in collaboration with Tsinghua University's SATLab and Northwestern Polytechnical University, announced three new multimodal generation...","collect":0,"end_time":0,"defaultTopTitle":"","property":[],"viewcount":null,"language":"en","relate_stocks":{"BK0196":"行业龙头","002558":"巨人网络","BK0217":"互联网","BK0197":"中小创蓝筹","BK0094":"网络游戏"},"translate_title":"巨人网络人工智能实验室公布多模态生成突破，计划开源技术","themeId":"","isJumpTheme":false,"ttsUrl":null,"symbols_score_info":{"002558":1},"content_text":"On November 27, Giant Network Group Co.,Ltd.'s AI Lab, in collaboration with Tsinghua University's SATLab and Northwestern Polytechnical University, announced three new multimodal generation technologies in the audio-visual domain. The research outcomes will be progressively open-sourced on platforms like GitHub and HuggingFace.\nThe three innovations released include:  \n1. **YingVideo-MV**: A music-driven video generation model capable of producing synchronized music video clips using just \"one music track plus one character image.\" The model performs multimodal analysis of rhythm, emotion, and structural content in music, enabling seamless alignment of camera movements (e.g., pans, zooms, tilts) with audio. Its long-sequence consistency mechanism mitigates common issues like character distortion and frame jumps in lengthy videos.\n2. **YingMusic-SVC**: A zero-shot singing voice conversion model optimized for real-world music scenarios. It minimizes interference from accompaniments, harmonies, and reverb, reducing vocal breakage and pitch distortion risks while providing stable support for high-quality music reproduction.\n3. **YingMusic-Singer**: A singing synthesis model that generates natural vocals with clear pronunciation and stable melody from arbitrary lyrics input. Its flexibility in adapting to varying lyric lengths and zero-shot timbre cloning enhances AI-assisted music creation, lowering the barrier for artistic production.\nThese advancements underscore the team's progress in multimodal audio-visual generation technologies.","kind":"news","is_publish_news":true,"is_publish_highlight":false,"is_publish_live":false,"is_publish_wemedia":null,"editions":null,"column":"","sentiment":"1","news_tag":"","news_rank":0,"symbols":[],"gpt_button":0,"need_auth":false,"code":"91000000","status":"200"},"commentList":[],"isCommentEnd":true,"newsSizeData":{"likeSize":0,"commentSize":0,"repostSize":0,"favoriteSize":0,"likeStatus":false,"favoriteStatus":false},"APP":{"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","isDev":false,"isTTM":true,"tenantId":"TBGLOBAL","deviceId":"web-server-community-laohu8-v3","version":"4.36.0","shortVersion":"4.36.0","platform":"web","vendor":"web","appName":"ttm","isIOS":false,"isAndroid":false,"isTiger":false,"isTHS":false,"isWeiXin":false,"isWeiXinMini":false,"isWeiBo":false,"isQQ":false,"isBaiduSwan":false,"isBaiduBox":false,"isDingTalk":false,"isToutiao":false,"isOnePlus":false,"isHuaWei":false,"isXiaomi":false,"isXiaomiWebView":false,"isOppo":false,"isVivo":false,"isSamsung":false,"isMobile":false},"href":"/m/news/1191697745","isCrawlerRequest":true}