On the afternoon of October 15th, Giant Network Group Co.,Ltd.'s AI Lab and the SATLab research team from Tsinghua University's Department of Electronic Engineering jointly pioneered the DiaMoE-TTS multi-dialect speech synthesis large model framework, with data, code, and methods fully open-sourced to promote fairness and accessibility in dialect speech synthesis.
In today's era of large model-driven speech synthesis, general TTS systems have demonstrated remarkable capabilities, but dialect TTS remains a "gray area" that practitioners find difficult to access. Existing industrial-grade models often rely on massive proprietary datasets, leaving dialect TTS practitioners and researchers with little recourse: they lack unified corpus construction methods and, more critically, an end-to-end open-source framework capable of handling multiple languages.
The jointly pioneered DiaMoE-TTS represents an open-source comprehensive solution that rivals industrial-grade dialect TTS models to a considerable extent. Based on linguists' professional expertise, they constructed a unified IPA expression system and proposed this solution relying solely on open-source dialect ASR data.
Before launching Chinese dialect versions including Cantonese, Sichuanese, and Shanghainese, the research team had already validated the approach across multiple languages including English, French, German, and Dutch Bilzen, ensuring the method's global scalability and robustness across multiple languages.
Giant Network Group Co.,Ltd.'s AI Lab and Tsinghua University's SATLab hope to promote fairness and accessibility in dialect speech synthesis through this initiative, enabling any researcher, developer, or language and cultural preservation worker to freely use, improve, and extend this framework. Their goal is to prevent minority languages and dialects from being drowned out in the flood of general large models, allowing them to be more widely heard and preserved through the power of open source.
Comments