About me

Ziqian Ning is a master student in the Audio, Speech and Language Processing Laboratory at Northwestern Polytechnical University (ASLP@NWPU), Xi’an, China, supervised by Prof. Lei Xie . He is currently performing research at Netease Fuxi AI Lab. His research interests include voice conversion, text-to-speech and audio/music generation.

Internships

2024.03 - 2024.09, Azure Speech, Microsoft, China.
2022.06 - 2024.03, Fuxi AI Lab, Netease, China.
2021.07 - 2021.09, TEG, Tencent, China.

Publications

Singing Voice Generation

DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion, Ziqian Ning, Huakang Chen, Yuepeng Jiang, Chunbo Hao, Guobin Ma, Shuai Wang, Jixun Yao, Lei Xie. Under review.
Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation, Ziqian Ning, Shuai Wang, Yuepeng Jiang, Jixun Yao, Lei He, Shifeng Pan, Jie Ding, Lei Xie. AAAI, 2025. Demo.
VITS-Based Singing Voice Conversion Leveraging Whisper and multi-scale F0 Modeling, Ziqian Ning, Yuepeng Jiang, Zhichao Wang, Bin Zhang, Lei Xie. ASRU, 2023. Demo

Voice Conversion (VC)

Noise-Robust Expressive Zero-Shot Voice Conversion with Shortcut Models. Under review.
StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching, Jixun Yao, Yuguang Yang, Yu Pan, Ziqian Ning, Jiaohao Ye, Hongbin Zhou, Lei Xie. AAAI, 2025.
PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts, Jixun Yao, Yuguang Yang, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, Jingjing Yin, Hongbin Zhou, Heng Lu, Lei Xie. ICASSP, 2024. Demo
Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features, Ziqian Ning, Qicong Xie, Pengcheng Zhu, Zhichao Wang, Liumeng Xue, Jixun Yao, Lei Xie, Mengxiao Bi. ICASSP, 2023. Demo
Preserving background sound in noise-robust voice conversion via multi-task learning, Jixun Yao, Yi Lei, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie, Hai Li, Junhui Liu, Danming Xie. ICASSP, 2023. Demo

Streaming Voice Conversion

DistilVC: Leveraging Synthetic Data for End-to-end Low Latency Streaming Voice Conversion. Under review.
DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion, Ziqian Ning, Shuai Wang, Pengcheng Zhu, Zhichao Wang, Jixun Yao, Lei Xie, Mengxiao Bi. INTERSPEECH, 2024. Demo
DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion, Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Shuai Wang, Jixun Yao, Lei Xie, Mengxiao Bi. ICASSP, 2024. Demo
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding, Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Jixun Yao, Shuai Wang, Lei Xie, Mengxiao Bi. INTERSPEECH, 2023. Demo

Speaker Anonymization

NPU-NTU System for Voice Privacy 2024 Challenge, Jixun Yao, Nikita Kuzmin, Qing Wang, Pengcheng Guo, Ziqian Ning, Dake Guo, Kong Aik Lee, Eng-Siong Chng, Lei Xie. INTERSPEECH, 2025.
Distinctive and Natural Speaker Anonymization via Singular Value Transformation-assisted Matrix, Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie. TASLP.
MUSA: Multi-lingual Speaker Anonymization via Serial Disentanglement, Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Yuguang Yang, Yu Pan, Lei Xie. Submitted to TASLP (under review).

Text to Speech

FPO: Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech, Jixun Yao, Yang Yuguang, Yuan Feng, Yu Pan, Ziqian Ning, Jianhao Ye, Hongbin Zhou, Lei Xie. Under review.
Accent-VITS: accent transfer for end-to-end TTS, Linhan Ma, Yongmao Zhang, Xinfa Zhu, Yi Lei, Ziqian Ning, Pengcheng Zhu, Lei Xie. NCMMSC, 2023. Demo

Project Experience

Singing Voice Conversion Challenge 2023
- Propose a VITS-based singing voice conversion model that leverages Whisper bottleneck features as linguistic information and uses PBTC module extracts multi-scale F0 to better capture the pitch variation. The results of the official competition measurements demonstrate that our system achieves human-level naturalness, ranking first and second in Task 1 and Task 2, respectively. Demo
Online Text-to-speech synthesis system
- Develop a text-to-speech system to provide high availability and scalability for online services. Models are encapsulated in separate microservices that are managed using Kubernetes. Kafka is used for inter-model messaging, and the use of message queue makes it possible to parallelize a large number of microservice replicas.

Patents

CN115910083A Real-time voice conversion method, device, electronic equipment and medium.
CN116013336A Voice conversion method, device, electronic equipment and storage medium.
CN116364099A Voice conversion method, device, electronic apparatus, storage medium, and program product.
CN118136033A Method, device, electronic equipment and storage medium for converting drama voice.