NHV-Sing Demo

Kiritan & Natsume

Comparing speaker-specific training vs. M4Singer fine-tuning on Japanese singing voice. "Trained (speaker)": model trained exclusively on each speaker's data. "M4Singer fine-tuned": model fine-tuned on M4Singer (20 Mandarin singers), then applied to these speakers.

Speaker	Ground Truth	Trained (speaker)	M4Singer fine-tuned
Tohoku Kiritan unseen
Natsume Yuri

M4Singer Fine-tuning Samples

Samples from the M4Singer dataset (Zhang et al., NeurIPS 2022 · CC BY-NC-SA 4.0). Ground truth audio is included for research demonstration purposes with full attribution.
Note: The M4Singer fine-tuned model is designed as a general-purpose vocoder capable of synthesizing a wide pitch range across diverse speakers. However, as these samples suggest, better quality can be achieved by focusing on a specific speaker and narrower pitch range.