Comparing speaker-specific training vs. M4Singer fine-tuning on Japanese singing voice.
"Trained (speaker)": model trained exclusively on each speaker's data.
"M4Singer fine-tuned": model fine-tuned on
M4Singer
(20 Mandarin singers), then applied to these speakers.
Speaker
Ground Truth
Trained (speaker)
M4Singer fine-tuned
Tohoku Kiritan unseen
Natsume Yuri
M4Singer Fine-tuning Samples
Samples from the M4Singer dataset
(Zhang et al., NeurIPS 2022 · CC BY-NC-SA 4.0).
Ground truth audio is included for research demonstration purposes with full attribution. Note: The M4Singer fine-tuned model is designed as a general-purpose vocoder
capable of synthesizing a wide pitch range across diverse speakers.
However, as these samples suggest, better quality can be achieved by focusing on
a specific speaker and narrower pitch range.