https://github.com/open-mmlab/Amphion
Tip revision: 6e9d34f498b41f12f889923566ddaafd6f50e8cc authored by zyingt on 23 February 2024, 15:03:35 UTC
Support Multi-speaker VITS (#131)
Support Multi-speaker VITS (#131)
Tip revision: 6e9d34f
README.md
# Amphion Text-to-Speech (TTS) Recipe
## Quick Start
We provide a **[beginner recipe](VALLE/)** to demonstrate how to train a cutting edge TTS model. Specifically, it is Amphion's re-implementation for [Vall-E](https://arxiv.org/abs/2301.02111), which is a zero-shot TTS architecture that uses a neural codec language model with discrete codes.
## Supported Model Architectures
Until now, Amphion TTS supports the following models or architectures,
- **[FastSpeech2](FastSpeech2)**: A non-autoregressive TTS architecture that utilizes feed-forward Transformer blocks.
- **[VITS](VITS)**: An end-to-end TTS architecture that utilizes conditional variational autoencoder with adversarial learning
- **[Vall-E](VALLE)**: A zero-shot TTS architecture that uses a neural codec language model with discrete codes.
- **[NaturalSpeech2](NaturalSpeech2)** (👨💻 developing): An architecture for TTS that utilizes a latent diffusion model to generate natural-sounding voices.
## Amphion TTS Demo
Here are some [TTS samples](https://openhlt.github.io/Amphion_TTS_Demo/) from Amphion.
