VALL-E Wows with 3-Second Voice Cloning

6 January 2023·45 words·1 min · Download pdf

Just about to wrap up my day and saw VALL-E! Wow!! This model takes 3 seconds of speech sample for a person and can synthesize text-to-speech in same voice with unbelievable fidelity. It can maintain even emotion and acoustic environment in the sample.

https://valle-demo.github.io

Discussion