Skip to main content

VALL-E Wows with 3-Second Voice Cloning

·45 words·1 min · Download pdf

Just about to wrap up my day and saw VALL-E! Wow!! This model takes 3 seconds of speech sample for a person and can synthesize text-to-speech in same voice with unbelievable fidelity. It can maintain even emotion and acoustic environment in the sample.

https://valle-demo.github.io

Discussion