VALL-E Wows with 3-Second Voice Cloning
Just about to wrap up my day and saw VALL-E! Wow!! This model takes 3 seconds of speech sample for a person and can synthesize text-to-speech in same voice with unbelievable fidelity. It can maintain even emotion and acoustic environment in the sample.