FWIW, it's not quite a model of your voice. The way it works is, a model is trai...

FWIW, it's not quite a model of your voice.

The way it works is, a model is trained of all possible voices. Then your specific voice is projected into latent space.

That's why it can mimic your voice with only a few seconds of audio. It's not making a model, but rather using an existing model.

It may seem like a pedantic distinction, but it's why the model isn't as worrisome as it seems. It can't target you specifically, just the average voice near yours.

It's closer to a really talented parrot than a model that can impersonate you on command. I suspect if you try it out, you'll be surprised it's so far off from your actual voice.