Voice Conversion With Just Nearest Neighbors

TL;DR: want to convert your voice to another person’s voice? Or even to a whisper? Or a dog barking? Or to any other random speech clip? Give our new method a try: https://bshall.github.io/knn-vc

Longer version: our research team kept seeing new voice conversion methods getting more complex and becoming harder to reproduce. So, we tried to see if we could make a top-tier voice conversion model that was extremely simple. So, we made kNN-VC, where our entire conversion model is just k-nearest neighbors on WavLM features. And, it turns out, this does as well if not better than very complex any-to-any voice conversion methods. What’s more, since k-nearest neighbors has no parameters, we can use anything as the reference, even clips of dogs barking, music, or references from other languages.

I hope you enjoy our research! We provide a quick-start notebook, code, and audio samples, and vocoder checkpoints https://bshall.github.io/knn-vc/

Voice Conversion With Just Nearest Neighbors(arxiv.org)

Machine Learning

!machinelearning@kbin.social

Community stats

Community moderators