The AI that speaks like Bill Gates

Facebook AI Research engineers develop an artificial intelligence capable of mimicking Bill Gates ‘ human voice and intonation based on machine learning and that aims to outperform today’s voice assistants.

Do you find the voice of virtual assistants somewhat disconcerting? Don’t they inspire confidence? This could change. A facebook research team it has managed to overcome the limitations of today’s computer voice systems by creating a technology that manages to imitate the human voice of real characters, according to MIT Technology Review sources.

Until the arrival of Artificial Intelligence in the field of voice generation, audio synthesizer systems did not create audio as such, they were limited to pasting phonemes that had been previously recorded. It was in 2016, when WaveNet was introduced, the Artificial Intelligence based on machine learning that gives voice to the Google assistant that revolutionized text-to-speech systems.

One more step in converting text to speech

MelNet is the AI created by Sean Vasquez and Mike Lewis that could suppose another qualitative leap in this area. The characteristic of this technology is that it uses a neural network trained from high resolution spectrograms which replaces the waveform diagrams used so far.

Waveform

While waveforms capture the change in time of a parameter, spectrograms capture the change over a wide frequency range. This allows you to generate a data representation that includes much more information about audio. This information is analyzed by AI and tries to mimic its reproduction, according to the MIT Technology Review.

The Facebook team has managed to train this technology to mimic the voice of Microsoft creator Bill Gates. MelNet was trained using around 425 hours of TED talks and a multitude of audiobooks. This system has some limitations, as it is not yet able to replicate the voice with its intonation variations throughout the speech.

Voice assistants: a double-edged sword

This breakthrough, though revolutionary, poses some dangers. If this technology is capable of imitating the human voice, how will we know how to differentiate a true speech from a false one?

The fakes news could be the big beneficiaries of this powerful technology through the dissemination of political speeches or news that do not correspond to the reality of what happened. So, from now on, we must be attentive to the veracity of both what we see and what we hear.

If you want to continue reading about Artificial Intelligence and virtual assistants do not miss this post about Aura, the Artificial Intelligence of Telefónica.