The Silicon Trend Tech Bulletin

Icon Collap
Home / AI / ML / Voice Cloning: Next-generation AI Speech Synthesis

Voice Cloning: Next-generation AI Speech Synthesis

Published Thu, Sep 16 2021 05:10 am
by The Silicon Trend


Voice Cloning: Next-generation AI Speech Synthesis

Artificial Intelligence (AI) speech synthesis is swift & straightforward, which can for a moment fool anyone to whomever the audio is sent. However, all we have to do is speak into a microphone & send the audio files for processing. Once our voice copy is ready, we receive a notification, after which we can type anything in the chatbox & our AI clone will repeat back with the audio. 


Voice Clone Enhancements & Controversies

With the machine learning (ML) advancement, the voice clone has significantly improved over the years. Nowadays, neural networks are trained on unsorted information of the target voice to yield raw audio. Even though the quality won't be top-notch but will get better in the coming years. These deepfake techs have become so common that they are available in speech synthesis specialist shops such as Respeecher & Resemble.AI. This tech can also be integrated into platforms - Descript & Veritone.

A documentary about Anthony Bourdain prompted controversy in July when the developers unveiled their leverage of AI to clone the voice notes of Bourdain. This case was considered tech exploitation by many. Last month, Sonantic - a startup firm, publicized about AI voice clone creation of an actor - Val Kilmer. Veritone released a similar version this year, stating it will allow actors, influencers & athletes to license their AI voice.


Voice Rental with Minor Effort

An American actor - Walter Bruce Willis, has licensed his picture to be leveraged by visual deepfake in mobile adverts in Russia. Such techs aren't widespread, yet it seems like a good option for celebrities to make money. These audio or visual deepfakes can gear up the economic scale, permitting them to capitalize on the fame.

American company - Descript's overdub facet allows a podcaster to develop AI voice deepfakes so producers can make swift modifications to the audio. The company's CEO - Andrew Mason, said, "You can not only delete words in Descript and have it delete the audio, but you can also type words, and it will generate audio in your voice." However, its voice deepfake is certainly not perfect as they cannot charge line with emphasis & emotions.


Potential Risks

Amid pandemic, everything has taken a turn to the digital world, with hacking risks at their peak. Similarly, there is a potential risk for voice clone techs. Fraudsters have started using these innovative deepfakes to trick firms into moving money to their accounts & for other malicious activities. If the tech uses are to go by, where the fear of political misinformation has proven to be misplaced, the deepfake has done severe damage leading to non-consensual pornography & these sort of situations can pose enormous risks.





Image source: