For years, developers and tech enthusiasts alike have been trying to recreate the human voice digitally. Defying the norm, Google’s DeepMind has recently achieved a major breakthrough, claiming the feat surpasses 50% of the existing technology. The British-based company is renowned for its relentless pursuit of developing ‘super’ artificial intelligence (AI) capabilities –paving the way for innovative strides in this field.
In a recent [post on their website](https://deepmind.com/blog/wavenet-generative-model-raw-audio/), DeepMind revealed their latest creation: an AI system that sounds almost indistinguishable from the human voice. Named WaveNet, this advanced system can replicate individual human sound waves remarkably well. The company took it a step further by comparing WaveNet’s performance with existing systems, including Google’s own. The results were astounding—WaveNet outperformed them all by at least 50%, thus bringing us one step closer to a hyper-realistic, [text-to-speech](https://murf.ai/text-to-speech) future.
So, what does this mean for the future of technology? Let’s break it down. At its core, WaveNet’s aim isn’t to simply mimic the human voice. Instead, it’s programmed to understand how humans pronounce words in different languages, and from that learning, create new words of its own. As we inch closer each day to perfecting this AI technology, we can only imagine the advancement in human and machine interactions.
To achieve its uncanny realism, the WaveNet leans on massive sets of short human voice recordings. By combining these voices, the system learns and develops the capability to form entirely new words. This breakthrough puts Google at the forefront of the AI industry and sets a new benchmark for rivals like Apple, whose plans for their AI—digital assistant Siri, remain relatively undisclosed.
But how does WaveNet stack against other digital assistants like Apple’s Siri, Microsoft’s Cortana, or Amazon’s Alexa?
While these assistants are powered by artificial intelligence and can effectively handle human queries, they all use a process called ‘concatenative text to speech’. In layman’s terms, this method involves using a large database of short speech fragments recorded from a single speaker, which are then recombined to form complete responses.
The current downside to these existing systems is their inability to express emotions or switch to a different speaker without having to record a new database. This means that systems like Siri and Cortana can only convey what they have been programmed to say and can’t express human emotional tones. For example, to make the ‘concatenative text to speech (TTS)’ system stress a particular word, a human would have to record every possible sound in different ways–a tremendously daunting task.
This issue led to the creation of the ‘Parametric TTS’, which is considered to be the other extreme of the text-to-speech spectrum. The parametric model is a purely computer-generated method that relies on programmed rules and doesn’t require human voice inputs. DeepMind defines this as a model where, “contents and characteristics of the speech can be controlled via the inputs to the model.”
The novelty of WaveNet lies in its ability to learn from human recordings independently and create its own range of voices and words. WaveNet takes it a step further by learning realistic human aspects of speech such as pausing and taking a breath. It’s also capable of developing entirely new content that fits a different context from the original. This innovative approach heralds a whole new dimension to the future of AI, making interaction with machines more realistic and ‘human.’
Undoubtedly, WaveNet’s wide application potential is already causing stir within the industry. Its realistic AI voice technology could eventually enhance digital assistant services, making our interactions with Siri, Alexa, and other devices more engaging and immersive.
Nevertheless, as with any novel innovation, it does come with its own set of challenges. At present, the main issue hindering WaveNet’s commercial debut is its high computational requirement which can make real-time applications burdensome. However, with the rapid advancements in AI and computing, this hiccup is bound to be overcome sooner than later.
In the era where ‘Alexa’ and ‘Hey Siri’ have become commonplace, DeepMind’s achievements hint towards an exciting future where AI systems could seamlessly blend in with human intelligence. Echoing the space and arms race of the ’60s and ’70s, we are on the cusp of an AI race, and with companies like DeepMind leading the charge, the future certainly looks promising.
So, as we march forward to an AI-integrated future, keep an ear open for the human-like voices of our machines – you might just get a surprise!
[DeepMind, the British AI firm responsible for the cutting-edge AlphaGo program, was acquired by Google in 2014.](https://en.wikipedia.org/wiki/DeepMind) .
Discover more from TechBooky
Subscribe to get the latest posts sent to your email.