This technology may be helpful for people attempting to communicate at noisy locations or events such as factories, conferences, or busy restaurants.
We’ve all tried to hold conversations in noisy settings. For some occupations, it comes with the territory — like working on an airport tarmac, in an industrial machine shop, or even trade show floors. Now, it appears artificial intelligence (AI) may have a fix for that. Researchers from the University of Washington have devised an AI system that can sift through sounds and amplify the one that a listener seeks.
The prototype AI system, which would be loaded onto headphones or earbuds, provides users the ability to select the person with whom they wish to speak, and hear their voice while canceling out all other sounds. Called Target Speech Hearing, the software “could help wearers focus on specific voices in noisy environments, such as a friend in a crowd or a tour guide amid the urban hubbub.”
See also: Artificial Intelligence Brings More Clarity to Hearing Devices
Essentially, the embedded AI can quickly learn the vocal range and patterns of the selected speaker. The software is designed so “the wearer looks at the target speaker for a few seconds to capture a single, short, highly noisy, binaural example of the target speaker,” according to an introductory paper prepared by its creators, led by Bandhav Veluri of the University of Washington.
This technology may be helpful for people attempting to communicate at noisy locations or events such as factories, conferences, or busy restaurants.
“Our system achieves a signal quality improvement of 7.01 dB using less than 5 seconds of noisy enrollment audio and can process 8 milliseconds of audio chunks in 6.24 milliseconds on an embedded CPU,” Veluri and his co-researchers state. The researchers say the system works both indoors and outdoors. The system runs in real-time on an embedded IoT CPU, they add.
The Target Speech Hearing’s AI is capable of following speakers as they move about the environment. The system was trained on synthetic data “and yet allows our system to generalize to real-world unseen target and interfering speakers and their head-related transfer functions. We also introduce a fine-tuning mechanism that addresses moving sources and sudden changes in the listener’s head orientation.”