Skip to main content
Neural engineering

Neural engineering

Decoder translates brain activity into speech

24 Apr 2019 Tami Freeman
Gopala Anumanchipalli
First author Gopala Anumanchipalli holds an array of intracranial electrodes of the type used to record brain activity in the study. (Courtesy: UCSF)

Neurological conditions or injuries that result in the inability to communicate can be devastating. Patients with such speech loss often rely on alternative communication devices that use brain–computer interfaces (BCIs) or nonverbal head or eye movements to control a cursor to spell out words. While these systems can enhance quality-of-life, they can only produce around 5–10 words per minute, far slower than the natural rate of human speech.

Researchers from the University of California San Francisco today published details of a neural decoder that can transform brain activity into intelligible synthesized speech at the rate of a fluent speaker (Nature 10.1038/s41586-019-1119-1).

“It has been a longstanding goal of our lab to create technology to restore communication for patients with severe speech disabilities,” explains neurosurgeon Edward Chang. “We want to create technologies that can generate synthesized speech directly from human brain activity. This study provides a proof-of-principle that this is possible.”

Chang and colleagues Gopala Anumanchipalli and Josh Chartier developed a method to synthesize speech using brain signals related to the movements of a patient’s jaw, larynx, lips and tongue. To achieve this, they recorded high-density electrocorticography signals from five participants undergoing intracranial monitoring for epilepsy treatment. They tracked the activity of areas of the brain that control speech and articulator movement as the volunteers spoke several hundred sentences.

To reconstruct speech, rather than transforming brain signals directly into audio signals, the researchers used a two-stage approach. First, they designed a recurrent neural network that decoded the neural signals into movements of the vocal tract. Next, these movements were used to synthesize speech.

Synthesizing speech

“We showed that using brain activity to control a computer simulated version of the participant’s vocal tract allowed us to generate more accurate, natural sounding synthetic speech than attempting to directly extract speech sounds from the brain,” says Chang.

Clearly spoken

To assess the intelligibility of the synthesized speech, the researchers conducted listening tasks based on single-word identification and sentence-level transcription. In the first task, which evaluated 325 words, they found that listeners were better at identifying words as syllable length increased and the number of word choices (10, 25 or 50) decreased, consistent with natural speech perception.

Speech synthesis (Credit: Chang lab/UCSF)

For the sentence-level tests, the listeners heard synthesized sentences and transcribed what they heard by selecting words from a defined pool (of either 25 or 50 words) including target and random words. In trials of 101 sentences, at least one listener was able to provide a perfect transcription for 82 sentences with a 25-word pool and 60 sentences with a 50-word pool. The transcribed sentences had a median word error rate of 31% with a 25-word pool size and 53% with a 50-word pool.

“This level of intelligibility for neurally synthesized speech would already be immediately meaningful and practical for real world application,” the authors write.

Restoring communication

While the above tests were conducted in subjects with normal speech, the team’s main goal is to create a device for people with communication disabilities. To simulate a setting where the subject cannot vocalize, the researchers tested their decoder on silently mimed speech.

For this, participants were asked to speak sentences and then mime them, making the same articulatory movements but without sound. “Afterwards, we ran our speech decoder to decode these neural recordings, and we were able to generate speech,” explains Chartier. “It was really remarkable that we could still generate audio signals from an act that did not create audio at all.”

So how can person who cannot speak be trained to use the device? “If someone can’t speak, then we don’t have a speech synthesizer for that person,” says Anumanchipalli. “We have used a speech synthesizer trained on one subject and driven that by the neural activity of another subject. We have shown that this may be possible.”

“The second stage could be trained on a healthy speaker, but the question remains: how do we train decoder 1?” adds Chartier. “We’re envisioning that someone could learn by attempting to move their mouth to speak — although they cannot — and then via a feedback approach learn to speak using our device.”

The team now has two aims. “First, we want to make the technology better, make it more natural, more intelligible,” says Chang. “There’s a lot of engineering going on in our group to figure out how to improve it.” The other challenge is to determine whether the same algorithms used for people with normal speech will work in a population that cannot speak — a question that may require a clinical trial to answer.

Copyright © 2024 by IOP Publishing Ltd and individual contributors