Skip to Main Content
Learn more about remote working, online schooling and community support during the COVID-19 outbreak
A woman wearing headphones and taliing into a microphone.

Text-to-speech technology helps produce more audiobooks for people who are blind or have low vision

Al and the cloud generate a synthetic voice that sounds a lot like a Beijing broadcaster.

When Lina Dong lost her sight at age 10, she was shut out from the visual world around her and also from the imaginary one she had enjoyed through reading.

Undeterred, she kept up her school work with the help of others who read textbooks out aloud to her. Over the years, she gained self-confidence, graduated college and became a certified broadcaster — a first for a blind person in China.

Dong now teaches language arts at the Beijing Hongdandan Visually Impaired Service Center — a nonprofit educational institution where she once took classes. Knowing firsthand the importance of the spoken word to learning, she also makes audiobooks for her students and others who are blind or have low vision.

But production can be slow and limited. She must train volunteers in studio techniques and how best to read printed text so listeners can readily understand. Recording sessions and editing can go on for hours.

Now Hongdandan and Microsoft have developed a new way — using artificial intelligence (AI) and the cloud to create a synthetic version of Dong’s voice.

Recently, she happily offered up samples of her speech so that Custom Neural Voice, a new text-to-speech capability in Microsoft Azure Cognitive Services, could generate a real-to-life voice that comes close to hers. From there, the Audio Content Creation platform makes high-quality audiobooks that almost sound as if they are being read by Dong herself.

This process augments, and is much quicker than, the standard way Hongdandan and Dong have been making audiobooks. And that means people who are blind or have low vision can now access a much wider range of books faster than before.

“Hongdandan and I share the same goal: to help people who are blind or have low vision better fit into society.

“So, when someone has a dream, we are able to open a path for them.”

– Lina Dong

A woman holds a microphone in a group setting.

Dong says that having more audiobooks available helps the center’s students gain higher grades and valuable skills that will boost their future job prospects

“Hongdandan and I share the same goal: to help people who are blind or have low vision better fit into society. So, when someone has a dream, we are able to open a path for them; for example, to help a teenager … to learn and gain employment.”

In China, employment opportunities have long been severely limited for people who are blind or have low vision. Traditionally, many were only able to find work in therapeutic massage centers popular across the country and other parts of Asia. In fact, for many years, most Chinese books in braille were about professional massage techniques.

Hongdandan’s founder, Zheng Xiaojie, decided to change that. In 2006, she set up the Eyes of the Soul Library – a project she describes as her “lifetime’s dream.” The idea of producing a wide range of easily accessible audiobooks came from young people who were blind or who had low vision.

“They knew we did movie commentaries and job training for blind people,” Zheng recalls. “And they wanted help with recording audiobooks on topics, such as law and early childhood education, so they could study and pass exams.

“At that time, we didn’t have specialized recording equipment. We set up a computer and used microphones from our children’s program. After recording, we just gave the young people the audio files. So, you can imagine that it was a very simple and basic process.”

A woman show objects to a group of children
Hongdandan’s founder Zheng Xiaojie shares some audiobooks with a group of school children.

Nowadays the library rolls out content via Microsoft Azure to 105 schools across China for students who are blind or have low vision. They can also access 1,000-plus titles on the library’s own app and a mini-program on WeChat, China’s popular social media platform.

Microsoft has been Hongdandan’s partner for around 15 years. And the center produces its audiobooks in line with Microsoft’s commitment to responsible AI, which safeguards against the misuse of the technology and prioritizes transparency, fairness, accountability, privacy and security.

“Microsoft has been in contact with us all the time,” says Zheng. “Supporting all aspects of the Eyes of the Soul Library, including the AI voice service we are using now, which was unimaginable for us before. In front-line jobs, we knew the needs of blind people, but we didn’t know how to use high-tech methods to solve their needs. In fact, technology is a particularly good method for the education of people who are blind or who have low vision. It brings us closer together.”

ALSO READ: Are you talking to me? Azure AI brings iconic characters to life with Custom Neural Voice

As well as teaching and volunteering, Dong is currently in a graduate program at the Communication University of China where she is researching the creation and use of synthetic voices. “As a blind person, the development of technology has changed my life,” she says.

So, with her experience and well-tuned ear for voices, how does she rate Microsoft’s AI creations, including her own?

“Microsoft’s Custom Neural Voice actually simulates a real voice much better than more general synthetic voices,” she says. “For example, there are some tone changes and more details to the voices—these details are really good.”

Dong says that whether real or synthetic, an ideal audio voice needs to sound warm and clear, with a sense of confidence and even a feeling of love and affection. “The most similar point between a human voice and Microsoft’s Custom Neural Voice is the timbre—the timbre of the Custom Neural Voice is really vivid.”

Both Dong and Zheng emphasize the importance of the Eyes of the Soul Library for improving education and employment prospects for people who are blind or have low vision. But they also see another crucial benefit: a sense of connection that instills confidence and self-reliance.

Zheng says many people who are blind or have low vision can now “seize opportunities in the internet era and find the professions and positions they are good at.

“We give them a channel to acquire knowledge and know the world. Having the companionship of a voice has eliminated the distance between them and the world, so many have become more positive and confident. They no longer have a sense of isolation or fear of the world. They believe that they can do a lot of things all by themselves.”

All images are courtesy of the Hongdandan Visually Impaired Service Center. TOP: Lina Dong in a recording booth. CENTER: Lisa Dong (center) conducts a lesson with students.