Saqib Shaikh says people who are blind, like himself, typically develop highly organized routines to keep track of their things — putting keys, wallets, canes and other essentials in the same places each time.
But sometimes life gets messy: A child needs help finding a lost stuffed animal, identical garbage bins get moved around on the curb or coats get jumbled together at a party.
Today, a person using Microsoft’s Seeing AI app can point a phone camera at a scene, such as a conference room table, and hear a description of what’s in the frame: laptops, water bottles, power cords, phones. But it would sometimes also be useful for the machine learning algorithms powering the app to recognize objects that are specific to that individual person, said Shaikh, a Microsoft engineer whose team invented Seeing AI.
Until recently, there hasn’t been enough relevant data to train machine learning algorithms to tackle this kind of personalized object recognition for people with vision disabilities. That’s why City, University of London, a Microsoft AI for Accessibility grantee, has launched the Object Recognition for Blind Image Training (ORBIT) research project to create a public dataset from scratch, using videos submitted by people who are blind or have low vision.
The data will be used to train and test new algorithms to recognize and locate important personal objects, which can range from cell phones to face coverings to kitchen tools.
“Without data, there is no machine learning,” said Simone Stumpf, senior lecturer at the Centre for Human-Computer Interaction Design at City, University of London, who leads ORBIT. “And there’s really been no dataset of a size that anyone could use to introduce a step change in this relatively new area of AI.” The lack of machine learning datasets that represent or include people with disabilities is a common roadblock for researchers or developers working with those communities to develop intelligent solutions that can assist with everyday tasks or create AI systems that are less likely to magnify prejudices that can skew decision making.
“We are in a data desert,” said Mary Bellard, principal innovation architect lead at Microsoft who also oversees the AI for Accessibility program. “There’s a lot of passion and energy around doing really cool things with AI and people with disabilities, but we don’t have enough data.”
“It’s like we have the car and the car is packed and ready to go, but there’s no gas in it. We don’t have enough data to power these ideas.”
To begin to shrink that data desert, Microsoft researchers have been working for the past year and a half to investigate and suggest ways to make AI systems more inclusive of people with disabilities. The company is also funding and collaborating with AI for Accessibility grantees to create or use more representative training datasets, such as ORBIT and the Microsoft Ability Initiative with University of Texas at Austin researchers.
Today, Team Gleason announced it is partnering with Microsoft on Project Insight, which will create an open dataset of facial imagery of people living with ALS to help advance innovation in computer vision and train those AI models more inclusively.
It’s an industry-wide problem that won’t be solved by one project or organization alone, Microsoft says. But new collaborations are beginning to address the issue.
A research roadmap on AI Fairness and Disability published by Microsoft Research and a workshop on Disability, Bias and AI hosted last year with the AI Now Institute at New York University found a host of potential areas in which mainstream AI algorithms that aren’t trained on inclusive data either don’t work well for people with disabilities or can actively harm them.
If a self-driving car’s pedestrian detection algorithms haven’t been shown examples of people who use wheelchairs or whose posture or gait is different due to advanced age, for example, they may not correctly identify those people as objects to avoid or estimate how much longer they need to safely cross a street, researchers noted.
AI models used in hiring processes that try to read personalities or interpret sentiment from potential job candidates can misread cues and screen out qualified candidates with autism or who emote differently. Algorithms that read handwriting may not be able to cope with examples from people who have Parkinson’s disease or tremors. Gesture recognition systems may be confused by people with amputated limbs or different body shapes.
It’s fairly common for some people with disabilities to be early adopters of intelligent technologies, yet they’ve often not been adequately represented in the data that informs how those systems work, researchers say.
“When technologies are so desired by a community, they’re often willing to tolerate a higher rate of errors,” said Meredith Ringel Morris, senior principal researcher who manages the Microsoft Research Ability Team. “So imperfect AI systems still have value, but they could provide so much more and work so much better if they were trained on more inclusive data.”
‘Pushing the state of the art’
Danna Gurari, an AI for Accessibility grantee and assistant professor at the University of Texas at Austin, had that goal in mind when she began developing the VizWiz datasets. They include tens of thousands of photographs and questions submitted by people who are blind or have low vision to an app originally developed by researchers at Carnegie Mellon University.
The questions run the gamut: What is the expiration date on this milk? What does this shirt say? Do my fingertips look blue? Do these clouds look stormy? Do the charcoal briquettes in this grill look ready? What does the picture on this birthday card look like?
The app originally crowdsourced answers from people across the internet, but Gurari wondered if she could use the data to improve how computer vision algorithms interpret photos taken by people who are blind.
Many of those questions require reading text, such as determining how much of an over-the-counter medicine is safe to take. Computer vision research has often treated that as a separate problem, for example, from recognizing objects or trying to interpret low-quality photos. But successfully describing real-world photos requires an integrated approach, Gurari said.
Moreover, computer vision algorithms typically learn from large image datasets of pictures downloaded from the internet. Most are taken by sighted people and reflect the photographer’s interest, with items that are centered and in focus.
But an algorithm that’s only been trained on perfect images is likely to perform poorly in describing what’s in a photo taken by a person who is blind; it may be blurry, off center or backlit. And sometimes the thing that person wants to know hinges on a detail that a person who is sighted might not think to label, such as whether a shirt is clean or dirty.
“Often it’s not obvious what is meaningful to people, and that’s why it’s so important not just to design for — but design these technologies with — people who are in the blind and low vision community,” said Gurari, who also directs the School of Information’s Image and Video Computing Group at the University of Texas at Austin.
Her team undertook the massive task of cleaning up the original VizWiz dataset to make it usable for training machine learning algorithms — removing inappropriate images, sourcing new labels, scrubbing personal information and even translating audio questions into text to remove the possibility that someone’s voice could be recognized.
Working with Microsoft funding and researchers, Gurari’s team has developed a new public dataset to train, validate and test image captioning algorithms. It includes more than 39,000 images taken by blind and low vision participants and five possible captions for each. Her team is also working on algorithms that can recognize right off the bat when an image someone has submitted is too blurry, obscured or poorly lit and suggest how to try again.
Earlier this year, Microsoft sponsored an open challenge to other industry and academic researchers to test their image captioning algorithms on the VizWiz dataset. In one common evaluation metric, the top performing algorithm posted a 33% improvement over the prior state of the art.
“This is really pushing the state of the art in captioning for the blind community forward,” said Seeing AI lead engineer Shaikh, who is working with AI for Accessibility grantees and their datasets to develop potential improvements for the app.
A 5 euro bill on a red table. | Black oven temperature knob that is currently in the off position. | A brown window planter with white flowers and yellow flowers that have died. |
A person holding a plush toy of a cartoon dinosaur in their hand. | A fresh banana that is a little green and mostly yellow | A crayon colored drawing of a vase with flowers. |
Making inclusive datasets available to all
Because AI systems model the world based on the data they’re given, people who don’t mirror patterns in the data can be overlooked or actively discriminated against. While the AI community has increasingly acknowledged and worked to improve the fairness of these systems when it comes to gender and race, conversations around being inclusive of people with disabilities are much more nascent, researchers say.
Microsoft Research has launched a multi-pronged effort to define the extent of the problem and avenues for improvement — including the workshop hosted with NYU’s AI Now Institute last year. The workshop convened disability scholars and activists, machine learning practitioners and computer science researchers to begin to discuss how to create AI systems that avoid treating people with disabilities as edge cases or outliers.
“This really points to the question of how ‘normal’ is defined by AI systems and who gets to decide that,” said Kate Crawford, senior principal researcher at Microsoft Research New York and co-founder of the company’s Fairness, Accountability, Transparency and Ethics (FATE) in AI group.
Take the example of a predictive hiring system that assesses video interviews from job candidates and suggests what a “successful” employee will sound and look like, Crawford said.
“Has it been trained on data that suggests that certain abilities or ways of being are standard and therefore desirable? Are people with disabilities or those who are in any way different ranked lower for potential hiring because they differ from the data in the training set? That’s what we really need to be aware of and work against,” Crawford said.
To advance that goal, one area Microsoft researchers are investigating is how often public datasets commonly used to train AI systems include data from people older than 80, because age correlates strongly with disability. Morris and her colleagues have also been exploring how search algorithms might be tweaked to improve results for people with dyslexia.
Last summer, Microsoft hosted disability technologies expert Shaun Kane, an associate professor of computer science at University of Colorado Boulder, as a visiting researcher to jointly investigate how intelligent sensing systems can fail to recognize or respond properly to people who use wheelchairs or have amputated limbs, motor disabilities or body morphology that falls outside of the examples those algorithms have been trained on.
Microsoft and its grantees are also exploring how to navigate practical challenges and are developing ethical approaches for soliciting AI training data from people with disabilities. Some people who worry about stigma or liabilities don’t want to disclose their disability status, for instance, so maintaining privacy is paramount.
Stumpf’s team reviews each video submitted to the ORBIT dataset to ensure it doesn’t inadvertently include identifying information. They also had to create detailed instructions on how to shoot videos of each item, because they need footage from multiple angles and also want people to be able to collect the data without the help of a sighted person.
In the project’s first phase within the United Kingdom, the team collected several thousand videos, making it by far the largest dataset of its kind. The team plans to open up the second phase of data collection globally in mid-October.
“We are really still working out how to balance getting good data that we can innovate with as researchers and enabling people to be the drivers of the technologies they’re going to use in a way that’s not too difficult or has too many rules,” said Cecily Morrison, principal researcher at Microsoft Research Cambridge in the UK. “If people find the process hard or boring, they’re going to think, ‘AI is not for me.’”
Morrison co-leads Project Tokyo, which focuses on how AI can help people who are blind or have low vision make sense of their environments. To that end, she’s collaborating with Stumpf’s team on algorithms that are able to learn from fewer examples, which could have wide-ranging applications.
The goal is to make the ORBIT dataset publicly available, Stumpf said, to help make everyday life better in as many situations as possible. For example, if a person who is blind is visiting a friend’s house for the first time, a navigation app that relies on a GPS system can only get them so close.
“When you’re standing in front of an address, you still need to know if this is actually my friend’s house or someone else’s house,” Stumpf said. “With pictures of that friend’s front door or other places of interest, you could use personalized object recognition to identify locations that are particularly important to you.”
Top image: The VizWiz dataset includes photographs taken by people who are blind or have low vision, such as this image of a stuffed animal and book on a bedspread, to train computer vision algorithms to provide more accurate information about them. Photo available via a Creative Commons 4.0 license.
Related:
- Learn more: The Object Recognition for Blind Image Training (ORBIT) dataset
- Learn more: 2020 VizWiz Grand Challenge Workshop
- Learn more: Seeing AI app
- Learn more: Microsoft Research Ability Group
- Learn more: Microsoft AI for Accessibility
- Read more: Disability, Bias and AI
- Read more: Microsoft Ability Initiative: A collaborative quest to innovate in image captioning for people who are blind or with low vision
- Read more: Where’s my stuff? Developing AI with help from people who are blind or low vision to meet their needs
- Watch: Designing computer algorithms to describe the visual world to people who are blind or low vision (webinar)
Jennifer Langston writes about Microsoft research and innovation. Follow her on Twitter.