Open data: The name alone can cause some confusion. Then there are some myths and misconceptions associated with it.
Organizations have questions about how and why they should make data freely available, or open.
We talked to Jule Sigall, Associate General Counsel, Open Innovation, in Microsoft Corporate, External and Legal Affairs, to explore this topic. Here are five of the most commonly encountered open data misconceptions, and responses to them.
[Subscribe to Microsoft on the Issues for more on the topics that matter most.]
I can’t ‘open’ my data because it’s associated with privacy concerns
“Possibly,” says Sigall. “But sometimes it’s important to work through those issues.”
In health care, for example, privacy rightly receives a lot of attention. After all, the details of a person’s health are among the most personal of all personal data. But sharing data in this sector could also help accelerate the use of data-driven decisions in tackling disease and thus inform health policy in the future.
This is where differential privacy tools can help.
“Let’s say you can have a set of patient information that contains very sensitive, personal, private data,” says Sigall, “but you want to run pattern detection machine learning technologies across that data to understand what kind of patients have particular or conditions.
“Differential privacy effectively injects white noise into the data, so that you can still draw statistically significant inferences from it, but you can’t see the individual’s private information.”
A new open source differential privacy platform, developed by Microsoft and Harvard’s Institute for Quantitative Social Science and the School of Engineering and Applied Sciences, was recently released as part of the OpenDP initiative, as a way to give developers globally the opportunity to leverage expert differential privacy implementations and join the community.
[READ MORE: What do we mean by open data and data collaboration?]
Open data means I have to share my data with everyone for any use
“Yes, that’s one way you can improve collaboration,” says Sigall. “But it’s not the only way. You can make your data available to just a certain number of people and only for a certain number of uses.”
The parameters of acceptable use would be established within a data-sharing agreement. Anyone wanting to use your data would then be bound by the agreement’s terms and conditions, which could cover multiple scenarios, such as not using mapping data for oil and gas exploration, for example.
There is also an important role for automation here.
“Azure is developing a technology called Azure Confidential Computing, where the data sits in hardware-backed secure enclaves, allowing you to conduct machine learning or analysis with the data without actually accessing the data itself,” explains Sigall.
So, potentially, a collection of suppliers in an industry could leverage Azure Confidential Computing and securely share their data with each to draw insights and learning. That way, one supplier will only have access to their
I can’t ‘open’ my data because my competitors will use it against me
“You may be missing an opportunity that you can’t see, but someone outside your organization can,” says Sigall. “There’s almost a hidden cost to being selective or protective around your data – you might be preventing it from being reviewed or understood in a way that you’re currently not looking at it.” Azure Confidential Computing technology also applies in this context.
There is also a parallel with the open source software movement, which took the idea that a programmer’s code was valuable and must be kept secret, and turned it on its head. Proponents of open source software demonstrated that when code is made freely available to others, new applications can be developed that help solve customers’ problems in new ways.
“We’re hopeful these opportunities can be captured through voluntary efforts to make data more open,” says Sigall. “Ideally organizations should make their own choices around their data and how it’s used.”
Understanding the opportunity costs of being protective of your data is part of assessing whether the open data route is a good fit for any given organization.
[READ MORE: 4 ways sharing data is improving our world]
Data sharing is too difficult
“Certainly, I think people feel like we don’t have the right tools to make data available in a secure, trusted way. Or even in a way that you can actually pull the data down and use it effectively,” says Sigall.
But most things look difficult until the solution has been made available. There was a similar situation in the open source community for a long time. But then came solutions such as Git and container technology that made it possible to share and collaborate on code. Sometimes answers like that come from the ground up, but not always.
“I think a lot of this development comes from the platforms used to either host or analyze the data. And that’s why we’re building, for example, confidential computing into the Azure platform,” says Sigall.
“But I also think researchers, developers and businesspeople will create their own applications and ways of working with data that could also be made available for others to develop on.”
Data sharing will put too much power in the hands of a small number of very large tech businesses
If you open your data, you may fear it’s going to be hoovered up by one of the very large tech companies currently dominating the data economy. But it could just as easily be argued that this imagined scenario is already too close to becoming reality – that a handful of large companies are the only ones who can offer effective data analysis services to the rest of the market.
“What we want to do is allow everyone – small businesses, large businesses, individuals – to be able to access the data they need, and then draw insights from it,” says Sigall.
“And that’s why we should work to help address things like security, privacy and technology, so everyone is able to access and analyze data in an effective way.
“If you’re more open with your data and therefore increase the opportunity for people to use it, you’ll help generate more independent development techniques and technologies. That makes sure that the openness in the data leads to benefits for everyone as opposed to the few.
“Open data works in a lot of cases, probably in more cases than you might think,” he concludes. “But there are cases where it’s not going to work – these choices don’t need to be made in a binary way. The key is to find the right way to open your data to get the value for you and for the others that you want to collaborate with.”