What do we mean by open data and data collaboration?

Seán FlemingApr 22, 2020

Open data is data that is published for anyone to use without restriction, which can often be the most effective way to enable collaboration between people and organizations. There are also data collaboratives and data trusts that are useful, especially where there may be privacy or security concerns that need to be taken into consideration. One of the guiding principles is to be as open as possible, understanding that, in some instances, effective sharing may include necessary limits.

Open data and data collaboration allow organizations to share and access data that then enables them to build products and find solutions that lead to social, economic or environmental benefits. Data needs to be stored in common formats that can be read and understood by different systems and, where necessary, covered by a license that permits its unrestricted reuse by others. One such example might be using aggregated data from a mobile phone operator to conduct an analysis of metropolitan population movements during rush-hour.

Knowing where people commute from and to, and which are the busiest times, allows for better city transit planning. If the metro stations are near the places where most people want to get on and off, the convenience will encourage more people to use public transport. Similarly, if the number of buses and trains matches the volume of people, you can score high in terms of customer convenience as well as efficiency for the service provider.

These large-scale projects are very visible uses of open data. But there are also advantages to organizations making data more accessible internally. Jule Sigall, Associate General Counsel, Intellectual Property Group at Microsoft Corporate External and Legal Affairs, explains: “It is also important for organizations to adopt new outlooks with respect to their own data. Data can sit within silos inside private businesses and public-sector bodies. Opening it up so that people in different teams and departments can collaborate on solutions can unleash value from that data that might otherwise never have been realized.”

But what about any potential downsides?

Careful planning and consideration will, of course, be of vital importance. Furthering the use and acceptance of open data will call for the right technical, legal and social frameworks to enable easier sharing. Uncertainties around any of the legal requirements of data sharing, a lack of user-friendly tools and a shortage of people with the skills to open, share and use data can delay the adoption of open data.

But countering the “new oil” narrative means explaining and demonstrating that data that is locked away is being prevented from offering up its maximum potential value. Data is like an idea: If someone helps you nurture and develop it, the idea can come to life. To paraphrase Thomas Jefferson, ideas are like a candle − one candle may light another while leaving the original flame undimmed.

This is an important reframing of the way in which organizations have typically valued their data. “Even when firms are notionally willing to share data, they can face a ‘prisoner’s dilemma,’ believing that a first-mover advantage in generating data means a first-mover disadvantage in sharing data if other market participants do not reciprocate,” says Sigall.

So why is open data so important?

Machine learning and AI systems need data in order to learn. Large amounts of digital data, such as images of medical scans, are fed into an AI tool so it can identify the signs of potential health problems.

AI will be one of the most transformational technologies of our time, thanks to its ability to identify complex patterns and empower people to make better decisions. From monitoring traffic flows around a busy city to life-or-death medical diagnoses, AI needs a large dataset to establish a baseline of normal, expected results before it can successfully identify variations that require further investigation.

The more data fed into a system, the more effective and reliable it is likely to become. Conversely, without access to enough usable data, an AI tool’s effectiveness will be compromised.

But if I’m open with my data, won’t it lose value?

It’s important to note that making the data freely available for others to work on – i.e. making it open – is not the same as making it free.

The spirit of openness with which this is associated first gained prominence in the 1980s and 1990s with the rise of the open source movement.

It’s an initiative that started with the GNU Project and the Free Software Movement, which believed software should be free of all license-based constraints. What that meant, in effect, was that people should be allowed to make their own copies of software, share them, take the code apart and use it as the basis for new applications, and so on.

Open source meant people sharing the building blocks of applications so they could collaborate on creating new solutions to the challenges of the day, and it has proven to be a very effective way to build software across many domains. Open data takes a similar view. Being open with data can help to bring together new perspectives on challenges and help facilitate brand new solutions to them.

For more on open data, visit Microsoft’s Open Data Campaign website.