Maps data and computer vision

Maps data and computer vision
Nikola Trifunović, Senior Software Engineer,
Maps team, Microsoft Development Center Serbia

Data is the cornerstone of great geospatial products such as maps or routing, and it is the key ingredient of creating the most-complete and most-up-to-date map service. The fundamental challenge is how to efficiently maintain such data on a world-wide scale. The most reliable approach is to employ map editors to keep the data fresh, but it is very hard to keep up with its amount and change frequency. In order to overcome this obstacle, we use computer vision and machine learning to automatically extract specific map features using imagery from satellites, planes, cars, and other sources. I would like to share the progress and accomplishments our team has achieved in detecting buildings from aerial imagery.

Achievements in building extraction

So far, we have extracted around 170 million buildings in the United States, Canada, Australia, Tanzania and Uganda. These regions are selected to align with core Bing priorities, such as improving geocoding and map rendering services on Bing Maps. Our work in Africa is the result of partnering with Microsoft Philanthropies and Humanitarian OpenStreetMap team to provide the essential building data for efficient disaster response. We are also a big supporter and contributor of Open Data and Daylight Map Distribution. All detected building footprints are shared on our public GitHub pages linked below, under Open Data Commons Open Database License (ODbL). I kindly suggest to check out the interesting NYTimes article, “A map of every building in America”, that used our data to analyze US settlements with beautiful visuals.

Goals and challenges

Our goal is to extract buildings worldwide at a regular cadence, and there are several challenges associated with this goal. The most complex challenge is creating deep neural network (DNN) models that can adapt to an extreme variety of earth landscape imagery. Some notable dimensions of this variety are:

  • Usage of different cameras with different properties – e.g., resolution, color balance etc.
  • Variety of natural landscapes. Terrains in US and Uganda are very different.
  • Variety of building architectures and urban planning worldwide

When creating our training sets, we try our best to transfer this variety to our training sets by creating representative large samples. This is particularly important because DNNs do not generalize well in diverse environments. At the moment, we are working with more than 10 million buildings labels including true negative regions such as glaciers, deserts, forests, water bodies etc. Still, it was not possible to train a single model that would work great for the whole world. Training specific models for each region and imagery source resulted in a much higher extraction quality. But the problem with this approach is that it is very hard to scale such development process, knowing the aerial imagery is updated regularly with potentially different imagery properties.

Our training solution

The following solution worked great for us when dealing with the aforementioned challenges.

First, we train a robust model with heavy data augmentations, using all training data available to create a model that can work accurately under a wide range of circumstances. We can say this model is jack of all trades and master of none, meaning there is an opportunity for additional gains in specific regions.
As the next step, we use transfer learning to fine-tune the model for specific target regions, or imagery sources, referred to as target domains. The standard way to do this is to use supervised training with target domain labels. Unfortunately, there is a limit to how far the supervised learning can go in terms of scale. Practically speaking, it is impossible to label everything in the world, especially when costly pixel precise image annotations are required. Therefore, we heavily leverage recent research discoveries in fields of unsupervised and semi-supervised learning. These techniques can help train models without additional labels, requiring only samples of available target domain imagery. Techniques like style transfer and output entropy minimization resulted in exceptional performance in our Australia extraction output. Our experiments showed that these techniques complement supervised training in cases where there is enough additional data. We believe that we reached a state where it is possible to automate the fine-tuning process and produce models for world-wide extraction in a scalable manner.

Finally, the whole field is a very hot research topic, and it is evolving rapidly, therefore it is of paramount importance to stay up-to-date and be adaptable for future discoveries.

Model inference

Once the models are trained and evaluated, another challenge is applying the models on terabytes of imagery in the most efficient way. Fortunately, Microsoft provides multiple cloud services with GPU enabled VMs suitable for this task, specifically Azure Machine Learning and Azure Batch that we utilize.

Final thoughts

Computer vision field and ever-increasing amount of available imagery will eventually become one of the key information sources for map creation in the future. The results so far are great and will be even better as technology advances. But we are still far from perfect results, which is important in some critical navigation scenarios. The focus of mappers around the world is slowly switching from map feature creation to verification and curation of data produced by machines following instructions of solutions similar to what we have described here.

For more information about open positions at Maps team click here.


GitHub pages linking building footprints data:
microsoft/AustraliaBuildingFootprints (
microsoft/Uganda-Tanzania-Building-Footprints: Open dataset of machine extracted buildings in Uganda and Tanzania (
microsoft/CanadianBuildingFootprints: Computer generated building footprints for Canada (
microsoft/USBuildingFootprints: Computer generated building footprints for the United States (
New York Times article analyzing settlements in USA using our data:
A Map of Every Building in America – The New York Times (
Our collaboration with Humanitarian OpenStreetMap Team article:
Bing Maps team contributes Uganda and Tanzania building data sets to OpenStreetMap | VentureBeat 


Related Posts