Dummy-heads and ear simulators: the audio team at Microsoft Development Center Estonia uses tools many haven’t heard of

Have you ever been in a video meeting and heard some background noise, an annoying echo, or fallen into a black hole where you can’t hear anything? Or it’s your Bluetooth headphones that won’t connect to your computer? Ergo Esken, an audio engineer at Microsoft Development Center Estonia, can help you find solutions to these and many other problems when using Microsoft Teams.

Ergo’s job is to develop technical requirements for audio devices. To do this, he must test everything from headphones and microphones to soundbars and videophones, and make sure they meet the requirements of Microsoft Teams. Predicting use cases for different devices requires a creative approach and the use of special tools, but Ergo’s work is making both Teams and Skype more user-friendly.

Your team’s focus is primarily on Teams, but to what extent are they continuing to improve Skype’s audio quality in Estonia?

Over the last few years, Teams has really become our focus. The user interfaces of Skype and Teams look different, but the technologies used to make the calls, the operation of audio and video equipment, noise balancing and echo cancellation work on the same code. Much of what we do is useful for both services. Now we’ve added Azure Communication Services, a development platform for audio-video calls, on which Microsoft partners can build their own client solution. Its ‘cogs’ are also like those of Teams and Skype’s web client technology.

What kind of problems does the audio certification team deal with?

Broadly speaking, we are involved in the development of tests for equipment built and optimized for Teams. Upon successful completion of these tests, the devices become certified for Teams. For headsets, for example, we have issued the Premium Microphone for Open Office certification. When we test these headsets in the lab, we don’t just test how the voice comes through the microphone in a quiet environment, we use a mixer that creates the effect of having another person constantly talking nearby. Headphones with this certification strongly suppress this chatter by also using active noise cancellation in the microphone signal path. Such devices are useful, for example, in an open office where someone is sitting opposite or next to you. We are currently solving the problems with Bluetooth and working on making the Bluetooth functionality built into the device more widely available. To date, for devices certified for Teams, there is a requirement that all Bluetooth headsets must come with a Bluetooth USB dongle, as the Bluetooth chipsets built into laptops are often of vastly different quality. It either doesn’t work with one headset or another or doesn’t pair with the devices at all. People tend to blame headphones or microphones for the poor sound quality, but the problem may be the Bluetooth chip built into the computer, which does not support the necessary audio codec. We want to make sure that at least all Windows laptops work without problems when pairing a Bluetooth headset with a PC without a dongle. In the world of phones, things are already pretty good in the audio department, but there are still problems with laptops.

You’re using tools that many people have never heard of. One of these tools is a dummy-like head and ear simulator. What do you use such devices for and what do they do?

The head and ear simulator allows repeatable testing, which is important in the world of audio testing and certification. We can prepare some of the specifications in Tallinn, but there are also labs in Florida, Taiwan, and other parts of the world with testing equipment for Teams. For example, when we test a headset, the results from all three labs must be identical. A head and ear simulator ensures that the tests are repeatable and that everyone gets the same results. The head and ear simulator has an opening above the mouth and a built-in speaker. It can be made to speak in any language, male or female. The aim is to simulate real life, so the simulator sounds like a real person. The dummy’s ears are fitted with microphones, which means that it can hear and record speech coming through the headset under test. This makes it possible to measure the sensitivity of both the microphone and the earpiece, the signal-to-noise ratio, the frequency response, and many other characteristics that indicate the quality of the headset pair and whether the device meets the requirements for Teams certification.

If the mannequin is designed to simulate real life, why not use a real person to test it?

Depending on mood or health, for example, a real person will never speak identically for two days in a row. If my nose is stuffed up or my voice is off, the frequency spectrum of my speech is completely different. The aim of testing is to get reproducible results, and that is not possible with humans.

In addition to the head and ear simulator, you also have a sound-proof room in the office. What is that used for?

In the anechoic chamber, you can isolate the device under test from the world around it. For example, if I want to find out the noise level of a microphone itself in decibels, I can’t test it in a normal everyday environment with people and chatter around. It all gets into the microphone and spoils the measurement. In a noise-free room, it’s quiet enough to pick up the noise level of the device under test. Any microphone will make noise, but in an anechoic chamber you can be sure that the room is quieter than the microphone anyway. Acoustic standing waves will occur in any normal physical room. For example, there may be an amplification in the frequency response at 400 hertz and a hole at 500 hertz. If you change the position of the device in the room, these frequencies shift. This is not a problem in a sound-proof room, where the sound is completely absorbed into the wall rather than reflected. This makes it possible to measure what the device itself is doing without the influence of the room and is again important for repeatability. If one of our partners develops a new device in their lab and gets some results and sends it to us for certification, we could get the same results. Without an anechoic chamber, this will not happen because the environmental impact on the measurement is too big.

What have been some of the most difficult problems for your team to solve?

One of the things that makes our job challenging is the variety of equipment and use cases, and the development of test instructions for each use case. For example, in meeting rooms, equipment is used in quite different conditions. For a small meeting room, you need a camera with a wide field of view to capture all 4-5 people in the room. As you move into a larger room, the conditions change, people sit closer together and further apart, so the field of view of the camera may be narrower. Also, in a larger meeting room, a single speakerphone may no longer cover the whole room, so multiple speakerphones or microphones should be used. Our test specifications try to cover all these cases with different test scenarios. We need to find many potential use cases and test whether the equipment works in these situations. For example, during the spike of people moving into home offices there was a huge boom in using USB microphones, which meant that we also started to certify them. We had to come up with scenarios on how to test them in the lab in a way that all labs acted in the same way and the tests correlated with real life use cases? A similar issue is for example with soundbars that go on top of or underneath a monitor. A couple of years ago, such things did not exist, but during the lockdown this solution became popular, which meant that it had to be tested.

What are the most important innovations and changes coming to Microsoft services and the wider audio world in the future?

More technology will become wireless in the future. As an audio engineer, I don’t really like this, as the quality is always better with wired. If you take the network cable out of the back of the computer and replace it with Wi-Fi, the quality of the connection will be worse. The same logic applies to audio. But you can’t argue with convenience. Headphones are starting to lose the stick microphones that reach close to the mouth. It looks nicer with just headphones on. But if you move the microphone from the mouth to the ears, and then into the flaps, it’s a mess in many ways. Even with powerful audio processing, more ambient noise is introduced into the valves, which is suppressed by digital processing, but still tends to degrade the quality of the speaker’s own speech. A Bluetooth Low Energy Audio specification has recently been released that can significantly improve the quality of calls made using Bluetooth valves. For a long time, headsets with a USB connection have had significantly better speech quality than Bluetooth headsets, especially in terms of microphone signal. This may change in a few years. Bluetooth LE will standardize a new audio codec that will allow calls to be transmitted over a wider audio band and in better quality. Another trend is machine learning, which is starting to be introduced in all kinds of devices. For example, cameras can record a wide-angle image, but the technology will learn to zoom in at the right time and focus on a specific person. In the world of audio, machine learning is increasingly being used for noise cancellation. For example, noise cancellation is already built into some devices, including Teams, which eliminates the sound of a keyboard clicking or eating crisps. You can crunch a packet of crisps during a call, but none of it will reach the other party.

The next step from this will be to personalize the digital signal processing of devices. You teach the device or software and its machine learning to recognize your voice, and after that, only your speech will be transmitted through the call. For example, if you’re on a call from your home office and your spouse starts talking next to you, his or her conversation won’t be overheard by the others on the call. It will be interesting to see how the teaching process will turn out and how friendly the technology will be to the less common languages like Estonian.

Source: Kajavaba ruum ning pea ja kõrva simulaator: Microsofti Eesti arenduskeskuse audiotiim kasutab töövahendeid, millest paljud ei ole kuulnudki – DigiPRO (geenius.ee)

Product development