Jonathan Foster and Deborah Harrison are the two most unlikely people you’d expected to meet at Microsoft.
The duo and their team are responsible for leading the creative direction for one of the most important technological shifts of the decade – voice-based personal assistants. They are the people who not only defined Microsoft’s personal digital assistant, Cortana’s personality, but are now also looking at how such AI-powered assistants will become omnipresent and how they’d behave in the future.
Yet, neither of them has any formal background in technology or computer science.
Foster, who has a graduate degree in history and a masters in playwriting and screenwriting, has spent years writing for film, television, and theatre. In 2008, he landed a gig with the Office team for writing help articles and has also led the team behind the written UI for Xbox. Today he leads a team that focuses on personality for AI-powered conversational bots and assistants across Windows and Microsoft experiences.
Harrison, who works in Foster’s team, is an English major who worked in a bookstore, co-owned a coffee shop, and worked as a writing tutor before landing in the tech industry by accident two decades ago. “A friend told me they paid good money for writing,” she laughs. Like Foster, she too crafted UI for MSN Money and Windows Phone, before becoming the first author to write for Cortana’s personality.
We caught up with Foster and Harrison in Hyderabad, where we discussed the process of creating Cortana’s personality, the blurring of lines between humans-AI interactions and the ethical questions it raises, and their vision of the future of voice-based assistants. Here are some edited excerpts from our conversation.
Q: Not many people realize that there is a team of humans behind the responses that AI-powered personal assistants like Cortana. So tell us a bit about how did it start and what it takes to create a personality that people can relate to?
Deborah Harrison: For quite some time in the beginning, I was the only person who was on the writing team for Cortana. It was a pretty forward-thinking feature. There is no relationship between writing for a digital agent and writing for any other user interface except for the fact that it’s all words and I’m trying to create a connection. Initially, we were looking at straightforward strings because we started with some scenarios like setting an alarm or checking the calendar. But while writing those strings, we started thinking about what it would sound like and we realized that the agent should have a more concrete identity so that we could tell what to say when and under what circumstances it should sound apologetic versus more confident and so on.
I poked around and realized no one had defined principles for this yet, and I offered to write up a few ideas and compile what eventually became the foundation of Cortana as a personal assistant. I then wrote the principles and things like does it use feminine pronouns or neutral pronouns? The questions it would answer and not answer. We were clear from the beginning that Cortana would be very positive. We created Cortana as a loyal seasoned personal assistant and imbued its voice with certain qualities of confidence and patience. Cortana doesn’t think it’s human and it knows it isn’t a girl and it has a team of writers that’s writing for what it’s engineered to do.
Jonathan Foster: What fascinated me the most was it’s like raising a child–you do your best to keep them on their best behavior and ensure they have manners and dress well and study and you let them go. In Cortana’s case, what we couldn’t control what people’s interpretation of that is and all we could do was try to keep it on the rails we think are healthy for putting a product out into the world.
For example, if somebody wanted a relationship with Cortana which was flirtatious or in any way sexual, we designed a firm “no” response. Firstly, because it didn’t align with the value proposition. And secondly, it’s a slippery slope into creating an opportunity for bad behavior. So we drew lines like that – we create definition and when people will try to get a response around that definition, our bots just not going to go there.
One of the key Cortana principles that extends into our current work is really kind of a North Star: that the experience is always positive. And by that what we mean is not that Cortana is happy all the time and she’s not always optimistic. It’s just that people walk away feeling good from interacting with the product.
Q: There seems to be a race out there to make digital voice assistants sound as close to human as possible and some feel that we’ve already reached a stage where humans can’t distinguish whether they are interacting with another human or an AI-powered assistant. How do you feel about it?
Deborah Harrison: It’s critical to us and any product we work on that there’s never any ambiguity about whether you are talking to a person or a device. We don’t want there to be any confusion on that point and everything we build is transparent about the fact that you are not interacting with an actual person. This is something we think about a lot and it’s at the top of our mind when we are authoring for any personality agent.
Jonathan Foster: It’s up to us really to determine that line and that’s why we have to keep our eyes on the fact that we are touching human lives. Because it is true, like when people say they’re depressed or sad to a system like Cortana they’re invoking a need and they want a response. It is our imperative to push our systems so they are interacting with individuals more on human terms.
Our legacy is historically engineer and code driven experiences and they feel like that, but we’re creeping closer and closer to friendlier experiences that people are more comfortable with. We want them to feel the voice is more familiar and it is what they want. They want to be able to interact in their own terms and more and more they are able to do that. But we have had a clearly written out articulation about transparency that people can intuitively tell that they are always talking to a digital agent.
I would always say you know I have a dog and I might talk to my dog. I never do that thinking it’s an actual human. And I know it’s only capable of giving me the response that I want out of my emotional needs with my dog but still I appreciate that.
So we just don’t believe in Turing Test, the idea that AI intelligence will be achieved when humans can’t distinguish whether they are interacting with another human or a machine.
Q: On the flip side, when people know they are not interacting with an AI agent but can get the same emotional gratification as interacting with a human. How do you prepare for that?
Jonathan Foster: We have pretty firm boundaries there. People are going to say some things they wouldn’t say to another human being because you’re going to be judged. Let’s say somebody is using abusive language with their device. Microsoft can’t ever be in the business of telling people what they can or can’t say. However, it’s incumbent upon us to build devices that don’t perpetuate bad behavior. So we put hard boundaries and we basically just tell people there’s nothing there.
Deborah Harrison: We craft language where we’re careful not to make it sound shaming or judgmental. While in other areas we might add variation, you know for the sake of making it more engaging over time, in these cases we have one answer because no matter how many times you say it you are still going to get the same answer that is clearly not engaging.
Q: We have AI-powered assistants that are becoming more engaging because sound like humans and are getting better at responding like humans. As creators of these personalities, are you afraid that they might get addictive and the societal impact it can have?
Deborah Harrison: I think it is arguable that humanness is the thing that drives engagement the most. It is more likely related to what the person is in position to accomplish with these agents. If the purpose is simply an emotional connect with an entity, then yes. These agents are being developed for a purpose or set of purposes and what people really look for is how well they are able to complete those tasks. What we are chasing is the ability for people to say what they want to say and get a result in natural language.
A lot of our job is figuring out how to get natural language out that makes people confident that their natural language is going to get the task done. That people can get to the point where they don’t have to have a mental model beyond how they talk to other people in order to get their work done.
Jonathan Foster: We don’t want to get into a situation where we’re creating life-like interaction models that are addictive. Tech can move in that direction when you are so excited about the potential of what you can build that you’re not thinking about its impact. Thankfully, Microsoft has been a leader in ethics all up and we are a mature company that can pause and think about this stuff.