A SwiftKey employee has made it his mission to upload obscure languages

According to the most recent census, there are more than 104 different languages in England and Wales.

The survey of 56.1 million people also revealed that you are most likely to hear Gujarati in Leicester, while Manchester is the place to go if you want to speak to someone in Cantonese and Mandarin; its Lithuanian in Boston, Lincolnshire; Punjabi in Slough, Berkshire; and Somali in Brent, north-west London.

One person in Barnet and one in Bexley said they spoke Caribbean creole, the Office for National Statistics revealed. Across England and Wales, 629 people speak Romany, 58 use Scottish Gaelic and 33 converse in Manx Gaelic. The 2011 Scottish census found 58,000 speakers of Scottish Gaelic and the Isle of Man survey revealed 1,662 of Manx.

Julien Baley, from SwiftKey
Julien Baley, from SwiftKey

The England and Wales census also proved that we love languages, with 4.2 million people (8%) speaking more than one. No company knows this more than SwiftKey.

The British mobile phone keyboard maker announced this week that it now supports more than 150 different languages, from the widely-used to the obscure. On Tuesday alone, SwiftKey added Friulian, Lingala, Fijian, Rwanda, Oromo, Tsonga, Tswana, Swazi, Venda, Sesotho, Hiligaynon and Southern Ndebele.

While SwiftKey’s love of dialects is common knowledge, what isn’t so widely known is that more than a third of those 150+ language keyboards were created by one person in his spare time. Despite the firm having a team of people to research, program and upload words and phrases to their app, which is used by millions of people across the world, Julien Baley made it his mission to do more.

“It’s a bit addictive. When you dip your toe in [a language], it grabs you entirely,” he said.

Dhivehi and Bashkir in SwiftKey
Dhivehi and Bashkir in SwiftKey

The 29-year-old Frenchman’s day job is a software engineer for SwiftKey’s analytics team; but when he’s not doing that, he’s uploading obscure languages into the app.

Baley speaks three languages fluently – English, French and Mandarin – but he has varying degrees of knowledge of German, Icelandic, Norwegian, Danish, Swedish, Italian, Spanish, Polish, Breton, Armenian, Greek, Hungarian, Taiwanese, Classical Chinese, Indonesian, Japanese, Korean, Vietnamese, Swahili, Scottish Gaelic, Yiddish and Yakut. With more than 50 SwiftKey languages to his name, which one is his favourite?

“I’m happy to have done Kurdish. There are two languages in Kurdistan and both are spoken in warzones. One part of the region is in Turkey and one part is in Syria and Iraq. The people can’t even go to a school that uses their language, so I’m happy they have a keyboard to at least write it now.”

Other highlights include the soon-to-be-released Amharic, which is spoken by more than 36 million people in Ethiopia. Despite being the second-most widely-used Semitic language in the world after Arabic, it was a tough test for Baley: “The writing system is a pain to process, but most of the difficulties were to do with finding resources, and finding people to test the language in the app.”

Reaching their language milestone this week is a sign of just how far SwiftKey as a company has come since it was founded by Cambridge University graduates Jon Reynolds and Ben Medlock in 2008. The SwiftKey app is now used on 300 million devices, and last year it was bought by Microsoft. Since then it has released a new system based on neural networks, a type of artificial intelligence that makes very accurate word predictions as people type messages on their phones.

Baley has been at SwiftKey for more than half its existence, having joined in 2011. Prior to that he studied computer engineering at Shanghai University and the Université de Technologie de Compiègne, and speech and language processing at the University of Edinburgh.

Once at SwiftKey he started adding languages that he knew, spurred on by a love of foreign words that he has had for as long as he can remember.

“It’s always been there. When I started learning English, I thought: ‘This is something I can do quite easily’, and then: ‘Oh, this one is interesting’. The more you learn, the more you feel like: ‘Yeah, I can do that’. It’s a bit like people who learn a musical instrument. Some people keep playing the guitar all their life, some people explore a range of things. Once you’ve learned 15 instruments … it gets easier and easier.”

N'Ko and Uyghur in SwiftKey
N’Ko and Uyghur in SwiftKey

Sticking with the musical metaphor, Baley’s knowledge of languages is now about the size of an orchestra, but his initial experiments at SwiftKey were more like the triangle.

“There were languages that I studied or spoke that we didn’t have yet, so I took it upon myself to just do it. Once I had done that, I realised there were other languages that I did not study or speak but people were interested in it. We receive lots of requests for languages that we’ve never heard of but are spoken by tens of millions of people. I decided to do it on a more regular basis.”

The first language Baley tackled took four months – from deciding to do it through to releasing it to the world. He’s now got the process running so smoothly that “in two weeks I can have finished 10”.

“If the language is nothing specific, I just provide the models that will make the prediction for that language. Then it’s about having the product people go through and validate it.”

After exhausting his personal knowledge of languages, Baley went to SwiftKey’s online forum, where people across the world were asking the company to create a keyboard in their native tongue.

“We have this forum where people can make requests,” Baley explained. “I have contacted a lot of these people because they are prime candidates to help us and test the product. If you reply to them, even if it’s been two years since they asked, they are like: ‘Oh wow, I never believed it would happen’. Then, once you have a keyboard you can make them test it.

“When I started we didn’t have languages like Pashto, which has 60 million speakers in Afghanistan and Pakistan, or Sindhi, which is spoken by 80 million – that’s more than the population of the UK!”

Swiftkey

However, agreeing to build a language keyboard after a few people request it is one thing, building it to a level whereby millions of users can communicate with each other easily and fluently is quite another. Languages obviously vary massively in complexity. The Foreign Service Institute, the US government’s training centre for diplomats serving overseas claims the most difficult languages for English speakers to learn are Arabic, Cantonese (Chinese), Mandarin (Chinese), Japanese and Korean.

Baley needs at least 5,000 words in a language to be able to build a keyboard for it, using online news reports and other content to get things started before users take over and build up the vocabulary.

“In some languages there are more word forms just because of the simple fact that the grammar is more complex. Some have millions and millions of word forms; some of them are theoretical and it doesn’t mean a person has actually said that thing. But because SwiftKey learns from what users type, if we provide a basic keyboard for them, within a couple of days it has improved enough that it’s actually great for them.”

The feedback is always positive because “there is a real sense of pride when you make their language”. In fact, some have been so grateful that Baley has taken the time to help them that they are now friends.

“We chat from time to time, and it’s important to keep up a relationship because at some point there will be a new version and they can help test it.”

That army of testers is building a strong foundation for future SwiftKey keyboards. On a personal level, Baley has completed all the languages he set out to do, but is always open to being surprised, as was the case with Esperanto.

“I thought there would be so many people complaining: ‘Why are you doing Esperanto? Nobody speaks it’. But it’s one of the most popular ones I’ve made. Esperanto has an amazing online community and people are really excited about it. It was the most requested language that we hadn’t done yet.”

So, if there are any speakers of little-known languages who want to stay in touch with their friends, you know who to contact.