“Sanskrit is the number one, most requested language at Google Translate, and we are finally adding it,” Isaac Caswell, senior software engineer, Google Research, told ET in an exclusive interview. “We are also adding the first languages from northeast India, which is another rather underrepresented place.”
Apart from Sanskrit, the other Indian languages in the latest iteration of Google Translate are Assamese, Bhojpuri, Dogri, Konkani, Maithili, Mizo and Meiteilon (Manipuri), taking the total number of Indian languages supported by the service to 19.
The announcement was made at the annual Google conference I/O that began late on Wednesday night.
The latest update does not cover all the 22 scheduled languages of India, as the company was hoping, but Caswell said, “We have significantly closed the gap for at least the scheduled languages.”
All the languages that have been added in the update will only be supported in the text translation feature but the company will be working on rolling out voice to text, camera mode and other features soon. “We are working on them, but they are not yet supported for all of these languages,” said Caswell.
Discover the stories of your interest
Google is also working to iron out glitches with regard to translations of Indian languages. “We have this impression that frequently translations that our models produce for Indian languages, when they make mistakes, are often archaic,” said Caswell.
Often the translations are words that people don’t know or don’t use on a regular basis, he said. “We are trying to understand (the problems) better, and hopefully get our model to shift towards more colloquial output rather than this old fashioned or stilted type of thing. But we know there are other issues as well that we are trying to get our fingers on more closely,” he said.
These are the first languages that have been added using the zero-shot machine translation, where a machine learning model only sees monolingual text, meaning it learns to translate into another language without ever seeing an example.
“While this technology is impressive, it isn’t perfect. And we’ll keep improving these models to deliver the same experience you’re used to with a Spanish or German translation, for example,” Caswell said in a blog post announcing the update.
The addition of the eight Indian languages is part of a larger update wherein 24 languages have been added to Google Translate, which now supports a total of 133 languages used around the globe.
More than 300 million people use the newly added languages – for instance, Mizo is spoken by about 800,000 people in northeast India, and Lingala is spoken by more than 45 million people across Central Africa. As part of the update, indigenous languages of the Americas (Quechua, Guarani and Aymara) and an English dialect (Sierra Leonean Krio) have also been added to Google Translate.