Beginning today, we can explore the linguistic diversity of the Indian sub-continent with Google Translate, which now supports five new experimental alpha languages: Bengali, Gujarati, Kannada, Tamil and Telugu. In India and Bangladesh alone, more than 500 million people speak these five languages. Since 2009, Google have launched a total of 11 alpha languages, bringing the current number of languages supported by Google Translate to 63.
Indic languages differ from English in many ways, presenting several exciting challenges when developing their respective translation systems. Indian languages often use the Subject Object Verb (SOV) ordering to form sentences, unlike English, which uses Subject Verb Object (SVO) ordering. This difference in sentence structure makes it harder to produce fluent translations; the more words that need to be reordered, the more chance there is to make mistakes when moving them. Tamil, Telugu and Kannada are also highly agglutinative, meaning a single word often includes affixes that represent additional meaning, like tense or number. Fortunately, Google's research to improve Japanese (an SOV language) translation helped them with the word order challenge, while their work translating languages like German, Turkish and Russian provided insight into the agglutination problem.
I hope that the launch of these new alpha languages will help you better understand the Indic web and encourage the publication of new content in Indic languages, taking five alpha steps closer to a web without language barriers.
*Download the fonts for each language: Tamil, Telugu, Bengali, Gujarati and Kannada.
Indic languages differ from English in many ways, presenting several exciting challenges when developing their respective translation systems. Indian languages often use the Subject Object Verb (SOV) ordering to form sentences, unlike English, which uses Subject Verb Object (SVO) ordering. This difference in sentence structure makes it harder to produce fluent translations; the more words that need to be reordered, the more chance there is to make mistakes when moving them. Tamil, Telugu and Kannada are also highly agglutinative, meaning a single word often includes affixes that represent additional meaning, like tense or number. Fortunately, Google's research to improve Japanese (an SOV language) translation helped them with the word order challenge, while their work translating languages like German, Turkish and Russian provided insight into the agglutination problem.
You can expect translations for these new alpha languages to be less fluent and include many more untranslated words than some of the more mature languages—like Spanish or Chinese—which have much more of the web content that powers our statistical machine translation approach. Despite these challenges, Google release alpha languages when they believe that they help people better access the multilingual web. If you notice incorrect or missing translations for any of the languages, please correct them; Google enjoys learning from mistakes and your feedback helps them graduate new languages from alpha status. If you’re a translator, you’ll also be able to take advantage of their machine translated output when using the Google Translator Toolkit.
Since these languages each have their own unique scripts, we’ve enabled a transliterated input method for those of you without Indian language keyboards. For example, if you type in the word “nandri,” it will generate the Tamil word நன்றி (see what it means). To see all these beautiful scripts in action, you’ll need to install fonts* for each language.I hope that the launch of these new alpha languages will help you better understand the Indic web and encourage the publication of new content in Indic languages, taking five alpha steps closer to a web without language barriers.
*Download the fonts for each language: Tamil, Telugu, Bengali, Gujarati and Kannada.
0 comments:
Post a Comment