Computational Linguistics & Machine Translation Tools in India

in Article
Published on: 13 June 2018

Pranjal Koranne

Pranjal graduated with an Integrated MA in English Studies from IIT Madras and his research interests include Linguistics and AI. In his free time, he reads a lot and dreams of writing novels, and building games and apps.

Here is a short list of departments, research centres or cells from institutes in India that work in Computational Linguistics or Natural Language Processing for Indian Languages and Sanskrit. These institutes work in the broad array of areas discussed before, from creating resources to teach Sanskrit and other Indian languages, to projects that create Machine Translation tools.

 

      1. Language Technologies Research Center, IIIT Hyderabad

Anusaaraka is a Machine Translation tool that uses insights from traditional Indian knowledge of language along with modern technologies to make a machine translation tool. This project is undertaken by the LTRC, IIT Hyderabad in collaboration with institutes like Chinmaya International Foundation and University of Hyderabad (Department of Sanskrit Studies). LTRC also runs other research projects  like building corpora, building lexicons, and building speech processing tools.

 

 

      2. Center for Indian Language Technology, IIT Bombay

CFILT has undertaken the project of building resources for Indian languages like Hindi and Marathi in the internationally accepted interlingua called Universal Natural Language. CFILT has also undertaken the project of building WordNet—an international standard for online thesaurus and lexicon—or Indian languages. UNL and WordNet are both useful resources for machine translation projects. Obviously, IIT Bombay has many other projects in the field of NLP for Sanskrit and other Indian languages.

 

 

      3. Computational Linguistics R&D at the School of Sanskrit and Indic Studies, JNU

This R&D centre at JNU has produced multiple tools like the Multilingual Online Amarkosh, an online thesaurus for English, Sanskrit, Hindi, and many other Indian languages. It is also a leading participant in the Indian Language Corpora Initiative which has developed annotated corpora for many Indian languages using Indian standards. The research centre undertakes many projects, from language processing tools like morphological analyzers and POS taggers, and projects on machine translation, to building online and searchable versions of ancient Indian texts. 

 

 

      4. Center for Development of Advanced Computing (C-DAC)

C-DAC has undertaken an important project in Machine Assisted Translation from English to Hindi in the specific domain of administrative documents like government notification, office orders, etc. Other projects in natural language processing, from building word processors in Indian languages to creating language teaching resources, have also been done in C-DAC.

 

 

      5. Anna University’s K. B. Chandra’s Research Centre, Chennai

This centre represents a collaboration of Anna University and KBC Research Foundation Pvt. Ltd. They work on the development of the machine-aided translation system, Anusaraka, for Tamil. This centre also has projects like Word Sense Disambiguation in Tamil, Biological Named Entity Recognition for Tamil, WordNet for Tamil, etc.   

 

      6. University of Hyderabad

They have a developed a machine-aided translation system for English texts to Kannada using Universal Clause Structure Grammar. They were also involved in development of the machine-aided translation system, Anusaraka.

 

 

      7. IIT Kanpur

A machine-aided translation system for the specific domain of Public Health Campaigns called Anglabharti has been developed here. The system analyses sentences and converts them into an almost disambiguated intermediate structure which can then potentially be used for translation of English into any Indian language. IIT Kanpur is also a collaborative partner in the development of the Anusaraka system. They are also involved in developing other projects, lexical resources to spell-checkers.

 

 

      8. Jadavpur University

Jadavpur University has developed a rule-based machine-aided translation system for English to Hindi translation for ‘news sentences’.

 

 

      9. Utkal University

The Oriya Machine Translation System is a project undertaken by Utkal University which develops modules for Oriya, e.g., parser, morphological analyzer, grammar and spell checker, word processor, etc. Utkal University is also involved in building WordNet resources for Sanskrit and Oriya.   

 

 

      10. Technology Development for Indian Languages 

TDIL is a program by the Ministry of Electronics and Information Technology, Government of India that works in the field of NLP, machine translation and other related areas for Indian languages. It works with several institutes across India in research areas ranging from building OCR tools, or handwriting recognition tools to building MT systems.