On development of an Open Source Machine Translation System for English to Indian Languages.

Written by Jaganadh Gopinadhan

As a continuation of discussions held in fsug-tvm and smc-discuss mailing I am writing the post. At the end of discussions I promised that I will prepare a note on how to start working in English to Indian Language Machine Translation(MT). I am not going to the history or theory of Machine Translation in this blog post.

There is a Free and Open Source Machine Translation system for English to Bengali translation called “Anubadok”. It is available for download as well as online use. The system id developed by Golam Mortuza Hossain. It is a small system with handful of rules and dictionary and other resources. I don’t know to read Bengala so I am not able to judge the capability of this system. Any how this system can be adopted for developing English to other Indian Language Machine Translation system.

The system is developed in Perl programming language. The developer of this system is a Post Doctoral Fellow in University of New Brunswick, Canada. A good documentation is available for this system. The project is hosted in the Sourceforgent.net as well as in Bengalinux site.

Let’s see how to adopt this system for other Indian languages.

First go through the documentation of the system. The current version can be downloaded from svn with the following command. svn checkout https://anubadok.svn.sourceforge.net/svnroot/anubadok/trunk anubadok

The go to the /anubadok/data/ directory. There you can find bdict.db file. Create a backup copy of the file. Now you can replace the Bengali words with your language words and we can run it. So we will get a feel how this will working our language. While writing this post I replaced some two words and tested. It is working. Once our dictionary work is over we can run our system.(Dont hink that it will give extact result. Because we are replacing meaning only. We have to add riles for generating our language sentences. I will write on this topic after some times.) For the initial testing we can use the sentences available in /anubadok/TestSuites/ directory of the system. First let’s workout this much thins and see what will be the output. Based on our result I think we can proceed.

You can see the performance status of this system given in a blog. Status Table (Version: Anubadok-0.2.0 )

                   Declar.     Imper.     Interro.     Exclam. Simple             W                   W           W                M Compound     M                 M                 M            M Complex          N                  N               N           N Compound      N                    N            N              N – Complex

W: Well implemented M: Moderately implemented N: Not/Not-well implemented

Migrated from my old blog jaganadhg.freeflux.net

Written on August 29, 2009

The Opinions Expressed In This Post Are My Own And Not Necessarily Those Of My Employer.

[ Machine Translation Natural Language Processing ]