DSpace About DSpace Software
 

Dyuthi @ CUSAT >
e-SCHOLARSHIP >
Computer Science >
Faculty >
Dr.Santhosh Kumar G >

Please use this identifier to cite or link to this item: http://purl.org/purl/4187

Title: Techniques to Improve the word alignments in Statistical Machine Translation from English to Malayalam
Authors: Santhosh Kumar, G
Mary, Priya Sebastian
Sheena Kurian, K
Keywords: alignment
training
machine translation
English Malayalam translation
Issue Date: 2010
Abstract: In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam translation using statistical models like translation model, language model and a decoder. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set up among the sentence pairs of the source and target language before subjecting them for training. This paper is deals with the techniques which can be adopted for improving the alignment model of SMT. Incorporating the parts of speech information into the bilingual corpus has eliminated many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
URI: http://dyuthi.cusat.ac.in/purl/4187
Appears in Collections:Dr.Santhosh Kumar G

Files in This Item:

File Description SizeFormat
Techniques to Improve the word alignments in Statistical Machine Translation from English to Malayalam.pdfpdf359.89 kBAdobe PDFView/Open
View Statistics

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback