Title:
|
Extension schemes for the Alignment Model of English-Malayalam Statistical Machine Translator |
Author:
|
Santhosh Kumar, G; Mary, Priya Sebastian; Sheena Kurian, K
|
Abstract:
|
In Statistical Machine Translation from English
to Malayalam, an unseen English sentence is translated
into its equivalent Malayalam sentence using statistical
models. A parallel corpus of English-Malayalam is used in
the training phase. Word to word alignments has to be set
among the sentence pairs of the source and target
language before subjecting them for training. This paper
deals with certain techniques which can be adopted for
improving the alignment model of SMT. Methods to
incorporate the parts of speech information into the
bilingual corpus has resulted in eliminating many of the
insignificant alignments. Also identifying the name entities
and cognates present in the sentence pairs has proved to
be advantageous while setting up the alignments. Presence
of Malayalam words with predictable translations has also
contributed in reducing the insignificant alignments.
Moreover, reduction of the unwanted alignments has
brought in better training results. Experiments conducted
on a sample corpus have generated reasonably good
Malayalam translations and the results are verified with F
measure, BLEU and WER evaluation metrics. |
Description:
|
2012 International Conference on Advances in Computing and Communications |
URI:
|
http://dyuthi.cusat.ac.in/purl/4160
|
Date:
|
2012 |