DSpace About DSpace Software
 

Dyuthi @ CUSAT >
e-SCHOLARSHIP >
Computer Science >
Faculty >
Dr.Santhosh Kumar G >

Please use this identifier to cite or link to this item: http://purl.org/purl/4157

Title: Handling OOV Words in Phrase-Based Statistical Machine Translation for Malayalam
Authors: Santhosh Kumar, G
Mary, Priya Sebastian
Keywords: SMT
OOV words
out-of-vocabulary
unknown words
phrase translation
Machine Translation
Malayalam Translation
Issue Date: 9-Feb-2013
Abstract: Statistical Machine Translation (SMT) is one of the potential applications in the field of Natural Language Processing. The translation process in SMT is carried out by acquiring translation rules automatically from the parallel corpora. However, for many language pairs (e.g. Malayalam- English), they are available only in very limited quantities. Therefore, for these language pairs a huge portion of phrases encountered at run-time will be unknown. This paper focuses on methods for handling such out-of-vocabulary (OOV) words in Malayalam that cannot be translated to English using conventional phrase-based statistical machine translation systems. The OOV words in the source sentence are pre-processed to obtain the root word and its suffix. Different inflected forms of the OOV root are generated and a match is looked up for the word variants in the phrase translation table of the translation model. A Vocabulary filter is used to choose the best among the translations of these word variants by finding the unigram count. A match for the OOV suffix is also looked up in the phrase entries and the target translations are filtered out. Structuring of the filtered phrases is done and SMT translation model is extended by adding OOV with its new phrase translations. By the results of the manual evaluation done it is observed that amount of OOV words in the input has been reduced considerably
URI: http://dyuthi.cusat.ac.in/purl/4157
Appears in Collections:Dr.Santhosh Kumar G

Files in This Item:

File Description SizeFormat
Handling OOV Words in Phrase-Based Statistical Machine Translation for Malayalam.pdfpdf255.24 kBAdobe PDFView/Open
View Statistics

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback