Handling OOV Words in Phrase-Based Statistical Machine Translation for Malayalam

Dyuthi/Manakin Repository

Handling OOV Words in Phrase-Based Statistical Machine Translation for Malayalam

Show simple item record

dc.contributor.author Santhosh Kumar, G
dc.contributor.author Mary, Priya Sebastian
dc.date.accessioned 2014-07-21T05:00:52Z
dc.date.available 2014-07-21T05:00:52Z
dc.date.issued 2013-02-09
dc.identifier.uri http://dyuthi.cusat.ac.in/purl/4157
dc.description.abstract Statistical Machine Translation (SMT) is one of the potential applications in the field of Natural Language Processing. The translation process in SMT is carried out by acquiring translation rules automatically from the parallel corpora. However, for many language pairs (e.g. Malayalam- English), they are available only in very limited quantities. Therefore, for these language pairs a huge portion of phrases encountered at run-time will be unknown. This paper focuses on methods for handling such out-of-vocabulary (OOV) words in Malayalam that cannot be translated to English using conventional phrase-based statistical machine translation systems. The OOV words in the source sentence are pre-processed to obtain the root word and its suffix. Different inflected forms of the OOV root are generated and a match is looked up for the word variants in the phrase translation table of the translation model. A Vocabulary filter is used to choose the best among the translations of these word variants by finding the unigram count. A match for the OOV suffix is also looked up in the phrase entries and the target translations are filtered out. Structuring of the filtered phrases is done and SMT translation model is extended by adding OOV with its new phrase translations. By the results of the manual evaluation done it is observed that amount of OOV words in the input has been reduced considerably en_US
dc.description.sponsorship Cochin University of Science and Technology en_US
dc.language.iso en en_US
dc.subject SMT en_US
dc.subject OOV words en_US
dc.subject out-of-vocabulary en_US
dc.subject unknown words en_US
dc.subject phrase translation en_US
dc.subject Machine Translation en_US
dc.subject Malayalam Translation en_US
dc.title Handling OOV Words in Phrase-Based Statistical Machine Translation for Malayalam en_US
dc.type Article en_US


Files in this item

Files Size Format View Description
Handling OOV Wo ... nslation for Malayalam.pdf 255.2Kb PDF View/Open pdf

This item appears in the following Collection(s)

Show simple item record

Search Dyuthi


Advanced Search

Browse

My Account