Browsing Faculty by Author "Mary, Priya Sebastian"

Dyuthi/Manakin Repository

Dyuthi Home →
e-SCHOLARSHIP →
Computer Science →
Faculty→
Browsing Faculty by Author

About Dyuthi | Login

Browsing Faculty by Author "Mary, Priya Sebastian"

Now showing items 1-8 of 8

Alignment Model and Training Technique in SMT from English to Malayalam

Santhosh Kumar, G; Sheena Kurian, K; Mary, Priya Sebastian (August 30, 2010)

[+]

Abstract:

This paper investigates certain methods of training adopted in the Statistical Machine Translator (SMT) from English to Malayalam. In English Malayalam SMT, the word to word translation is determined by training the parallel corpus. Our primary goal is to improve the alignment model by reducing the number of possible alignments of all sentence pairs present in the bilingual corpus. Incorporating morphological information into the parallel corpus with the help of the parts of speech tagger has brought around better training results with improved accuracy

URI:

http://dyuthi.cusat.ac.in/purl/4140

Files in this item: 1

Files	Size
Alignment Model ... m English to Malayalam.pdf	(388.8Kb)

A Classification of Sandhi Rules for Suffix Separation in Malayalam

Santhosh Kumar, G; Sheena Kurian, K; Mary, Priya Sebastian (Cochin University of Science And Technology, 2009)

[+]

Abstract:

Suffix separation plays a vital role in improving the quality of training in the Statistical Machine Translation from English into Malayalam. The morphological richness and the agglutinative nature of Malayalam make it necessary to retrieve the root word from its inflected form in the training process. The suffix separation process accomplishes this task by scrutinizing the Malayalam words and by applying sandhi rules. In this paper, various handcrafted rules designed for the suffix separation process in the English Malayalam SMT are presented. A classification of these rules is done based on the Malayalam syllable preceding the suffix in the inflected form of the word (check_letter). The suffixes beginning with the vowel sounds like ആല, ഉെെ, ഇല etc are mainly considered in this process. By examining the check_letter in a word, the suffix separation rules can be directly applied to extract the root words. The quick look up table provided in this paper can be used as a guideline in implementing suffix separation in Malayalam language

URI:

http://dyuthi.cusat.ac.in/purl/4185

Files in this item: 1

Files	Size
A Classificatio ... eparation in Malayalam.pdf	(420.0Kb)

English to Malayalam Translation: A Statistical Approach

Santhosh Kumar, G; Mary, Priya Sebastian; Sheena Kurian, K (ACM, September 16, 2010)

[+]

Abstract:	This paper underlines a methodology for translating text from English into the Dravidian language, Malayalam using statistical models. By using a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase, the machine automatically generates Malayalam translations of English sentences. This paper also discusses a technique to improve the alignment model by incorporating the parts of speech information into the bilingual corpus. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in training. Various handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. The structural difference between the English Malayalam pair is resolved in the decoder by applying the order conversion rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Description:	Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
URI:	http://dyuthi.cusat.ac.in/purl/4139

Files in this item: 1

Files	Size
English to Mala ... A Statistical Approach.pdf	(646.6Kb)

Extension schemes for the Alignment Model of English-Malayalam Statistical Machine Translator

Santhosh Kumar, G; Mary, Priya Sebastian; Sheena Kurian, K (IEEE, 2012)

[+]

Abstract:	In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam sentence using statistical models. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set among the sentence pairs of the source and target language before subjecting them for training. This paper deals with certain techniques which can be adopted for improving the alignment model of SMT. Methods to incorporate the parts of speech information into the bilingual corpus has resulted in eliminating many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Presence of Malayalam words with predictable translations has also contributed in reducing the insignificant alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.
Description:	2012 International Conference on Advances in Computing and Communications
URI:	http://dyuthi.cusat.ac.in/purl/4160

Files in this item: 1

Files	Size
Extension schem ... cal Machine Translator.pdf	(219.2Kb)

A framework for translating English text into Malayalam using statistical models

Santhosh Kumar, G; Mary, Priya Sebastian; Sheena Kurian, K (Elsevier, 2011)

[+]

Abstract:	A methodology for translating text from English into the Dravidian language, Malayalam using statistical models is discussed in this paper. The translator utilizes a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase and generates automatically the Malayalam translation of an unseen English sentence. Various techniques to improve the alignment model by incorporating the morphological inputs into the bilingual corpus are discussed. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in producing better alignments. Difficulties in translation process that arise due to the structural difference between the English Malayalam pair is resolved in the decoding phase by applying the order conversion rules. The handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Description:	Procedia Technology 00 (2011) 000–000,2nd International Conference on Communication, Computing & Security
URI:	http://dyuthi.cusat.ac.in/purl/4150

Files in this item: 1

Files	Size
A framework for ... ing statistical models.pdf	(592.6Kb)

A Framework of Statistical Machine Translator from English to Malayalam

Santhosh Kumar, G; Mary, Priya Sebastian; Sheena Kurian, K (2010)

[+]

Abstract:	In this paper we describe the methodology and the structural design of a system that translates English into Malayalam using statistical models. A monolingual Malayalam corpus and a bilingual English/Malayalam corpus are the main resource in building this Statistical Machine Translator. Training strategy adopted has been enhanced by PoS tagging which helps to get rid of the insignificant alignments. Moreover, incorporating units like suffix separator and the stop word eliminator has proven to be effective in bringing about better training results. In the decoder, order conversion rules are applied to reduce the structural difference between the language pair. The quality of statistical outcome of the decoder is further improved by applying mending rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Description:	Proceedings of Fourth International Conference on Information Processing, Bangalore, India
URI:	http://dyuthi.cusat.ac.in/purl/4138

Files in this item: 1

Files	Size
A Framework of ... m English to Malayalam.pdf	(363.2Kb)

Handling OOV Words in Phrase-Based Statistical Machine Translation for Malayalam

Santhosh Kumar, G; Mary, Priya Sebastian (February 9, 2013)

[+]

Abstract:

Statistical Machine Translation (SMT) is one of the potential applications in the field of Natural Language Processing. The translation process in SMT is carried out by acquiring translation rules automatically from the parallel corpora. However, for many language pairs (e.g. Malayalam- English), they are available only in very limited quantities. Therefore, for these language pairs a huge portion of phrases encountered at run-time will be unknown. This paper focuses on methods for handling such out-of-vocabulary (OOV) words in Malayalam that cannot be translated to English using conventional phrase-based statistical machine translation systems. The OOV words in the source sentence are pre-processed to obtain the root word and its suffix. Different inflected forms of the OOV root are generated and a match is looked up for the word variants in the phrase translation table of the translation model. A Vocabulary filter is used to choose the best among the translations of these word variants by finding the unigram count. A match for the OOV suffix is also looked up in the phrase entries and the target translations are filtered out. Structuring of the filtered phrases is done and SMT translation model is extended by adding OOV with its new phrase translations. By the results of the manual evaluation done it is observed that amount of OOV words in the input has been reduced considerably

URI:

http://dyuthi.cusat.ac.in/purl/4157

Files in this item: 1

Files	Size
Handling OOV Wo ... nslation for Malayalam.pdf	(261.3Kb)

Techniques to Improve the word alignments in Statistical Machine Translation from English to Malayalam

Santhosh Kumar, G; Mary, Priya Sebastian; Sheena Kurian, K (2010)

[+]

Abstract:

In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam translation using statistical models like translation model, language model and a decoder. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set up among the sentence pairs of the source and target language before subjecting them for training. This paper is deals with the techniques which can be adopted for improving the alignment model of SMT. Incorporating the parts of speech information into the bilingual corpus has eliminated many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics

URI:

http://dyuthi.cusat.ac.in/purl/4187

Files in this item: 1

Files	Size
Techniques to I ... m English to Malayalam.pdf	(368.5Kb)

Now showing items 1-8 of 8

Search Dyuthi

Advanced Search

Browse

All of Dyuthi
This Community
- By Issue Date
- Authors
- Titles
- Subjects

Browsing Faculty by Author "Mary, Priya Sebastian"

Dyuthi/Manakin Repository

Browsing Faculty by Author "Mary, Priya Sebastian"

Files in this item: 1

Files in this item: 1

Files in this item: 1

Files in this item: 1

Files in this item: 1

Files in this item: 1

Files in this item: 1

Files in this item: 1

Search Dyuthi

Browse

All of Dyuthi

This Community

My Account