This paper investigates certain methods of training adopted in the
Statistical Machine Translator (SMT) from English to Malayalam. In English
Malayalam SMT, the word to word translation is determined by training the
parallel corpus. Our primary goal is to improve the alignment model by
reducing the number of possible alignments of all sentence pairs present in the
bilingual corpus. Incorporating morphological information into the parallel
corpus with the help of the parts of speech tagger has brought around better
training results with improved accuracy
Sumam, Mary Idicula; Bindu, Baby Thomas; Sindhu, L(February 9, 2013)
[+]
[-]
Abstract:
In this paper a method of copy detection in short Malayalam text passages is proposed. Given two passages one as the source text and another as the copied text it is determined whether the second passage is plagiarized version of the source text. An algorithm for plagiarism detection using the n-gram model for word retrieval is developed and found tri-grams as the best model for comparing the Malayalam text. Based on the probability and the resemblance measures calculated from the n-gram comparison , the text is categorized on a threshold. Texts are compared by variable length n-gram(n={2,3,4}) comparisons. The experiments show that trigram model gives the average acceptable performance with affordable cost in terms of complexity
Sreeraj, M; Sumam, Mary Idicula(IEEE, December 7, 2012)
[+]
[-]
Abstract:
The span of writer identification extends to broad
domes like digital rights administration, forensic expert decisionmaking
systems, and document analysis systems and so on. As the
success rate of a writer identification scheme is highly dependent
on the features extracted from the documents, the phase of
feature extraction and therefore selection is highly significant for
writer identification schemes. In this paper, the writer
identification in Malayalam language is sought for by utilizing
feature extraction technique such as Scale Invariant Features
Transform (SIFT).The schemes are tested on a test bed of 280
writers and performance evaluated
Sumam, Mary Idicula; Nikesh, P L; David, Peter S(IEEE, 2008)
[+]
[-]
Abstract:
This paper describes about an English-Malayalam
Cross-Lingual Information Retrieval system. The system
retrieves Malayalam documents in response to query given in
English or Malayalam. Thus monolingual information retrieval is
also supported in this system. Malayalam is one of the most
prominent regional languages of Indian subcontinent. It is
spoken by more than 37 million people and is the native language
of Kerala state in India. Since we neither had any full-fledged
online bilingual dictionary nor any parallel corpora to build the
statistical lexicon, we used a bilingual dictionary developed in
house for translation. Other language specific resources like
Malayalam stemmer, Malayalam morphological root analyzer etc
developed in house were used in this work
Kannan, Balakrishnan; Jomy, John; Pramod, K V(June 1, 2011)
[+]
[-]
Abstract:
Handwritten character recognition is always a frontier
area of research in the field of pattern recognition and image
processing and there is a large demand for OCR on hand written
documents. Even though, sufficient studies have performed in
foreign scripts like Chinese, Japanese and Arabic characters, only
a very few work can be traced for handwritten character
recognition of Indian scripts especially for the South Indian scripts.
This paper provides an overview of offline handwritten character
recognition in South Indian Scripts, namely Malayalam, Tamil,
Kannada and Telungu
Description:
National Conference on Indian Language Computing, Kochi, Feb 19-20, 2011
On-line handwriting recognition has been a frontier
area of research for the last few decades under the purview of
pattern recognition. Word processing turns to be a vexing
experience even if it is with the assistance of an alphanumeric
keyboard in Indian languages. A natural solution for this
problem is offered through online character recognition. There is
abundant literature on the handwriting recognition of western,
Chinese and Japanese scripts, but there are very few related to
the recognition of Indic script such as Malayalam. This paper
presents an efficient Online Handwritten character Recognition
System for Malayalam Characters (OHR-M) using K-NN
algorithm. It would help in recognizing Malayalam text entered
using pen-like devices. A novel feature extraction method, a
combination of time domain features and dynamic representation
of writing direction along with its curvature is used for
recognizing Malayalam characters. This writer independent
system gives an excellent accuracy of 98.125% with recognition
time of 15-30 milliseconds
Description:
2010 First International Conference on Integrated Intelligent Computing
Development of Malayalam speech recognition system is in its infancy stage; although many works have
been done in other Indian languages. In this paper we present the first work on speaker independent
Malayalam isolated speech recognizer based on PLP (Perceptual Linear Predictive) Cepstral Coefficient
and Hidden Markov Model (HMM). The performance of the developed system has been evaluated with
different number of states of HMM (Hidden Markov Model). The system is trained with 21 male and
female speakers in the age group ranging from 19 to 41 years. The system obtained an accuracy of 99.5%
with the unseen data
Description:
International Journal of Advanced Information Technology (IJAIT) Vol. 1, No.5, October 2011
Kannan, Balakrishnan; Pramod, K V; Jomy, John(IEEE, March 23, 2011)
[+]
[-]
Abstract:
Optical Character Recognition plays an important role
in Digital Image Processing and Pattern Recognition. Even
though ambient study had been performed on foreign languages
like Chinese and Japanese, effort on Indian script is still
immature. OCR in Malayalam language is more complex as it is
enriched with largest number of characters among all Indian
languages. The challenge of recognition of characters is even high
in handwritten domain, due to the varying writing style of each
individual. In this paper we propose a system for recognition of
offline handwritten Malayalam vowels. The proposed method
uses Chain code and Image Centroid for the purpose of
extracting features and a two layer feed forward network with
scaled conjugate gradient for classification
Description:
Emerging Trends in Electrical and Computer Technology (ICETECT), 2011 International Conference on
This paper presents a novel approach to recognize Grantha, an ancient script in South India and converting it to Malayalam, a prevalent language in South India using online character recognition mechanism. The motivation behind this work owes its credit to (i) developing a mechanism to recognize Grantha script in this modern world and (ii) affirming the strong connection among Grantha and Malayalam. A framework for the recognition of Grantha script using online character recognition is designed and implemented. The features extracted from the Grantha script comprises mainly of time-domain features based on writing direction and curvature. The recognized characters are mapped to corresponding Malayalam characters. The framework was tested on a bed of medium length manuscripts containing 9-12 sample lines and printed pages of a book titled Soundarya Lahari writtenin Grantha by Sri Adi Shankara to recognize the words and sentences. The manuscript recognition rates with the system are for Grantha as 92.11%, Old Malayalam 90.82% and for new Malayalam script 89.56%. The recognition rates of pages of the printed book are for Grantha as 96.16%, Old Malayalam script 95.22% and new Malayalam script as 92.32% respectively. These results show the efficiency of the developed system
Description:
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 3, No. 7, 2012
Kannan, Balakrishnan; Jomy, John; Pramod, K V(MECS, April , 2013)
[+]
[-]
Abstract:
In this paper, we propose a handwritten character recognition system for Malayalam language. The feature extraction phase consists of gradient and curvature calculation and dimensionality reduction using Principal Component Analysis. Directional information from the arc tangent of gradient is used as gradient feature. Strength of gradient in curvature direction is used as the curvature feature. The proposed system uses a combination of gradient and curvature feature in reduced dimension as the feature vector. For classification, discriminative power of Support Vector Machine (SVM) is evaluated. The results reveal that SVM with Radial Basis Function (RBF) kernel yield the best performance with 96.28% and 97.96% of accuracy in two different datasets. This is the highest accuracy ever reported on these datasets
Description:
I.J. Image, Graphics and Signal Processing, 2013, 4, 53-59