Sumam, Mary Idicula; Soumya, S; Manju, K(IEEE, 2009)
[+]
[-]
Abstract:
A Parts of Speech tagger for Malayalam which uses
a stochastic approach has been proposed. The tagger makes
use of word frequencies and bigram statistics from a corpus.
The morphological analyzer is used to generate a tagged
corpus due to the unavailability of an annotated corpus in
Malayalam. Although the experiments have been performed on
a very small corpus, the results have shown that the statistical
approach works well with a highly agglutinative language like
Malayalam
Description:
2009 International Conference on Advances in Recent Technologies in Communication and Computing