DSpace About DSpace Software
 

Dyuthi @ CUSAT >
e-SCHOLARSHIP >
Computer Science >
Faculty >
Dr. Sumam Mary Idicula >

Please use this identifier to cite or link to this item: http://purl.org/purl/4103

Title: Author Identification in Malayalam using n-grams
Authors: Sumam, Mary Idicula
Bindu, Baby Thomas
Sindhu, L
Keywords: stylometrics
feature extraction
author profile
lexical features
character features
collocations
classification
n-grams
distance measure
Issue Date: 2009
Abstract: Author identification is the problem of identifying the author of an anonymous text or text whose authorship is in doubt from a given set of authors. The works by different authors are strongly distinguished by quantifiable features of the text. This paper deals with the attempts made on identifying the most likely author of a text in Malayalam from a list of authors. Malayalam is a Dravidian language with agglutinative nature and not much successful tools have been developed to extract syntactic & semantic features of texts in this language. We have done a detailed study on the various stylometric features that can be used to form an authors profile and have found that the frequencies of word collocations can be used to clearly distinguish an author in a highly inflectious language such as Malayalam. In our work we try to extract the word level and character level features present in the text for characterizing the style of an author. Our first step was towards creating a profile for each of the candidate authors whose texts were available with us, first from word n-gram frequencies and then by using variable length character n-gram frequencies. Profiles of the set of authors under consideration thus formed, was then compared with the features extracted from anonymous text, to suggest the most likely author.
URI: http://dyuthi.cusat.ac.in/purl/4103
Appears in Collections:Dr. Sumam Mary Idicula

Files in This Item:

File Description SizeFormat
Author Identification in Malayalam using n-grams.pdfpdf379.04 kBAdobe PDFView/Open
View Statistics

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback