dc.contributor.author |
Sumam, Mary Idicula |
|
dc.contributor.author |
Bindu, Baby Thomas |
|
dc.contributor.author |
Sindhu, L |
|
dc.date.accessioned |
2014-07-18T04:58:29Z |
|
dc.date.available |
2014-07-18T04:58:29Z |
|
dc.date.issued |
2009 |
|
dc.identifier.uri |
http://dyuthi.cusat.ac.in/purl/4103 |
|
dc.description.abstract |
Author identification is the problem of identifying the author of an anonymous text or text whose authorship is in doubt from a given set of authors. The works by different authors are strongly distinguished by quantifiable features of the text. This paper deals with the attempts made on identifying the most likely author of a text in Malayalam from a list of authors. Malayalam is a Dravidian language with agglutinative nature and not much successful tools have been developed to extract syntactic & semantic features of texts in this language. We have done a detailed study on the various stylometric features that can be used to form an authors profile and have found that the frequencies of word collocations can be used to clearly distinguish an author in a highly inflectious language such as Malayalam. In our work we try to extract the word level and character level features present in the text for characterizing the style of an author. Our first step was towards creating a profile for each of the candidate authors whose texts were available with us, first from word n-gram frequencies and then by using variable length character n-gram frequencies. Profiles of the set of authors under consideration thus formed, was then compared with the features extracted from anonymous text, to suggest the most likely author. |
en_US |
dc.description.sponsorship |
Cochin University Of Science And Technology |
en_US |
dc.language.iso |
en |
en_US |
dc.subject |
stylometrics |
en_US |
dc.subject |
feature extraction |
en_US |
dc.subject |
author profile |
en_US |
dc.subject |
lexical features |
en_US |
dc.subject |
character features |
en_US |
dc.subject |
collocations |
en_US |
dc.subject |
classification |
en_US |
dc.subject |
n-grams |
en_US |
dc.subject |
distance measure |
en_US |
dc.title |
Author Identification in Malayalam using n-grams |
en_US |
dc.type |
Article |
en_US |