DSpace About DSpace Software
 

Dyuthi @ CUSAT >
e-SCHOLARSHIP >
Computer Science >
Faculty >
Dr. Sumam Mary Idicula >

Please use this identifier to cite or link to this item: http://purl.org/purl/4104

Title: A Copy detection Method for Malayalam Text Documents using N-grams Model
Authors: Sumam, Mary Idicula
Bindu, Baby Thomas
Sindhu, L
Keywords: Copy detection
N-gram Model
Bi-gram
Tri-gram
Malayalam
Plagiarism
Issue Date: 9-Feb-2013
Abstract: In this paper a method of copy detection in short Malayalam text passages is proposed. Given two passages one as the source text and another as the copied text it is determined whether the second passage is plagiarized version of the source text. An algorithm for plagiarism detection using the n-gram model for word retrieval is developed and found tri-grams as the best model for comparing the Malayalam text. Based on the probability and the resemblance measures calculated from the n-gram comparison , the text is categorized on a threshold. Texts are compared by variable length n-gram(n={2,3,4}) comparisons. The experiments show that trigram model gives the average acceptable performance with affordable cost in terms of complexity
URI: http://dyuthi.cusat.ac.in/purl/4104
Appears in Collections:Dr. Sumam Mary Idicula

Files in This Item:

File Description SizeFormat
A Copy detection Method for Malayalam Text Documentsusing N-grams Model.pdfpdf493.56 kBAdobe PDFView/Open
View Statistics

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback