DSpace About DSpace Software
 

Dyuthi @ CUSAT >
e-SCHOLARSHIP >
Computer Science >
Faculty >
Dr. Sumam Mary Idicula >

Please use this identifier to cite or link to this item: http://purl.org/purl/4105

Title: A Novel Decision Tree Algorithm for Numeric Datasets - C 4.5*Stat
Authors: Sumam, Mary Idicula
Sudheep, Elayidom M
Joseph, Alexander
Keywords: Statistical variance
Data Mining
Decision tree
Statistical mean
Accuracy
Issue Date: 2013
Publisher: Recent Science Publications
Abstract: Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining
Description: International Journal of Advanced Computing, ISSN:2051-0845, Vol.36, Issue.1
URI: http://dyuthi.cusat.ac.in/purl/4105
ISSN: 2051-0845
Appears in Collections:Dr. Sumam Mary Idicula

Files in This Item:

File Description SizeFormat
A Novel Decision Tree Algorithm for Numeric Datasets - C 4.5Stat.pdfpdf179.03 kBAdobe PDFView/Open
View Statistics

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback