Dyuthi @ CUSAT >
e-SCHOLARSHIP >
Computer Science >
Faculty >
Dr. Sumam Mary Idicula >
Please use this identifier to cite or link to this item:
http://purl.org/purl/4105
|
Title: | A Novel Decision Tree Algorithm for Numeric Datasets - C 4.5*Stat |
Authors: | Sumam, Mary Idicula Sudheep, Elayidom M Joseph, Alexander |
Keywords: | Statistical variance Data Mining Decision tree Statistical mean Accuracy |
Issue Date: | 2013 |
Publisher: | Recent Science Publications |
Abstract: | Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining |
Description: | International Journal of Advanced Computing, ISSN:2051-0845, Vol.36, Issue.1 |
URI: | http://dyuthi.cusat.ac.in/purl/4105 |
ISSN: | 2051-0845 |
Appears in Collections: | Dr. Sumam Mary Idicula
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|