A Novel Decision Tree Algorithm for Numeric Datasets - C 4.5*Stat

Dyuthi/Manakin Repository

A Novel Decision Tree Algorithm for Numeric Datasets - C 4.5*Stat

Show simple item record

dc.contributor.author Sumam, Mary Idicula
dc.contributor.author Sudheep, Elayidom M
dc.contributor.author Joseph, Alexander
dc.date.accessioned 2014-07-18T05:16:40Z
dc.date.available 2014-07-18T05:16:40Z
dc.date.issued 2013
dc.identifier.issn 2051-0845
dc.identifier.uri http://dyuthi.cusat.ac.in/purl/4105
dc.description International Journal of Advanced Computing, ISSN:2051-0845, Vol.36, Issue.1 en_US
dc.description.abstract Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining en_US
dc.description.sponsorship cochin university of science and technology en_US
dc.language.iso en en_US
dc.publisher Recent Science Publications en_US
dc.subject Statistical variance en_US
dc.subject Data Mining en_US
dc.subject Decision tree en_US
dc.subject Statistical mean en_US
dc.subject Accuracy en_US
dc.title A Novel Decision Tree Algorithm for Numeric Datasets - C 4.5*Stat en_US
dc.type Article en_US


Files in this item

Files Size Format View Description
A Novel Decisio ... c Datasets - C 4.5Stat.pdf 179.0Kb PDF View/Open pdf

This item appears in the following Collection(s)

Show simple item record

Search Dyuthi


Advanced Search

Browse

My Account