Archive for February, 2009

Impurity in Tree Classifiers

“The first problem in tree construction is how to use L to determine the binary splits of X into smaller and smaller pieces. The fundamental idea is to select each split of a subset so that the data in each of the descendant subsets are ‘purer’ than the data in the parent subset…the node impurity [...]

The Gini Index

“The concept of a criterion depending on a node impurity measure has already been introduced. Given a node t with estimated class probabilities p(j|t), j=1, …, J, a measure of node impurity given t:
i(t) = psi(p(1|t), …, p(J|t))
is defined and a search made for the split that reduces node, or equivalently tree, impurity. As remarked [...]