<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Book Blurbs &#187; Classification and Regression Trees by Breiman et al</title>
	<atom:link href="http://books.hammerpig.com/category/classification-and-regression-trees-by-breiman-et-al/feed" rel="self" type="application/rss+xml" />
	<link>http://books.hammerpig.com</link>
	<description>Quotes to Remember From Some Great Books About Science, People, Technology, and Ideas</description>
	<lastBuildDate>Tue, 12 May 2009 16:53:08 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Impurity in Tree Classifiers</title>
		<link>http://books.hammerpig.com/impurity-in-tree-classifiers.html</link>
		<comments>http://books.hammerpig.com/impurity-in-tree-classifiers.html#comments</comments>
		<pubDate>Wed, 11 Feb 2009 15:37:04 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Classification and Regression Trees by Breiman et al]]></category>

		<guid isPermaLink="false">http://books.hammerpig.com/?p=38</guid>
		<description><![CDATA[&#8220;The first problem in tree construction is how to use L to determine the binary splits of X into smaller and smaller pieces. The fundamental idea is to select each split of a subset so that the data in each of the descendant subsets are &#8216;purer&#8217; than the data in the parent subset&#8230;the node impurity [...]]]></description>
			<content:encoded><![CDATA[<p>&#8220;The first problem in tree construction is how to use L to determine the binary splits of X into smaller and smaller pieces. The fundamental idea is to select each split of a subset so that the data in each of the descendant subsets are &#8216;purer&#8217; than the data in the parent subset&#8230;the node impurity is largest when all classes are equally mixed together in it, and smallest when the node contains only one class.&#8221; (pp. 23-24)</p>
]]></content:encoded>
			<wfw:commentRss>http://books.hammerpig.com/impurity-in-tree-classifiers.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Gini Index</title>
		<link>http://books.hammerpig.com/the-gini-index.html</link>
		<comments>http://books.hammerpig.com/the-gini-index.html#comments</comments>
		<pubDate>Wed, 11 Feb 2009 15:29:27 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Classification and Regression Trees by Breiman et al]]></category>

		<guid isPermaLink="false">http://books.hammerpig.com/?p=34</guid>
		<description><![CDATA[&#8220;The concept of a criterion depending on a node impurity measure has already been introduced. Given a node t with estimated class probabilities p(j&#124;t), j=1, &#8230;, J, a measure of node impurity given t:
i(t) = psi(p(1&#124;t), &#8230;, p(J&#124;t))
is defined and a search made for the split that reduces node, or equivalently tree, impurity. As remarked [...]]]></description>
			<content:encoded><![CDATA[<p>&#8220;The concept of a criterion depending on a node impurity measure has already been introduced. Given a node t with estimated class probabilities p(j|t), j=1, &#8230;, J, a measure of node impurity given t:</p>
<p>i(t) = psi(p(1|t), &#8230;, p(J|t))</p>
<p>is defined and a search made for the split that reduces node, or equivalently tree, impurity. As remarked earlier, the original function selected was:</p>
<p>psi(p1, &#8230;, pJ) = -Sum(j)(pj * log(pj)).</p>
<p>&#8220;In later work the Gini diversity index was adopted. This has the form:</p>
<p>Sum (j!=i) (p(i|t)p(j|t)).</p>
<p>&#8220;The Gini index has an interesting interpretation. Instead of using the plurality rule to classify objects in a node t, use the rule that assigns an object selected at random from the node to class i with probability p(i|t). The estimated probability that the item is actually in class j is p(j|t). Therefore, the estimated probability of misclassification under this rule is the Gini index:</p>
<p>Sum (j!=i) (p(i|t)p(j|t)).</p>
<p>&#8220;Another interpretation is in terms of variances (see Light and Margolin, 1971). In a node t, assign all class j objects the value 1, and all other objects the value 0. Then the sample variance of these values is p(j|t)(1-p(j|t)). If this is repeated for all J classes and the variances summed, the results is:</p>
<p>Sum(j) (p(j|t)(1-p(j|t)) = 1 &#8211; Sum(j) (p^2(j|t))).</p>
<p>&#8230;</p>
<p>The Gini index is simple and quickly computed. It can also incorporate symmetric variable missclassification costs in a nature way.&#8221; (pp. 103-104)</p>
]]></content:encoded>
			<wfw:commentRss>http://books.hammerpig.com/the-gini-index.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
