Discussion:
[R] How to read this Rpart decision tree?
Kim C.
2015-02-11 08:02:22 UTC
Permalink
Hi all,

In the attachment or this link (Loading Image...) you'll find the decision tree I made. I used the Rpart package to make the tree and the rattle package using the fancyRpartPlot to plot it. The data in the tree looks different than about every example I have seen before. I don't understand how I should read it. I want to predict Product (which are productkeys). The variables to predict it contain age, incomegroup, gender, totalchildren, education, occupation, houseownerflag, numberCars.It looks like the upper number is a ProductKey. The "n" is number of observations? And the percentage of the yes/no question below.

This is the code I used.
ss.rpart1 <- rpart(Product ~ ., data=sstrain, control=rpart.control(minbucket=2,minsplit=1, cp=-1))
spt <- which.min(ss.rpart1$cptable[, "xerror"])
scp <- ss.rpart1$cptable[opt, "CP"]
ss.rpart2 <- prune(ss.rpart1, cp=cp)
fancyRpartPlot(ss.rpart2)
So why does the tree looks so different from the most (for example: Loading Image...). This is from Trevor Stephen's TItanic tutorial. The first node show that 62% of 100% doesn't survive. If they were male, only 19% of them were survivors. I find that a lot examples look like that. Why does mine predict per ProductKey and every node it has something else. it doesn't make sense to me. And it doesn't have the two numbers like .62 and .38 but it has n=197e+3. So should I read the first node like "For 100% of the observations of ProductKey 1074, the incomegroup was moderate)"?

Thank you!

Kim
Sarah Goslee
2015-02-11 15:03:28 UTC
Permalink
Hi Kim

fancyRpartPlot is a front-end to prp, and you can pass it all of the
prp options - it says this in the help for fancyRpartPlot, and that's
about all it says.

So you need to spend some time reading about prp options, and how to
customize your plot to get what you want. There are lots of detailed
resources; here's one to get you started.

http://www.milbo.org/rpart-plot/prp.pdf

Sarah
Post by Kim C.
Hi all,
In the attachment or this link (http://oi58.tinypic.com/35ic9qc.jpg) you'll find the decision tree I made. I used the Rpart package to make the tree and the rattle package using the fancyRpartPlot to plot it. The data in the tree looks different than about every example I have seen before. I don't understand how I should read it. I want to predict Product (which are productkeys). The variables to predict it contain age, incomegroup, gender, totalchildren, education, occupation, houseownerflag, numberCars.It looks like the upper number is a ProductKey. The "n" is number of observations? And the percentage of the yes/no question below.
This is the code I used.
ss.rpart1 <- rpart(Product ~ ., data=sstrain, control=rpart.control(minbucket=2,minsplit=1, cp=-1))
spt <- which.min(ss.rpart1$cptable[, "xerror"])
scp <- ss.rpart1$cptable[opt, "CP"]
ss.rpart2 <- prune(ss.rpart1, cp=cp)
fancyRpartPlot(ss.rpart2)
So why does the tree looks so different from the most (for example: http://media.tumblr.com/a9f482ff88b0b9cfaffca7ffd46c6a8e/tumblr_inline_mz7pyuaYJQ1s5wtly.png). This is from Trevor Stephen's TItanic tutorial. The first node show that 62% of 100% doesn't survive. If they were male, only 19% of them were survivors. I find that a lot examples look like that. Why does mine predict per ProductKey and every node it has something else. it doesn't make sense to me. And it doesn't have the two numbers like .62 and .38 but it has n=197e+3. So should I read the first node like "For 100% of the observations of ProductKey 1074, the incomegroup was moderate)"?
Thank you!
Kim
--
Sarah Goslee
http://www.functionaldiversity.org

______________________________________________
R-***@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...