新浪博客

R语言—传统决策树rpart

2016-12-28 14:53阅读:
R语言实现决策树
代码:​
>data('bodyfat',package = 'TH.data')
>set.seed(1234)
> ind<-sample(2,nrow(bodyfat),replace = T,prob = c(0.7,0.3))
> bodyfat.train<-bodyfat[ind==1,]
> bodyfat.test<-bodyfat[ind==2,]
> library(rpart)
> myFormula<-DEXfat~age+waistcirc+hipcirc+elbowbreadth+kneebreadth
> bodyfat_rpart<-rpart(myFormula,data = bodyfat.train,control = rpart.control(minsplit = 10))
>attributes(bodyfat_rpart)

> print(bodyfat_rpart$cptable)
CP nsplit rel error xerror xstd
1 0.67272638 0 1.00000000 1.0687268 0.19588877
2 0.09390665 1 0.32727362 0.4700500 0.11234371
3 0.06037503 2 0.23336696 0.4399747 0.09718181

4 0.03420446 3 0.17299193 0.3739830 0.09705168
5 0.01708278 4 0.13878747 0.3316298 0.07557418
6 0.01695763 5 0.12170469 0.3202835 0.06913430
7 0.01007079 6 0.10474706 0.3273852 0.06873450
8 0.01000000 7 0.09467627 0.3251986 0.06926171
cp(复杂度参数)nsplit(分支数)​relerror(训练集中各种树对应的误差)xerror(交叉验证误差)xstd(交叉验证误差的标准差)
> print(bodyfat_rpart)
n= 56
node), split, n, deviance, yval
* denotes terminal node
1) root 56 7265.0290000 30.94589
2) waistcirc< 88.4 31 960.5381000 22.55645
4) hipcirc< 96.25 14 222.2648000 18.41143
8) age< 60.5 9 66.8809600 16.19222 *
9) age>=60.5 5 31.2769200 22.40600 *
5) hipcirc>=96.25 17 299.6470000 25.97000
10) waistcirc< 77.75 6 30.7345500 22.32500 *
11) waistcirc>=77.75 11 145.7148000 27.95818
22) hipcirc< 99.5 3 0.2568667 23.74667 *
23) hipcirc>=99.5 8 72.2933500 29.53750 *
3) waistcirc>=88.4 25 1417.1140000 41.34880
6) waistcirc< 104.75 18 330.5792000 38.09111
12) hipcirc< 109.9 9 68.9996200 34.37556 *
13) hipcirc>=109.9 9 13.0832000 41.80667 *
7) waistcirc>=104.75 7 404.3004000 49.72571 *
> plot(bodyfat_rpart)
> text(bodyfat_rpart,use.n = T)
图略
选择具有最小预测误差的决策树:
opt<-which.min(bodyfat_rpart$cptable[,'xerror'])
> cp<-bodyfat_rpart$cptable[opt,'CP']
> bodyfat_prune<-prune(bodyfat_rpart,cp=cp)
> print(bodyfat_prune)
n= 56
node), split, n, deviance, yval
* denotes terminal node
1) root 56 7265.02900 30.94589
2) waistcirc< 88.4 31 960.53810 22.55645
4) hipcirc< 96.25 14 222.26480 18.41143
8) age< 60.5 9 66.88096 16.19222 *
9) age>=60.5 5 31.27692 22.40600 *
5) hipcirc>=96.25 17 299.64700 25.97000 *
3) waistcirc>=88.4 25 1417.11400 41.34880
6) waistcirc< 104.75 18 330.57920 38.09111
12) hipcirc< 109.9 9 68.99962 34.37556 *
13) hipcirc>=109.9 9 13.08320 41.80667 *
7) waistcirc>=104.75 7 404.30040 49.72571 *
> plot(bodyfat_prune,margin=0.1)
> text(bodyfat_prune,all=T,use.n = T)
R语言—传统决策树rpart
用决策树模型进行预测,并与实际值进行对比。图中abline()绘制了一条对角线。一个好的预测模型,绝大多数的点应该落在对角线上或者越接近对角线越好。
  • > DEXfat_pred <- predict(bodyfat_prune, newdata=bodyfat.test)
  • > xlim <- range(bodyfat$DEXfat)
  • > plot(DEXfat_pred ~ DEXfat, data=bodyfat.test, xlab='Observed', + ylab='Predicted', ylim=xlim, xlim=xlim)
  • > abline(a=0, b=1)



我的更多文章

下载客户端阅读体验更佳

APP专享