Hi All,
In the script below, the importance measure for column 4 (ie
MeanDecreaseGini) indicated "Inf" for V7.
Running the getTree command showed that "V7" had been selected at
least
twice in one of the trees for Random Forest. So the "Inf" command was
not generated as a result of dividing the sum of the decreases by 0.
Any suggestions on what may be causing the Inf in "V7" would be
helpful?
Thanks in advance,
-Melanie
---------i
library(randomForest)
credit<-read.csv(url("ftp://ftp.ics.uci.edu/pub/machine-learning-databases/credit-screening/crx.data"),
header=FALSE, na.string="?")
credit.rf <- randomForest(V16~., credit, imp=T,
do.trace=100,na.action=na.omit)
imp <- round(importance(credit.rf), 2)
imp
- + MeanDecreaseAccuracy MeanDecreaseGini
V1 0.00 0.00 0.00 0.00
V2 0.75 0.25 0.55 19.92
V3 0.41 0.57 0.46 22.13
V4 0.39 0.33 0.33 4.93
V5 0.26 0.24 0.21 0.60
V6 0.39 0.50 0.40 -46.21
V7 0.91 0.59 0.71 Inf
V8 1.35 1.35 1.06 37.15
V9 0.00 0.00 0.00 0.00
V10 0.00 0.00 0.00 0.00
V11 1.65 1.59 1.23 49.16
V12 0.00 0.00 0.00 0.00
V13 -0.11 -0.10 -0.10 0.21
V14 0.82 0.57 0.66 20.71
V15 1.36 1.02 1.01 33.47
getTree(credit.rf, 1)
left daughter right daughter split var split point status prediction
[1,] 2 3 15 492.0000 1 0
[2,] 4 5 11 2.5000 1 0
[3,] 6 7 2 38.5000 1 0
[4,] 8 9 14 83.0000 1 0
[5,] 10 11 7 207.0000 1 0
[6,] 12 13 11 0.5000 1 0
[7,] 0 0 0 0.0000 -1 2
[8,] 14 15 7 117.0000 1 0
[9,] 16 17 8 3.0625 1 0
[10,] 18 19 3 0.2700 1 0
[11,] 0 0 0 0.0000 -1 2
[12,] 20 21 15 4753.0000 1 0
[13,] 22 23 2 37.0850 1 0
[14,] 24 25 14 8.5000 1 0
That result looks fishy: Not only there shouldn't be Inf, but there shouldn't be negative values in that measure (look at V6). I will look into it. I hope by now you realize that there's not much point in asking such package-specific questions on R-help... Not all package maintainers are on R-help, and they are the best persons to ask package specific questions or report bugs. Andy> From: Melanie Vida > > Hi All, > > In the script below, the importance measure for column 4 (ie > MeanDecreaseGini) indicated "Inf" for V7. > Running the getTree command showed that "V7" had been > selected at least > twice in one of the trees for Random Forest. So the "Inf" command was > not generated as a result of dividing the sum of the decreases by 0. > > Any suggestions on what may be causing the Inf in "V7" would > be helpful? > Thanks in advance, > > -Melanie > > ---------i > > library(randomForest) > > credit<-read.csv(url("ftp://ftp.ics.uci.edu/pub/machine-learni > ng-databases/credit-screening/crx.data"), > header=FALSE, na.string="?") > > credit.rf <- randomForest(V16~., credit, imp=T, > do.trace=100,na.action=na.omit) > > imp <- round(importance(credit.rf), 2) > > imp > - + MeanDecreaseAccuracy MeanDecreaseGini > V1 0.00 0.00 0.00 0.00 > V2 0.75 0.25 0.55 19.92 > V3 0.41 0.57 0.46 22.13 > V4 0.39 0.33 0.33 4.93 > V5 0.26 0.24 0.21 0.60 > V6 0.39 0.50 0.40 -46.21 > V7 0.91 0.59 0.71 Inf > V8 1.35 1.35 1.06 37.15 > V9 0.00 0.00 0.00 0.00 > V10 0.00 0.00 0.00 0.00 > V11 1.65 1.59 1.23 49.16 > V12 0.00 0.00 0.00 0.00 > V13 -0.11 -0.10 -0.10 0.21 > V14 0.82 0.57 0.66 20.71 > V15 1.36 1.02 1.01 33.47 > > getTree(credit.rf, 1) > > left daughter right daughter split var split point status prediction > [1,] 2 3 15 492.0000 > 1 0 > [2,] 4 5 11 2.5000 > 1 0 > [3,] 6 7 2 38.5000 > 1 0 > [4,] 8 9 14 83.0000 > 1 0 > [5,] 10 11 7 207.0000 > 1 0 > [6,] 12 13 11 0.5000 > 1 0 > [7,] 0 0 0 0.0000 > -1 2 > [8,] 14 15 7 117.0000 > 1 0 > [9,] 16 17 8 3.0625 > 1 0 > [10,] 18 19 3 0.2700 > 1 0 > [11,] 0 0 0 0.0000 > -1 2 > [12,] 20 21 15 4753.0000 > 1 0 > [13,] 22 23 2 37.0850 > 1 0 > [14,] 24 25 14 8.5000 > 1 0 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > >