thr3ads.net - R help - [R] Random Forest Bug [Oct 2008]

If this information is useful, please help other people find it:
Share via:

Roeder Jens (CR/AEM5)

2008-Oct-28 17:17 UTC

[R] Random Forest Bug

Dear help list,

I think I found a bug a the R Random Forest. Hopefully, you are able to
reproduce it.
I use R version 2.7.2 and RF version 4.5-27.
This is a minimal code to describe the problem:

library(randomForest)
tries <- 20
dimension <- 20
n <- 200
outlyingness <- rep(NaN,tries)
for (o_number in 1:tries){
	features <- matrix(rnorm(n*dimension,0,1),n,dimension)
#Generate features, n uncorrelated normally distributed points
	outlier.rf <- randomForest(features, ntree=100, proximity=TRUE)
#Compute Random Forest including the proximity matrix
	outlyingness_all <- apply(outlier.rf$proximity,2,mean) #Compute
the mean proximity for each of the n points
            better <- sum(outlyingness_all[1]<outlyingness_all) #Compute
the rank of a certain point according to the outlyingness
            outlyingness[o_number] <- 1+better
}
outlyingness


Point number 1 plays a special role in this code fragment.
A typical value for "outlyingness" is 
200 200 200 200 196 200 200 200 200 200 200 200 200 200 200 200 199 200
200 200
whereas one obtains what one would expect for any other point. So, if 
better <- sum(outlyingness_all[1]<outlyingness_all) 
is for example replaced by
better <- sum(outlyingness_all[17]<outlyingness_all) 
one gets
194   7 184  76  25  40 175 174 137  75  49 146 175 150 148 118 100  88
121 14

Is this a bug or am I confused?
Can anybody help me? Does anybody know the problem? 

Best regards

Jens Roeder




	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Oct 2008 - Random Forest Bug

[R] Random Forest Bug

Seemingly Similar Threads

Wisdom of the Ancients