Kilian
2012-Feb-01 10:38 UTC
[R] randomForest: proximity for new objects using an existing rf
Ein eingebundener Text mit undefiniertem Zeichensatz wurde abgetrennt. Name: nicht verf?gbar URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120201/cc22025d/attachment.pl>
Liaw, Andy
2012-Feb-01 15:39 UTC
[R] randomForest: proximity for new objects using an existing rf
There's an alternative, but it may not be any more efficient in time or memory... You can run predict() on the training set once, setting nodes=TRUE. That will give you a n by ntree matrix of which node of which tree the data point falls in. For any new data, you would run predict() with nodes=TRUE, then compute the proximity "by hand" by counting how often any given pair landed in the same terminal node of each tree. Andy> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Kilian > Sent: Wednesday, February 01, 2012 5:39 AM > To: r-help at r-project.org > Subject: [R] randomForest: proximity for new objects using an > existing rf > > Dear all, > > using an existing random forest, I would like to calculate > the proximity > for a new test object, i.e. the similarity between the new > object and the > old training objects which were used for building the random > forest. I do > not want to build a new random forest based on both old and > new objects. > > Currently, my workaround is to calculate the proximites of a > combined data > set consisting of training and new objects like this: > > model <- randomForest(Xtrain, Ytrain) # build random forest > nnew <- nrow(Xnew) # number of new objects > Xcombi <- rbind(Xnew, Xtrain) # combine new objects and > training objects > predcombi <- predict(model, Xcombi, proximity=TRUE) # > calculate proximities > proxcombi <- predcombi$proximity # get proximities of combined dataset > proxnew <- proxcombi[(1:nnew),-(1:nnew)] # get proximities of > new objects > only > > But this approach causes a lot of wasted computation time as I am not > interested in the proximities among the training objects > themselves but > only among the training objects and the new objects. With > 1000 training > objects and 5 new objects, I have to calculate a 1005x1005 > proximity matrix > to get the essential 5x1000 matrix of the new objects only. > > Am I doing something wrong? I read through the documentation > but could not > find another solution. Any advice would be highly appreciated. > > Thanks in advance! > Kilian > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Notice: This e-mail message, together with any attachme...{{dropped:11}}