hello, I want to use randomForest to classify a matrix which is 331030?42,the last column is class signal.I use ? Memebers.rf<-randomForest(class~.,data=Memebers,proximity=TRUE,mtry=6,ntree=200) which told me" the error is matrix(0,n,n) set too elements" then I use: Memebers.rf<-randomForest(class~.,data=Memebers,importance=TRUE,proximity=TRUE) which told me"the error is na.fail.default(list(class = c(17L, 17L, 17L, 29L, 29L, 29L, : missing values in object " what's wrong with it .Thanks a lot ????????wanghong ????????wanghong at neusoft.edu.cn ??????????2008-12-26
wanghong wrote:> hello, > I want to use randomForest to classify a matrix which is 331030?42,the last column is class signal.I use ? > Memebers.rf<-randomForest(class~.,data=Memebers,proximity=TRUE,mtry=6,ntree=200) which told me" the error is matrix(0,n,n) set too elements"I doubt "the error is matrix(0,n,n) set too elements" is really an error message from randomForest. I'd rather get "Error in matrix(0, n, n) : too many elements specified" which tells us that randomForest cannot deal with such a huge *data.frame* (rather than a matrix, I guess). Finally, what do you think how much RAM will be required to store 200 trees grown with default setting on such a huge data.frame? I doubt it will fit on your whole HDD (without having done any calculations), but never in your RAM.> then I use: > Memebers.rf<-randomForest(class~.,data=Memebers,importance=TRUE,proximity=TRUE) which told me"the error is na.fail.default(list(class = c(17L, 17L, 17L, 29L, 29L, 29L, : > missing values in object > "Missing values? Uwe Ligges> what's wrong with it .Thanks a lot > > > ????????wanghong > ????????wanghong at neusoft.edu.cn > ??????????2008-12-26 > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Wanghong, Unless you have a huge linux box, you will need to sample down your 300k rows to a few thousand. In marketing aps, I often have data sets of comparable size. I would suggest you start with a just a few k rows to make sure everything else is working as you wish. Also, study carefully Andy's randomForest docs - including the R News article a couple years ago. In particular, 1) the formula interface is a memory hog. Andy suggests just using explicit declaration. In you case, something like randomForest(Memebers[42], Memebers[-42], ... 2) proximity matirx is also memory & time intensive. Suggest proximity FALSE until, other things sorted out. HTH, Jim Porzak TGN.com San Francisco, CA linkedin.com/in/jimporzak useR Group SF: ia.meetup.com/67 2008/12/26 wanghong <wanghong@neusoft.edu.cn>> hello, > I want to use randomForest to classify a matrix which is 331030¡Á42,the last > column is class signal.I use £º > Memebers.rf<-randomForest(class~.,data=Memebers,proximity=TRUE,mtry=6,ntree=200) > which told me" the error is matrix(0,n,n) set too elements" > then I use: > Memebers.rf<-randomForest(class~.,data=Memebers,importance=TRUE,proximity=TRUE) > which told me"the error is na.fail.default(list(class = c(17L, 17L, 17L, > 29L, 29L, 29L, : > missing values in object > " > > what's wrong with it .Thanks a lot > > > wanghong > wanghong@neusoft.edu.cn > 2008-12-26 > ______________________________________________ > R-help@r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]