kevin123
2012-Feb-25 23:24 UTC
[R] How to deal with missing values when using Random Forrest
I am using the package Random Forrest to test and train a model, I aim to predict (LengthOfStay.days),:> library(randomForest) > model <- randomForest( LengthOfStay.days~.,data = training,+ importance=TRUE, + keep.forest=TRUE + ) *This is a small portion of the data frame: * *data(training)* LengthOfStay.days CharlsonIndex.numeric DSFS.months 1 0 0.0 8.5 6 0 0.0 3.5 7 0 0.0 0.5 8 0 0.0 0.5 9 0 0.0 1.5 11 0 1.5 NaN *Error message* Error in na.fail.default(list(LengthOfStay.days = c(0, 0, 0, 0, 0, 0, : missing values in object, I would greatly appreciate any help Thanks Kevin -- View this message in context: http://r.789695.n4.nabble.com/How-to-deal-with-missing-values-when-using-Random-Forrest-tp4421254p4421254.html Sent from the R help mailing list archive at Nabble.com.
David Winsemius
2012-Feb-26 01:26 UTC
[R] How to deal with missing values when using Random Forrest
On Feb 25, 2012, at 6:24 PM, kevin123 wrote:> I am using the package Random Forrest to test and train a model, > I aim to predict (LengthOfStay.days),: > >> library(randomForest) >> model <- randomForest( LengthOfStay.days~.,data = training, > + importance=TRUE, > + keep.forest=TRUE > + ) > > > *This is a small portion of the data frame: * > > *data(training)* > > LengthOfStay.days CharlsonIndex.numeric DSFS.months > 1 0 0.0 8.5 > 6 0 0.0 3.5 > 7 0 0.0 0.5 > 8 0 0.0 0.5 > 9 0 0.0 1.5 > 11 0 1.5 NaN > > *Error message* > > Error in na.fail.default(list(LengthOfStay.days = c(0, 0, 0, 0, 0, > 0, : > missing values in object,What part of that error message is unclear? Have you looked at the randomForest page? It tells you what the default behavior is na.fail.> > I would greatly appreciate any helpI would seem that the way forward is to remove the cases with missing values or to impute values. -- David Winsemius, MD Heritage Laboratories West Hartford, CT
Weidong Gu
2012-Feb-26 23:10 UTC
[R] How to deal with missing values when using Random Forrest
Hi, You can set na.action=na.roughfix which fills NAs with the mean or mode of the missing variable. Other option is to impute missing values using rfImpute, then run randomForest on the complete data set. Weidong Gu On Sat, Feb 25, 2012 at 6:24 PM, kevin123 <kevincorry123 at gmail.com> wrote:> I am using the package Random Forrest to test and train a model, > I aim to predict (LengthOfStay.days),: > >> library(randomForest) >> model <- randomForest( LengthOfStay.days~.,data = training, > + importance=TRUE, > + keep.forest=TRUE > + ) > > > *This is a small portion of the data frame: ? * > > *data(training)* > > LengthOfStay.days CharlsonIndex.numeric DSFS.months > 1 ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? 0.0 ? ? ? ? 8.5 > 6 ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? 0.0 ? ? ? ? 3.5 > 7 ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? 0.0 ? ? ? ? 0.5 > 8 ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? 0.0 ? ? ? ? 0.5 > 9 ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? 0.0 ? ? ? ? 1.5 > 11 ? ? ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ? 1.5 ? ? ? ? NaN > > > > *Error message* > > Error in na.fail.default(list(LengthOfStay.days = c(0, 0, 0, 0, 0, 0, ?: > ?missing values in object, > > I would greatly appreciate any help > > Thanks > > Kevin > > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to-deal-with-missing-values-when-using-Random-Forrest-tp4421254p4421254.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.