Hi Jan-Paul,
You definitely want to be careful with na.omit in randomForest -- that
wipes out any row with even one NA. If NAs are sprawled throughout your
dataset, na.omit might end up killing a lot of rows. Here's my usual MO
for missing values:
1) "impute" in Hmisc fills in gaps with the mean, median, most common
value, etc.
2) rfImpute: fits a forest on the rows available and uses it to predict
the missing values.
3) aregImpute: similar to rfImpute, but using a linear model.
4) You may want to consider using a single tree ("rpart" package) in
this case instead of a forest. Single trees deal with missing values
cleanly through surrogate splits.
Good luck!
Kevin
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Uwe Ligges
Sent: Sunday, September 11, 2005 3:44 AM
To: Jan-Paul Roodbol
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] [handling] Missing [values in randomForest]
Jan-Paul Roodbol wrote:
> Does anyone know if randomForest in R can handle
> dataset with missings?
See ?randomForest, you can omit observations including NAs by specifying
na.action=na.omit
Please do not cross-post!
Please specify a sensible subject!
Uwe Ligges
> Thank you
>
> Kind regards
>
> Jan-Paul
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html