Eleni Rapsomaniki
2006-Oct-30 08:18 UTC
[R] how to combine imputed data-sets from mice for classfication
Dear R users I want to combine multiply imputed data-sets generated from mice to do classfication. However, I have various questions regarding the use of mice library. For example suppose I want to predict the class in this data.frame: data(nhanes) mydf=nhanes mydf$class="pos" mydf$class[sample(1:nrow(mydf), size=0.5*nrow(mydf))]="neg" mydf$class=factor(mydf$class) First I impute: imp=mice(mydf) I want to use randomForest to do my analysis, not the inbuilt glm.mids functions. In a previous post it was suggested to substitute the call to (g)lm.mids for the analysis one needs to perform: (from http://tolstoy.newcastle.edu.au/R/help/06/03/22295.html) analyses <- as.list(1:data$m) for (i in 1:data$m) { data.i <- complete(data, i) analyses[[i]] <- lm(formula, data = data.i, ...) } Is the idea that then I should just combine the results(predictions) of randomForest from all 5 data-sets? In that case what does the pool function do? Do I need to use it? Also, if I was to use glm.mids for my predictions I get an error:> imp.fit=glm.mids(class ~., data=imp)Error: NA/NaN/Inf in foreign function call (arg 4) In addition: Warning messages: 1: - not meaningful for factors in: Ops.factor(y, mu) 2: - not meaningful for factors in: Ops.factor(eta, offset) 3: - not meaningful for factors in: Ops.factor(y, mu) But this works:> imp.fit=glm.mids((class=="pos") ~., data=imp)In this case I don't know how to interpret the result.. I would appreciate any suggestions on these. Many Thanks Eleni Rapsomaniki