Dear all,
I am trying to impute data for a range of variables in my data set, of which
unfortunately most variables have missing values, and some have quite a few.
So I set up the predictor matrix to exclude certain variables (setting the
relevant elements to zero) and then I run the imputation. This works fine if
I use predictive mean matching for the continous variables in the data set.
When I resort to "norm" instead of pmm, the results look generally
fine as
well. However, for one variable I get some huge out of range values. Here
are summary statistics before and after imputation:
> summary(aux$emitters) #original data
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.00219 2.10200 7.33800 17.87000 23.15000 136.20000 52.00000
> summary(complete(imp2)$emitters) #imputation 1
Min. 1st Qu. Median Mean 3rd Qu. Max.
-68.920 2.062 10.000 19.980 32.980 136.200
> summary(complete(imp2,2)$emitters) #imputation 2 (looks better)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-30.650 1.848 8.808 20.480 32.980 136.200
etc.
Now my question is, in such cases, would it be better to use pmm for this
variable instead, or should I instead use the squeeze() function in MICE? I
read a paper explaining MICE:
http://www.stefvanbuuren.nl/publications/MICE%20in%20R%20-%20Draft.pdf, but
I am still unsure how to proceed. I would be really grateful for some
advise, thanks!
--
View this message in context:
http://r.789695.n4.nabble.com/Multiple-imputation-using-mice-tp4452986p4452986.html
Sent from the R help mailing list archive at Nabble.com.