Dear all,
I have a question regarding the use of multiple imputation folowed by doing a
PCA
on imputed datasets. As I have a datset with missing values, applying a pca
directly would mean possible deleting some of the variables. Multiple imputation
was done on the original dataset which then calculated the missing values. I
used
first the EM algorithm followed by DA algorithm using the program NORM by
schafer.
I chose to genrate 5 imputed datasets. Each dataset has identical values, except
for the imputed missing values. The use of 5 datasets was to allow missing
values
uncertainty to be considered.
Now the question.
What would be the correct procedure to do a PCA or any other analysis, to
produce
estimates and their standard errors. I have followed the suggestion of schafer
and
decided to do individual PCAs on all 5 datasets. This generates a set of
loadings
and scores for each one. The idea would then be to combine the loadings and
scores
to have an average estimate of both loadings and scores anf their standard
errors.
A quick look at the summary and the scree plot of all 5 analyses show that they
are
generally all the same. Of course the result is different loadings and scores.
Is this the correct why to analyse multiple imputated datsets for PCA? Or should
data from all 5 datsets be averaged into one datset and only one PCA analysis
conducted?
Any comments regarding this would be helpful.
By the way, I used princomp. I did not manage to get any scores data using
prcomp.
Thanks.
Peter
--------------------
Peter Ho
Escola Superior de Biotecnologia
Universidade Cat?lica Portuguesa
Rua Dr. Ant?nio Bernardino de Almeida
4200 Porto
Portugal
Tel: ++351-2-5580043
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._