Conrad Zygmont
2012-Aug-20 14:19 UTC
[R] Combining imputed datasets for analysis using Factor Analysis
Dear R users and developers,
I have a dataset containing 34 variables measured in a survey, which has
some missing items. I would like to conduct a factor analysis of this
data. I tested mi, Amelia, and MissForest as alternative packages in
order to impute the missing data. I now have 5 separate datasets with
the variables I am interested in factor analysing. In my reading of the
package help files, various articles and books I have come across a
number of suggestions for combining analyses (mostly regression or other
linear models) using Rubin's (1987) rules.
However, I am not sure how I should proceed in the case of factor
analysis. Should I calculate the covariance matrix or correlation matrix
for my dataset, combine these estimates and then perform a factor
analysis. Or should I conduct a FA of each complete imputed dataset and
then combine the results (say eigenvalues or fit statistics)? Could
anyone guide me to literature (if possible, not overly technical) that
would guide me in this regard? Or provide an example of a script that
would help me achieve this?
Your assistance and time is much appreciated.
Kind Regards,
Conrad Zygmont
Psychology Department
Helderberg College
South Africa
Additional info:
R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"
Running on Linux version 3.3.8-gentoo (root at PsychStat) (gcc version
4.5.3 (Gentoo 4.5.3-r2 p1.5, pie-0.4.7) )
Script for multiple imputation:
> var.info <- mi.info(LRN)
> var.info
> var.info <- update(var.info, "type", list("LRN1" =
"ordered-categorical", "LRN2" =
"ordered-categorical", "LRN3" =
"ordered-categorical", "LRN4" =
"ordered-categorical", "LRN5" =
"ordered-categorical", "LRN6" =
"ordered-categorical", "LRN7" =
"ordered-categorical", "LRN8" =
"ordered-categorical", "LRN9" =
"ordered-categorical", "LRN10" =
"ordered-categorical", "LRN11" =
"ordered-categorical", "LRN12" =
"ordered-categorical", "LRN13" =
"ordered-categorical", "LRN14" =
"ordered-categorical", "LRN15" =
"ordered-categorical", "LRN16" =
"ordered-categorical", "LRN17" =
"ordered-categorical", "LRN18" =
"ordered-categorical", "LRN19" =
"ordered-categorical", "LRN20" =
"ordered-categorical", "LRN21" =
"ordered-categorical", "LRN22" =
"ordered-categorical", "LRN23" =
"ordered-categorical", "LRN24" =
"ordered-categorical", "LRN25" =
"ordered-categorical", "LRN26" =
"ordered-categorical", "LRN27" =
"ordered-categorical", "LRN28" =
"ordered-categorical", "LRN29" =
"ordered-categorical", "LRN30" =
"ordered-categorical", "LRN31" =
"ordered-categorical", "LRN32" =
"ordered-categorical", "LRN33" =
"ordered-categorical", "LRN34" =
"ordered-categorical"))
> prepared.data <- mi.preprocess(SOC, info = var.info)
> prepared.data <- mi.preprocess(LRN, info = var.info)
> ImpLRN <- mi(prepared.data, n.imp = 5, n.iter = 50,
check.coef.convergence = TRUE, add.noise = noise.control(post.run.iter =
30))
> LRN.imputed <- mi.completed(ImpLRN)
> LRN.first <- mi.data.frame(ImpLRN, m=1)
> cov.mat <- polychoric(LRN.first,std.err=TRUE)
... and so on
Jose Iparraguirre
2012-Aug-22 11:02 UTC
[R] Combining imputed datasets for analysis using Factor Analysis
Dear Conrad,
1) Have you tried the missMDA package? It imputes missing values of a dataset to
perform multiple factor analysis, principal components analysis, etc.
2) The EM algorithm can be used for FA even in the presence of missing values.
Two references in this regard:
a) Jamshidian, M. (1997). " An EM Algorithm for ML Factor Analysis with
Missing Data", Lecture Notes in Statistics, V. 120, 247-258
b) Little, R. and Rubin, D. (1987). Statistical Analysis with Missing Data.
John Wiley & Sons. (Chapter 8)
3) A different approach to deal with missing data in FA, based on rescaled
Bartlett-corrected statistics, is proposed by:
Yuan, K-H; Marshall, L.; and Bentler, P. (2002). "A unified approach to
exploratory factor analysis with missing data, nonnormal data, and in the
presence of
outliers",
PSYCHOMETRIKA, V. 67, N. 1, 95-121
Hope this helps.
Jos?
Jos? Iparraguirre
Chief Economist
Age UK
T 020 303 31482
E Jose.Iparraguirre at ageuk.org.uk
Twitter @jose.iparraguirre at ageuk
Tavis House, 1- 6 Tavistock Square
London, WC1H 9NB
www.ageuk.org.uk?| ageukblog.org.uk | @ageukcampaigns
For evidence and statistics on the older population, visit the Age UK Knowledge
Hub
http://www.ageuk.org.uk/professional-resources-home/knowledge-hub-evidence-statistics/
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Conrad Zygmont
Sent: 20 August 2012 15:20
To: r-help at r-project.org
Subject: [R] Combining imputed datasets for analysis using Factor Analysis
Dear R users and developers,
I have a dataset containing 34 variables measured in a survey, which has
some missing items. I would like to conduct a factor analysis of this
data. I tested mi, Amelia, and MissForest as alternative packages in
order to impute the missing data. I now have 5 separate datasets with
the variables I am interested in factor analysing. In my reading of the
package help files, various articles and books I have come across a
number of suggestions for combining analyses (mostly regression or other
linear models) using Rubin's (1987) rules.
However, I am not sure how I should proceed in the case of factor
analysis. Should I calculate the covariance matrix or correlation matrix
for my dataset, combine these estimates and then perform a factor
analysis. Or should I conduct a FA of each complete imputed dataset and
then combine the results (say eigenvalues or fit statistics)? Could
anyone guide me to literature (if possible, not overly technical) that
would guide me in this regard? Or provide an example of a script that
would help me achieve this?
Your assistance and time is much appreciated.
Kind Regards,
Conrad Zygmont
Psychology Department
Helderberg College
South Africa
Additional info:
R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"
Running on Linux version 3.3.8-gentoo (root at PsychStat) (gcc version
4.5.3 (Gentoo 4.5.3-r2 p1.5, pie-0.4.7) )
Script for multiple imputation:
> var.info <- mi.info(LRN)
> var.info
> var.info <- update(var.info, "type", list("LRN1" =
"ordered-categorical", "LRN2" =
"ordered-categorical", "LRN3" =
"ordered-categorical", "LRN4" =
"ordered-categorical", "LRN5" =
"ordered-categorical", "LRN6" =
"ordered-categorical", "LRN7" =
"ordered-categorical", "LRN8" =
"ordered-categorical", "LRN9" =
"ordered-categorical", "LRN10" =
"ordered-categorical", "LRN11" =
"ordered-categorical", "LRN12" =
"ordered-categorical", "LRN13" =
"ordered-categorical", "LRN14" =
"ordered-categorical", "LRN15" =
"ordered-categorical", "LRN16" =
"ordered-categorical", "LRN17" =
"ordered-categorical", "LRN18" =
"ordered-categorical", "LRN19" =
"ordered-categorical", "LRN20" =
"ordered-categorical", "LRN21" =
"ordered-categorical", "LRN22" =
"ordered-categorical", "LRN23" =
"ordered-categorical", "LRN24" =
"ordered-categorical", "LRN25" =
"ordered-categorical", "LRN26" =
"ordered-categorical", "LRN27" =
"ordered-categorical", "LRN28" =
"ordered-categorical", "LRN29" =
"ordered-categorical", "LRN30" =
"ordered-categorical", "LRN31" =
"ordered-categorical", "LRN32" =
"ordered-categorical", "LRN33" =
"ordered-categorical", "LRN34" =
"ordered-categorical"))
> prepared.data <- mi.preprocess(SOC, info = var.info)
> prepared.data <- mi.preprocess(LRN, info = var.info)
> ImpLRN <- mi(prepared.data, n.imp = 5, n.iter = 50,
check.coef.convergence = TRUE, add.noise = noise.control(post.run.iter =
30))
> LRN.imputed <- mi.completed(ImpLRN)
> LRN.first <- mi.data.frame(ImpLRN, m=1)
> cov.mat <- polychoric(LRN.first,std.err=TRUE)
... and so on
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Age UK and YouthNet are official charities for the Virgin London Marathon 2013
We need you to Run for it. Join the team and help raise vital funds to bring
generations together to combat loneliness and isolation.
Go to http://www.runforit.org.uk for more information or contact Helen Parson at
helen.parsons at ageuk.org.uk or on 020 303 31369.
Age UK and YouthNet. A lifeline, online.
www.runforit.org.uk
Age UK Improving later life
www.ageuk.org.uk
-------------------------------
Age UK is a registered charity and company limited by guarantee, (registered
charity number 1128267, registered company number 6825798).
Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA.
For the purposes of promoting Age UK Insurance, Age UK is an Appointed
Representative of Age UK Enterprises Limited, Age UK is an Introducer
Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth
Access for the purposes of introducing potential annuity and health
cash plans customers respectively. Age UK Enterprises Limited, JLT Benefit
Solutions Limited and Simplyhealth Access are all authorised and
regulated by the Financial Services Authority.
------------------------------
This email and any files transmitted with it are confide...{{dropped:28}}