thr3ads.net - R help - [R] EM for missing data [Jul 2012]

If this information is useful, please help other people find it:
Share via:

2012-Jul-21 11:55 UTC

[R] EM for missing data

Hi list,

I am wondering if there is a way to use EM algorithm to handle missing data and
get a completed data set in R?

I usually do it in SPSS because EM in SPSS kind of "fill in" the
estimated value for the missing data, and then the completed dataset can be
saved and used for further analysis. But I have not found a way to get the a
completed data set like this in R or SAS. With Amelia or MICE, the missing data
set were imputed a couple of times, and the new imputed datasets were not
combined. I understand that the parameter estimation can still be done in the
way of combination of estimates from each imputed data set, but it would be more
convenient to have a combined dataset to do some analysis, for example, ANOVA
with IVs having more than two categories. In this case, the only way to get the
main effect of the whole IV is to estimate parameters in a single data set(as
far as I know). If the separated imputed data sets were used, then the main
effect showed in the result were for each category of the IV, respectively. I
figured sometimes the readers and reviewers would like to see how big the effect
for the whole IV instead of the effect of each category of that IV.

This is one of the reasons I can not fully move to R from SPSS. So any
suggestions?

Thank you very much.




ya
	[[alternative HTML version deleted]]

Tal Galili

2012-Jul-21 15:45 UTC

head link

[R] EM for missing data

Hello Ya.

I am no expert, so I am eager to read suggestions from other people in the
mailing list.  But just a few pointers I am (somewhat) sure of -

You can try using this package:
http://cran.r-project.org/web/packages/imputation/imputation.pdf
And use something like kNNImpute.  KNN solving is a type of EM.

In any event, an imputation based on EM is also based on some assumption of
the underlying distribution of the data (observable and missing).  From
what I see here:
http://www.youtube.com/watch?v=xEkJxl6mmQ0
It seems that the EM of SPSS often assumed a (multi?!) normal distribution
of the data.  Which is a stronger assumption than what knn will use.  Also
the function I linked to has a CV option to check how stable the imputation
process is.

If you are looking for more options just google R+imputation.  There are
numerous packages and functions for this.

Good luck,
Tal





----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------




On Sat, Jul 21, 2012 at 2:55 PM, ya <xinxi813@163.com> wrote:
> Hi list,
>
> I am wondering if there is a way to use EM algorithm to handle missing
> data and get a completed data set in R?
>
> I usually do it in SPSS because EM in SPSS kind of "fill in" the
estimated
> value for the missing data, and then the completed dataset can be saved and
> used for further analysis. But I have not found a way to get the a
> completed data set like this in R or SAS. With Amelia or MICE, the missing
> data set were imputed a couple of times, and the new imputed datasets were
> not combined. I understand that the parameter estimation can still be done
> in the way of combination of estimates from each imputed data set, but it
> would be more convenient to have a combined dataset to do some analysis,
> for example, ANOVA with IVs having more than two categories. In this case,
> the only way to get the main effect of the whole IV is to estimate
> parameters in a single data set(as far as I know). If the separated imputed
> data sets were used, then the main effect showed in the result were for
> each category of the IV, respectively. I figured sometimes the readers and
> reviewers would like to see how bi!
>  g the effect for the whole IV instead of the effect of each category of
> that IV.
>
> This is one of the reasons I can not fully move to R from SPSS. So any
> suggestions?
>
> Thank you very much.
>
>
>
>
> ya
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Greg Snow

2012-Jul-21 20:35 UTC

head link

[R] EM for missing data

The EM algorithm does not impute missing data, rather it estimates
parameters when you have missing data (those parameters can then be
used to impute the missing values, but that is separate from the EM
algorithm).

If you create a dataset that has missing values imputed (a single
time) and then analyze that dataset as if there were no missing data
then your results will be wrong.  The better approach is multiple
imputation (and there are packages including MICE to do this) where
more than one new dataset is imputed (including error on the imputed
missing values), then each of the imputed datasets is analyzed (don't
look at the results yet, they are still each wrong), then the analyses
are combined to give a correct answer (well as correct as any
statistical procedure is, approximate is probably the better term).
Though this of course is assuming that your assumptions are
reasonable.

If SPSS really gives you a single imputed dataset after running EM for
you to analyze using other tools then my opinion of SPSS will go down.
 The reason that you probably have not found a way to do this in SAS
or R is because they are useful tools that try to not make it easy to
do the wrong thing.

On Sat, Jul 21, 2012 at 5:55 AM, ya <xinxi813 at 163.com>
wrote:> Hi list,
>
> I am wondering if there is a way to use EM algorithm to handle missing data
and get a completed data set in R?
>
> I usually do it in SPSS because EM in SPSS kind of "fill in" the
estimated value for the missing data, and then the completed dataset can be
saved and used for further analysis. But I have not found a way to get the a
completed data set like this in R or SAS. With Amelia or MICE, the missing data
set were imputed a couple of times, and the new imputed datasets were not
combined. I understand that the parameter estimation can still be done in the
way of combination of estimates from each imputed data set, but it would be more
convenient to have a combined dataset to do some analysis, for example, ANOVA
with IVs having more than two categories. In this case, the only way to get the
main effect of the whole IV is to estimate parameters in a single data set(as
far as I know). If the separated imputed data sets were used, then the main
effect showed in the result were for each category of the IV, respectively. I
figured sometimes the readers and reviewers would like to see how bi!
>  g the effect for the whole IV instead of the effect of each category of
that IV.
>
> This is one of the reasons I can not fully move to R from SPSS. So any
suggestions?
>
> Thank you very much.
>
>
>
>
> ya
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com

Maybe Matching Threads

Search for more reasonably related threads

R help - Jul 2012 - EM for missing data

[R] EM for missing data

[R] EM for missing data

[R] EM for missing data

Maybe Matching Threads