thr3ads.net - R help - [R] missing values imputation [May 2004]

If this information is useful, please help other people find it:
Share via:

Anne

2004-May-12 14:42 UTC

[R] missing values imputation

What R functionnalities are there to do missing values imputation (substantial
proportion of missing data)?
I would prefer to use maximum likelihood methods ; is the EM algorithm
implemented? in which package?


Thanks

Anne


----------------------------------------------------
Anne Piotet
Tel: +41 79 359 83 32 (mobile)
Email: anne.piotet@m-td.com
---------------------------------------------------
M-TD Modelling and Technology Development
PSE-C
CH-1015 Lausanne
Switzerland
Tel: +41 21 693 83 98
Fax: +41 21 646 41 33
--------------------------------------------------
 
	[[alternative HTML version deleted]]

Rolf Turner

2004-May-12 15:00 UTC

head link

[R] missing values imputation

Anne Piotet wrote:
> What R functionnalities are there to do missing values imputation
> (substantial proportion of missing data)?  I would prefer to use
> maximum likelihood methods ; is the EM algorithm implemented? in
> which package?
	The so-called ``EM algorithm'' is ***NOT*** an
	algorithm.  It is a methodology or a unifying concept.
	It would be impossible to ``implement'' it.  (Except
	possibly by means of some extremely advanced and
	sophisticated Artificial Intelligence software.)

				cheers,

					Rolf Turner
					rolf at math.unb.ca

(Ted Harding)

2004-May-12 16:44 UTC

head link

[R] missing values imputation

On 12-May-04 Anne wrote:>  What R functionnalities are there to do missing values imputation
> (substantial proportion of missing data)? 
> I would prefer to use maximum likelihood methods ; is the EM algorithm
> implemented? in which package?
Hi Anne,
R already has packages/libraries called "cat", "norm" and
"mix" which,
while they are not part of the standard installation, can be readily
downloaded and installed from any CRAN website -- see under "contributed
sources".

These implement in R Schafer's S code for what he calls "CAT",
"NORM"
and "MIX". These are for imputing missing data where the data are
respectively entirely categorical, entirley continous ("norm" operates
on the basis that the data are a sample from a multivariate normal
distribution) and a mixture of both (some variables categorical, some
continuous). All include routines for multiple imputation, and for
extracting appropriate information about the parameters from the
imputations.

Schafer also has an S function "PAN" which imoputes missing values
from "panel" data. I don;t think this has been implemented for R yet.

There is one type of data which also, I think, has nothing implemented
for R (and I have not heard of a specially written routine for S-plus
either). This is so-called "semi-continuous" data -- where the value
of a variable may either be "continuous" or else take a specific
value (typically zero). E.g. "How much did you spend on alcohol last
week?" -- answer may be a positive amount, maybe log-normally distributed,
or else zero. You can approach data of this kind with missing values
by combining "cat" and "norm", but it's tricky and may
not correspond
to a valid model.

All of Schafer's methods use maximum-likelihood estimation of the
parameters for the first phase of the imputation, using the EM algorithm
(and I'll respond to Rolf Turner's comments shortly).

After that, you can make a simple imputation by sampling from the
distribution thus estimated, or in a more general and indeed sounder
way, first sample from the posterior parameter distribution, sample
imputed values from the resulting distribution, and then repeat
sampling from parameters and resulting distributions to build up
an array of datasets with the missing data filled in by multiple
imputation.

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 12-May-04                                       Time: 17:44:50
------------------------------ XFMail ------------------------------

Liaw, Andy

2004-May-12 16:55 UTC

head link

[R] missing values imputation

> From: Rolf Turner
> 
> Anne Piotet wrote:
> 
> > What R functionnalities are there to do missing values imputation
> > (substantial proportion of missing data)?  I would prefer to use
> > maximum likelihood methods ; is the EM algorithm implemented? in
> > which package?
> 
> 	The so-called ``EM algorithm'' is ***NOT*** an
> 	algorithm.  It is a methodology or a unifying concept.
> 	It would be impossible to ``implement'' it.  (Except
> 	possibly by means of some extremely advanced and
> 	sophisticated Artificial Intelligence software.)
Yes, but EM for missing value imputation is a bit narrower, I guess.  At
least the `norm' package on CRAN has em.norm() for multivariate gaussian...

Andy

 > 				cheers,
> 
> 					Rolf Turner
> 					rolf at math.unb.ca
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>

Rolf Turner

2004-May-12 19:38 UTC

head link

[R] missing values imputation

Thanks  Brian.

The EM algorithm requires an ``E'' step and an ``M'' step. 
Harding
and Rossini appear to be seriously suggesting that an R function
could be written which would

	(a) Perform the E step in arbitrary contexts, and
	(b) For that given expected value, work out a procedure
	    to effect its maximization.

Or maybe they're not serious.

For the M step (b) general numerical optimization would theoretically
do the trick.  (But would be fraught with peril.)  For the E step
(a), forget it.

The point is, the EM ``algorithm'' is NOT an algorithm which could be
effected by an R function.  This is in complete contrast with
integrate() --- it's there; the code is written.  Hand integrate() an
integration problem, and it'll do it.  One of the differences is that
the input to an itegration problem is clearly defined and readily
specifiable as an R function.  The input to a general missing values
problem is amorphous.

Arguing about what constitutes an algorithm according to some
abstract definition is mindless.  If you define ``algorithm'' to suit
yourself, then the EM algorithm is an algorithm; otherwise not.

The original questioner wanted an R function to effect the EM
algorithm.  My point was that this is a silly request because such a
function would be impossible to write.

Call the EM algorithm an algorithm if it makes you happy.  But
remember that by doing so you'll mislead the naive inquirer
who will expect there to be a real live implementation of that
algorithm.  In computer (R) code.  Like integrate().

If you can write an R function to effect the EM ``algorithm'' --- in
general, not just in a special case --- you'll win the Chambers Prize
in computing and a few other things as well.


				cheers,

					Rolf Turner
					rolf at math.unb.ca

Maybe Matching Threads

Search for more seemingly similar threads

R help - May 2004 - missing values imputation

[R] missing values imputation

[R] missing values imputation

[R] missing values imputation

[R] missing values imputation

[R] missing values imputation

Maybe Matching Threads