thr3ads.net - R help - [R] normal mixture EM not working? [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Stat Tistician

2013-Mar-30 10:20 UTC

[R] normal mixture EM not working?

Hi,
I am currently working on fitting a mixture density to financial data.

I have the following data:
http://s000.tinyupload.com/?file_id=00083355432555420222

I want to fit a mixture density of two normal distributions.

I have the formula:
f(l)=πϕ(l;μ1,σ21)+(1−π)ϕ(l;μ2,σ22)

my R code is:

normalmix<-normalmixEM(dat,k=2,fast=TRUE)

pi<-normalmix$lambda[1]
    mu1<-normalmix$mu[1]
mu2<-normalmix$mu[2]
    sigma1<-normalmix$sigma[1]
sigma2<-normalmix$sigma[2]

Now I have the problem, that the output is not consistent, i.e. every time
I run the code, I get different outputs! And they are very different, no
small differences, which could be due to the precision of the numerical
procedures.

E.g. sometimes for pi I get

[1] 0.2653939

or

[1] 0.3318069

I already recognized, that sometimes the numbering is changed, so the pi of
0.7 would be equal to a pi of 0.3. Okay, I got this, I don't know why the R
procedures does this, but this would not be a problem. But the problem is,
that the outputs are way to different, sometimes I even get an error
message (german): Fehler in while (dl > eps && iter < maxit) { :
Fehlender
Wert, wo TRUE/FALSE nötig ist

Also, the number of iterations is very different, from 29 up to 1000 ......

Anyone can help? Thanks a lot!

	[[alternative HTML version deleted]]

David Winsemius

2013-Mar-30 16:31 UTC

head link

[R] normal mixture EM not working?

Dear Stat Tistician;

Have you looked at the original distribution graphically? When I use densityplot
on that data I am unable to discern even a hint of two separate peaks.
Furthermore the density looks peaked versus what I would expect from a
"true" normal distribution with those empiric parameters. So I think
the algorithm is attempting to estimate the proportions and variances of two
distributions with equal means and different standard deviations. That would
seem to be a particularly difficult task for a non-deterministic algorithm as
this one clearly is. Under these circumstances I thought you might get better
results if you constrained the means to be identically zero.

On the other hand my experiments with mixtools::normalmixEM on the clearly
separable peaks in the faithful$waiting vector used in that package's help
page also gave widely divergent results on estimated mixing proportions as well
(even with arbvar=FALSE), so I am now very suspicious about the stability of
either the method or possibly the implementation of that method for data of this
size:

data(faithful)> attach(faithful)
> length(waiting)
[1] 272> system.time(out<-normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03))
> system.time(out2<-normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03))
> out$lambda
[1] 0.4548029 0.5451971> out2$lambda
[1] 0.998016838 0.001983162> system.time(out3<-normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03))number of iterations= 5 
   user  system elapsed 
  0.019   0.001   0.021 > out3$lambda[1] 0.3609454 0.6390546

I think I could do a lot better just "drawing a line by eyeball" 
between the two peaks:
> densityplot(waiting)
> sum(waiting>68)
[1] 171> sum(waiting<=68)[1] 101

This makes the observation of implausibly high variation in estimated mixing
proportions more reproducible (at least if you are on a Mac):
> set.seed(123)
> out1 <- normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03)
number of iterations= 8 > out1$lambda
[1] 0.3608581 0.6391419> out2 <- normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03)
number of iterations= 3 > out2$lambda[1] 0.08257889 0.91742111

(I'm copying the package maintainer, Derek Young .)
>  sessionInfo()R version 3.0.0 beta (2013-03-22 r62364)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grDevices datasets  splines   graphics  utils     stats     methods   base

other attached packages:
 [1] mixtools_0.4.6    segmented_0.2-9.4 boot_1.3-9        car_2.0-16       
nnet_7.3-6        MASS_7.3-26
 [7] data.table_1.8.8  reshape2_1.2.2    rms_3.6-3         Hmisc_3.10-1     
survival_2.37-4   sos_1.3-5
[13] brew_1.0-6        lattice_0.20-14  

loaded via a namespace (and not attached):
 [1] cluster_1.14.3     colorspace_1.2-1   dichromat_2.0-0    digest_0.6.3      
ggplot2_0.9.3.1
 [6] grid_3.0.0         gtable_0.1.2       labeling_0.1       munsell_0.4       
plyr_1.8
[11] proto_0.3-10       RColorBrewer_1.0-5 scales_0.2.3       stringr_0.6.2     
tools_3.0.0

On Mar 30, 2013, at 3:20 AM, Stat Tistician wrote:
> Hi,
> I am currently working on fitting a mixture density to financial data.
> 
> I have the following data:
> http://s000.tinyupload.com/?file_id=00083355432555420222
> 
> I want to fit a mixture density of two normal distributions.
> 
> I have the formula:
> f(l)=??(l;?1,?21)+(1??)?(l;?2,?22)
> 
> my R code is:
> 
> normalmix<-normalmixEM(dat,k=2,fast=TRUE)
> 
> pi<-normalmix$lambda[1]
>    mu1<-normalmix$mu[1]
> mu2<-normalmix$mu[2]
>    sigma1<-normalmix$sigma[1]
> sigma2<-normalmix$sigma[2]
> 
> Now I have the problem, that the output is not consistent, i.e. every time
> I run the code, I get different outputs! And they are very different, no
> small differences, which could be due to the precision of the numerical
> procedures.
> 
> E.g. sometimes for pi I get
> 
> [1] 0.2653939
> 
> or
> 
> [1] 0.3318069
> 
> I already recognized, that sometimes the numbering is changed, so the pi of
> 0.7 would be equal to a pi of 0.3. Okay, I got this, I don't know why
the R
> procedures does this, but this would not be a problem. But the problem is,
> that the outputs are way to different, sometimes I even get an error
> message (german): Fehler in while (dl > eps && iter < maxit)
{ : Fehlender
> Wert, wo TRUE/FALSE n?tig ist
> 
> Also, the number of iterations is very different, from 29 up to 1000 ......
> 
> Anyone can help? Thanks a lot!
> 
> 	[[alternative HTML version deleted]]
> 
David Winsemius
Alameda, CA, USA

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Mar 2013 - normal mixture EM not working?

[R] normal mixture EM not working?

[R] normal mixture EM not working?

Seemingly Similar Threads