thr3ads.net - R help - [R] Fitting Mixture distributions [Sep 2016]

If this information is useful, please help other people find it:
Share via:

Aanchal Sharma

2016-Sep-07 22:51 UTC

[R] Fitting Mixture distributions

Hi Simon

I am facing same problem as described above. i am trying to fit gaussian 
mixture model to my data using normalmixEM. I am running a Rscript which 
has this function running as part of it for about 17000 datasets (in loop). 
The script runs fine for some datasets, but it terminates when it 
encounters one dataset with the following error:

Error in normalmixEM(expr_glm_residuals, lambda = c(0.75, 0.25), k = 2,  : 
  Too many tries!

(command used: expr_mix_gau <- normalmixEM(expr_glm_residuals, lambda = 
c(0.75,0.25), k = 2, epsilon = 1e-08, maxit = 10000, maxrestarts=200, verb 
= TRUE))
(expr_glm_residuals is my dataset which has residual values for different 
samples)

It is suggested that one should define the mu and sigma in the command by 
looking at your dataset. But in my case there are many datasets and it will 
keep on changing every time. please suggest what can I do to resolve this 
issue.

Regards
Anchal

On Tuesday, 16 July 2013 17:53:09 UTC-4, Simon Zehnder
wrote:>
> Hi Tjun Kiat Teo, 
>
> you try to fit a Normal mixture to some data. The Normal mixture is very 
> delicate when it comes to parameter search: If the variance gets closer and
> closer to zero, the log Likelihood becomes larger and larger for any values
> of the remaining parameters. Furthermore for the EM algorithm it is known, 
> that it takes sometimes very long until convergence is reached. 
>
> Try the following: 
>
> Use as starting values for the component parameters: 
>
> start.par <- mean(your.data, na.rm = TRUE) + sd(your.data, na.rm = TRUE)
*
> runif(K) 
>
> For the weights just use either 1/K or the R cluster function with K 
> clusters 
>
> Here K is the number of components. Further enlarge the maximum number of 
> iterations. What you could also try is to randomize start parameters and 
> run an SEM (Stochastic EM). In my opinion the better method is in this case
> a Bayesian method: MCMC. 
>
>
> Best 
>
> Simon 
>
>
> On Jul 16, 2013, at 10:59 PM, Tjun Kiat Teo <teot... at gmail.com 
> <javascript:>> wrote: 
>
> > I was trying to use the normixEM in mixtools and I got this error 
> message. 
> > 
> > And I got this error message 
> > 
> > One of the variances is going to zero;  trying new starting values. 
> > Error in normalmixEM(as.matrix(temp[[gc]][, -(f + 1)])) : Too many 
> tries! 
> > 
> > Are there any other packages for fitting mixture distributions  ? 
> > 
> > 
> > Tjun Kiat Teo 
> > 
> >         [[alternative HTML version deleted]] 
> > 
> > ______________________________________________ 
> > R-h... at r-project.org <javascript:> mailing list 
> > https://stat.ethz.ch/mailman/listinfo/r-help 
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html 
> > and provide commented, minimal, self-contained, reproducible code. 
>
> ______________________________________________ 
> R-h... at r-project.org <javascript:> mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code. 
>

Bert Gunter

2016-Sep-08 06:47 UTC

head link

[R] Fitting Mixture distributions

"please suggest what can I do to resolve this
issue."

Fitting normal mixtures can be difficult, and sometime the
optimization algorithm (EM) will get stuck with very slow convergence.
Presumably there are options in the package to either increase the max
number of steps before giving up or make the convergence criteria less
sensitive. The former will increase the run time and the latter will
reduce the optimality (possibly leaving you farther from the true
optimum). So you should look into changing these as you think
appropriate.

Cheers,
Bert




Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Sep 7, 2016 at 3:51 PM, Aanchal Sharma
<aanchalsharma833 at gmail.com> wrote:> Hi Simon
>
> I am facing same problem as described above. i am trying to fit gaussian
> mixture model to my data using normalmixEM. I am running a Rscript which
> has this function running as part of it for about 17000 datasets (in loop).
> The script runs fine for some datasets, but it terminates when it
> encounters one dataset with the following error:
>
> Error in normalmixEM(expr_glm_residuals, lambda = c(0.75, 0.25), k = 2,  :
>   Too many tries!
>
> (command used: expr_mix_gau <- normalmixEM(expr_glm_residuals, lambda
> c(0.75,0.25), k = 2, epsilon = 1e-08, maxit = 10000, maxrestarts=200, verb
> = TRUE))
> (expr_glm_residuals is my dataset which has residual values for different
> samples)
>
> It is suggested that one should define the mu and sigma in the command by
> looking at your dataset. But in my case there are many datasets and it will
> keep on changing every time. please suggest what can I do to resolve this
> issue.
>
> Regards
> Anchal
>
> On Tuesday, 16 July 2013 17:53:09 UTC-4, Simon Zehnder wrote:
>>
>> Hi Tjun Kiat Teo,
>>
>> you try to fit a Normal mixture to some data. The Normal mixture is
very
>> delicate when it comes to parameter search: If the variance gets closer
and
>> closer to zero, the log Likelihood becomes larger and larger for any
values
>> of the remaining parameters. Furthermore for the EM algorithm it is
known,
>> that it takes sometimes very long until convergence is reached.
>>
>> Try the following:
>>
>> Use as starting values for the component parameters:
>>
>> start.par <- mean(your.data, na.rm = TRUE) + sd(your.data, na.rm =
TRUE) *
>> runif(K)
>>
>> For the weights just use either 1/K or the R cluster function with K
>> clusters
>>
>> Here K is the number of components. Further enlarge the maximum number
of
>> iterations. What you could also try is to randomize start parameters
and
>> run an SEM (Stochastic EM). In my opinion the better method is in this
case
>> a Bayesian method: MCMC.
>>
>>
>> Best
>>
>> Simon
>>
>>
>> On Jul 16, 2013, at 10:59 PM, Tjun Kiat Teo <teot... at gmail.com
>> <javascript:>> wrote:
>>
>> > I was trying to use the normixEM in mixtools and I got this error
>> message.
>> >
>> > And I got this error message
>> >
>> > One of the variances is going to zero;  trying new starting
values.
>> > Error in normalmixEM(as.matrix(temp[[gc]][, -(f + 1)])) : Too many
>> tries!
>> >
>> > Are there any other packages for fitting mixture distributions  ?
>> >
>> >
>> > Tjun Kiat Teo
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-h... at r-project.org <javascript:> mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-h... at r-project.org <javascript:> mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Martin Maechler

2016-Sep-08 12:38 UTC

head link

[R] Fitting Mixture distributions

>>>>> Bert Gunter <bgunter.4567 at gmail.com>
>>>>>     on Wed, 7 Sep 2016 23:47:40 -0700 writes:
    > "please suggest what can I do to resolve this
    > issue."

    > Fitting normal mixtures can be difficult, and sometime the
    > optimization algorithm (EM) will get stuck with very slow convergence.
    > Presumably there are options in the package to either increase the max
    > number of steps before giving up or make the convergence criteria less
    > sensitive. The former will increase the run time and the latter will
    > reduce the optimality (possibly leaving you farther from the true
    > optimum). So you should look into changing these as you think
    > appropriate.

I'm jumping in late, without having read everything preceding.

One of the last messages seemed to indicate that you are looking
at mixtures of *one*-dimensional gaussians.

If this is the case, I strongly recommend looking at (my) CRAN
package 'nor1mix' (the "1" is for "*one*-dimensional).

For a while now that small package is providing an alternative
to the EM, namely direct MLE, simply using optim(<likelihood>) where the
likelihood uses a somewhat smart parametrization.

Of course, *as the EM*, this also depends on the starting value,
but my (limited) experience has been that
  nor1mix::norMixMLE()
works considerably faster and more reliable than the EM (which I
also provide as    nor1mix::norMixEM() .

Apropos 'starting value': The help page shows how to use
kmeans() for "somewhat" reliable starts; alternatively, I'd
recommend using cluster::pam() to get a start there.

I'm glad to hear about experiences using these / comparing
these with other approaches.

Martin


--
Martin Maechler,
ETH Zurich


    > On Wed, Sep 7, 2016 at 3:51 PM, Aanchal Sharma
    > <aanchalsharma833 at gmail.com> wrote:
    >> Hi Simon
    >> 
    >> I am facing same problem as described above. i am trying to fit
gaussian
    >> mixture model to my data using normalmixEM. I am running a Rscript
which
    >> has this function running as part of it for about 17000 datasets
(in loop).
    >> The script runs fine for some datasets, but it terminates when it
    >> encounters one dataset with the following error:
    >> 
    >> Error in normalmixEM(expr_glm_residuals, lambda = c(0.75, 0.25), k
= 2,  :
    >> Too many tries!
    >> 
    >> (command used: expr_mix_gau <- normalmixEM(expr_glm_residuals,
lambda     >> c(0.75,0.25), k = 2, epsilon = 1e-08, maxit = 10000,
maxrestarts=200, verb
    >> = TRUE))
    >> (expr_glm_residuals is my dataset which has residual values for
different
    >> samples)
    >> 
    >> It is suggested that one should define the mu and sigma in the
command by
    >> looking at your dataset. But in my case there are many datasets and
it will
    >> keep on changing every time. please suggest what can I do to
resolve this
    >> issue.
    >> 
    >> Regards
    >> Anchal
    >> 
    >> On Tuesday, 16 July 2013 17:53:09 UTC-4, Simon Zehnder wrote:
    >>> 
    >>> Hi Tjun Kiat Teo,
    >>> 
    >>> you try to fit a Normal mixture to some data. The Normal
mixture is very
    >>> delicate when it comes to parameter search: If the variance
gets closer and
    >>> closer to zero, the log Likelihood becomes larger and larger
for any values
    >>> of the remaining parameters. Furthermore for the EM algorithm
it is known,
    >>> that it takes sometimes very long until convergence is reached.
    >>> 
    >>> Try the following:
    >>> 
    >>> Use as starting values for the component parameters:
    >>> 
    >>> start.par <- mean(your.data, na.rm = TRUE) + sd(your.data,
na.rm = TRUE) *
    >>> runif(K)
    >>> 
    >>> For the weights just use either 1/K or the R cluster function
with K
    >>> clusters
    >>> 
    >>> Here K is the number of components. Further enlarge the maximum
number of
    >>> iterations. What you could also try is to randomize start
parameters and
    >>> run an SEM (Stochastic EM). In my opinion the better method is
in this case
    >>> a Bayesian method: MCMC.
    >>> 
    >>> 
    >>> Best
    >>> 
    >>> Simon
    >>> 
    >>> 
    >>> On Jul 16, 2013, at 10:59 PM, Tjun Kiat Teo <teot... at
gmail.com
    >>> <javascript:>> wrote:
    >>> 
    >>> > I was trying to use the normixEM in mixtools and I got
this error
    >>> message.
    >>> >
    >>> > And I got this error message
    >>> >
    >>> > One of the variances is going to zero;  trying new
starting values.
    >>> > Error in normalmixEM(as.matrix(temp[[gc]][, -(f + 1)])) :
Too many
    >>> tries!
    >>> >
    >>> > Are there any other packages for fitting mixture
distributions  ?
    >>> >
    >>> >
    >>> > Tjun Kiat Teo
    >>> >
    >>> >         [[alternative HTML version deleted]]
    >>> >
    >>> > ______________________________________________
    >>> > R-h... at r-project.org <javascript:> mailing list
    >>> > https://stat.ethz.ch/mailman/listinfo/r-help
    >>> > PLEASE do read the posting guide
    >>> http://www.R-project.org/posting-guide.html
    >>> > and provide commented, minimal, self-contained,
reproducible code.
    >>> 
    >>> ______________________________________________
    >>> R-h... at r-project.org <javascript:> mailing list
    >>> https://stat.ethz.ch/mailman/listinfo/r-help
    >>> PLEASE do read the posting guide
    >>> http://www.R-project.org/posting-guide.html
    >>> and provide commented, minimal, self-contained, reproducible
code.
    >>> 
    >> ______________________________________________
    >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
    >> https://stat.ethz.ch/mailman/listinfo/r-help
    >> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
    >> and provide commented, minimal, self-contained, reproducible code.

    > ______________________________________________
    > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.

R help - Sep 2016 - Fitting Mixture distributions

[R] Fitting Mixture distributions

[R] Fitting Mixture distributions

[R] Fitting Mixture distributions