thr3ads.net - R help - [R] mixture univariate distributions fit [Jan 2022]

If this information is useful, please help other people find it:
Share via:

PIKAL Petr

2021-Dec-31 09:48 UTC

[R] mixture univariate distributions fit

Hallo Ivan

Thanks. Yes, this approach seems to be viable. I did not consider using
dnorm in fitting procedure. But as you pointed
> (Some nonlinear least squares problems will be much harder to solve
> though.)
This simple example is quite easy. The more messy are data and the more
distributions are mixed in them the more problematic could be the correct
starting values selection. Errors could be quite common.

x <- (0:200)/100
y1 <- dnorm(x, mean=.3, sd=.1)
y2 <- dnorm(x, mean=.7, sd=.2)
y3 <- dnorm(x, mean=.5, sd=.1)

ymix <- ((y1+2*y2+y3)/max(y1+2*y2+y3))+rnorm(201, sd=.001)
plot(x, ymix)

With just sd1 and sd2 slightly higher, the fit results to
error.> fit <- minpack.lm::nlsLM(+  ymix ~ a1 * dnorm(x, mu1, sd1) + a2 * dnorm(x, mu2, sd2)+
+  a3 * dnorm(x, mu3, sd3),
+  start = c(a1 = 1, mu1 = .3, sd1=.3, a2 = 2, mu2 = .7, sd2 =.3,
+  a3 = 1, mu3 = .5, sd3 = .1),
+  lower = rep(0, 9) # help minpack avoid NaNs
+ )
Error in nlsModel(formula, mf, start, wts) : 
  singular gradient matrix at initial parameter estimates

If sd1 and sd2 are set to lower value, the function is no longer singular
and arrives with result. 

Well, it seems that the  only way how to procced is to code such function by
myself and take care of suitable starting values. 

Best regards.
Petr
> -----Original Message-----
> From: Ivan Krylov <krylov.r00t at gmail.com>
> Sent: Friday, December 31, 2021 9:26 AM
> To: PIKAL Petr <petr.pikal at precheza.cz>
> Cc: r-help mailing list <r-help at r-project.org>
> Subject: Re: [R] mixture univariate distributions fit
> 
> On Fri, 31 Dec 2021 07:59:11 +0000
> PIKAL Petr <petr.pikal at precheza.cz> wrote:
> 
> > x <- (0:100)/100
> > y1 <- dnorm((x, mean=.3, sd=.1)
> > y2 <- dnorm((x, mean=.7, sd=.1)
> > ymix <- ((y1+2*y2)/max(y1+2*y2))
> 
> > My question is if there is some package or function which could get
> > those values ***directly from x and ymix values***, which is
> > basically what is measured in my case.
> 
> Apologies if I'm missing something, but, this being a peak fitting
> problem, shouldn't nls() (or something from the minpack.lm or nlsr
> packages) work for you here?
> 
> minpack.lm::nlsLM(
>  ymix ~ a1 * dnorm(x, mu1, sigma1) + a2 * dnorm(x, mu2, sigma2),
>  start = c(a1 = 1, mu1 = 0, sigma1 = 1, a2 = 1, mu2 = 1, sigma2 = 1),
>  lower = rep(0, 6) # help minpack avoid NaNs
> )
> # Nonlinear regression model
> #  model: ymix ~ a1 * dnorm(x, mu1, sigma1) + a2 * dnorm(x, mu2, sigma2)
> #  data: parent.frame()
> #      a1    mu1 sigma1     a2    mu2 sigma2
> #  0.1253 0.3000 0.1000 0.2506 0.7000 0.1000
> # residual sum-of-squares: 1.289e-31
> #
> # Number of iterations to convergence: 23
> # Achieved convergence tolerance: 1.49e-08
> 
> (Some nonlinear least squares problems will be much harder to solve
> though.)
> 
> --
> Best regards,
> Ivan

Bert Gunter

2021-Dec-31 17:57 UTC

head link

[R] mixture univariate distributions fit

Petr:
Please feel free to ignore and not reply if you think the following
questions are unhelpful.

1. Do you want to know the location of peaks (local modes) or the
parameters of the/a mixture distribution? Peaks do not have to be
located at the modes of the individual components of the mixture.

2. Do you know the number of components in the mixture? This would
simplify the problem (a lot, I believe; though those more
knowledgeable should comment on that).

3. Do you know that the points on the fitted density you get are
obtained as a mixture of normals? Or  at least of symmetric
distributions? ... or whether they are obtained by some sort of
(algorithmic) density estimation procedure?

Best and New Year's greeting to all,
Bert



On Fri, Dec 31, 2021 at 1:49 AM PIKAL Petr <petr.pikal at precheza.cz>
wrote:>
> Hallo Ivan
>
> Thanks. Yes, this approach seems to be viable. I did not consider using
> dnorm in fitting procedure. But as you pointed
>
> > (Some nonlinear least squares problems will be much harder to solve
> > though.)
>
> This simple example is quite easy. The more messy are data and the more
> distributions are mixed in them the more problematic could be the correct
> starting values selection. Errors could be quite common.
>
> x <- (0:200)/100
> y1 <- dnorm(x, mean=.3, sd=.1)
> y2 <- dnorm(x, mean=.7, sd=.2)
> y3 <- dnorm(x, mean=.5, sd=.1)
>
> ymix <- ((y1+2*y2+y3)/max(y1+2*y2+y3))+rnorm(201, sd=.001)
> plot(x, ymix)
>
> With just sd1 and sd2 slightly higher, the fit results to error.
> > fit <- minpack.lm::nlsLM(
> +  ymix ~ a1 * dnorm(x, mu1, sd1) + a2 * dnorm(x, mu2, sd2)+
> +  a3 * dnorm(x, mu3, sd3),
> +  start = c(a1 = 1, mu1 = .3, sd1=.3, a2 = 2, mu2 = .7, sd2 =.3,
> +  a3 = 1, mu3 = .5, sd3 = .1),
> +  lower = rep(0, 9) # help minpack avoid NaNs
> + )
> Error in nlsModel(formula, mf, start, wts) :
>   singular gradient matrix at initial parameter estimates
>
> If sd1 and sd2 are set to lower value, the function is no longer singular
> and arrives with result.
>
> Well, it seems that the  only way how to procced is to code such function
by
> myself and take care of suitable starting values.
>
> Best regards.
> Petr
>
> > -----Original Message-----
> > From: Ivan Krylov <krylov.r00t at gmail.com>
> > Sent: Friday, December 31, 2021 9:26 AM
> > To: PIKAL Petr <petr.pikal at precheza.cz>
> > Cc: r-help mailing list <r-help at r-project.org>
> > Subject: Re: [R] mixture univariate distributions fit
> >
> > On Fri, 31 Dec 2021 07:59:11 +0000
> > PIKAL Petr <petr.pikal at precheza.cz> wrote:
> >
> > > x <- (0:100)/100
> > > y1 <- dnorm((x, mean=.3, sd=.1)
> > > y2 <- dnorm((x, mean=.7, sd=.1)
> > > ymix <- ((y1+2*y2)/max(y1+2*y2))
> >
> > > My question is if there is some package or function which could
get
> > > those values ***directly from x and ymix values***, which is
> > > basically what is measured in my case.
> >
> > Apologies if I'm missing something, but, this being a peak fitting
> > problem, shouldn't nls() (or something from the minpack.lm or nlsr
> > packages) work for you here?
> >
> > minpack.lm::nlsLM(
> >  ymix ~ a1 * dnorm(x, mu1, sigma1) + a2 * dnorm(x, mu2, sigma2),
> >  start = c(a1 = 1, mu1 = 0, sigma1 = 1, a2 = 1, mu2 = 1, sigma2 = 1),
> >  lower = rep(0, 6) # help minpack avoid NaNs
> > )
> > # Nonlinear regression model
> > #  model: ymix ~ a1 * dnorm(x, mu1, sigma1) + a2 * dnorm(x, mu2,
sigma2)
> > #  data: parent.frame()
> > #      a1    mu1 sigma1     a2    mu2 sigma2
> > #  0.1253 0.3000 0.1000 0.2506 0.7000 0.1000
> > # residual sum-of-squares: 1.289e-31
> > #
> > # Number of iterations to convergence: 23
> > # Achieved convergence tolerance: 1.49e-08
> >
> > (Some nonlinear least squares problems will be much harder to solve
> > though.)
> >
> > --
> > Best regards,
> > Ivan
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

PIKAL Petr

2022-Jan-03 09:37 UTC

head link

[R] mixture univariate distributions fit

Hallo Bert

The discussion starts to be more off topic here, as you already pointed. There 
probably is not any package (function) in R designed for easy overlapping peak 
(distribution) fitting. With original data one could use mixtools, with 
density or cummulative density values Ivan's suggestion seems to work 
reasonably.

To your questions
1. No, peak location is not known. If I decided to code the function (package) 
myself I would start with plot and user should select possible location by 
locator.

2. No, but one should restrict the number of components to some reasonable 
value.

3. In particle size measurement it is usually lognormal or normal 
distribution, for which the way suggested by Ivan is workable solution. 
However in the other case I have on mind, the function could be more variable 
(Fraser-Suzuki, Cauchy, Pseudo-Voigt, ...) and I would need to program the 
curves myself. Possible way is to make a plot with the starting values and let 
user to change them until the fit is relatively close to measured values.

So unless somebody could point me to an R package for such peak shape mixture 
evaluation I do not consider further discussion necessary. I first need to do 
my homework if I decided to code such function myself.

Thank you again and best regards.
Petr
> -----Original Message-----
> From: Bert Gunter <bgunter.4567 at gmail.com>
> Sent: Friday, December 31, 2021 6:57 PM
> To: PIKAL Petr <petr.pikal at precheza.cz>
> Cc: Ivan Krylov <krylov.r00t at gmail.com>; r-help mailing list
<r-help at r-
> project.org>
> Subject: Re: [R] mixture univariate distributions fit
>
> Petr:
> Please feel free to ignore and not reply if you think the following 
> questions
> are unhelpful.
>
> 1. Do you want to know the location of peaks (local modes) or the
> parameters of the/a mixture distribution? Peaks do not have to be located
at
> the modes of the individual components of the mixture.
>
> 2. Do you know the number of components in the mixture? This would
> simplify the problem (a lot, I believe; though those more knowledgeable
> should comment on that).
>
> 3. Do you know that the points on the fitted density you get are obtained
as
> a mixture of normals? Or  at least of symmetric distributions? ... or 
> whether
> they are obtained by some sort of
> (algorithmic) density estimation procedure?
>
> Best and New Year's greeting to all,
> Bert
>
>
>
> On Fri, Dec 31, 2021 at 1:49 AM PIKAL Petr <petr.pikal at
precheza.cz> wrote:
> >
> > Hallo Ivan
> >
> > Thanks. Yes, this approach seems to be viable. I did not consider
> > using dnorm in fitting procedure. But as you pointed
> >
> > > (Some nonlinear least squares problems will be much harder to
solve
> > > though.)
> >
> > This simple example is quite easy. The more messy are data and the
> > more distributions are mixed in them the more problematic could be the
> > correct starting values selection. Errors could be quite common.
> >
> > x <- (0:200)/100
> > y1 <- dnorm(x, mean=.3, sd=.1)
> > y2 <- dnorm(x, mean=.7, sd=.2)
> > y3 <- dnorm(x, mean=.5, sd=.1)
> >
> > ymix <- ((y1+2*y2+y3)/max(y1+2*y2+y3))+rnorm(201, sd=.001) plot(x,
> > ymix)
> >
> > With just sd1 and sd2 slightly higher, the fit results to error.
> > > fit <- minpack.lm::nlsLM(
> > +  ymix ~ a1 * dnorm(x, mu1, sd1) + a2 * dnorm(x, mu2, sd2)+
> > +  a3 * dnorm(x, mu3, sd3),
> > +  start = c(a1 = 1, mu1 = .3, sd1=.3, a2 = 2, mu2 = .7, sd2 =.3,
> > +  a3 = 1, mu3 = .5, sd3 = .1),
> > +  lower = rep(0, 9) # help minpack avoid NaNs
> > + )
> > Error in nlsModel(formula, mf, start, wts) :
> >   singular gradient matrix at initial parameter estimates
> >
> > If sd1 and sd2 are set to lower value, the function is no longer
> > singular and arrives with result.
> >
> > Well, it seems that the  only way how to procced is to code such
> > function by myself and take care of suitable starting values.
> >
> > Best regards.
> > Petr
> >
> > > -----Original Message-----
> > > From: Ivan Krylov <krylov.r00t at gmail.com>
> > > Sent: Friday, December 31, 2021 9:26 AM
> > > To: PIKAL Petr <petr.pikal at precheza.cz>
> > > Cc: r-help mailing list <r-help at r-project.org>
> > > Subject: Re: [R] mixture univariate distributions fit
> > >
> > > On Fri, 31 Dec 2021 07:59:11 +0000
> > > PIKAL Petr <petr.pikal at precheza.cz> wrote:
> > >
> > > > x <- (0:100)/100
> > > > y1 <- dnorm((x, mean=.3, sd=.1)
> > > > y2 <- dnorm((x, mean=.7, sd=.1)
> > > > ymix <- ((y1+2*y2)/max(y1+2*y2))
> > >
> > > > My question is if there is some package or function which
could
> > > > get those values ***directly from x and ymix values***,
which is
> > > > basically what is measured in my case.
> > >
> > > Apologies if I'm missing something, but, this being a peak
fitting
> > > problem, shouldn't nls() (or something from the minpack.lm or
nlsr
> > > packages) work for you here?
> > >
> > > minpack.lm::nlsLM(
> > >  ymix ~ a1 * dnorm(x, mu1, sigma1) + a2 * dnorm(x, mu2, sigma2),
> > > start = c(a1 = 1, mu1 = 0, sigma1 = 1, a2 = 1, mu2 = 1, sigma2 =
1),
> > > lower = rep(0, 6) # help minpack avoid NaNs
> > > )
> > > # Nonlinear regression model
> > > #  model: ymix ~ a1 * dnorm(x, mu1, sigma1) + a2 * dnorm(x, mu2,
> > > sigma2) #  data: parent.frame()
> > > #      a1    mu1 sigma1     a2    mu2 sigma2
> > > #  0.1253 0.3000 0.1000 0.2506 0.7000 0.1000 # residual
> > > sum-of-squares: 1.289e-31 # # Number of iterations to
convergence:
> > > 23 # Achieved convergence tolerance: 1.49e-08
> > >
> > > (Some nonlinear least squares problems will be much harder to
solve
> > > though.)
> > >
> > > --
> > > Best regards,
> > > Ivan
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

R help - Jan 2022 - mixture univariate distributions fit

[R] mixture univariate distributions fit

[R] mixture univariate distributions fit

[R] mixture univariate distributions fit