thr3ads.net - R help - [R] mixture univariate distributions fit [Dec 2021]

If this information is useful, please help other people find it:
Share via:

Bert Gunter

2021-Dec-30 16:09 UTC

[R] mixture univariate distributions fit

Petr:

1. I now am somewhat confused by your goals. Any curve fitting
procedure can fit curves to data, including discrete points on a
smooth curve. What cannot be done is to recover the exact parameters
of the smooth curve from which your data derive, at least not without
knowing how the curve was fitted to that underlying (particle size)
data, i.e. the nature of the parameterization (if even there was one
-- it may just be some sort of empirical smoother like kernel density
estimation or some such).

2. mixtools uses EM, so the similarity you noted is probably not surprising.

3. Given my evident confusion, I hope that this email may prompt
someone more knowledgeable than I to correct or clarify my "advice."
You may wish to further clarify what you wish to do with the
parameters that you hope to get to encourage such comments. For
example, do you wish to compare the densities you get via their
parameterizations? If so, others may be able to offer better
strategies to do this, perhaps on SO --
https://stats.stackexchange.com/  -- rather than here.

Cheers,
Bert Gunter

On Thu, Dec 30, 2021 at 12:36 AM PIKAL Petr <petr.pikal at precheza.cz>
wrote:>
> Thank you Bert
>
> The values are results from particle size measurement by sedimentation and
> they are really available only as these cumulative or density
distributions.
> What I thought about was that there is some package which could fit data of
> such curves and deliver parameters of fitted curves.
>
> Something like
> https://chrisostrouchov.com/post/peak_fit_xrd_python/
>
> I found package EMpeaksR which results close to values estimated from
mixtools
> package.
>
> test <- spect_em_gmm(temp1$velik, temp1$proc, mu=c(170, 220),
> mix_ratio=c(1,1), sigma=c(5,5), maxit=2000, conv.cri=1e-8)
> print(cbind(test$mu, test$sigma, test$mix_ratio))
>          [,1]      [,2]      [,3]
> [1,] 170.7744  7.200109 0.5759867
> [2,] 229.1815 10.831626 0.4240133
>
> But it is probably in stage of intensive development as it is limited in
data
> visualisation
>
> Any further hint is appreciated.
>
> Regards
> Petr
>
> > -----Original Message-----
> > From: Bert Gunter <bgunter.4567 at gmail.com>
> > Sent: Wednesday, December 29, 2021 5:01 PM
> > To: PIKAL Petr <petr.pikal at precheza.cz>
> > Cc: r-help mailing list <r-help at r-project.org>
> > Subject: Re: [R] mixture univariate distributions fit
> >
> > No.
> >
> > However, if the object returned is the "Value" structure of
whatever density
> > function you use, it probably contains the original data. You need to
check
> > the docs to see. But this does not appear to be your situation.
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming
along and
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic
strip )
> >
> > On Wed, Dec 29, 2021 at 3:05 AM PIKAL Petr <petr.pikal at
precheza.cz>
> > wrote:
> > >
> > > Dear all
> > >
> > > I have data which are either density distribution estimate or
> > > cummulative density distribution estimate (temp1, temp2 below). I
> > > would like to get values (mu, sd) for underlaying original data
but they
> > > are
> > not available.
> > >
> > > I found mixtools package which calculate what I need but it
requires
> > > original data (AFAIK). They could be generated from e.g. temp1 by
> > >
> > > set.seed(111)
> > > x<- sample(temp1$velik, size=100000, replace=TRUE,
prob=temp1$proc)
> > >
> > > library(mixtools)
> > > fit <- normalmixEM(x)
> > > plot(fit, which=2)
> > > summary(fit)
> > > summary of normalmixEM object:
> > >            comp 1     comp 2
> > > lambda   0.576346   0.423654
> > > mu     170.784520 229.192823
> > > sigma    7.203491  10.793461
> > > loglik at estimate:  -424062.7
> > > >
> > >
> > > Is there any way how to get such values directly from density or
> > > cummulative density estimation without generating fake data by
sample?
> > >
> > > Best regards
> > > Petr
> > >
> > >
> > > temp1 <- structure(list(velik = c(155, 156.8, 157.9, 158.8,
159.6,
> > > 160.4, 161.2, 161.9, 162.5, 163.1, 163.8, 164.3, 164.7, 165.3,
165.8,
> > > 166.2, 166.7, 167.2, 167.7, 168.2, 168.7, 169.1, 169.6, 170.1,
170.6,
> > > 171.1, 171.6, 172, 172.5, 173, 173.5, 174, 174.5, 175.1, 175.7,
176.3,
> > > 177, 177.6, 178.3, 179.1, 179.9, 180.6, 181.4, 182.4, 183.5,
184.7,
> > > 186.1, 187.9, 189.8, 192, 194.4, 197, 200.1, 203.5, 206.7, 209.2,
> > > 211.3, 213.1, 214.8, 216.3, 217.4, 218.5, 219.5, 220.4, 221.3,
222.1,
> > > 223, 223.7, 224.5, 225.2, 225.9, 226.7, 227.5, 228.2, 228.9,
229.6,
> > > 230.4, 231.2, 231.9, 232.6, 233.4, 234.2, 235, 235.9, 236.8,
237.7,
> > > 238.6, 239.7, 241, 242.3, 243.6, 245.2, 247.1, 249.3, 251.9,
255.3,
> > > 260, 266, 274.9, 323.4 ), proc = c(0.6171, 1.583, 1.371, 2.13,
1.828,
> > > 2.095, 1.994, 2.694, 2.824, 2.41, 2.909, 3.768, 3.179, 3.029,
3.798,
> > > 3.743, 3.276, 3.213, 3.579, 2.928, 4.634, 3.415, 3.473, 3.135,
3.476,
> > > 3.759, 3.726, 3.9, 3.593, 2.89, 3.707, 4.08, 2.846, 2.685, 3.394,
> > > 2.737, 2.693, 2.878, 2.248, 2.368, 2.258, 2.662, 1.866, 1.895,
1.457,
> > > 1.513, 1.181, 1.008, 0.9641, 0.799, 0.7878, 0.7209, 0.5869,
0.5778,
> > > 0.7313, 0.9531, 1.053, 1.317, 1.247, 1.739, 2.064, 1.99, 2.522,
2.401,
> > > 2.48, 2.687, 2.797, 2.918, 3.243, 3.055, 3.009, 2.89, 3.037,
3.25,
> > > 3.349, 3.141, 2.771, 2.985, 3.203, 3.298, 3.215, 2.637, 2.683,
2.782,
> > > 2.632, 2.625, 2.475, 2.014, 1.781, 1.987, 1.627, 1.374, 1.352,
0.9441,
> > > 1.01, 0.5737, 0.5265, 0.3794, 0.2513, 0.0351)), row.names =
2:101,
> > > class = "data.frame")
> > >
> > > temp2 <- structure(list(velik = c(153.8, 156.3, 157.3, 158.4,
159.2,
> > > 160.1, 160.8, 161.6, 162.2, 162.8, 163.5, 164, 164.5, 165, 165.5,
166,
> > > 166.4, 166.9, 167.5, 167.9, 168.5, 168.9, 169.4, 169.8, 170.4,
170.9,
> > > 171.3, 171.8, 172.2, 172.7, 173.3, 173.8, 174.2, 174.8, 175.5,
176,
> > > 176.6, 177.3, 177.9, 178.7, 179.5, 180.3, 180.9, 181.9, 182.9,
184.1,
> > > 185.3, 186.9, 188.8, 190.8, 193.2, 195.6, 198.4, 201.8, 205.3,
208.1,
> > > 210.3, 212.3, 213.9, 215.7, 216.9, 218, 219.1, 219.9, 220.8,
221.7,
> > > 222.6, 223.4, 224.1, 224.8, 225.6, 226.3, 227.1, 227.8, 228.5,
229.2,
> > > 230, 230.8, 231.6, 232.3, 233, 233.7, 234.6, 235.5, 236.3, 237.2,
> > > 238.1, 239.1, 240.3, 241.6, 242.9, 244.4, 246.1, 248, 250.6,
253.1,
> > > 257.6, 262.5, 269.5, 280.4, 372.9), proc = c(0, 1, 2, 3, 4, 5, 6,
7,
> > > 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25,
> > > 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
42,
> > > 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
59,
> > > 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,
76,
> > > 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,
93,
> > > 94, 95, 96, 97, 98, 99, 100)), row.names = c(NA, 101L), class
> > > "data.frame")
> > > >
> > >
> > > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
code.

PIKAL Petr

2021-Dec-31 07:59 UTC

head link

[R] mixture univariate distributions fit

Hallo Bert

Sorry for the confusion, I may not have used correct wording in describing 
what I wanted to do. Consider this example

# 2 different distribution densities mixed together in 1:2 proportion
x <- (0:100)/100
y1 <- dnorm((x, mean=.3, sd=.1)
y2 <- dnorm((x, mean=.7, sd=.1)
ymix <- ((y1+2*y2)/max(y1+2*y2))
plot((x, ymix)

# sampling  x based on mixed distribution density
dist <- sample((x, size=10000, replace=TRUE, prob=ymix)
library(mixtools)
fit <- normalmixEM(dist)
summary(fit)

summary of normalmixEM object:
         comp 1    comp 2
lambda 0.334926 0.6650742
mu     0.304862 0.7036926
sigma  0.098673 0.0994382
loglik at estimate:  30935.34

I can get estimation of mu and sigma together with proportion of each (lambda) 
distribution in mixture. My question is if there is some package or function 
which could get those values ***directly from x and ymix values***, which is 
basically what is measured in my case.

I know I could code some optimisation procedure and adopt what was done in 
Python https://chrisostrouchov.com/post/peak_fit_xrd_python/ for peak fitting 
or this one 
https://stats.stackexchange.com/questions/92748/multi-peak-gaussian-fit-in-r 
but I do not like to reinvent the wheel.

I went through Chemometrics and Computational Physics task view, which I 
consider most appropriate to have such functions but it is usually quite 
specific, especially in required input data form.

Anyway, thanks for your effort.

I wish you best for new year 2022.
Petr

> -----Original Message-----
> From: Bert Gunter <bgunter.4567 at gmail.com>
> Sent: Thursday, December 30, 2021 5:10 PM
> To: PIKAL Petr <petr.pikal at precheza.cz>
> Cc: r-help mailing list <r-help at r-project.org>
> Subject: Re: [R] mixture univariate distributions fit
>
> Petr:
>
> 1. I now am somewhat confused by your goals. Any curve fitting procedure
> can fit curves to data, including discrete points on a smooth curve. What
> cannot be done is to recover the exact parameters of the smooth curve from
> which your data derive, at least not without knowing how the curve was
> fitted to that underlying (particle size) data, i.e. the nature of the
> parameterization (if even there was one
> -- it may just be some sort of empirical smoother like kernel density
> estimation or some such).
>
> 2. mixtools uses EM, so the similarity you noted is probably not
surprising.
>
> 3. Given my evident confusion, I hope that this email may prompt someone
> more knowledgeable than I to correct or clarify my "advice."
> You may wish to further clarify what you wish to do with the parameters
that
> you hope to get to encourage such comments. For example, do you wish to
> compare the densities you get via their parameterizations? If so, others
may
> be able to offer better strategies to do this, perhaps on SO --
> https://stats.stackexchange.com/  -- rather than here.
>
> Cheers,
> Bert Gunter
>
> On Thu, Dec 30, 2021 at 12:36 AM PIKAL Petr <petr.pikal at
precheza.cz>
> wrote:
> >
> > Thank you Bert
> >
> > The values are results from particle size measurement by sedimentation
> > and they are really available only as these cumulative or density
> distributions.
> > What I thought about was that there is some package which could fit
> > data of such curves and deliver parameters of fitted curves.
> >
> > Something like
> > https://chrisostrouchov.com/post/peak_fit_xrd_python/
> >
> > I found package EMpeaksR which results close to values estimated from
> > mixtools package.
> >
> > test <- spect_em_gmm(temp1$velik, temp1$proc, mu=c(170, 220),
> > mix_ratio=c(1,1), sigma=c(5,5), maxit=2000, conv.cri=1e-8)
> > print(cbind(test$mu, test$sigma, test$mix_ratio))
> >          [,1]      [,2]      [,3]
> > [1,] 170.7744  7.200109 0.5759867
> > [2,] 229.1815 10.831626 0.4240133
> >
> > But it is probably in stage of intensive development as it is limited
> > in data visualisation
> >
> > Any further hint is appreciated.
> >
> > Regards
> > Petr
> >
> > > -----Original Message-----
> > > From: Bert Gunter <bgunter.4567 at gmail.com>
> > > Sent: Wednesday, December 29, 2021 5:01 PM
> > > To: PIKAL Petr <petr.pikal at precheza.cz>
> > > Cc: r-help mailing list <r-help at r-project.org>
> > > Subject: Re: [R] mixture univariate distributions fit
> > >
> > > No.
> > >
> > > However, if the object returned is the "Value"
structure of whatever
> > > density function you use, it probably contains the original data.
> > > You need to check the docs to see. But this does not appear to be
your
> situation.
> > >
> > > Bert Gunter
> > >
> > > "The trouble with having an open mind is that people keep
coming
> > > along and sticking things into it."
> > > -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
> > >
> > > On Wed, Dec 29, 2021 at 3:05 AM PIKAL Petr <petr.pikal at
precheza.cz>
> > > wrote:
> > > >
> > > > Dear all
> > > >
> > > > I have data which are either density distribution estimate
or
> > > > cummulative density distribution estimate (temp1, temp2
below). I
> > > > would like to get values (mu, sd) for underlaying original
data
> > > > but they are
> > > not available.
> > > >
> > > > I found mixtools package which calculate what I need but it
> > > > requires original data (AFAIK). They could be generated from
e.g.
> > > > temp1 by
> > > >
> > > > set.seed(111)
> > > > x<- sample(temp1$velik, size=100000, replace=TRUE,
> > > > prob=temp1$proc)
> > > >
> > > > library(mixtools)
> > > > fit <- normalmixEM(x)
> > > > plot(fit, which=2)
> > > > summary(fit)
> > > > summary of normalmixEM object:
> > > >            comp 1     comp 2
> > > > lambda   0.576346   0.423654
> > > > mu     170.784520 229.192823
> > > > sigma    7.203491  10.793461
> > > > loglik at estimate:  -424062.7
> > > > >
> > > >
> > > > Is there any way how to get such values directly from
density or
> > > > cummulative density estimation without generating fake data
by
> sample?
> > > >
> > > > Best regards
> > > > Petr
> > > >
> > > >
> > > > temp1 <- structure(list(velik = c(155, 156.8, 157.9,
158.8, 159.6,
> > > > 160.4, 161.2, 161.9, 162.5, 163.1, 163.8, 164.3, 164.7,
165.3,
> > > > 165.8, 166.2, 166.7, 167.2, 167.7, 168.2, 168.7, 169.1,
169.6,
> > > > 170.1, 170.6, 171.1, 171.6, 172, 172.5, 173, 173.5, 174,
174.5,
> > > > 175.1, 175.7, 176.3, 177, 177.6, 178.3, 179.1, 179.9, 180.6,
> > > > 181.4, 182.4, 183.5, 184.7, 186.1, 187.9, 189.8, 192, 194.4,
197,
> > > > 200.1, 203.5, 206.7, 209.2, 211.3, 213.1, 214.8, 216.3,
217.4,
> > > > 218.5, 219.5, 220.4, 221.3, 222.1, 223, 223.7, 224.5, 225.2,
> > > > 225.9, 226.7, 227.5, 228.2, 228.9, 229.6, 230.4, 231.2,
231.9,
> > > > 232.6, 233.4, 234.2, 235, 235.9, 236.8, 237.7, 238.6, 239.7,
241,
> > > > 242.3, 243.6, 245.2, 247.1, 249.3, 251.9, 255.3, 260, 266,
274.9,
> > > > 323.4 ), proc = c(0.6171, 1.583, 1.371, 2.13, 1.828, 2.095,
1.994,
> > > > 2.694, 2.824, 2.41, 2.909, 3.768, 3.179, 3.029, 3.798,
3.743,
> > > > 3.276, 3.213, 3.579, 2.928, 4.634, 3.415, 3.473, 3.135,
3.476,
> > > > 3.759, 3.726, 3.9, 3.593, 2.89, 3.707, 4.08, 2.846, 2.685,
3.394,
> > > > 2.737, 2.693, 2.878, 2.248, 2.368, 2.258, 2.662, 1.866,
1.895,
> > > > 1.457, 1.513, 1.181, 1.008, 0.9641, 0.799, 0.7878, 0.7209,
0.5869,
> > > > 0.5778, 0.7313, 0.9531, 1.053, 1.317, 1.247, 1.739, 2.064,
1.99,
> > > > 2.522, 2.401, 2.48, 2.687, 2.797, 2.918, 3.243, 3.055,
3.009,
> > > > 2.89, 3.037, 3.25, 3.349, 3.141, 2.771, 2.985, 3.203, 3.298,
> > > > 3.215, 2.637, 2.683, 2.782, 2.632, 2.625, 2.475, 2.014,
1.781,
> > > > 1.987, 1.627, 1.374, 1.352, 0.9441, 1.01, 0.5737, 0.5265,
0.3794,
> > > > 0.2513, 0.0351)), row.names = 2:101, class =
"data.frame")
> > > >
> > > > temp2 <- structure(list(velik = c(153.8, 156.3, 157.3,
158.4,
> > > > 159.2, 160.1, 160.8, 161.6, 162.2, 162.8, 163.5, 164, 164.5,
165,
> > > > 165.5, 166, 166.4, 166.9, 167.5, 167.9, 168.5, 168.9, 169.4,
> > > > 169.8, 170.4, 170.9, 171.3, 171.8, 172.2, 172.7, 173.3,
173.8,
> > > > 174.2, 174.8, 175.5, 176, 176.6, 177.3, 177.9, 178.7, 179.5,
> > > > 180.3, 180.9, 181.9, 182.9, 184.1, 185.3, 186.9, 188.8,
190.8,
> > > > 193.2, 195.6, 198.4, 201.8, 205.3, 208.1, 210.3, 212.3,
213.9,
> > > > 215.7, 216.9, 218, 219.1, 219.9, 220.8, 221.7, 222.6, 223.4,
> > > > 224.1, 224.8, 225.6, 226.3, 227.1, 227.8, 228.5, 229.2, 230,
> > > > 230.8, 231.6, 232.3, 233, 233.7, 234.6, 235.5, 236.3, 237.2,
> > > > 238.1, 239.1, 240.3, 241.6, 242.9, 244.4, 246.1, 248, 250.6,
> > > > 253.1, 257.6, 262.5, 269.5, 280.4, 372.9), proc = c(0, 1, 2,
3, 4,
> > > > 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22,
> > > > 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
38,
> > > > 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
54,
> > > > 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70,
> > > > 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,
86,
> > > > 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100)),
> > > > row.names = c(NA, 101L), class > > > >
"data.frame")
> > > > >
> > > >
> > > > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible
code.

R help - Dec 2021 - mixture univariate distributions fit

[R] mixture univariate distributions fit

[R] mixture univariate distributions fit