thr3ads.net - R help - [R] mixture univariate distributions fit [Dec 2021]

If this information is useful, please help other people find it:
Share via:

PIKAL Petr

2021-Dec-30 08:36 UTC

[R] mixture univariate distributions fit

Thank you Bert

The values are results from particle size measurement by sedimentation and 
they are really available only as these cumulative or density distributions. 
What I thought about was that there is some package which could fit data of 
such curves and deliver parameters of fitted curves.

Something like
https://chrisostrouchov.com/post/peak_fit_xrd_python/

I found package EMpeaksR which results close to values estimated from mixtools 
package.

test <- spect_em_gmm(temp1$velik, temp1$proc, mu=c(170, 220), 
mix_ratio=c(1,1), sigma=c(5,5), maxit=2000, conv.cri=1e-8)
print(cbind(test$mu, test$sigma, test$mix_ratio))
         [,1]      [,2]      [,3]
[1,] 170.7744  7.200109 0.5759867
[2,] 229.1815 10.831626 0.4240133

But it is probably in stage of intensive development as it is limited in data 
visualisation

Any further hint is appreciated.

Regards
Petr
> -----Original Message-----
> From: Bert Gunter <bgunter.4567 at gmail.com>
> Sent: Wednesday, December 29, 2021 5:01 PM
> To: PIKAL Petr <petr.pikal at precheza.cz>
> Cc: r-help mailing list <r-help at r-project.org>
> Subject: Re: [R] mixture univariate distributions fit
>
> No.
>
> However, if the object returned is the "Value" structure of
whatever density
> function you use, it probably contains the original data. You need to check
> the docs to see. But this does not appear to be your situation.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
> On Wed, Dec 29, 2021 at 3:05 AM PIKAL Petr <petr.pikal at
precheza.cz>
> wrote:
> >
> > Dear all
> >
> > I have data which are either density distribution estimate or
> > cummulative density distribution estimate (temp1, temp2 below). I
> > would like to get values (mu, sd) for underlaying original data but
they
> > are
> not available.
> >
> > I found mixtools package which calculate what I need but it requires
> > original data (AFAIK). They could be generated from e.g. temp1 by
> >
> > set.seed(111)
> > x<- sample(temp1$velik, size=100000, replace=TRUE, prob=temp1$proc)
> >
> > library(mixtools)
> > fit <- normalmixEM(x)
> > plot(fit, which=2)
> > summary(fit)
> > summary of normalmixEM object:
> >            comp 1     comp 2
> > lambda   0.576346   0.423654
> > mu     170.784520 229.192823
> > sigma    7.203491  10.793461
> > loglik at estimate:  -424062.7
> > >
> >
> > Is there any way how to get such values directly from density or
> > cummulative density estimation without generating fake data by sample?
> >
> > Best regards
> > Petr
> >
> >
> > temp1 <- structure(list(velik = c(155, 156.8, 157.9, 158.8, 159.6,
> > 160.4, 161.2, 161.9, 162.5, 163.1, 163.8, 164.3, 164.7, 165.3, 165.8,
> > 166.2, 166.7, 167.2, 167.7, 168.2, 168.7, 169.1, 169.6, 170.1, 170.6,
> > 171.1, 171.6, 172, 172.5, 173, 173.5, 174, 174.5, 175.1, 175.7, 176.3,
> > 177, 177.6, 178.3, 179.1, 179.9, 180.6, 181.4, 182.4, 183.5, 184.7,
> > 186.1, 187.9, 189.8, 192, 194.4, 197, 200.1, 203.5, 206.7, 209.2,
> > 211.3, 213.1, 214.8, 216.3, 217.4, 218.5, 219.5, 220.4, 221.3, 222.1,
> > 223, 223.7, 224.5, 225.2, 225.9, 226.7, 227.5, 228.2, 228.9, 229.6,
> > 230.4, 231.2, 231.9, 232.6, 233.4, 234.2, 235, 235.9, 236.8, 237.7,
> > 238.6, 239.7, 241, 242.3, 243.6, 245.2, 247.1, 249.3, 251.9, 255.3,
> > 260, 266, 274.9, 323.4 ), proc = c(0.6171, 1.583, 1.371, 2.13, 1.828,
> > 2.095, 1.994, 2.694, 2.824, 2.41, 2.909, 3.768, 3.179, 3.029, 3.798,
> > 3.743, 3.276, 3.213, 3.579, 2.928, 4.634, 3.415, 3.473, 3.135, 3.476,
> > 3.759, 3.726, 3.9, 3.593, 2.89, 3.707, 4.08, 2.846, 2.685, 3.394,
> > 2.737, 2.693, 2.878, 2.248, 2.368, 2.258, 2.662, 1.866, 1.895, 1.457,
> > 1.513, 1.181, 1.008, 0.9641, 0.799, 0.7878, 0.7209, 0.5869, 0.5778,
> > 0.7313, 0.9531, 1.053, 1.317, 1.247, 1.739, 2.064, 1.99, 2.522, 2.401,
> > 2.48, 2.687, 2.797, 2.918, 3.243, 3.055, 3.009, 2.89, 3.037, 3.25,
> > 3.349, 3.141, 2.771, 2.985, 3.203, 3.298, 3.215, 2.637, 2.683, 2.782,
> > 2.632, 2.625, 2.475, 2.014, 1.781, 1.987, 1.627, 1.374, 1.352, 0.9441,
> > 1.01, 0.5737, 0.5265, 0.3794, 0.2513, 0.0351)), row.names = 2:101,
> > class = "data.frame")
> >
> > temp2 <- structure(list(velik = c(153.8, 156.3, 157.3, 158.4,
159.2,
> > 160.1, 160.8, 161.6, 162.2, 162.8, 163.5, 164, 164.5, 165, 165.5, 166,
> > 166.4, 166.9, 167.5, 167.9, 168.5, 168.9, 169.4, 169.8, 170.4, 170.9,
> > 171.3, 171.8, 172.2, 172.7, 173.3, 173.8, 174.2, 174.8, 175.5, 176,
> > 176.6, 177.3, 177.9, 178.7, 179.5, 180.3, 180.9, 181.9, 182.9, 184.1,
> > 185.3, 186.9, 188.8, 190.8, 193.2, 195.6, 198.4, 201.8, 205.3, 208.1,
> > 210.3, 212.3, 213.9, 215.7, 216.9, 218, 219.1, 219.9, 220.8, 221.7,
> > 222.6, 223.4, 224.1, 224.8, 225.6, 226.3, 227.1, 227.8, 228.5, 229.2,
> > 230, 230.8, 231.6, 232.3, 233, 233.7, 234.6, 235.5, 236.3, 237.2,
> > 238.1, 239.1, 240.3, 241.6, 242.9, 244.4, 246.1, 248, 250.6, 253.1,
> > 257.6, 262.5, 269.5, 280.4, 372.9), proc = c(0, 1, 2, 3, 4, 5, 6, 7,
> > 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
> > 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
> > 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
> > 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
> > 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93,
> > 94, 95, 96, 97, 98, 99, 100)), row.names = c(NA, 101L), class >
> "data.frame")
> > >
> >
> > >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

Bert Gunter

2021-Dec-30 16:09 UTC

head link

[R] mixture univariate distributions fit

Petr:

1. I now am somewhat confused by your goals. Any curve fitting
procedure can fit curves to data, including discrete points on a
smooth curve. What cannot be done is to recover the exact parameters
of the smooth curve from which your data derive, at least not without
knowing how the curve was fitted to that underlying (particle size)
data, i.e. the nature of the parameterization (if even there was one
-- it may just be some sort of empirical smoother like kernel density
estimation or some such).

2. mixtools uses EM, so the similarity you noted is probably not surprising.

3. Given my evident confusion, I hope that this email may prompt
someone more knowledgeable than I to correct or clarify my "advice."
You may wish to further clarify what you wish to do with the
parameters that you hope to get to encourage such comments. For
example, do you wish to compare the densities you get via their
parameterizations? If so, others may be able to offer better
strategies to do this, perhaps on SO --
https://stats.stackexchange.com/  -- rather than here.

Cheers,
Bert Gunter

On Thu, Dec 30, 2021 at 12:36 AM PIKAL Petr <petr.pikal at precheza.cz>
wrote:>
> Thank you Bert
>
> The values are results from particle size measurement by sedimentation and
> they are really available only as these cumulative or density
distributions.
> What I thought about was that there is some package which could fit data of
> such curves and deliver parameters of fitted curves.
>
> Something like
> https://chrisostrouchov.com/post/peak_fit_xrd_python/
>
> I found package EMpeaksR which results close to values estimated from
mixtools
> package.
>
> test <- spect_em_gmm(temp1$velik, temp1$proc, mu=c(170, 220),
> mix_ratio=c(1,1), sigma=c(5,5), maxit=2000, conv.cri=1e-8)
> print(cbind(test$mu, test$sigma, test$mix_ratio))
>          [,1]      [,2]      [,3]
> [1,] 170.7744  7.200109 0.5759867
> [2,] 229.1815 10.831626 0.4240133
>
> But it is probably in stage of intensive development as it is limited in
data
> visualisation
>
> Any further hint is appreciated.
>
> Regards
> Petr
>
> > -----Original Message-----
> > From: Bert Gunter <bgunter.4567 at gmail.com>
> > Sent: Wednesday, December 29, 2021 5:01 PM
> > To: PIKAL Petr <petr.pikal at precheza.cz>
> > Cc: r-help mailing list <r-help at r-project.org>
> > Subject: Re: [R] mixture univariate distributions fit
> >
> > No.
> >
> > However, if the object returned is the "Value" structure of
whatever density
> > function you use, it probably contains the original data. You need to
check
> > the docs to see. But this does not appear to be your situation.
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming
along and
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic
strip )
> >
> > On Wed, Dec 29, 2021 at 3:05 AM PIKAL Petr <petr.pikal at
precheza.cz>
> > wrote:
> > >
> > > Dear all
> > >
> > > I have data which are either density distribution estimate or
> > > cummulative density distribution estimate (temp1, temp2 below). I
> > > would like to get values (mu, sd) for underlaying original data
but they
> > > are
> > not available.
> > >
> > > I found mixtools package which calculate what I need but it
requires
> > > original data (AFAIK). They could be generated from e.g. temp1 by
> > >
> > > set.seed(111)
> > > x<- sample(temp1$velik, size=100000, replace=TRUE,
prob=temp1$proc)
> > >
> > > library(mixtools)
> > > fit <- normalmixEM(x)
> > > plot(fit, which=2)
> > > summary(fit)
> > > summary of normalmixEM object:
> > >            comp 1     comp 2
> > > lambda   0.576346   0.423654
> > > mu     170.784520 229.192823
> > > sigma    7.203491  10.793461
> > > loglik at estimate:  -424062.7
> > > >
> > >
> > > Is there any way how to get such values directly from density or
> > > cummulative density estimation without generating fake data by
sample?
> > >
> > > Best regards
> > > Petr
> > >
> > >
> > > temp1 <- structure(list(velik = c(155, 156.8, 157.9, 158.8,
159.6,
> > > 160.4, 161.2, 161.9, 162.5, 163.1, 163.8, 164.3, 164.7, 165.3,
165.8,
> > > 166.2, 166.7, 167.2, 167.7, 168.2, 168.7, 169.1, 169.6, 170.1,
170.6,
> > > 171.1, 171.6, 172, 172.5, 173, 173.5, 174, 174.5, 175.1, 175.7,
176.3,
> > > 177, 177.6, 178.3, 179.1, 179.9, 180.6, 181.4, 182.4, 183.5,
184.7,
> > > 186.1, 187.9, 189.8, 192, 194.4, 197, 200.1, 203.5, 206.7, 209.2,
> > > 211.3, 213.1, 214.8, 216.3, 217.4, 218.5, 219.5, 220.4, 221.3,
222.1,
> > > 223, 223.7, 224.5, 225.2, 225.9, 226.7, 227.5, 228.2, 228.9,
229.6,
> > > 230.4, 231.2, 231.9, 232.6, 233.4, 234.2, 235, 235.9, 236.8,
237.7,
> > > 238.6, 239.7, 241, 242.3, 243.6, 245.2, 247.1, 249.3, 251.9,
255.3,
> > > 260, 266, 274.9, 323.4 ), proc = c(0.6171, 1.583, 1.371, 2.13,
1.828,
> > > 2.095, 1.994, 2.694, 2.824, 2.41, 2.909, 3.768, 3.179, 3.029,
3.798,
> > > 3.743, 3.276, 3.213, 3.579, 2.928, 4.634, 3.415, 3.473, 3.135,
3.476,
> > > 3.759, 3.726, 3.9, 3.593, 2.89, 3.707, 4.08, 2.846, 2.685, 3.394,
> > > 2.737, 2.693, 2.878, 2.248, 2.368, 2.258, 2.662, 1.866, 1.895,
1.457,
> > > 1.513, 1.181, 1.008, 0.9641, 0.799, 0.7878, 0.7209, 0.5869,
0.5778,
> > > 0.7313, 0.9531, 1.053, 1.317, 1.247, 1.739, 2.064, 1.99, 2.522,
2.401,
> > > 2.48, 2.687, 2.797, 2.918, 3.243, 3.055, 3.009, 2.89, 3.037,
3.25,
> > > 3.349, 3.141, 2.771, 2.985, 3.203, 3.298, 3.215, 2.637, 2.683,
2.782,
> > > 2.632, 2.625, 2.475, 2.014, 1.781, 1.987, 1.627, 1.374, 1.352,
0.9441,
> > > 1.01, 0.5737, 0.5265, 0.3794, 0.2513, 0.0351)), row.names =
2:101,
> > > class = "data.frame")
> > >
> > > temp2 <- structure(list(velik = c(153.8, 156.3, 157.3, 158.4,
159.2,
> > > 160.1, 160.8, 161.6, 162.2, 162.8, 163.5, 164, 164.5, 165, 165.5,
166,
> > > 166.4, 166.9, 167.5, 167.9, 168.5, 168.9, 169.4, 169.8, 170.4,
170.9,
> > > 171.3, 171.8, 172.2, 172.7, 173.3, 173.8, 174.2, 174.8, 175.5,
176,
> > > 176.6, 177.3, 177.9, 178.7, 179.5, 180.3, 180.9, 181.9, 182.9,
184.1,
> > > 185.3, 186.9, 188.8, 190.8, 193.2, 195.6, 198.4, 201.8, 205.3,
208.1,
> > > 210.3, 212.3, 213.9, 215.7, 216.9, 218, 219.1, 219.9, 220.8,
221.7,
> > > 222.6, 223.4, 224.1, 224.8, 225.6, 226.3, 227.1, 227.8, 228.5,
229.2,
> > > 230, 230.8, 231.6, 232.3, 233, 233.7, 234.6, 235.5, 236.3, 237.2,
> > > 238.1, 239.1, 240.3, 241.6, 242.9, 244.4, 246.1, 248, 250.6,
253.1,
> > > 257.6, 262.5, 269.5, 280.4, 372.9), proc = c(0, 1, 2, 3, 4, 5, 6,
7,
> > > 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25,
> > > 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
42,
> > > 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
59,
> > > 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,
76,
> > > 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,
93,
> > > 94, 95, 96, 97, 98, 99, 100)), row.names = c(NA, 101L), class
> > > "data.frame")
> > > >
> > >
> > > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
code.

R help - Dec 2021 - mixture univariate distributions fit

[R] mixture univariate distributions fit

[R] mixture univariate distributions fit