PIKAL Petr
2022-Feb-03 16:44 UTC
[R] generate distribution based on summary data and add random noise
Hallo Bert probably not, sorry. Did you try my examples? To make it maybe simpler 1. sample a vector with given proportion and generate new data 2. add random noise to each generated value with sd given by value of a vector. let say x <- c(10, 100) y <- c(.6, .4) set.seed(200) z <- sample(x, 10, rep=TRUE, prob=y) ind <- order(z) bins <- rle(z[ind]) bin1 <- rnorm(bins$lengths[1], mean = 0, sd=bins$values[1]/5) bin2 <- rnorm(bins$lengths[2], mean = 0, sd=bins$values[2]/5) z[ind] + c(bin1, bin2) Sorry that I did not explain myself more clearly, I hoped that example showed what I have on mind. Basically it is particle size cumulative distribution but size is expressed as size bins. Normally I have exact size measurement for each particle. S pozdravem | Best Regards RNDr. Petr PIKAL Vedouc? V?zkumu a v?voje | Research Manager PRECHEZA a.s. n?b?. Dr. Edvarda Bene?e 1170/24 | 750 02 P?erov | Czech Republic Tel: +420 581 252 256 | GSM: +420 724 008 364 mailto:petr.pikal at precheza.cz | https://www.precheza.cz/ Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch partner? PRECHEZA a.s. jsou zve?ejn?ny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner?s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ From: Bert Gunter <bgunter.4567 at gmail.com> Sent: Thursday, February 3, 2022 5:10 PM To: PIKAL Petr <petr.pikal at precheza.cz> Cc: R-help <r-help at r-project.org> Subject: Re: [R] generate distribution based on summary data and add random noise If I understand correctly: To generate a sample of total size N, generate a uniform sample of size p*N for a bin with proportion p? ?runif Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Feb 3, 2022 at 7:52 AM PIKAL Petr <mailto:petr.pikal at precheza.cz> wrote: Hallo all I have summary data with size bins and percentage below that size. dat <- structure(list(size = c(10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L, 90L, 100L, 110L, 120L, 130L, 140L, 150L, 160L, 170L, 180L, 190L, 200L, 250L, 300L, 400L, 500L), percent = c(0L, 0L, 0L, 1L, 1L, 2L, 4L, 8L, 13L, 18L, 24L, 31L, 38L, 44L, 50L, 57L, 65L, 72L, 76L, 83L, 95L, 98L, 100L, 100L)), class = "data.frame", row.names = c(NA, -24L)) #I want to generate original distribution (I know it is better not to do it but I have no other choice) so I calculated #mids of those bins xd <-dat$size-c(5,diff(dat$size)/2) xd<- xd[-1] #I can sample the size bins with probability given by percent. Result <- sample(xd, 1000, rep=T, prob=diff(dat$percent)/100) plot(ecdf(Result)) #and I can add some noise to it, which is satisfactory with lower size bins but not enough for higher size bins. Result <- sample(xd, 1000, rep=T, prob=diff(dat$percent)/100)+rnorm(1000, mean=0, sd=5) plot(ecdf(Result)) I can increase sd to satisfy bigger bin size but in that case noise is too big for lower bin size. I would like to add smaller random noise to lower size bins and bigger random noise to higher size bins, which seems to be easy task but I am stuck how to do it. It should be somehow proportional to size value. The only way forward I see is to sort generated result and to use something like + rnorm(1000, mean=xd, sd=xd/10) But it is not correct. I'd appreciate any hint how to add random noise to values in ordered manner. Best regards. Petr Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch partner? PRECHEZA a.s. jsou zve?ejn?ny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner?s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ ______________________________________________ mailto:R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter
2022-Feb-03 17:34 UTC
[R] generate distribution based on summary data and add random noise
Nope. I think I provided what you asked for, random data in each bin with the amount of data proportional to bin percentage and the distribution of that data uniform (nor normal) within the bin. So maybe someone else can give you what you want if this ain't it. Cheers, Bert "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Feb 3, 2022 at 8:44 AM PIKAL Petr <petr.pikal at precheza.cz> wrote:> Hallo Bert > > probably not, sorry. Did you try my examples? > > To make it maybe simpler > 1. sample a vector with given proportion and generate new data > 2. add random noise to each generated value with sd given by value of a > vector. > > let say > > x <- c(10, 100) > y <- c(.6, .4) > set.seed(200) > z <- sample(x, 10, rep=TRUE, prob=y) > ind <- order(z) > bins <- rle(z[ind]) > bin1 <- rnorm(bins$lengths[1], mean = 0, sd=bins$values[1]/5) > bin2 <- rnorm(bins$lengths[2], mean = 0, sd=bins$values[2]/5) > z[ind] + c(bin1, bin2) > > Sorry that I did not explain myself more clearly, I hoped that example > showed what I have on mind. > > Basically it is particle size cumulative distribution but size is > expressed as size bins. Normally I have exact size measurement for each > particle. > > S pozdravem | Best Regards > RNDr. Petr PIKAL > Vedouc? V?zkumu a v?voje | Research Manager > PRECHEZA a.s. > n?b?. Dr. Edvarda Bene?e 1170/24 | 750 02 P?erov | Czech Republic > Tel: +420 581 252 256 | GSM: +420 724 008 364 > mailto:petr.pikal at precheza.cz | https://www.precheza.cz/ > > Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch > partner? PRECHEZA a.s. jsou zve?ejn?ny na: > https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information > about processing and protection of business partner?s personal data are > available on website: > https://www.precheza.cz/en/personal-data-protection-principles/ > D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou > d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? > odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any > documents attached to it may be confidential and are subject to the legally > binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ > > From: Bert Gunter <bgunter.4567 at gmail.com> > Sent: Thursday, February 3, 2022 5:10 PM > To: PIKAL Petr <petr.pikal at precheza.cz> > Cc: R-help <r-help at r-project.org> > Subject: Re: [R] generate distribution based on summary data and add > random noise > > If I understand correctly: > To generate a sample of total size N, generate a uniform sample of size > p*N for a bin with proportion p? > ?runif > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Thu, Feb 3, 2022 at 7:52 AM PIKAL Petr <mailto:petr.pikal at precheza.cz> > wrote: > Hallo all > > I have summary data with size bins and percentage below that size. > > dat <- structure(list(size = c(10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L, > 90L, 100L, 110L, 120L, 130L, 140L, 150L, 160L, 170L, 180L, 190L, > 200L, 250L, 300L, 400L, 500L), percent = c(0L, 0L, 0L, 1L, 1L, > 2L, 4L, 8L, 13L, 18L, 24L, 31L, 38L, 44L, 50L, 57L, 65L, 72L, > 76L, 83L, 95L, 98L, 100L, 100L)), class = "data.frame", row.names = c(NA, > -24L)) > > #I want to generate original distribution (I know it is better not to do > it but I have no other choice) so I calculated #mids of those bins > > xd <-dat$size-c(5,diff(dat$size)/2) > xd<- xd[-1] > > #I can sample the size bins with probability given by percent. > Result <- sample(xd, 1000, rep=T, prob=diff(dat$percent)/100) > plot(ecdf(Result)) > > #and I can add some noise to it, which is satisfactory with lower size > bins but not enough for higher size bins. > > Result <- sample(xd, 1000, rep=T, prob=diff(dat$percent)/100)+rnorm(1000, > mean=0, sd=5) > plot(ecdf(Result)) > I can increase sd to satisfy bigger bin size but in that case noise is too > big for lower bin size. > > I would like to add smaller random noise to lower size bins and bigger > random noise to higher size bins, which seems to be easy task but I am > stuck how to do it. It should be somehow proportional to size value. > The only way forward I see is to sort generated result and to use > something like > > + rnorm(1000, mean=xd, sd=xd/10) > But it is not correct. > > I'd appreciate any hint how to add random noise to values in ordered > manner. > > Best regards. > Petr > > Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch > partner? PRECHEZA a.s. jsou zve?ejn?ny na: > https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information > about processing and protection of business partner?s personal data are > available on website: > https://www.precheza.cz/en/personal-data-protection-principles/ > D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou > d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? > odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any > documents attached to it may be confidential and are subject to the legally > binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ > > ______________________________________________ > mailto:R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]