Bert Gunter
2022-Feb-03 16:09 UTC
[R] generate distribution based on summary data and add random noise
If I understand correctly: To generate a sample of total size N, generate a uniform sample of size p*N for a bin with proportion p? ?runif Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Feb 3, 2022 at 7:52 AM PIKAL Petr <petr.pikal at precheza.cz> wrote:> Hallo all > > I have summary data with size bins and percentage below that size. > > dat <- structure(list(size = c(10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L, > 90L, 100L, 110L, 120L, 130L, 140L, 150L, 160L, 170L, 180L, 190L, > 200L, 250L, 300L, 400L, 500L), percent = c(0L, 0L, 0L, 1L, 1L, > 2L, 4L, 8L, 13L, 18L, 24L, 31L, 38L, 44L, 50L, 57L, 65L, 72L, > 76L, 83L, 95L, 98L, 100L, 100L)), class = "data.frame", row.names = c(NA, > -24L)) > > #I want to generate original distribution (I know it is better not to do > it but I have no other choice) so I calculated #mids of those bins > > xd <-dat$size-c(5,diff(dat$size)/2) > xd<- xd[-1] > > #I can sample the size bins with probability given by percent. > Result <- sample(xd, 1000, rep=T, prob=diff(dat$percent)/100) > plot(ecdf(Result)) > > #and I can add some noise to it, which is satisfactory with lower size > bins but not enough for higher size bins. > > Result <- sample(xd, 1000, rep=T, prob=diff(dat$percent)/100)+rnorm(1000, > mean=0, sd=5) > plot(ecdf(Result)) > I can increase sd to satisfy bigger bin size but in that case noise is too > big for lower bin size. > > I would like to add smaller random noise to lower size bins and bigger > random noise to higher size bins, which seems to be easy task but I am > stuck how to do it. It should be somehow proportional to size value. > The only way forward I see is to sort generated result and to use > something like > > + rnorm(1000, mean=xd, sd=xd/10) > But it is not correct. > > I'd appreciate any hint how to add random noise to values in ordered > manner. > > Best regards. > Petr > > Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch > partner? PRECHEZA a.s. jsou zve?ejn?ny na: > https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information > about processing and protection of business partner?s personal data are > available on website: > https://www.precheza.cz/en/personal-data-protection-principles/ > D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou > d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? > odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any > documents attached to it may be confidential and are subject to the legally > binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
PIKAL Petr
2022-Feb-03 16:44 UTC
[R] generate distribution based on summary data and add random noise
Hallo Bert probably not, sorry. Did you try my examples? To make it maybe simpler 1. sample a vector with given proportion and generate new data 2. add random noise to each generated value with sd given by value of a vector. let say x <- c(10, 100) y <- c(.6, .4) set.seed(200) z <- sample(x, 10, rep=TRUE, prob=y) ind <- order(z) bins <- rle(z[ind]) bin1 <- rnorm(bins$lengths[1], mean = 0, sd=bins$values[1]/5) bin2 <- rnorm(bins$lengths[2], mean = 0, sd=bins$values[2]/5) z[ind] + c(bin1, bin2) Sorry that I did not explain myself more clearly, I hoped that example showed what I have on mind. Basically it is particle size cumulative distribution but size is expressed as size bins. Normally I have exact size measurement for each particle. S pozdravem | Best Regards RNDr. Petr PIKAL Vedouc? V?zkumu a v?voje | Research Manager PRECHEZA a.s. n?b?. Dr. Edvarda Bene?e 1170/24 | 750 02 P?erov | Czech Republic Tel: +420 581 252 256 | GSM: +420 724 008 364 mailto:petr.pikal at precheza.cz | https://www.precheza.cz/ Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch partner? PRECHEZA a.s. jsou zve?ejn?ny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner?s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ From: Bert Gunter <bgunter.4567 at gmail.com> Sent: Thursday, February 3, 2022 5:10 PM To: PIKAL Petr <petr.pikal at precheza.cz> Cc: R-help <r-help at r-project.org> Subject: Re: [R] generate distribution based on summary data and add random noise If I understand correctly: To generate a sample of total size N, generate a uniform sample of size p*N for a bin with proportion p? ?runif Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Feb 3, 2022 at 7:52 AM PIKAL Petr <mailto:petr.pikal at precheza.cz> wrote: Hallo all I have summary data with size bins and percentage below that size. dat <- structure(list(size = c(10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L, 90L, 100L, 110L, 120L, 130L, 140L, 150L, 160L, 170L, 180L, 190L, 200L, 250L, 300L, 400L, 500L), percent = c(0L, 0L, 0L, 1L, 1L, 2L, 4L, 8L, 13L, 18L, 24L, 31L, 38L, 44L, 50L, 57L, 65L, 72L, 76L, 83L, 95L, 98L, 100L, 100L)), class = "data.frame", row.names = c(NA, -24L)) #I want to generate original distribution (I know it is better not to do it but I have no other choice) so I calculated #mids of those bins xd <-dat$size-c(5,diff(dat$size)/2) xd<- xd[-1] #I can sample the size bins with probability given by percent. Result <- sample(xd, 1000, rep=T, prob=diff(dat$percent)/100) plot(ecdf(Result)) #and I can add some noise to it, which is satisfactory with lower size bins but not enough for higher size bins. Result <- sample(xd, 1000, rep=T, prob=diff(dat$percent)/100)+rnorm(1000, mean=0, sd=5) plot(ecdf(Result)) I can increase sd to satisfy bigger bin size but in that case noise is too big for lower bin size. I would like to add smaller random noise to lower size bins and bigger random noise to higher size bins, which seems to be easy task but I am stuck how to do it. It should be somehow proportional to size value. The only way forward I see is to sort generated result and to use something like + rnorm(1000, mean=xd, sd=xd/10) But it is not correct. I'd appreciate any hint how to add random noise to values in ordered manner. Best regards. Petr Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch partner? PRECHEZA a.s. jsou zve?ejn?ny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner?s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ ______________________________________________ mailto:R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.