Benjamin Otto
2007-Jan-24 16:34 UTC
[R] Fit model to data and use model for data generation
Hi, Suppose I have a set of values x and I want to calculate the distribution of the data. Ususally I would use the "density" command. Now, can I use the resulting "density-object" model to generate a number of new values which have the same distribution? Or do I have to use some different function? Regards, Benjamin -- Benjamin Otto Universitaetsklinikum Eppendorf Hamburg Institut fuer Klinische Chemie Martinistrasse 52 20246 Hamburg
Stephen D. Weigand
2007-Jan-25 05:03 UTC
[R] Fit model to data and use model for data generation
On Jan 24, 2007, at 10:34 AM, Benjamin Otto wrote:> Hi, > > Suppose I have a set of values x and I want to calculate the > distribution of > the data. Ususally I would use the "density" command. Now, can I use > the > resulting "density-object" model to generate a number of new values > which > have the same distribution? Or do I have to use some different > function? > > Regards, > > Benjamin > > -- > Benjamin Otto > Universitaetsklinikum Eppendorf Hamburg > Institut fuer Klinische Chemie > Martinistrasse 52 > 20246 Hamburg >You could sample from the x's in the density object with probability given by the y's: ### Create a bimodal distribution x <- c(rnorm(25, -2, 1), rnorm(50, 3, 2)) d <- density(x, n = 1000) plot(d) ### Sample from the distribution and show the two ### distributions are the same x.new <- sample(d$x, size = 100000, # large n for proof of concept replace = TRUE, prob = d$y/sum(d$y)) dx.new <- density(x.new) lines(dx.new$x, dx.new$y, col = "blue") Hope this helps, Stephen Rochester, Minnesota, USA
Roberto Perdisci
2007-Jan-25 15:13 UTC
[R] Fit model to data and use model for data generation
On 1/25/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote: > That gives a discrete distribution, which may well matter for small> samples. > > Since density() is returning an equal-weighted mixture of (by default) > normal distributions, all you need to do is > > x.new <- rnorm(n, sample(x, size = n, replace=TRUE), bw)Prof. Ripley, I didn't understand why you used sample(x, size = n, replace=TRUE) I though the mixture should be computed using all the points in x as means, like in x.new <- rnorm(n, x, bw) Could you explain why you propose x.new <- rnorm(n, sample(x, size = n, replace=TRUE), bw) instead? Could you also briefly say in what sense kde is biased? thank you very much, best regards, Roberto> where bw is the bandwidth used by density (d$bw in this example). > (This is known as a 'smoothed bootstrap' in some circles.) > > > > ### Create a bimodal distribution > > x <- c(rnorm(25, -2, 1), rnorm(50, 3, 2)) > > d <- density(x, n = 1000) > > plot(d) > > > > ### Sample from the distribution and show the two > > ### distributions are the same > > x.new <- sample(d$x, size = 100000, # large n for proof of concept > > replace = TRUE, prob = d$y/sum(d$y)) > > dx.new <- density(x.new) > > lines(dx.new$x, dx.new$y, col = "blue") > > BTW, lines(density(x.news), col = "blue") works here, and you do need to > remember that a kde is biased. But my solution matches better than yours. > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >