Hello, I have some data, and I want to generate random numbers following the distribution of this data (in other words, to generate a synthetic data set sharing the same stats as a given data set). Reading an old thread I found the following text:>If you can compute the quantile function of the distribution (i.e., the >inverse of the integral of the pdf), then you can use the probability >integral transform: If U is a U(0,1) random variable and Q is the quantile >function of the distribution F, then Q(U) is a random variable distributed >as F.That sounds good, but is there a quick way to do this in R? Let's say my data is contained in "ee", I can get the quantiles using: qq = quantile(ee, probs=(0,1,0.25)) 0% 25% 50% 75% 100% -0.2573385519 -0.0041451053 0.0004538924 0.0049276991 0.1037823292 Then I "know" how to use the above method to generate Q(U) (by looking up U in the first row, and then mapping it to a number using the second row), but is there an R function that does that? Otherwise I need to write my own to lookup the table. Thanks in advance, Ivan _________________________________________________________________ [[alternative HTML version deleted]]
ivan popivanov wrote:> Hello, > > I have some data, and I want to generate random numbers following the distribution of this data (in other words, to generate a synthetic data set sharing the same stats as a given data set). Reading an old thread I found the following text: > >> If you can compute the quantile function of the distribution (i.e., the >> inverse of the integral of the pdf), then you can use the probability >> integral transform: If U is a U(0,1) random variable and Q is the quantile >> function of the distribution F, then Q(U) is a random variable distributed >> as F. > > That sounds good, but is there a quick way to do this in R? Let's say my data is contained in "ee", I can get the quantiles using: > > qq = quantile(ee, probs=(0,1,0.25)) > 0% 25% 50% 75% 100% > -0.2573385519 -0.0041451053 0.0004538924 0.0049276991 0.1037823292 > > Then I "know" how to use the above method to generate Q(U) (by looking up U in the first row, and then mapping it to a number using the second row), but is there an R function that does that? Otherwise I need to write my own to lookup the table. > > Thanks in advance, > IvanQ <- approxfun(x,sort(ee)) with x=(0:(n-1))/(n-1) is your friend, I think. Beware the details of the interpolation, though, in some variants you end up reinventing the bootstrap. Also the fact that your generated variables tend to be constrained to the range of ee should at least be noted. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Look at the logspline package for an alternative. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of ivan popivanov > Sent: Saturday, December 12, 2009 7:38 PM > To: r-help at r-project.org > Subject: [R] A random number from any distribution?? > > > Hello, > > I have some data, and I want to generate random numbers following the > distribution of this data (in other words, to generate a synthetic data > set sharing the same stats as a given data set). Reading an old thread > I found the following text: > > >If you can compute the quantile function of the distribution (i.e., > the > >inverse of the integral of the pdf), then you can use the probability > >integral transform: If U is a U(0,1) random variable and Q is the > quantile > >function of the distribution F, then Q(U) is a random variable > distributed > >as F. > > That sounds good, but is there a quick way to do this in R? Let's say > my data is contained in "ee", I can get the quantiles using: > > qq = quantile(ee, probs=(0,1,0.25)) > 0% 25% 50% 75% 100% > -0.2573385519 -0.0041451053 0.0004538924 0.0049276991 0.1037823292 > > Then I "know" how to use the above method to generate Q(U) (by looking > up U in the first row, and then mapping it to a number using the second > row), but is there an R function that does that? Otherwise I need to > write my own to lookup the table. > > Thanks in advance, > Ivan > > _________________________________________________________________ > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.