Marshall Feldman
2010-Jul-27 14:35 UTC
[R] how to generate a random data from a empirical, distribition
On 7/27/2010 6:00 AM, r-help-request at r-project.org wrote:> Date: Mon, 26 Jul 2010 11:36:29 -0700 (PDT) > From: xin wei<xinwei at stat.psu.edu> > To:r-help at r-project.org > Subject: [R] how to generate a random data from a empirical > distribition > Message-ID:<1280169389379-2302716.post at n4.nabble.com> > Content-Type: text/plain; charset=us-ascii > > > hi, this is more a statistical question than a R question. but I do want to > know how to implement this in R. > I have 10,000 data points. Is there any way to generate a empirical > probablity distribution from it (the problem is that I do not know what > exactly this distribution follows, normal, beta?). My ultimate goal is to > generate addition 20,000 data point from this empirical distribution created > from the existing 10,000 data points. > thank you all in advance. > > > -- View this message in context: > http://r.789695.n4.nabble.com/how-to-generate-a-random-data-from-a-empirical-distribition-tp2302716p2302716.html > Sent from the R help mailing list archive at Nabble.com.Ah! This brings back memories of the halcyon days of my youth when, as a junior in college, I took a course in introductory probability theory around this time during the summer in preparation for working as a co-op student the coming fall. Conceptually, why not treat your empirical sample as an "urn" with 10,000 items. Then take a sample of 20,000 by sampling with equal probabilities and replacement (otherwise you'll run out of cases before 20,000). Remember that all the common distributions (normal, etc.) either were derived because they fit certain common situations (e.g., binomial), are of particular use (e.g., Student's t), can be derived from other distributions (e.g., normal and the Central Limit Theorem), or some combination of such things. In other words, whether or not an empirical sample fits one of them is always contingent, although understanding any underlying processes that generate the sample might point in the direction of certain distributions over others. Nonetheless, for something like a Monte Carlo simulation, knowledge of an underlying distribution is not necessary. Also remember that many things in statistics were developed largely because they made certain problems mathematically tractable. (Hence, for example, the large number of situations involving independent, identically distributed random samples or the popularity of ordinary least-squares regression.) Today, most of us have more computing power at our desks than entire mainframe computing centers had a few decades ago. So in many instances, we don't need no stinkin' complex formulas anymore. If you suspect the distribution corresponds to one of the mathematically studied distributions, why not fit a curve to a plot of your data points and see if it looks familiar? Then do some kind of goodness-of-fit test to see if the theoretical distribution is a reasonable approximation. -- Dr. Marshall Feldman, PhD Director of Research and Academic Affairs CUSR Logo Center for Urban Studies and Research <http://www.uri.edu/prov/research/urbanstudies.html> The University of Rhode Island <http://www.uri.edu> email: marsh @ uri .edu (remove spaces) <mailto:marsh%20%5C%20uri%20.edu>