Michael Haenlein
2013-Jan-22 11:49 UTC
[R] Approximating discrete distribution by continuous distribution
Dear all, I have a discrete distribution showing how age is distributed across a population using a certain set of bands: Age <- matrix(c(74045062, 71978405, 122718362, 40489415), ncol=1, dimnames=list(c("<18", "18-34", "35-64", "65+"),c())) Age_dist <- Age/sum(Age) For example I know that 23.94% of all people are between 0-18 years, 23.28% between 18-34 years and so forth. I would like to find a continuous approximation of this discrete distribution in order to estimate the probability that a person is for example 16 years old. Is there some automatic way in R through which this can be done? I tried a Kernel density estimation of the histogram but this does not seem to provide what I'm looking for. Thanks very much for your help, Michael Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]]
Barry Rowlingson
2013-Jan-22 12:39 UTC
[R] Approximating discrete distribution by continuous distribution
On Tue, Jan 22, 2013 at 11:49 AM, Michael Haenlein <haenlein at escpeurope.eu> wrote:> I would like to find a continuous approximation of this discrete > distribution in order to estimate the probability that a person is for > example 16 years old.Given that people age continuously (and continually...), you sound like you are trying to replace one discrete distribution with another (discretised by year). A continuous distribution would give you, for example, the probability that a person is between 16.0 and 16.1 years old. Barry
Prof Brian Ripley
2013-Jan-22 12:45 UTC
[R] Approximating discrete distribution by continuous distribution
On 22/01/2013 11:49, Michael Haenlein wrote:> Dear all, > > I have a discrete distribution showing how age is distributed across a > population using a certain set of bands: > > Age <- matrix(c(74045062, 71978405, 122718362, 40489415), ncol=1, > dimnames=list(c("<18", "18-34", "35-64", "65+"),c())) > Age_dist <- Age/sum(Age) > > For example I know that 23.94% of all people are between 0-18 years, 23.28% > between 18-34 years and so forth. > > I would like to find a continuous approximation of this discrete > distribution in order to estimate the probability that a person is for > example 16 years old. > > Is there some automatic way in R through which this can be done? I tried a > Kernel density estimation of the histogram but this does not seem to > provide what I'm looking for.This is not really an R question, but a statistics one. It is almost guesswork: if for example these were drivers in the UK, the answer is 0. So you need to supply some information about the shape of the distribution of <18 year olds. You have estimates of the cumulative distribution function at c(0, 18, 35, 65, Inf) (or some better upper limit). You want to interpolate it. You could use linear interpolation (approx[fun]) or a monotone spline interpolation (spline[fun]) or any other interpolation method which meets your needs. But whatever you use, you will supplying a lot of information not actually in your data.> > Thanks very much for your help, > > Michael > > > Michael Haenlein > Associate Professor of Marketing > ESCP Europe > Paris, France > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595