bogdan romocea
2004-Nov-13 03:53 UTC
[R] density estimation: compute sum(value * probability) for given distribution
Dear R users, This is a KDE beginner's question. I have this distribution:> length(cap)[1] 200> summary(cap)Min. 1st Qu. Median Mean 3rd Qu. Max. 459.9 802.3 991.6 1066.0 1242.0 2382.0 I need to compute the sum of the values times their probability of occurence. The graph is fine, den <- density(cap, from=min(cap), to=max(cap), give.Rkern=F) plot(den) However, how do I compute sum(values*probabilities)? The probabilities produced by the density function sum to only 26%:> sum(den$y)[1] 0.2611142 Would it perhaps be ok to simply do> sum(den$x*den$y) * (1/sum(den$y))[1] 1073.22 ? Thank you, b.
Liaw, Andy
2004-Nov-13 13:11 UTC
[R] density estimation: compute sum(value * probability) for given distribution
First thing you probably should realize is that density is _not_ probability. A probability density function _integrates_ to one, not _sum_ to one. If X is an absolutely continuous RV with density f, then Pr(X=x)=0 for all x, and Pr(a < X < b) = \int_a^b f(x) dx. sum x*Pr(X=x) (over all possible values of x) for a discrete distribution is just the expectation, or mean, of the distribution. The expectation for a continuous distribution is \int x f(x) dx, where the integral is over the support of f. This is all elementary math stat that you can find in any textbook. Could you tell us exactly what you are trying to compute, or why you're computing it? HTH, Andy> From: bogdan romocea > > Dear R users, > > This is a KDE beginner's question. > I have this distribution: > > length(cap) > [1] 200 > > summary(cap) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 459.9 802.3 991.6 1066.0 1242.0 2382.0 > I need to compute the sum of the values times their probability of > occurence. > > The graph is fine, > den <- density(cap, from=min(cap), > to=max(cap), give.Rkern=F) > plot(den) > > However, how do I compute sum(values*probabilities)? The > probabilities produced by the density function sum to only 26%: > > sum(den$y) > [1] 0.2611142 > > Would it perhaps be ok to simply do > > sum(den$x*den$y) * (1/sum(den$y)) > [1] 1073.22 > ? > > Thank you, > b. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Uwe Ligges
2004-Nov-13 20:53 UTC
[R] density estimation: compute sum(value * probability) for given distribution
bogdan romocea wrote:> Dear R users, > > This is a KDE beginner's question. > I have this distribution: > >>length(cap) > > [1] 200 > >>summary(cap) > > Min. 1st Qu. Median Mean 3rd Qu. Max. > 459.9 802.3 991.6 1066.0 1242.0 2382.0 > I need to compute the sum of the values times their probability of > occurence. > > The graph is fine, > den <- density(cap, from=min(cap), > to=max(cap), give.Rkern=F) > plot(den) > > However, how do I compute sum(values*probabilities)?I don't get the point. You are estimating using a gaussian kernel. Hint: What's the probability to get x=0 for a N(0,1) distribution? So sum(values*probabilities) is zero! > The> probabilities produced by the density function sum to only 26%:and could also sum to, e.g., 783453.9, depending on the number of observations and the estimated parameters of the desnity ...>>sum(den$y) > > [1] 0.2611142 > > Would it perhaps be ok to simply do > >>sum(den$x*den$y) * (1/sum(den$y)) > > [1] 1073.22 > ?No. den$x is a point where the density function is equal to den$y, but den$y is not the probability to get den$x (you know, the stuff with intervals)! I fear you are mixing theory from discrete with continuous distributions. Uwe Ligges> Thank you, > b. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
(Ted Harding)
2004-Nov-14 08:50 UTC
[R] density estimation: compute sum(value * probability) for
On 13-Nov-04 bogdan romocea wrote:> Dear R users, > > However, how do I compute sum(values*probabilities)? The > probabilities produced by the density function sum to only 26%: >> sum(den$y) > [1] 0.2611142 > > Would it perhaps be ok to simply do >> sum(den$x*den$y) * (1/sum(den$y)) > [1] 1073.22 > ?What you're missing is the "dx"! A density estimation estimates the probability density function g(x) such that int[g(x)*dx] = 1, and R's 'density' function returns estimated values of "g" at a discrete set of points. An integral can be approximated by a discrete summation of the form sum(g(x.i)*delta.x You can recover the set of x-values at which the density is estimated, and hence the implicit value of delta.x, from the returned density. Example: X<-rnorm(1000) f<-density(X) x<-f$x delta.x<-x[2]-x[1] g<-f$y sum(g*delta.x) [1] 1.000976 Hoping this helps, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 14-Nov-04 Time: 08:50:53 ------------------------------ XFMail ------------------------------