Visual inspection of the plot of a density() function vs a normal with the same mean and variance suggests the area under the density curve is bigger than under the normal curve. The two curves are very close over most of the domain. Assuming the normal curve does integrate to 1, this implies the area under density() is > 1. Is there any assurance that the density kernel smoother produces something that integrates to 1? Or am I seeing things? I suppose an additional complexity is that density() produces discrete output, but then I'm looking at the continuous curve plot produced.
On 24-Sep-03 Ross Boylan wrote:> Visual inspection of the plot of a density() function vs a normal with > the same mean and variance suggests the area under the density curve is > bigger than under the normal curve. The two curves are very close over > most of the domain. Assuming the normal curve does integrate to 1, > this > implies the area under density() is > 1. > > Is there any assurance that the density kernel smoother produces > something that integrates to 1? Or am I seeing things? > > I suppose an additional complexity is that density() produces discrete > output, but then I'm looking at the continuous curve plot produced.It should integrate to 1 (see help for density), and sum to something very close to 1 depending on the number of points ("n=...") at which density is evaluated. Example:> X<-rnorm(1000) > Y<-density(X) # n = 512 (default) > x<-Y$x; y<-Y$y; > k<-length(x);d<-min(x[2:k]-x[1:k-1]); > sum(y*d)[1] 1.000975> Y<-density(X,n=2000) > x<-Y$x; y<-Y$y; > k<-length(x);d<-min(x[2:k]-x[1:k-1]); > sum(y*d)[1] 1.000240> Y<-density(X,n=100000) > x<-Y$x; y<-Y$y; > k<-length(x);d<-min(x[2:k]-x[1:k-1]); > sum(y*d)[1] 0.9999996 Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 167 1972 Date: 24-Sep-03 Time: 23:15:15 ------------------------------ XFMail ------------------------------
There was a related thread on R-help, probably last year. The question was getting density() to numerically integrate to 1. The answer is, "yes". If you do fine enough partitions, you will see that it integrates to one. And yes, a kernel density estimate is theoretically a true density (assuming the kernel used is a pdf), because it is just a n-component mixture of the kernel. Andy> -----Original Message----- > From: Ross Boylan [mailto:ross at biostat.ucsf.edu] > Sent: Wednesday, September 24, 2003 5:36 PM > To: r-help > Subject: [R] density() integrates to 1? > > > Visual inspection of the plot of a density() function vs a > normal with the same mean and variance suggests the area > under the density curve is bigger than under the normal > curve. The two curves are very close over most of the > domain. Assuming the normal curve does integrate to 1, this > implies the area under density() is > 1. > > Is there any assurance that the density kernel smoother > produces something that integrates to 1? Or am I seeing things? > > I suppose an additional complexity is that density() produces > discrete output, but then I'm looking at the continuous curve > plot produced. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo> /r-help >
We can try a to approximate the area under the curve using Trapezoidal rule on the plotting coordinates that density() produces. nbin <- 1024 # number of bin d <- density( rnorm(50000), n=nbin) totalArea <- 0 for(i in 1:(nbin-1) ){ xxx <- d$x[i+1] - d$x[i] # width of bin yyy <- (d$y[i+1] + d$y[i])/2 # average height of bin binArea <- xxx*yyy totalArea <- totalArea + binArea } print(totalArea) We can see that the total area under the curve is close to 1 and the approximation gets better as nbin is increased (but this is always an overestimate due to the concavity of the normal curve).