Gonçalo Ferraz
2011-Sep-08 15:36 UTC
[R] Density function: Area under density plot is not equal to 1. Why?
Hi, I have a vector 'data' of 58 probability values (bounded between 0 and 1) and want to draw a probability density function of these values. For this, I used the commands: data <- runif(58) a <- density(data, from=0, to=1) plot(a, type="l",lwd=3) But then, when I try to approximate the area under the plotted curve with the command: area <- sum(a$y)*(a$x[1]-a$y[2]) I get an area that is clearly smaller than 1. Strangely, if I don't bound the density function with 'to=0,from=1' (which is against my purpose because it extends the pdf beyond the limits of a probability value), I get an area of 1.000. This suggests that I am computing the area well, but using the density function improperly. Why is this happening? Does anyone know how to constrain the density function while still getting a true pdf (summing to 1 under the curve) at the end? Should I use a different function? I read through the density function notes but could not figure out a solution. Thank you! Gon?alo
Jean-Christophe BOUËTTÉ
2011-Sep-08 16:29 UTC
[R] Density function: Area under density plot is not equal to 1. Why?
Is your "data" supposed to be observations, or values of the density of the underlying law? Also, could you explain the rationale behind : sum(a$y)*(a$x[1]-a$y[2]) because it is not immediately clear to the reader. 2011/9/8 Gon?alo Ferraz <gferraz29 at gmail.com>:> Hi, I have a vector 'data' of 58 probability values (bounded between 0 and 1) and want to draw a probability density function of these values. For this, I used the commands: > > data <- runif(58) > > a <- density(data, from=0, to=1) > plot(a, type="l",lwd=3) > > But then, when I try to approximate the area under the plotted curve with the command: > > area <- sum(a$y)*(a$x[1]-a$y[2]) > > I get an area that is clearly smaller than 1. > > Strangely, if I don't bound the density function with 'to=0,from=1' (which is against my purpose because it extends the pdf beyond the limits of a probability value), I get an area of 1.000. This suggests that I am computing the area well, but using the density function improperly. > > Why is this happening? Does anyone know how to constrain the density function while still getting a true pdf (summing to 1 under the curve) at the end? Should I use a different function? I read through the density function notes but could not figure out a solution. > > Thank you! > > Gon?alo > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Greg Snow
2011-Sep-08 16:57 UTC
[R] Density function: Area under density plot is not equal to 1. Why?
For bounded density estimation look at the logspline package instead of the regular density function. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Gon?alo Ferraz > Sent: Thursday, September 08, 2011 9:36 AM > To: r-help at r-project.org > Subject: [R] Density function: Area under density plot is not equal to > 1. Why? > > Hi, I have a vector 'data' of 58 probability values (bounded between 0 > and 1) and want to draw a probability density function of these values. > For this, I used the commands: > > data <- runif(58) > > a <- density(data, from=0, to=1) > plot(a, type="l",lwd=3) > > But then, when I try to approximate the area under the plotted curve > with the command: > > area <- sum(a$y)*(a$x[1]-a$y[2]) > > I get an area that is clearly smaller than 1. > > Strangely, if I don't bound the density function with 'to=0,from=1' > (which is against my purpose because it extends the pdf beyond the > limits of a probability value), I get an area of 1.000. This suggests > that I am computing the area well, but using the density function > improperly. > > Why is this happening? Does anyone know how to constrain the density > function while still getting a true pdf (summing to 1 under the curve) > at the end? Should I use a different function? I read through the > density function notes but could not figure out a solution. > > Thank you! > > Gon?alo > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Albyn Jones
2011-Sep-08 17:03 UTC
[R] Density function: Area under density plot is not equal to 1. Why?
Look at area <- sum(a$y)*(a$x[1]-a$y[2]) The problem appears to be "a$x[1]-a$y[2]"; that is not the length of the base of an approximating rectangle, whatever it is :-) albyn On Thu, Sep 08, 2011 at 11:36:23AM -0400, Gon?alo Ferraz wrote:> Hi, I have a vector 'data' of 58 probability values (bounded between 0 and 1) and want to draw a probability density function of these values. For this, I used the commands: > > data <- runif(58) > > a <- density(data, from=0, to=1) > plot(a, type="l",lwd=3) > > But then, when I try to approximate the area under the plotted curve with the command: > > area <- sum(a$y)*(a$x[1]-a$y[2]) > > I get an area that is clearly smaller than 1. > > Strangely, if I don't bound the density function with 'to=0,from=1' (which is against my purpose because it extends the pdf beyond the limits of a probability value), I get an area of 1.000. This suggests that I am computing the area well, but using the density function improperly. > > Why is this happening? Does anyone know how to constrain the density function while still getting a true pdf (summing to 1 under the curve) at the end? Should I use a different function? I read through the density function notes but could not figure out a solution. > > Thank you! > > Gon?alo > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Albyn Jones Reed College jones at reed.edu