Jonathan Li
2002-Aug-30 19:50 UTC
[R] density() returns a density function that does not add up to 1
Dear R users, I ran into this curious problem:> d <- rnorm(100) > d.density <- density(d) > sum( d.density$x * d.density$y)[1] 2.517502 Admittedly the method of computing the mass under the density curve at line 3 is crude. But 2.5 is pretty far from 1, the value it should be. I tried a few other dataset and got similar result. Am I missing something obvious? Or is the return of density() not supposed to be normalized? Thanks in advance! Jonathan -- Jonathan Q. Li, PhD Agilent Technologies Laboratory Palo Alto, California, USA -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
A.J. Rossini
2002-Aug-30 20:05 UTC
[R] density() returns a density function that does not add up to 1
>>>>> "jonathan" == Jonathan Li <jonqli at labs.agilent.com> writes:jonathan> Dear R users, jonathan> I ran into this curious problem: >> d <- rnorm(100) >> d.density <- density(d) >> sum( d.density$x * d.density$y) jonathan> [1] 2.517502 I get 6.5 jonathan> Admittedly the method of computing the mass under the density curve at jonathan> line 3 is crude. It's actually wrong. You aren't computing an integral, you need the the approximate area under d.density$y[i], NOT the value of the X axis. jonathan> But 2.5 is pretty far from 1, the value it should be. nope. jonathan> I tried a few other dataset and got similar result. Am I missing jonathan> something obvious? jonathan> Or is the return of density() not supposed to be normalized? You just need to integrate properly. You are attempting Riemann integration, but need to change things a bit. best, -tony -- A.J. Rossini Rsrch. Asst. Prof. of Biostatistics U. of Washington Biostatistics rossini at u.washington.edu FHCRC/SCHARP/HIV Vaccine Trials Net rossini at scharp.org -------------- http://software.biostat.washington.edu/ ---------------- FHCRC: M: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email UW: Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX (my tuesday/wednesday/friday locations are completely unpredictable.) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
A.J. Rossini
2002-Aug-30 20:09 UTC
[R] density() returns a density function that does not add up to 1
>>>>> "jonathan" == Jonathan Li <jonqli at labs.agilent.com> writes:jonathan> Dear R users, jonathan> I ran into this curious problem: >> d <- rnorm(100) >> d.density <- density(d) >> sum( d.density$x * d.density$y) jonathan> [1] 2.517502 whoops, you probably want something like:> sum( diff(d.density$x) * d.density$y)[1] 1.000950 instead, for a crude integration (and yes, it throws a warning). -- A.J. Rossini Rsrch. Asst. Prof. of Biostatistics U. of Washington Biostatistics rossini at u.washington.edu FHCRC/SCHARP/HIV Vaccine Trials Net rossini at scharp.org -------------- http://software.biostat.washington.edu/ ---------------- FHCRC: M: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email UW: Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX (my tuesday/wednesday/friday locations are completely unpredictable.) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Jonathan Li
2002-Aug-30 20:28 UTC
[R] density() returns a density function that does not add up to 1
Thank you for the message. It helps a lot. I see where my method was wrong. Now another case:> dd <- c(rnorm(1000,0,1), rnorm(100, 100,1), rnorm(10, 1000,1)) > ddd <- density(dd) > sum( diff(ddd$x)*ddd$y)[1] 2.856715 If you add more extreme tails, the thing gets worse:> dd <- c(rnorm(1000,0,1), rnorm(100, 100,1), rnorm(10, 1000,1), rnorm(5, 5000,1)) > ddd <- density(dd) > sum( diff(ddd$x) * ddd$y)[1] 13.90902 I am not sure what is going on now? Thanks again! Jonathan "A.J. Rossini" wrote:> > >>>>> "jonathan" == Jonathan Li <jonqli at labs.agilent.com> writes: > > jonathan> Dear R users, > jonathan> I ran into this curious problem: > > >> d <- rnorm(100) > >> d.density <- density(d) > >> sum( d.density$x * d.density$y) > jonathan> [1] 2.517502 > > whoops, you probably want something like: > > > sum( diff(d.density$x) * d.density$y) > [1] 1.000950 > > instead, for a crude integration (and yes, it throws a warning). > > -- > A.J. Rossini Rsrch. Asst. Prof. of Biostatistics > U. of Washington Biostatistics rossini at u.washington.edu > FHCRC/SCHARP/HIV Vaccine Trials Net rossini at scharp.org > -------------- http://software.biostat.washington.edu/ ---------------- > FHCRC: M: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email > UW: Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX > (my tuesday/wednesday/friday locations are completely unpredictable.)-- Jonathan Q. Li, PhD Agilent Technologies Laboratory Palo Alto, California, USA -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
ripley@stats.ox.ac.uk
2002-Aug-30 20:28 UTC
User error (was [R] density() returns a density function that does not add up to 1)
On Fri, 30 Aug 2002, Jonathan Li wrote:> I ran into this curious problem:(or, I did something silly)> > d <- rnorm(100) > > d.density <- density(d) > > sum( d.density$x * d.density$y) > [1] 2.517502 > > Admittedly the method of computing the mass under the density curve at > line 3 is crude.(and got a nonsensical answer)> But 2.5 is pretty far from 1, the value it should be. > > I tried a few other dataset and got similar result. Am I missing > something obvious? > Or is the return of density() not supposed to be normalized?I think you intended sum( unique(diff(d.density$x)) * d.density$y ) Your sum is 512x the estimate of E(X). -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Liaw, Andy
2002-Sep-02 03:26 UTC
[R] density() returns a density function that does not add up to 1
Note that Tony did say "crude". For your second example, try ddd <- density(dd, n=2^16) That, hopefully, gives you enough clue. (What exactly are you trying to do, anyway? If the kernel used is an actual pdf, as is case with all the kernel types in density(), I think, then the estimated density is guaranteed to be a true density, since it's just an n-component mixture of the kernels. There's no need to use crude numerical method to check the integral.) Andy> -----Original Message----- > From: Jonathan Li [mailto:jonqli at labs.agilent.com] > Sent: Friday, August 30, 2002 4:29 PM > To: rossini at u.washington.edu > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] density() returns a density function that > does not add > up to 1 > > > Thank you for the message. It helps a lot. I see where my method was > wrong. > Now another case: > > > dd <- c(rnorm(1000,0,1), rnorm(100, 100,1), rnorm(10, 1000,1)) > > ddd <- density(dd) > > sum( diff(ddd$x)*ddd$y) > [1] 2.856715 > > If you add more extreme tails, the thing gets worse: > > > dd <- c(rnorm(1000,0,1), rnorm(100, 100,1), rnorm(10, > 1000,1), rnorm(5, 5000,1)) > > ddd <- density(dd) > > sum( diff(ddd$x) * ddd$y) > [1] 13.90902 > > I am not sure what is going on now? > Thanks again! > Jonathan > > > "A.J. Rossini" wrote: > > > > >>>>> "jonathan" == Jonathan Li <jonqli at labs.agilent.com> writes: > > > > jonathan> Dear R users, > > jonathan> I ran into this curious problem: > > > > >> d <- rnorm(100) > > >> d.density <- density(d) > > >> sum( d.density$x * d.density$y) > > jonathan> [1] 2.517502 > > > > whoops, you probably want something like: > > > > > sum( diff(d.density$x) * d.density$y) > > [1] 1.000950 > > > > instead, for a crude integration (and yes, it throws a warning). > > > > -- > > A.J. Rossini Rsrch. Asst. Prof. > of Biostatistics > > U. of Washington Biostatistics rossini at u.washington.edu > > FHCRC/SCHARP/HIV Vaccine Trials Net rossini at scharp.org > > -------------- http://software.biostat.washington.edu/ > ---------------- > > FHCRC: M: 206-667-7025 (fax=4812)|Voicemail is pretty > sketchy/use Email > > UW: Th: 206-543-1044 (fax=3286)|Change last 4 digits of > phone to FAX > > (my tuesday/wednesday/friday locations are completely > unpredictable.) > > -- > Jonathan Q. Li, PhD > Agilent Technologies Laboratory > Palo Alto, California, USA > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.- > r-help mailing list -- Read > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: > r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. > _._._._._._._._._ >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._