Dear R-list, I have made a simple kernel density estimation by x <- c(2,1,3,2,3,0,4,5,10,11,12,11,10) kde <- density(x,n=100) Now I would like to know the estimated probability that a new observation falls into the interval 0<x<3. How can I integrate over the corresponding interval? In several R-packages for kernel density estimation I did not found a corresponding function. I could apply Simpson's Rule for integrating, but perhaps somebody knows a better solution. Thanks a lot for help! Pedro _________
Not a direct answer to your question, but if you use a logspline density estimate rather than a kernal density estimate then the logspline package will help you and it has built in functions for dlogspline, qlogspline, and plogspline that do the integrals for you. If you want to stick with the KDE, then you could find the area under each of the kernals for the range you are interested in (need to work out the standard deviation used from the bandwidth, then use pnorm for the default gaussian kernal), then just sum the individual areas. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111 -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Pedro Ramirez Sent: Wednesday, June 07, 2006 11:00 AM To: r-help at stat.math.ethz.ch Subject: [R] Density Estimation Dear R-list, I have made a simple kernel density estimation by x <- c(2,1,3,2,3,0,4,5,10,11,12,11,10) kde <- density(x,n=100) Now I would like to know the estimated probability that a new observation falls into the interval 0<x<3. How can I integrate over the corresponding interval? In several R-packages for kernel density estimation I did not found a corresponding function. I could apply Simpson's Rule for integrating, but perhaps somebody knows a better solution. Thanks a lot for help! Pedro _________ ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Pedro wrote:> I have made a simple kernel density estimation by > > x <- c(2,1,3,2,3,0,4,5,10,11,12,11,10) > kde <- density(x,n=100) > > Now I would like to know the estimated probability that a > new observation falls into the interval 0<x<3. > > How can I integrate over the corresponding interval? > In several R-packages for kernel density estimation I did > not found a corresponding function. I could apply > Simpson's Rule for integrating, but perhaps somebody > knows a better solution.One possibility is to use splinefun(): > spiffy <- splinefun(kde$x,kde$y) > integrate(spiffy,0,3) 0.2353400 with absolute error < 2e-09 cheers, Rolf Turner rolf at math.unb.ca
>Not a direct answer to your question, but if you use a logspline density >estimate rather than a kernal density estimate then the logspline >package will help you and it has built in functions for dlogspline, >qlogspline, and plogspline that do the integrals for you. > >If you want to stick with the KDE, then you could find the area under >each of the kernals for the range you are interested in (need to work >out the standard deviation used from the bandwidth, then use pnorm for >the default gaussian kernal), then just sum the individual areas. > >Hope this helps,Thanks a lot for your quick help! I think I will follow your first suggestion (logspline density estimation) instead of summing over the kernel areas because at the boundaries of the range truncated kernel areas can occur, so I think it is easier to do it with logsplines. Thanks again for your help!! Pedro> >-- >Gregory (Greg) L. Snow Ph.D. >Statistical Data Center >Intermountain Healthcare >greg.snow at intermountainmail.org >(801) 408-8111 > > >-----Original Message----- >From: r-help-bounces at stat.math.ethz.ch >[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Pedro Ramirez >Sent: Wednesday, June 07, 2006 11:00 AM >To: r-help at stat.math.ethz.ch >Subject: [R] Density Estimation > >Dear R-list, > >I have made a simple kernel density estimation by > >x <- c(2,1,3,2,3,0,4,5,10,11,12,11,10) >kde <- density(x,n=100) > >Now I would like to know the estimated probability that a new >observation falls into the interval 0<x<3. > >How can I integrate over the corresponding interval? >In several R-packages for kernel density estimation I did not found a >corresponding function. I could apply Simpson's Rule for integrating, >but perhaps somebody knows a better solution. > >Thanks a lot for help! > >Pedro > >_________ > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! >http://www.R-project.org/posting-guide.html >
>In mathematical terms the optimal bandwith for density estimation >decreases at rate n^{-1/5}, while the one for distribution function >decreases at rate n^{-1/3}, if n is the sample size. In practical terms, >one must choose an appreciably smaller bandwidth in the second case >than in the first one.Thanks a lot for your remark! I was not aware of the fact that the optimal bandwidths for density and distribution do not decrease at the same rate.>Besides the computational aspect, there is a statistical one: >the optimal choice of bandwidth for estimating the density function >is not optimal (and possibly not even jsut sensible) for estimating >the distribution function, and the stated problem is equivalent to >estimation of the distribution function.The given interval "0<x<3" was only an example, in fact I would like to estimate the probability for intervals such as "0<=x<1" , "1<=x<2" , "2<=x<3" , "3<=x<4" , .... and compare it with the estimates of a corresponding histogram. In this case the stated problem is not anymore equivalent to the estimation of the distribution function. What do you think, can I go a ahead in this case with the optimal bandwidth for the density? Thanks a lot for your help! Best wishes Pedro>best wishes, > >Adelchi > > >PR> >PR> > >PR> >-- >PR> >Gregory (Greg) L. Snow Ph.D. >PR> >Statistical Data Center >PR> >Intermountain Healthcare >PR> >greg.snow at intermountainmail.org >PR> >(801) 408-8111 >PR> > >PR> > >PR> >-----Original Message----- >PR> >From: r-help-bounces at stat.math.ethz.ch >PR> >[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Pedro >PR> >Ramirez Sent: Wednesday, June 07, 2006 11:00 AM >PR> >To: r-help at stat.math.ethz.ch >PR> >Subject: [R] Density Estimation >PR> > >PR> >Dear R-list, >PR> > >PR> >I have made a simple kernel density estimation by >PR> > >PR> >x <- c(2,1,3,2,3,0,4,5,10,11,12,11,10) >PR> >kde <- density(x,n=100) >PR> > >PR> >Now I would like to know the estimated probability that a new >PR> >observation falls into the interval 0<x<3. >PR> > >PR> >How can I integrate over the corresponding interval? >PR> >In several R-packages for kernel density estimation I did not >PR> >found a corresponding function. I could apply Simpson's Rule for >PR> >integrating, but perhaps somebody knows a better solution. >PR> > >PR> >Thanks a lot for help! >PR> > >PR> >Pedro >PR> > >PR> >_________ >PR> > >PR> >______________________________________________ >PR> >R-help at stat.math.ethz.ch mailing list >PR> >https://stat.ethz.ch/mailman/listinfo/r-help >PR> >PLEASE do read the posting guide! >PR> >http://www.R-project.org/posting-guide.html >PR> > >PR> >PR> ______________________________________________ >PR> R-help at stat.math.ethz.ch mailing list >PR> https://stat.ethz.ch/mailman/listinfo/r-help >PR> PLEASE do read the posting guide! >PR> http://www.R-project.org/posting-guide.html >PR>