Hi folks, I know that density function will give a estimated density for a give dataset. Now from that I want to have a percentage estimation for a certain range. For examle: > y = density(c(-20,rep(0,98),20)) > plot(y, xlim=c(-4,4)) Now if I want to know the percentage of data lying in (-20,2). Basically it should be the area of the curve from -20 to 2. Anybody knows a simple function to do it? Thanks, D.
On 28/01/12 11:44, Duke wrote:> Hi folks, > > I know that density function will give a estimated density for a give > dataset. Now from that I want to have a percentage estimation for a > certain range. For examle: > > > y = density(c(-20,rep(0,98),20)) > > plot(y, xlim=c(-4,4)) > > Now if I want to know the percentage of data lying in (-20,2). > Basically it should be the area of the curve from -20 to 2. Anybody > knows a simple function to do it?You could try: foo <- with(y,splinefun(x,y)) integrate(foo,lower=-20,upper=2) Note that integrate(foo,lower=min(y$x),upper=max(y$x)) yields "1.000951 with absolute error < 0.00011", rather than giving exactly 1, so there's a bit of slop in the system. cheers, Rolf Turner
If you use logspline estimation (logspline package) instead of kernel density estimation then this is simple as there are cumulative area functions for logspline fits. If you need to do this with kernel density estimates then you can just find the area over your region for the kernel centered at each data point and average those values together to get the area under the entire density estimate. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Duke Sent: Friday, January 27, 2012 3:45 PM To: r-help at r-project.org Subject: [R] percentage from density() Hi folks, I know that density function will give a estimated density for a give dataset. Now from that I want to have a percentage estimation for a certain range. For examle: > y = density(c(-20,rep(0,98),20)) > plot(y, xlim=c(-4,4)) Now if I want to know the percentage of data lying in (-20,2). Basically it should be the area of the curve from -20 to 2. Anybody knows a simple function to do it? Thanks, D. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
If v is your original data, v <- c(-20, rep(0,98), 20) why not use mean( -20 < v & v < 2) as your estimate of the probability that v is in (-20,2)? Estimating a density is like taking the derivative of a smooth of the empirical distribution function, so why not eliminate the middleman instead of integrating the estimated density? Any difference between the two methods tells more about the smoothing used than about the data involved. (Not that I am any sort of expert in this matter.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Greg Snow > Sent: Saturday, January 28, 2012 8:12 PM > To: Duke; r-help at r-project.org > Subject: Re: [R] percentage from density() > > If you use logspline estimation (logspline package) instead of kernel density estimation then this is > simple as there are cumulative area functions for logspline fits. > > If you need to do this with kernel density estimates then you can just find the area over your region > for the kernel centered at each data point and average those values together to get the area under the > entire density estimate. > > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Duke > Sent: Friday, January 27, 2012 3:45 PM > To: r-help at r-project.org > Subject: [R] percentage from density() > > Hi folks, > > I know that density function will give a estimated density for a give > dataset. Now from that I want to have a percentage estimation for a > certain range. For examle: > > > y = density(c(-20,rep(0,98),20)) > > plot(y, xlim=c(-4,4)) > > Now if I want to know the percentage of data lying in (-20,2). Basically > it should be the area of the curve from -20 to 2. Anybody knows a simple > function to do it? > > Thanks, > > D. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Great suggestions and comments, Bill, Greg and Rolf. You provided me some valuable ways to deal with the data I am working with. Thank you all so much! Bests, D. On 1/29/12 4:03 PM, William Dunlap wrote:> If v is your original data, > v<- c(-20, rep(0,98), 20) > why not use > mean( -20< v& v< 2) > as your estimate of the probability that v is in (-20,2)? > > Estimating a density is like taking the derivative > of a smooth of the empirical distribution function, > so why not eliminate the middleman instead of integrating > the estimated density? Any difference between the two > methods tells more about the smoothing used than about > the data involved. (Not that I am any sort of expert > in this matter.) > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com >> -----Original Message----- >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Greg Snow >> Sent: Saturday, January 28, 2012 8:12 PM >> To: Duke; r-help at r-project.org >> Subject: Re: [R] percentage from density() >> >> If you use logspline estimation (logspline package) instead of kernel density estimation then this is >> simple as there are cumulative area functions for logspline fits. >> >> If you need to do this with kernel density estimates then you can just find the area over your region >> for the kernel centered at each data point and average those values together to get the area under the >> entire density estimate. >> >> -----Original Message----- >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Duke >> Sent: Friday, January 27, 2012 3:45 PM >> To: r-help at r-project.org >> Subject: [R] percentage from density() >> >> Hi folks, >> >> I know that density function will give a estimated density for a give >> dataset. Now from that I want to have a percentage estimation for a >> certain range. For examle: >> >> > y = density(c(-20,rep(0,98),20)) >> > plot(y, xlim=c(-4,4)) >> >> Now if I want to know the percentage of data lying in (-20,2). Basically >> it should be the area of the curve from -20 to 2. Anybody knows a simple >> function to do it? >> >> Thanks, >> >> D. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.