On Thu, 11 Dec 2008 14:28:31 +0100, Viktor Nagy wrote:
VN> Hi,
VN>
VN> I've estimated a simple kernel density of a univariate variable with
VN> density(), but after I would like to find out the CDF at specific
VN> values.
VN> How can I do it?
VN>
Answer 1.
Use approfun to interpolate the outcome from density() and then
use integrate(). The following lines show a *crude* coding of this
idea:
R> x<- rnorm(200)
R> pdf<- density(x)
R> f<- approxfun(pdf$x, pdf$y, yleft=0, yright=0)
R> cdf<-integrate(f, -Inf, 2) # replace '2' by any other value.
Answer 2.
Do not integrate the estimated density, since this is not the most
efficient estimate of the underlying CDF. Instead, smooth the empirical
distribution function, using a smaller bandwidth of the kernel. The
optimal bandwith for kernel density estimation is of order 0(n^{-1/5}),
while for CDF estimation is O(n^{-1/3}), if n denotes the sample size.
In practical terms you can still use density(), as indicated above, but
selecting a suitably smaller bandwith compared to the one used for
density estimation.
Best wishes
Adelchi Azzalini
--
Adelchi Azzalini <azzalini at stat.unipd.it>
Dipart.Scienze Statistiche, Universit? di Padova, Italia
tel. +39 049 8274147, http://azzalini.stat.unipd.it/