>>>>> "RV" == Ravi Varadhan <RVaradhan at
jhmi.edu>
>>>>> on Thu, 2 Jul 2009 10:51:03 -0400 writes:
RV> Dear Martin, I have been playing a lot with the
RV> glkerns() function in the "lokern" package for
RV> "automatic" smoothing of time-series data. This kernel
RV> smoothing approach of Gasser and Mueller seems to
RV> perform quite well for estimating the function and its
RV> derivatives (first and second derivatives). In fact,
RV> this is one of the best methods based on my simulation
RV> studies for comparing a number of "automatic" bandwidth
RV> selection methods.
That's quite interesting!
What methods did you use in your comparison?
[I assume you did use smooth.spline() as well]
RV> I am interested in applying this to
RV> automatic smoothing and feature extraction for a "large"
RV> number (thousands) of time series, with hundreds of
RV> points per time series. This is where I am interested
RV> in seeing if the efficiency of "glkerns" can be
RV> improved.
RV> Here follows my specific question:
RV> You have to call glkerns() separately each time to
RV> compute the function and its derivatives, ie. if I want
RV> the function and its first 2 derivatives, I have to make
RV> 3 calls to glkerns().
Yes. Note however that each of these calls chooses a different
kernel order ('korder') by default { korder <- deriv + 2 },
and uses *different* automatic bandwidths depending on both
deriv and korder.
Consequently, the result of glkerns(x,y, deriv=1) is
*not* the derivative of the estimate glkerns(x,y, deriv=0)
but rather an independent estimate of the derivative of the
unknown g(). The same applies for deriv=2 etc.
But the problem is even "worse": Let's assume you get the
"optimal" global
bandwidth estimate 'bandwidth' = h from the deriv=0 case.
Then you could set this bandwidth explicitly for the deriv=1 and
deriv=2 calls {and you'd gain quite a bit of execution time !}.
*HOWEVER*, as the internal codes absolutely require
k_{ord} - nu == korder - nu to be an *even* number,
you can *not* keep 'korder' fixed together with 'bandwidth'.
But if you change 'korder', the kernels change and the meaning
of the bandwidth has changed too.
I have just uploaded version 1.0-8 of package 'lokern' to CRAN,
and this now contains a demo("gkl-derivs") which defines a utility
function
to play a bit with this (keeping bandwidth fixed).
RV> This seems to me to be inefficient, especially for large
RV> time series. In smooth.spline(), for example, you can
RV> call it once to get the fit, and then use the fitted
RV> object to compute the derivatives using predict().
Note that smooth.spline() is very different:
If you use deriv=1 or deriv=2 in the predict() call,
you get exactly the 1st or 2nd derivative of the same smoothing
spline function.
RV> Is it possible to have a similar feature in glkerns?
As I have explained, this is not easily possible currently
more for conceptual reasons than others.
In principle it should be possible however,
but that would both need some "theoretical" work
{maybe just collecting the papers and formulae used} and
then some Fortran code digging and shuffling.
Would be a nice "semester work" for a student...
Martin
--
Martin <Maechler at stat.math.ethz.ch>
http://stat.ethz.ch/people/maechler
Seminar f?r Statistik, ETH Z?rich HG G 16 R?mistrasse 101
CH-8092 Zurich, SWITZERLAND
phone: +41-44-632-3408 fax: ...-1228 <><
RV> Thanks for any suggestions.
RV> Ravi.
RV>
----------------------------------------------------------------------------
RV> -------
RV> Ravi Varadhan, Ph.D.
RV> Assistant Professor, The Center on Aging and Health
RV> Division of Geriatric Medicine and Gerontology
RV> Johns Hopkins University
RV> Ph: (410) 502-2619
RV> Fax: (410) 614-9625
RV> Email: rvaradhan at jhmi.edu
RV> Webpage:
RV>
http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
RV> tml
RV>
----------------------------------------------------------------------------
RV> --------
RV> [[alternative HTML version deleted]]
RV> ______________________________________________
RV> R-help at r-project.org mailing list
RV> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
RV> read the posting guide
RV> http://www.R-project.org/posting-guide.html and provide
RV> commented, minimal, self-contained, reproducible code.