thr3ads.net - R help - [R] lokern package [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Ravi Varadhan

2009-Jul-02 14:51 UTC

[R] lokern package

Dear Martin,
 
I have been playing a lot with the glkerns() function in the "lokern"
package for "automatic" smoothing of time-series data.  This kernel
smoothing approach of Gasser and Mueller seems to perform quite well for
estimating the function and its derivatives (first and second derivatives).
In fact, this is one of the best methods based on my simulation studies for
comparing a number of "automatic" bandwidth selection methods.  I am
interested in applying this to automatic smoothing and feature extraction
for a "large" number (thousands) of time series, with hundreds of
points per
time series.  This is where I am interested in seeing if the efficiency of
"glkerns" can be improved.  
 
Here follows my specific question:
 
You have to call glkerns() separately each time to compute the function and
its derivatives, ie. if I want the function and its first 2 derivatives, I
have to make 3 calls to glkerns().  This seems to me to be inefficient,
especially for large time series.  In smooth.spline(), for example, you can
call it once to get the fit, and then use the fitted object to compute the
derivatives using predict().  Is it possible to have a similar feature in
glkerns? 
 
Thanks for any suggestions.
 
Ravi.
 
----------------------------------------------------------------------------
-------

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvaradhan@jhmi.edu

Webpage:
http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
tml

 

----------------------------------------------------------------------------
--------

 

	[[alternative HTML version deleted]]

Martin Maechler

2009-Jul-02 16:34 UTC

head link

[R] lokern package

>>>>> "RV" == Ravi Varadhan <RVaradhan at
jhmi.edu>
>>>>>     on Thu, 2 Jul 2009 10:51:03 -0400 writes:
    RV> Dear Martin, I have been playing a lot with the
    RV> glkerns() function in the "lokern" package for
    RV> "automatic" smoothing of time-series data.  This kernel
    RV> smoothing approach of Gasser and Mueller seems to
    RV> perform quite well for estimating the function and its
    RV> derivatives (first and second derivatives).  In fact,
    RV> this is one of the best methods based on my simulation
    RV> studies for comparing a number of "automatic" bandwidth
    RV> selection methods.  

That's quite interesting! 
What methods did you use in your comparison?
[I assume you did use  smooth.spline() as well]

    RV> I am interested in applying this to
    RV> automatic smoothing and feature extraction for a "large"
    RV> number (thousands) of time series, with hundreds of
    RV> points per time series.  This is where I am interested
    RV> in seeing if the efficiency of "glkerns" can be
    RV> improved.

    RV> Here follows my specific question:

    RV> You have to call glkerns() separately each time to
    RV> compute the function and its derivatives, ie. if I want
    RV> the function and its first 2 derivatives, I have to make
    RV> 3 calls to glkerns().  

Yes.  Note however that each of these calls chooses a different
kernel order ('korder') by default { korder <- deriv + 2 },
and uses *different* automatic bandwidths depending on both
deriv and korder.

Consequently, the result of	     glkerns(x,y, deriv=1)  is 
*not* the derivative of the estimate glkerns(x,y, deriv=0)
but rather an independent estimate of the derivative of the
unknown g(). The same applies for deriv=2 etc.

But the problem is even "worse": Let's assume you get the
"optimal" global
bandwidth estimate 'bandwidth' = h  from the deriv=0 case.
Then you could set this bandwidth explicitly for the deriv=1 and
deriv=2 calls {and you'd gain quite a bit of execution time !}.
*HOWEVER*, as the internal codes absolutely require

    k_{ord} - nu  ==  korder - nu  to be an *even* number,

you can *not* keep 'korder' fixed together with 'bandwidth'.
But if you change  'korder', the kernels change and the meaning
of the bandwidth has changed too.

I have just uploaded version 1.0-8 of package 'lokern' to CRAN,
and this now contains a demo("gkl-derivs") which defines a utility
function
to play a bit with this (keeping bandwidth fixed).

    RV> This seems to me to be inefficient, especially for large
    RV> time series.  In smooth.spline(), for example, you can
    RV> call it once to get the fit, and then use the fitted
    RV> object to compute the derivatives using predict().  

Note that smooth.spline() is very different:
If you use deriv=1 or deriv=2 in the predict() call,
you get exactly the 1st or 2nd derivative of the same smoothing
spline function.

    RV> Is it possible to have a similar feature in glkerns?

As I have explained, this is not easily possible currently
more for conceptual reasons than others.

In principle it should be possible however,
but that would both need some "theoretical" work
{maybe just collecting the papers and formulae used} and 
then some Fortran code digging and shuffling.

Would be a nice "semester work" for a student...
Martin

--
Martin <Maechler at stat.math.ethz.ch> 
http://stat.ethz.ch/people/maechler
Seminar f?r Statistik, ETH Z?rich  HG G 16      R?mistrasse 101
CH-8092 Zurich, SWITZERLAND
phone: +41-44-632-3408       fax: ...-1228      <><

    RV> Thanks for any suggestions.

    RV> Ravi.
    RV>
----------------------------------------------------------------------------
    RV> -------

    RV> Ravi Varadhan, Ph.D.

    RV> Assistant Professor, The Center on Aging and Health

    RV> Division of Geriatric Medicine and Gerontology

    RV> Johns Hopkins University

    RV> Ph: (410) 502-2619

    RV> Fax: (410) 614-9625

    RV> Email: rvaradhan at jhmi.edu

    RV> Webpage:
    RV>
http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
    RV> tml

    RV>
----------------------------------------------------------------------------
    RV> --------

    RV> 	[[alternative HTML version deleted]]

    RV> ______________________________________________
    RV> R-help at r-project.org mailing list
    RV> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
    RV> read the posting guide
    RV> http://www.R-project.org/posting-guide.html and provide
    RV> commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Jul 2009 - lokern package

[R] lokern package

[R] lokern package

Apparently Analagous Threads