Hello,
I was hoping that someone familiar with the implementation details of the
loess algorithm might be able to help me resolve some difficulties I am
having. I am attempting to reproduce some of the functionality of the
loess() function in C++. My primary motivation is that I would like to
understand the algorithm in detail.
So far I have managed to create a working port in C++ for the linear
(degree=1) case. I use the GNU Scientific Library (GSL) to perform the
least squares fitting. The program I wrote can be viewed here:
http://pastebin.com/0mhMiZee
The test data I am using comes from here:
http://www.itl.nist.gov/div898/handbook/pmd/section1/dep/dep144.htm
After reading Cleveland 1992 paper: A Package of C and Fortran Routines for
Fitting Local Regression Models, I was under the impression that in
principle the quadratic (degree=2) case would be straight forward, simply
fit y ~ x + I(x^2) instead of y ~ x. I realize that Cleveland goes to some
effort to optimize the computation by using kd-trees and interpolation to
reduce the calculations, but I believe that the proper options to the R
loess.control() function can disable that.
When I attempt to fit the quadratic in the simple way I describe above, I
get results that are very close to R but well outside of any floating point
rounding error and signals something more serious.
The test program I have written in C++ for the quadratic case can be viewed
here: http://pastebin.com/8LZSFSyg
Here is some working R code that performs the quadratic loess fit:
x <- c(0.5578196, 2.0217271, 2.5773252, 3.4140288, 4.3014084, 4.7448394,
5.1073781, 6.5411662, 6.7216176, 7.2600583, 8.1335874, 9.1224379,
11.9296663, 12.3797674, 13.2728619, 14.2767453, 15.3731026, 15.6476637,
18.5605355, 18.5866354, 18.7572812)
y <- c(18.63654, 103.49646, 150.35391, 190.51031, 208.70115, 213.71135,
228.49353, 233.55387, 234.55054, 223.89225, 227.68339, 223.91982, 168.01999,
164.95750, 152.61107, 160.78742, 168.55567, 152.42658, 221.70702, 222.69040,
243.18828)
df <- data.frame(x=x, y=y)
y.loess <- loess(y ~ x, span=0.33, degree=2,
control=loess.control(surface="direct", statistics="exact",
trace.hat="exact"), data=df)
y.predict <- predict(y.loess)
And the output:
> y.predict
[1] 17.7138 111.3904 147.5650 190.6561 209.4832 217.5014 225.5488 233.8617
231.7305 226.7886 224.7809 224.1512
[13] 167.6779 162.0389 154.9118 159.9907 161.1376 157.1839 221.0499 223.8055
242.7856
My C++ program for the quadratic case mentioned above produces the following
output:
X Y SMOOTHED
0.557820 18.636540 17.128316
2.021727 103.496460 113.404389
2.577325 150.353910 145.509245
3.414029 190.510310 188.070299
4.301408 208.701150 209.587060
4.744839 213.711350 217.841596
5.107378 228.493530 223.961597
6.541166 233.553870 232.331387
6.721618 234.550540 231.725151
7.260058 223.892250 229.100556
8.133587 227.683390 225.721553
9.122438 223.919820 222.121049
11.929666 168.019990 167.448082
12.379767 164.957500 162.050603
13.272862 152.611070 158.478596
14.276745 160.787420 157.644046
15.373103 168.555670 161.076443
15.647664 152.426580 161.123333
18.560536 221.707020 225.721022
18.586635 222.690400 226.986782
18.757281 243.188280 235.647504
If you are still reading, thanks!
I have tried poking around in the R source code for the R_loess_raw
(loessc.c) method, but I find it quite dense and hard to follow... if anyone
has some debugging ideas, they would be greatly appreciated.
Thank you for your attention,
Scott
[[alternative HTML version deleted]]