Hello, I need help with the details of loess prediction algorithm. I would like to get it implemented as a part of a measurement system programmed in LabView. My job is provide a detailed description of the algorithm. This is a simple one-dimensional problem - smoothing an (x, y) data set. I found quite a detailed description of the fitting procedure in the "white book". It is also described in great detail at the NIST site in the Engineering Statistics Handbook. It provides an example of Loess computations. I managed to reproduce their example exactly in R. At each data point I compute a weighted local linear fit using the number of points based of span. Then I predict the values from these local fits. This matches R "loess" predictions exactly. The problem starts when I try to predict at x values not in the data set. The "white book" does not talk about predictions at all. In the NIST handbook in the "Final note on Loess Computations" they mention this type of predictions but just say that the same steps are used for predictions as for fitting. When I try to use "the same steps" I get predictions that are quite different that the predictions obtained by fitting R loess model to a data set and running predict(<model object>, newdata=<grid of x values>). They match quite well at the lowest and highest ends of the x grid but in the middle are different and much less smooth. I can provide details but basically what I do to create the predictions at x0 is this: 1. I append c(x0, NA) to the data frame of (x, y) data. 2. I calculate abs(xi-x0), i.e., absolute deviations of the x values in the data set and a given x0 value. 3. I sort the data set according to these deviations. This way the first row has the (x0, NA) value. 4. I drop the first row. 5. I divide all the deviations by the m-th one, where m is the number of points used in local fitting - m = floor(n*span) where n is the number of points. 6. I calculate the "tricube" weights and assign 0's to the negative ones. This eliminates all the points except the m points of interest. 7. I fit a linear weighted regression using lm. 8. I predict y(x0) from this linear model. This is basically the same procedure I use to predict at the x values from the data set, except for point 4. I got the R sources for loess but it looks to me like most of the work is done in a bunch of Fortran modules. They are very difficult to read and understand, especially since they handle multiple x values. A couple of things that worry me are parameters in loess.control such as surface and cell. They seem to have something to do with predictions but I do not account for them in my simple procedure above. Could anyone shed a light on this problem? Any comment will be appreciated. I apologize in advance if this should have been posted in r-help. I figured that I have a better chance asking people who read the r-devel group, since they are likely to know more details about inner workings of R. Thanks in advance, Andy __________________________________ Andy Jaworski 518-1-01 Process Laboratory 3M Corporate Research Laboratory ----- E-mail: apjaworski at mmm.com Tel: (651) 733-6092 Fax: (651) 736-3122
The R interface is just a wrapper for those Netlib C/Fortran functions. I don't think anyone is going to be able (or willing) to read and explain those for you. You do need to understand the loess.control parameters, and I believe they are explained in the White Book. But perhaps you should use the simplest options in R as a baseline. I don't believe your sketchy description of tricube weights is correct: the White Book has the details. The default degree is 2, not linear fits. On Wed, 25 Jul 2007, apjaworski at mmm.com wrote:> > Hello, > > I need help with the details of loess prediction algorithm. I would like > to get it implemented as a part of a measurement system programmed in > LabView. My job is provide a detailed description of the algorithm. This > is a simple one-dimensional problem - smoothing an (x, y) data set. > > I found quite a detailed description of the fitting procedure in the "white > book". It is also described in great detail at the NIST site in the > Engineering Statistics Handbook. It provides an example of Loess > computations. I managed to reproduce their example exactly in R. At each > data point I compute a weighted local linear fit using the number of points > based of span. Then I predict the values from these local fits. This > matches R "loess" predictions exactly. > > The problem starts when I try to predict at x values not in the data set. > The "white book" does not talk about predictions at all. In the NIST > handbook in the "Final note on Loess Computations" they mention this type > of predictions but just say that the same steps are used for predictions as > for fitting. > > When I try to use "the same steps" I get predictions that are quite > different that the predictions obtained by fitting R loess model to a data > set and running predict(<model object>, newdata=<grid of x values>). They > match quite well at the lowest and highest ends of the x grid but in the > middle are different and much less smooth. I can provide details but > basically what I do to create the predictions at x0 is this: > 1. I append c(x0, NA) to the data frame of (x, y) data. > 2. I calculate abs(xi-x0), i.e., absolute deviations of the x values in > the data set and a given x0 value. > 3. I sort the data set according to these deviations. This way the first > row has the (x0, NA) value. > 4. I drop the first row. > 5. I divide all the deviations by the m-th one, where m is the number of > points used in local fitting - m = floor(n*span) where n is the number of > points. > 6. I calculate the "tricube" weights and assign 0's to the negative ones. > This eliminates all the points except the m points of interest. > 7. I fit a linear weighted regression using lm. > 8. I predict y(x0) from this linear model. > This is basically the same procedure I use to predict at the x values from > the data set, except for point 4. > > I got the R sources for loess but it looks to me like most of the work is > done in a bunch of Fortran modules. They are very difficult to read and > understand, especially since they handle multiple x values. A couple of > things that worry me are parameters in loess.control such as surface and > cell. They seem to have something to do with predictions but I do not > account for them in my simple procedure above. > > Could anyone shed a light on this problem? Any comment will be > appreciated. > > I apologize in advance if this should have been posted in r-help. I > figured that I have a better chance asking people who read the r-devel > group, since they are likely to know more details about inner workings of > R. > > Thanks in advance, > > Andy > > __________________________________ > Andy Jaworski > 518-1-01 > Process Laboratory > 3M Corporate Research Laboratory > ----- > E-mail: apjaworski at mmm.com > Tel: (651) 733-6092 > Fax: (651) 736-3122 > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Andy, If you could provide an example of the R code with which you call loess(), I can post R code which will duplicate what predict.loess does without having to call the C/Fortran. There are a lot of implementation details that are easy to overlook, but without knowing the arguments to your call it is difficult to guess the source of your problem. Ben
The loess.demo function in the TeachingDemos package may help you to understand better what is happening (both running the demo and looking at the code). One common reason why predictions from the loess function and hand computed predictions don't match is because the loess function does an additional smoothing step by default, the above demo shows both curves (with the additional smoothing and without) so you can see how close they are and how the smoothness differs. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111> -----Original Message----- > From: r-devel-bounces at r-project.org > [mailto:r-devel-bounces at r-project.org] On Behalf Of apjaworski at mmm.com > Sent: Wednesday, July 25, 2007 4:57 PM > To: r-devel at r-project.org > Subject: [Rd] loess prediction algorithm > > > Hello, > > I need help with the details of loess prediction algorithm. > I would like to get it implemented as a part of a measurement > system programmed in LabView. My job is provide a detailed > description of the algorithm. This is a simple > one-dimensional problem - smoothing an (x, y) data set. > > I found quite a detailed description of the fitting procedure > in the "white book". It is also described in great detail at > the NIST site in the Engineering Statistics Handbook. It > provides an example of Loess computations. I managed to > reproduce their example exactly in R. At each data point I > compute a weighted local linear fit using the number of > points based of span. Then I predict the values from these > local fits. This matches R "loess" predictions exactly. > > The problem starts when I try to predict at x values not in > the data set. > The "white book" does not talk about predictions at all. In > the NIST handbook in the "Final note on Loess Computations" > they mention this type of predictions but just say that the > same steps are used for predictions as for fitting. > > When I try to use "the same steps" I get predictions that are > quite different that the predictions obtained by fitting R > loess model to a data set and running predict(<model object>, > newdata=<grid of x values>). They match quite well at the > lowest and highest ends of the x grid but in the middle are > different and much less smooth. I can provide details but > basically what I do to create the predictions at x0 is this: > 1. I append c(x0, NA) to the data frame of (x, y) data. > 2. I calculate abs(xi-x0), i.e., absolute deviations of the > x values in the data set and a given x0 value. > 3. I sort the data set according to these deviations. This > way the first row has the (x0, NA) value. > 4. I drop the first row. > 5. I divide all the deviations by the m-th one, where m is > the number of points used in local fitting - m = > floor(n*span) where n is the number of points. > 6. I calculate the "tricube" weights and assign 0's to the > negative ones. > This eliminates all the points except the m points of interest. > 7. I fit a linear weighted regression using lm. > 8. I predict y(x0) from this linear model. > This is basically the same procedure I use to predict at the > x values from the data set, except for point 4. > > I got the R sources for loess but it looks to me like most of > the work is done in a bunch of Fortran modules. They are > very difficult to read and understand, especially since they > handle multiple x values. A couple of things that worry me > are parameters in loess.control such as surface and cell. > They seem to have something to do with predictions but I do > not account for them in my simple procedure above. > > Could anyone shed a light on this problem? Any comment will > be appreciated. > > I apologize in advance if this should have been posted in > r-help. I figured that I have a better chance asking people > who read the r-devel group, since they are likely to know > more details about inner workings of R. > > Thanks in advance, > > Andy > > __________________________________ > Andy Jaworski > 518-1-01 > Process Laboratory > 3M Corporate Research Laboratory > ----- > E-mail: apjaworski at mmm.com > Tel: (651) 733-6092 > Fax: (651) 736-3122 > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >