Dear R listers, When using smooth.spline to interpolate data, results are generally good. However, some cases produce totally unreasonable results. The data are values of pressure, temperature, and salinity from a probe that is lowered into the ocean, and the objective is to interpolate temperature and salinity to specified pressures. While smooth.spline provides excellent values at the observed pressures, there are cases when the values at the desired pressures are unusable. A dataframe with four such profiles, indicated by values of id, is attached. My target values for pressure are seq(25,1600,25), but 1:500 is also interesting. Setting all.knots = TRUE helps, but it would be nice to be able to do better. Any suggestions? Thanks, Carlisle> version_ platform sparc-sun-solaris2.9 arch sparc os solaris2.9 system sparc, solaris2.9 status major 1 minor 8.0 year 2003 month 10 day 08 language R -- William Carlisle Thacker Atlantic Oceanographic and Meteorological Laboratory 4301 Rickenbacker Causeway, Miami, Florida 33149 USA Office: (305) 361-4323 Fax: (305) 361-4392 "Too many have dispensed with generosity in order to practice charity." Albert Camus
If you really want interpolation, should you be using spline() rather than smooth.spline()? The later is for smoothing data observed with noise, not for interpolation. Andy> From: W. C. Thacker > > Dear R listers, > > When using smooth.spline to interpolate data, results are generally > good. However, some cases produce totally unreasonable results. > > The data are values of pressure, temperature, and salinity from a > probe that is lowered into the ocean, and the objective is to > interpolate temperature and salinity to specified pressures. While > smooth.spline provides excellent values at the observed pressures, > there are cases when the values at the desired pressures are > unusable. A dataframe with four such profiles, indicated by values of > id, is attached. My target values for pressure are seq(25,1600,25), > but 1:500 is also interesting. > > Setting all.knots = TRUE helps, but it would be nice to be able to do > better. > > Any suggestions? > > Thanks, > > Carlisle > > > version > _ > platform sparc-sun-solaris2.9 > arch sparc > os solaris2.9 > system sparc, solaris2.9 > status > major 1 > minor 8.0 > year 2003 > month 10 > day 08 > language R > > > -- > > William Carlisle Thacker > > Atlantic Oceanographic and Meteorological Laboratory > 4301 Rickenbacker Causeway, Miami, Florida 33149 USA > Office: (305) 361-4323 Fax: (305) 361-4392 > > "Too many have dispensed with generosity > in order to practice charity." Albert Camus > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}
Andy, As the data are often noisy, smoothing splines should be appropriate. The first example profile shows an isothermal (constant temperature) layer in the upper ocean followed by a sharp thermocline (large temperature gradient), but there are relatively few observations defining this sharp transition. In this case simple linear interpolation works fine, but smooth.spline() with all defaults gives an absolutely absurd value in the isothermal layer. With all.knots TRUE, the values in the isothermal layer are much better but still peculiar. Given the sampling and the data, is it possible to get smooth.spline() do better? If so, would that adversely impact its performance for other cases? (There are thousands of profiles.) If not, is there a simp[le way to select cases that smooth.spline() should not be expected to handle, so they can be treated separately? Thanks, Carlisle "Liaw, Andy" wrote:> > If you really want interpolation, should you be using spline() rather than > smooth.spline()? The later is for smoothing data observed with noise, not > for interpolation. > > Andy > > > From: W. C. Thacker > > > > Dear R listers, > > > > When using smooth.spline to interpolate data, results are generally > > good. However, some cases produce totally unreasonable results. > > > > The data are values of pressure, temperature, and salinity from a > > probe that is lowered into the ocean, and the objective is to > > interpolate temperature and salinity to specified pressures. While > > smooth.spline provides excellent values at the observed pressures, > > there are cases when the values at the desired pressures are > > unusable. A dataframe with four such profiles, indicated by values of > > id, is attached. My target values for pressure are seq(25,1600,25), > > but 1:500 is also interesting. > > > > Setting all.knots = TRUE helps, but it would be nice to be able to do > > better. > > > > Any suggestions? > > > > Thanks, > > > > Carlisle > > > > > version > > _ > > platform sparc-sun-solaris2.9 > > arch sparc > > os solaris2.9 > > system sparc, solaris2.9 > > status > > major 1 > > minor 8.0 > > year 2003 > > month 10 > > day 08 > > language R > > > > > > -- > > > > William Carlisle Thacker > > > > Atlantic Oceanographic and Meteorological Laboratory > > 4301 Rickenbacker Causeway, Miami, Florida 33149 USA > > Office: (305) 361-4323 Fax: (305) 361-4392 > > > > "Too many have dispensed with generosity > > in order to practice charity." Albert Camus > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > ------------------------------------------------------------------------------ > Notice: This e-mail message, together with any attachments,...{{dropped}} > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html-- William Carlisle Thacker Atlantic Oceanographic and Meteorological Laboratory 4301 Rickenbacker Causeway, Miami, Florida 33149 USA Office: (305) 361-4323 Fax: (305) 361-4392 "Too many have dispensed with generosity in order to practice charity." Albert Camus
Hi Carlisle, If I understand you correctly, the problem is smooth.spline() not handling sharp jump(s), right? If so, it's probably easier to try something that can handle such features. Wavelet `denoising' (as opposed to `smoothing', and available in the wavethresh package) is well known for being able to handle abrupt changes (very `spatially adaptive'). Other things you might consider are mars() in the `mda' package (which fits splines in an adaptive fashion) and locfit() in the `locfit' package. For locfit, you will want to specify local smoothing parameter selection, via a call like locfit(..., alpha=c(0, 0, 2), acri="cp") You might need to play with the `2' a bit to get the right amount of smoothing. The details are in Loader's book `Local regression and Likelihood'. HTH, Andy> From: W. C. Thacker > > Andy, > > As the data are often noisy, smoothing splines should be appropriate. > > The first example profile shows an isothermal (constant temperature) > layer in the upper ocean followed by a sharp thermocline (large > temperature gradient), but there are relatively few observations > defining this sharp transition. In this case simple linear > interpolation works fine, but smooth.spline() with all defaults gives > an absolutely absurd value in the isothermal layer. With all.knots > TRUE, the values in the isothermal layer are much better but still > peculiar. > > Given the sampling and the data, is it possible to get smooth.spline() > do better? If so, would that adversely impact its performance for > other cases? (There are thousands of profiles.) If not, is there a > simp[le way to select cases that smooth.spline() should not be > expected to handle, so they can be treated separately? > > Thanks, > > Carlisle > > "Liaw, Andy" wrote: > > > > If you really want interpolation, should you be using > spline() rather than > > smooth.spline()? The later is for smoothing data observed > with noise, not > > for interpolation. > > > > Andy > > > > > From: W. C. Thacker > > > > > > Dear R listers, > > > > > > When using smooth.spline to interpolate data, results are > generally > > > good. However, some cases produce totally unreasonable results. > > > > > > The data are values of pressure, temperature, and salinity from a > > > probe that is lowered into the ocean, and the objective is to > > > interpolate temperature and salinity to specified > pressures. While > > > smooth.spline provides excellent values at the observed pressures, > > > there are cases when the values at the desired pressures are > > > unusable. A dataframe with four such profiles, indicated > by values of > > > id, is attached. My target values for pressure are > seq(25,1600,25), > > > but 1:500 is also interesting. > > > > > > Setting all.knots = TRUE helps, but it would be nice to > be able to do > > > better. > > > > > > Any suggestions? > > > > > > Thanks, > > > > > > Carlisle > > > > > > > version > > > _ > > > platform sparc-sun-solaris2.9 > > > arch sparc > > > os solaris2.9 > > > system sparc, solaris2.9 > > > status > > > major 1 > > > minor 8.0 > > > year 2003 > > > month 10 > > > day 08 > > > language R > > > > > > > > > -- > > > > > > William Carlisle Thacker > > > > > > Atlantic Oceanographic and Meteorological Laboratory > > > 4301 Rickenbacker Causeway, Miami, Florida 33149 USA > > > Office: (305) 361-4323 Fax: (305) 361-4392 > > > > > > "Too many have dispensed with generosity > > > in order to practice charity." Albert Camus > > > > > > ______________________________________________ > > > R-help at stat.math.ethz.ch mailing list > > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide! > > > http://www.R-project.org/posting-guide.html > > > > > > > > > > > -------------------------------------------------------------- > ---------------- > > Notice: This e-mail message, together with any > attachments,...{{dropped}} > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > -- > > William > Carlisle Thacker > > Atlantic Oceanographic and Meteorological Laboratory > 4301 Rickenbacker Causeway, Miami, Florida 33149 USA > Office: (305) 361-4323 Fax: (305) 361-4392 > > "Too many have dispensed with generosity > in order to practice charity." Albert Camus > >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}
Martin Maechler
2004-Mar-04 15:44 UTC
[R] No attachments for R-help ("need help with smooth.spline")
I hope you see that almost all binary attachments are not allowed (and silently deleted) on R-help, as it clearly says in the posting guide, or directly on http://www.r-project.org/mail.html You should typically use dump(<dataframe>) and paste ( not attach !) the result of that to (the end of) your message {or do something even more readable -- plain text in any case!} With regards from your list maintainer, Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 <><
The strange thing (to me, and to Carlisle) is the behavior of predict.smooth.spline() on the data he posted: part of the predictions are way off from the data. If one just plot prediction at the input data, it looks just fine. What am I missing? Andy> From: Roger Koenker [mailto:roger at ysidro.econ.uiuc.edu] > > If one repeats the experiments in Craven and Wahba, the paper that > "invented" GCV you find, or at least I found, when I tried to do this > some years ago, that GCV fails in about 10% > of cases rather catastrophically, and this is a fairly innocuous > setting. So one way to interpret Brian's comment would be that maybe > it is GCV that is failing, and another choice of lambda might > do better. > > Obviously, Brian can interpret for himself. I would only add > that in cases > where you really want something with sharp breaks in derivatives then > then the usual L_2 roughness penalties are not very > appropriate however > you choose to do the smoothing. > > url: www.econ.uiuc.edu/~roger/my.html Roger Koenker > email rkoenker at uiuc.edu Department of Economics > vox: 217-333-4558 University of Illinois > fax: 217-244-6678 > Champaign, IL 61820 > > > >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}
roger koenker wrote:> > If one repeats the experiments in Craven and Wahba, the paper that > "invented" GCV you find, or at least I found, when I tried to do this > some years ago, that GCV in about 10% > of cases fails rather catastrophically, and this is a fairly innocuous > setting. So one way to interpret Brian's comment would be that maybe > it is GCV that is failing, and another choice of lambda might do better.Making spar large enough avoids the outrageous values but gives a poor approximation to much of the data. The problems seem to occur in a region where the data indicate the ocean should be well-mixed, i.e. the curve should be constant. What's more, the data look as though they have been edited to remove the boring repeated values, keeping only the first and last values within the mixed layer. When constant t and s values are inserted into the data at integer values for p within the mixed layer (interpolating by hand), smooth.spline() with GCV gives a very different result: much smoother and less faithful to the observations. Much more reasonable. It seems that the problem might be expected when the density of points changes abruptly at pretty much the same place where the gradient is changing. Maybe these cases can be captured and treated separately. Maybe the total variation penalty methods or some of Andy's suggestions (polymars(), mars(), locfit(), denoising with wavelets) will work better. I'll have to explore some packages. Thanks to everybody for the gracious help. Carlisle -- William Carlisle Thacker Atlantic Oceanographic and Meteorological Laboratory 4301 Rickenbacker Causeway, Miami, Florida 33149 USA Office: (305) 361-4323 Fax: (305) 361-4392 "Too many have dispensed with generosity in order to practice charity." Albert Camus
I looked at rqss() in nprq, as Prof. Koenker suggested, but that doesn't have a predict() method, so I don't know how you'd get the smooth at values other than the observed... The criteria (CV, GCV, etc.) could have multiple local minima for some data, as Prof. Ripley and Prof. Koenker pointed out, so relying on those `automatic' selection procedure may not be the best thing to do. Theoretically as spar (lambda) goes to 0, smooth.spline should linearly interpolate the data. I guess the routine could run into numerical problems before that. Here's yet another thing to try (thanks to Martin for the `lokern' package): library(lokerns) par(mfrow=c(2,4)) for (i in 1:4) { plot(dat[[i]]$p, dat[[i]]$t); lines(lokerns(dat[[i]]$p, dat[[i]]$t, x.out=seq(25,1000,25))) plot(dat[[i]]$p, dat[[i]]$s) lines(lokerns(dat[[i]]$p, dat[[i]]$s, x.out=seq(25,1000,25))) } Best, Andy> From: W. C. Thacker > > roger koenker wrote: > > > > If one repeats the experiments in Craven and Wahba, the paper that > > "invented" GCV you find, or at least I found, when I tried > to do this > > some years ago, that GCV in about 10% > > of cases fails rather catastrophically, and this is a > fairly innocuous > > setting. So one way to interpret Brian's comment would be that maybe > > it is GCV that is failing, and another choice of lambda > might do better. > > Making spar large enough avoids the outrageous values but gives a poor > approximation to much of the data. > > The problems seem to occur in a region where the data indicate the > ocean should be well-mixed, i.e. the curve should be constant. What's > more, the data look as though they have been edited to remove the > boring repeated values, keeping only the first and last values within > the mixed layer. > > When constant t and s values are inserted into the data at integer > values for p within the mixed layer (interpolating by hand), > smooth.spline() with GCV gives a very different result: much smoother > and less faithful to the observations. Much more reasonable. It > seems that the problem might be expected when the density of points > changes abruptly at pretty much the same place where the gradient is > changing. Maybe these cases can be captured and treated separately. > > Maybe the total variation penalty methods or some of Andy's > suggestions (polymars(), mars(), locfit(), denoising with wavelets) > will work better. I'll have to explore some packages. > > Thanks to everybody for the gracious help. > > Carlisle > -- > > William Carlisle Thacker > > Atlantic Oceanographic and Meteorological Laboratory > 4301 Rickenbacker Causeway, Miami, Florida 33149 USA > Office: (305) 361-4323 Fax: (305) 361-4392 > > "Too many have dispensed with generosity > in order to practice charity." Albert Camus > >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}
1. It may not be the thing to do, but if you really want to, try: install.packages(CRAN.packages()[,1], lib.loc=...) 2. No. Some packages have vignettes, but most not. Any additional info should have been cited in the `Reference' section of the help page. Cheers, Andy> From: thacker at aoml.noaa.gov [mailto:thacker at aoml.noaa.gov] > > "Liaw, Andy" wrote: > > > > I looked at rqss() in nprq, as Prof. Koenker suggested, but > that doesn't > > have a predict() method, so I don't know how you'd get the > smooth at values > > other than the observed... > > > > The criteria (CV, GCV, etc.) could have multiple local > minima for some data, > > as Prof. Ripley and Prof. Koenker pointed out, so relying on those > > `automatic' selection procedure may not be the best thing to do. > > Theoretically as spar (lambda) goes to 0, smooth.spline > should linearly > > interpolate the data. I guess the routine could run into > numerical problems > > before that. > > > > Here's yet another thing to try (thanks to Martin for the > `lokern' package): > > > > library(lokerns) > > par(mfrow=c(2,4)) > > for (i in 1:4) { > > plot(dat[[i]]$p, dat[[i]]$t); > > lines(lokerns(dat[[i]]$p, dat[[i]]$t, x.out=seq(25,1000,25))) > > plot(dat[[i]]$p, dat[[i]]$s) > > lines(lokerns(dat[[i]]$p, dat[[i]]$s, x.out=seq(25,1000,25))) > > } > > > > Best, > > Andy > > Thanks for still another suggestion. I'm keeping our system manager > busy loading packages. Is there a way he can get them all at once? > > Also, do you know whether any of these packages come with a manual --- > other than the help pages? > > Regards, > > Carlisle > >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}