x <- c(0,7,8,14,15,120,242) y <- c(122,128,130,158,110,110,92) lowess(x,y) $x [1] 0 7 8 14 15 120 242 $y [1] 122.0000 128.0000 132.2857 158.0000 110.0000 -4930.0000 110.0000 R version 2.2.1, 2005-12-20, i486-pc-linux-gnu attached base packages: [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" [7] "base" other attached packages: lattice Hmisc chron "0.12-11" "3.1-1" "2.3-8" -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
On Wed, 2006-10-11 at 22:29 -0500, Frank E Harrell Jr wrote:> x <- c(0,7,8,14,15,120,242) > y <- c(122,128,130,158,110,110,92) > > lowess(x,y) > > $x > [1] 0 7 8 14 15 120 242 > > $y > [1] 122.0000 128.0000 132.2857 158.0000 110.0000 -4930.0000 > 110.0000Same behaviour here on a more recent R (below), and I recall a posting for a year or so back that reported similar behaviour in the then current version of R. G> x <- c(0,7,8,14,15,120,242) > y <- c(122,128,130,158,110,110,92) > > lowess(x,y)$x [1] 0 7 8 14 15 120 242 $y [1] 122.0000 128.0000 132.2857 158.0000 110.0000 -4930.0000 110.0000> sessionInfo()R version 2.4.0 Patched (2006-10-03 r39576) i686-pc-linux-gnu locale: LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" [7] "base"> > > > R version 2.2.1, 2005-12-20, i486-pc-linux-gnu > > attached base packages: > [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" > [7] "base" > > other attached packages: > lattice Hmisc chron > "0.12-11" "3.1-1" "2.3-8" >-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC & ENSIS, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
On Thu, 12 Oct 2006, Gavin Simpson wrote:> On Wed, 2006-10-11 at 22:29 -0500, Frank E Harrell Jr wrote: >> x <- c(0,7,8,14,15,120,242) >> y <- c(122,128,130,158,110,110,92) >> >> lowess(x,y) >> >> $x >> [1] 0 7 8 14 15 120 242 >> >> $y >> [1] 122.0000 128.0000 132.2857 158.0000 110.0000 -4930.0000 >> 110.0000 > > Same behaviour here on a more recent R (below), and I recall a posting > for a year or so back that reported similar behaviour in the then > current version of R.Actually, it is system-dependent as I get (on x86_64)> lowess(x,y, iter=3)lowess(): ns = 4 cmad = 0.25589 cmad = 0 cmad = 0.00583385 $x [1] 0 7 8 14 15 120 242 $y [1] 128.0000 128.0000 132.2857 158.0000 110.0000 109.9990 110.0000 having turned DEBUG_lowess on. So the issue is finite-precision arithmetic, once again. It seems rather a moot point as to what the right answer actually is here, and even if that found by Frank is indeed wrong, as lowess() is defined by an algorithm. Perhaps the best one can hope for is a good approximation to what the algorithm would give in infinite-precision arithmetic (having defined what should happen if cmod is zero). -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Frank Harrell wrote: [...]> Thank you Brian. It seems that no matter what is the right answer, the > answer currently returned on my system is clearly wrong. lowess()$y > should be constrained to be within range(y).Really? That assertion is offered without proof and I am pretty sure is incorrect. Consider> x <- c(1:10, 20) > y <- c(1:10, 5) + 0.01*rnorm(11) > lowess(x,y)$x [1] 1 2 3 4 5 6 7 8 9 10 20 $y [1] 0.9983192 1.9969599 2.9960805 3.9948224 4.9944158 5.9959855 [7] 6.9964400 7.9981434 8.9990607 10.0002567 19.9946117 Remember that lowess is a local *linear* fitting method, and may give zero weight to some data points, so (as here) it can extrapolate. After reading what src/appl/lowess.doc says should happen with zero weights, I think the answer given on Frank's system probably is the correct one. Rounding error is determining which of the last two points is given zero robustness weight: on a i686 system both of the last two are, and on mine only the last is. As far as I can tell in infinite-precision arithmetic both would be zero, and hence the value at x=120 would be found by extrapolation from those (far) to the left of it. I am inclined to think that the best course of action is to quit with a warning when the MAD of the residuals is effectively zero. However, we need to be careful not to call things 'bugs' that we do not understand well enough. This might be a design error in lowess, but it is not AFAICS a bug in the implementation. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595