Hello When I run scatter.smooth(jitter(weight), jitter(height2), span = .25, evaluation = 50, pch = '.') I get the type of graph I thought I would get, but also a warning..... k-d tree limited by memory. ncmax= 528 I always get concerned when there are warnings I don't understand. What's a k-d tree? Is this something to be concerned about? Thanks Peter Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research National Development and Research Institutes 71 W. 23rd St www.peterflom.com New York, NY 10010 (212) 845-4485 (voice) (917) 438-0894 (fax)
Martin Maechler
2003-Sep-05 14:36 UTC
[R] scatter.smooth warning "k-d tree" --> "loess bowels"
>>>>> "Peter" == Peter Flom <flom at ndri.org> >>>>> on Thu, 04 Sep 2003 13:28:12 -0400 writes:Peter> Hello When I run Peter> scatter.smooth(jitter(weight), jitter(height2), span Peter> = .25, evaluation = 50, pch = '.') Peter> I get the type of graph I thought I would get, but Peter> also a warning..... (and not an "error" as said in the original Subject) Peter> k-d tree limited by memory. ncmax= 528 Peter> I always get concerned when there are warnings I Peter> don't understand. What's a k-d tree? Is this Peter> something to be concerned about? scatter.smooth() builds on loess() and the reference in help(loess) is chapter 8 of "the white book", W.S. Cleveland, E. Grosse and W.M. Shyu (1992) Local regression models. Chapter 8 of _Statistical Models in S_ eds J.M. Chambers and T.J. Hastie, Wadsworth & Brooks/Cole. Specifically, Section 8.4.2, p.373-376 is what you need here. You can learn that a k-d tree is the data structure used to represent a particular kind of "rpart()"-like partitioning of the predictor space. (The fun part is in the subsection "Error Messages from the Bowels of Loess" where you learn why you can even get an error message "Chernobyl! ...") --- The warning means that the loess() approximation will be a bit more rough than might be desired., since help(loess.control) has >> Usage: >> >> loess.control(surface = c("interpolate", "direct"), >> statistics = c("approximate", "exact"), >> trace.hat = c("exact", "approximate"), >> cell = 0.2, iterations = 4, ...) >> >> Arguments: >> >> surface: should be fitted surface be computed exactly or via >> interpolation from a kd tree? By setting surface = "direct" you will certainly get rid of the above warning, but probably pay a (too) big performance penalty. Unfortunately the loess-underlying Fortran code is pretty messy (with many dozens of subroutines called ehg125(), ehg126(), ....) so that it's not obvious how to improve it to adapt memory usage to the size of the k-d tree used. I'm pretty sure that today's computers would allow much larger trees than the loess() algorithm was made to. Regards, Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 <><