Hello, I am wondering if there is an easy way to combine loess() with glm() to produce a locally fitted generalised regression. I have a data set of about 5,000 observations and 5 explanatory variables, with a binary outcome. One of the explanatory variables (lets call it X) is much more predictive than the others. A single glm() regression over the entire data set produces rather poor results, so I have split the data based on sub ranges of X, and performed a separate glm() regression on each subset. This produces much more satisfactory results, but the problem is that at the boundaries, the result hyper-surfaces don't coincide. I am using this model in a predictive role so that given a new observation on the 5 explanatory variables, I want to predict the probability of a positive outcome (actually whether a protein has a certain conformation or not). At the boundary determined by the value of X, my prediction has a discontinuity, which is not very satisfactory. My solution has been to take a weighted average of the results of adjacent models for cases where X is close to a boundary so as to smooth over the discontinuities. Although this works, it seems rather simplistic and arbitrary in terms of choices about how and where the weighed averages are computed. It seems to me that what I am doing is a kind of poor mans loess. Can anyone suggest a better way to deal with this analysis ? I have only a sketchy knowledge of loess. Thanks, Luke Whitaker Inpharmatica -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
You could take a look at gam() in package mgcv - it will fit thin plate spline like multi-dimensional smoothers (and other models) in a GLM setting. (With 5000 observations you'd probably want to check the help files for some simple tricks to speed up fitting, though.) Simon> I am wondering if there is an easy way to combine loess() with glm() > to produce a locally fitted generalised regression. > > I have a data set of about 5,000 observations and 5 explanatory variables, > with a binary outcome. One of the explanatory variables (lets call it X) > is much more predictive than the others. A single glm() regression over > the entire data set produces rather poor results, so I have split the > data based on sub ranges of X, and performed a separate glm() regression > on each subset. > > This produces much more satisfactory results, but the problem is that > at the boundaries, the result hyper-surfaces don't coincide. > > I am using this model in a predictive role so that given a new observation > on the 5 explanatory variables, I want to predict the probability of a > positive outcome (actually whether a protein has a certain conformation > or not). At the boundary determined by the value of X, my prediction has > a discontinuity, which is not very satisfactory. My solution has been to > take a weighted average of the results of adjacent models for cases where X > is close to a boundary so as to smooth over the discontinuities. Although > this works, it seems rather simplistic and arbitrary in terms of choices > about how and where the weighed averages are computed. It seems to me > that what I am doing is a kind of poor mans loess. > > Can anyone suggest a better way to deal with this analysis ? I have only > a sketchy knowledge of loess. > > Thanks, > > Luke Whitaker > Inpharmatica > > > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
If you really _must_ use loess, the gam() function in Splus allows you to use loess terms (e.g., as lo(x)). In R, the gam() function in the mgcv package uses splines. However, it sounds like you don't have to use loess, so mgcv should be sufficient. There's an article on using gam() in mgcv in R News, I believe about two issues back. Prof. Harrell's "Design" library also has functions that allow spline terms in logistic regression. If you know roughly where the "break point" is, you can even just do glm with ns() or bs() terms, so you're still essentially fitting a parametric model. HTH, Andy -----Original Message----- From: Luke Whitaker [mailto:luke at inpharmatica.co.uk] Sent: Thursday, October 31, 2002 10:10 AM To: r-help Subject: [R] Loess with glm ? Hello, I am wondering if there is an easy way to combine loess() with glm() to produce a locally fitted generalised regression. I have a data set of about 5,000 observations and 5 explanatory variables, with a binary outcome. One of the explanatory variables (lets call it X) is much more predictive than the others. A single glm() regression over the entire data set produces rather poor results, so I have split the data based on sub ranges of X, and performed a separate glm() regression on each subset. This produces much more satisfactory results, but the problem is that at the boundaries, the result hyper-surfaces don't coincide. I am using this model in a predictive role so that given a new observation on the 5 explanatory variables, I want to predict the probability of a positive outcome (actually whether a protein has a certain conformation or not). At the boundary determined by the value of X, my prediction has a discontinuity, which is not very satisfactory. My solution has been to take a weighted average of the results of adjacent models for cases where X is close to a boundary so as to smooth over the discontinuities. Although this works, it seems rather simplistic and arbitrary in terms of choices about how and where the weighed averages are computed. It seems to me that what I am doing is a kind of poor mans loess. Can anyone suggest a better way to deal with this analysis ? I have only a sketchy knowledge of loess. Thanks, Luke Whitaker Inpharmatica -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
I believe the `locfit' package does what you want. -roger _______________________________ UCLA Department of Statistics rpeng at stat.ucla.edu http://www.stat.ucla.edu/~rpeng On Thu, 31 Oct 2002, Luke Whitaker wrote:> > Hello, > > I am wondering if there is an easy way to combine loess() with glm() > to produce a locally fitted generalised regression. > > I have a data set of about 5,000 observations and 5 explanatory variables, > with a binary outcome. One of the explanatory variables (lets call it X) > is much more predictive than the others. A single glm() regression over > the entire data set produces rather poor results, so I have split the > data based on sub ranges of X, and performed a separate glm() regression > on each subset. > > This produces much more satisfactory results, but the problem is that > at the boundaries, the result hyper-surfaces don't coincide. > > I am using this model in a predictive role so that given a new observation > on the 5 explanatory variables, I want to predict the probability of a > positive outcome (actually whether a protein has a certain conformation > or not). At the boundary determined by the value of X, my prediction has > a discontinuity, which is not very satisfactory. My solution has been to > take a weighted average of the results of adjacent models for cases where X > is close to a boundary so as to smooth over the discontinuities. Although > this works, it seems rather simplistic and arbitrary in terms of choices > about how and where the weighed averages are computed. It seems to me > that what I am doing is a kind of poor mans loess. > > Can anyone suggest a better way to deal with this analysis ? I have only > a sketchy knowledge of loess. > > Thanks, > > Luke Whitaker > Inpharmatica > > > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._