Liaw, Andy
2004-Dec-06 02:26 UTC
[R] What is the most useful way to detect nonlinearity in lo
> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of > Ted.Harding at nessie.mcc.ac.uk > Sent: Sunday, December 05, 2004 7:14 PM > To: r-help at stat.math.ethz.ch > Subject: Re: [R] What is the most useful way to detect > nonlinearity in lo > > > On 05-Dec-04 Peter Dalgaard wrote: > > (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> writes: > > > >> >> x <- runif(500) > >> >> y <- rbinom(500,size=1,p=plogis(x)) > >> >> xx <- predict(loess(resid(glm(y~x,binomial))~x),se=T) > >> >> matplot(x,cbind(xx$fit, 2*xx$se.fit, -2*xx$se.fit),pch=20) > >> >> > >> >> Not sure my money isn't still on the splines, though. > > ..... > >> > Serves me right for posting way beyond my bedtime... > >> > >> Hi Peter, > >> > >> Yes, the above is certainly misleading (try it with 2000 instead > >> of 500)! But what would you suggest instead? > > > > (I did and this little computer came tumbling down...). > > So did mine -- but at 5000 (which is the value I first tried): > lots of disk grinding and then it went "prprprprp" and wrote > words to the effect "Calloc cannot allocate (18790050 times 4)" > i.e. it needed 72MB, which bankrupted my 192MB baby. > > 2000 was OK, however, but I had plenty of time for a meal etc. > before it finished. > > Which brings up that predict(loess(....)) seems to be very > memory-hungry.locfit to the rescue, perhaps?> library(locfit) > n <- 5000 > x <- sort(runif(n)) > y <- rbinom(n, size=1, p=plogis(x)) > system.time(xx <- predict(locfit(resid(glm(y~x, binomial))~x),where="data", + se=TRUE), gcFirst=TRUE) [1] 0.79 0.00 0.84 NA NA> matplot(x, cbind(xx$fit, 2*xx$se.fit, -2*xx$se.fit), pch=20)[The plot looks strange...] This is on my mobile Pentium 1.6GHz w/512MB laptop. Using loess it also ran out of memory. At n=2000, the loess route took just under 3 seconds. Cheers, Andy> > Basically, I'd reconsider the type= option to residual.glm. > As I said, > > at least type="response" should have the right mean. Ideally, you'd > > want to take advantage of the fact that the variance of the > residuals > > is known too, rather than have the smoother estimate it. The more I > > think, the more I like the splines... > > I'll have a go at your suggestions (if I can get the syntax > right ... ) > > Thanks, > Ted. > > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> > Fax-to-email: +44 (0)870 094 0861 [NB: New number!] > Date: 06-Dec-04 Time: 00:13:53 > ------------------------------ XFMail ------------------------------ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >