inaki at goisolutions.net
2006-Nov-23 17:01 UTC
[R] nonlinear regression-getting the explained variation
Hi, I'm trying to teach myself R, and by the way, re-learning statistics using Crawley's "Statistics: an introduction using R". I've reached the regression chapter, and when it deals with non-linear regresion using the nls library I face the following problem: I follow the steps--->deer<-read.table("c:\\temp\\jaws.txt",header=T)---data available at http://www.bio.ic.ac.uk/research/crawley/statistics/data/>attach(deer) >names(deer) >library(nls) >model2<-nls(bone~a*(1-exp(-c*age)),start=list(a=120,c=0.064)) >summary(model2)I've taken away those steps that regard plotting. And everything goes fine, I understand the steps taken and so, but on the text Crawley says "the model... and explained 84.6% of the total variation in bone lenght". I guess this is linked to the adjusted squared R for linear models, but I just can't find how to get it in this case... I've tried in Statgraphics, and it plots the anova table and r^2 right the way, how could I do so in R? Thanks in advance, and sorry to bother I?aki
Douglas Bates
2006-Nov-26 16:23 UTC
[R] nonlinear regression-getting the explained variation
On 11/23/06, inaki at goisolutions.net <inaki at goisolutions.net> wrote:> I'm trying to teach myself R, and by the way, re-learning statistics using > Crawley's "Statistics: an introduction using R". > I've reached the regression chapter, and when it deals with non-linear > regresion using the nls library I face the following problem:> I follow the steps--- > >deer<-read.table("c:\\temp\\jaws.txt",header=T) > ---data available at > http://www.bio.ic.ac.uk/research/crawley/statistics/data/ > >attach(deer) > >names(deer) > >library(nls) > >model2<-nls(bone~a*(1-exp(-c*age)),start=list(a=120,c=0.064)) > >summary(model2) > I've taken away those steps that regard plotting.But I hope you did actually plot the data when you tried this exercise. Also, you can, if you wish, read the data from the URL without copying the data to a local file.> deer <- read.table("http://www.bio.ic.ac.uk/research/crawley/statistics/data/jaws.txt", header = TRUE) > model2<-nls(bone~a*(1-exp(-c*age)),deer, start=list(a=120,c=0.064)) > summary(model2)Formula: bone ~ a * (1 - exp(-c * age)) Parameters: Estimate Std. Error t value Pr(>|t|) a 115.58056 2.84365 40.645 < 2e-16 c 0.11882 0.01233 9.635 3.69e-13 Residual standard error: 13.1 on 52 degrees of freedom If you want to make things even easier you could use the self-starting model SSasympOrig as> summary(fm1 <- nls(bone ~ SSasympOrig(age, Asym, lrc), deer))Formula: bone ~ SSasympOrig(age, Asym, lrc) Parameters: Estimate Std. Error t value Pr(>|t|) Asym 115.5805 2.8436 40.65 <2e-16 lrc -2.1302 0.1038 -20.52 <2e-16 Residual standard error: 13.1 on 52 degrees of freedom The parameter lrc for that model is the natural logarithm of the rate constant 'c'.> And everything goes fine, I understand the steps taken and so, but on the > text Crawley says "the model... and explained 84.6% of the total variation > in bone lenght". > I guess this is linked to the adjusted squared R for linear models, but I > just can't find how to get it in this case...Actually it's just an R-squared, not an adjusted R-squared. It is being calculated as 1 - RSS/AdjSS where RSS is the residual sum of squares from the fitted model and AdjSS is the "adjusted sum of squares" or the sum of squares of the deviations of the observed responses from their mean. In this case the values are> sum(resid(model2)^2) # RSS[1] 8929.143> with(deer, sum((bone - mean(bone))^2)) # AdjSS[1] 59007.99 so the value of R^2 from the formula is> with(deer, 1 - sum(resid(model2)^2)/sum((bone - mean(bone))^2))[1] 0.848679> I've tried in Statgraphics, and it plots the anova table and r^2 right the > way, how could I do so in R?Yes, many software packages that fit nonlinear regression models do produce an anova table and an R^2 value for any model, even when that table and the R^2 value do not apply. (I'm assuming that the anova table is based on dividing the adjusted sum of squares into "model" and "residual" components so it is essentially the same calculation as above.) It would not be difficult to include these values in the summary output from an nls model in R but we don't because they could be nonsense. To see why we must examine what the R^2 should represent. First you should read what Bill Venables has to say on the general subject of analysis of variance for linear models. Use install.packages("fortunes"); library(fortunes); fortune("curious") All the summaries like an anova table or an R^2 value are based on the comparison of two model fits - the model we just fit to the data and the corresponding "trivial" model. The trivial model is either an arbitrary constant or zero, depending on whether the model consisting of a constant only can be embedded in the model we have fit. If we can select values of the parameters that turn our model into a one-parameter model consisting of a constant then the comparison sum of squares is AdjSS as above because the parameter estimate in the trivial model is mean(response). If we can't embed the constant model in our model then the appropriate comparison sum of squares is the unadjusted sum of squares sum(response^2) Technically we can't embed the model for which the predictions are an arbitrary constant in this model because of that point at age == 0. No matter how we change the parameters 'a' and 'c' in a*(1-exp(-c*age)) we always predict zero at age == 0. Thus the only model with constant predictions that is embedded in this model is the model that predicts 0 for all ages so we should use the unadjusted sum of squares. However, if we ignore the point at age == 0 (which doesn't contribute any information to the model fit) then we can get a constant model by letting c go to +Inf. As c goes to +Inf the conditional estimate of a goes to mean(bone) so the constant model is embedded in the model we have fit. We don't include the R^2 or the anova table in the summary output for a nonlinear regression model because we can't tell if these are the appropriate formulas. Look at the model in ?SSfol That's a common nonlinear regression model in pharmacokinetics and it can't be reduced to a constant other than zero. A person with good calculus skills can discern that but I certainly wouldn't want to write a computer program to decide whether a given nonlinear model can be reduced to a nonzero constant. I don't regard the fact that other programs will produce summary quantities whether or not they apply to be an advantage. Such results only propagate the misconception that statistical modeling is about a collection of formulas. It's not. It's about models and you can't understand the modeling process if you don't understand the models.> Thanks in advance, and sorry to botherYou're welcome. Such questions are not a bother. You were very clear about what you are trying to determine and you provided a reproducible example.> I?aki > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >