Toby Marthews
2008-Aug-29 08:53 UTC
[R] nls() fails on a simple exponential fit, when lm() gets it right?
Dear R-help, Here's a simple example of nonlinear curve fitting where nls seems to get the answer wrong on a very simple exponential fit (my R version 2.7.2). Look at this code below for a very basic curve fit using nls to fit to (a) a logarithmic and (b) an exponential curve. I did the fits using self-start functions and I compared the results with a more simple fit using a straight lm() command. The logarithmic fit works 100% correctly below. The problem is with the exponential fit: the nls command gives the wrong values and I have completely failed to see why this should happen. I can't see any misake in my self-start function, so why the mismatch?? Even worse, if I give nls a trivial fit by removing the "#&#&" on line 6, it fails to converge at all when the 'simpler' method with lm() easily gives the right answer (r=0.03 and A=5). I did the same exp and ln fits using MS Excel and the numbers returned are exactly as for the 'simpler ways', so nls is definitely in the wrong, but the self-start is almost identical to the one for the logarithmic fit, which works perfectly .... I can't see my mistake at all. Can anyone help?? Thanks, Toby Marthews traceson=FALSE;maxint=10000;minstepfactor=2^(-30);tolerance=1e-7 xtxt="DBH in mm";ytxt="Height in m" dbh=c(0.9,1.0,1.1,1.2,4.8,4.9,5.0,5.1,8.9,9.0,9.1,9.2,11.8,11.9,12.0,12.1,14.9,15.1,15.2,15.5) height=c(5.770089,5.154794,4.888847,6.356770,14.849109,14.973146,15.423816,14.865038,21.335985,20.477624,20.915823,21.886004,23.714724,24.182210,23.954929,23.784659,25.334360,25.411320,26.218614,25.612478) #&#&#height=5*exp(0.03*dbh) logarithmicInit=function(mCall,LHS,data) { xy=sortedXyData(mCall[["x"]],LHS,data) aaa=0.01 #Just guess aaa=0.01 to start the fit bbb=min(xy$y) #Use the minimum y value as an initial guess for bbb value=c(aaa,bbb) names(value)=mCall[c("aaa","bbb")] return(value) } fmla=as.formula("~(aaa*log(x)+bbb)") SSlogarithmic=selfStart(fmla,initial=logarithmicInit,parameters=c("aaa","bbb")) exponenInit=function(mCall,LHS,data) { xy=sortedXyData(mCall[["x"]],LHS,data) r=0.01 #Just guess r=0.01 to start the fit A=min(xy$y) #Use the minimum y value as an initial guess for A value=c(r,A) names(value)=mCall[c("r","A")] return(value) } fmla=as.formula("~(A*exp(r*x))") SSexponen=selfStart(fmla,initial=exponenInit,parameters=c("r","A")) cat("Logarithmic fit (here, the self-start and the 'simpler' way match):\n") tmp=getInitial(height~SSlogarithmic(dbh,aaa,bbb),data=data.frame(dbh,height)) cat("(Starting values for the logarithmic fit were: aaa=",tmp[1],"and bbb=",tmp[2],")\n") modelfit=nls(height~SSlogarithmic(x=dbh,aaa,bbb),trace=traceson,control=list(maxiter=maxint,tol=tolerance,minFactor=minstepfactor)) bfaaa=summary(modelfit)$coefficients[1];bfbbb=summary(modelfit)$coefficients[2] cat("Best logarithmic fit is (",ytxt,")=(aaa*log(",xtxt,")+bbb) with parameters aaa=",bfaaa,"and bbb=",bfbbb,"\n") #Produces: Best logarithmic fit is ( Height in m )=(aaa*log( DBH in mm )+bbb) with parameters aaa= 7.551666 and bbb= 4.594367 lnfit=lm(height~log(dbh)) #This is doing the ln fit without a self-start function cat("Done another, simpler, way, the best logarithmic fit has parameters: aaa=",lnfit$coefficients[2],"and bbb=",lnfit$coefficients[1],"\n") #Produces: Done another, simpler, way, the best logarithmic fit has parameters: aaa= 7.551666 and bbb= 4.594367 cat("Exponential fit (here, the self-start and the 'simpler' way DON'T match):\n") tmp=getInitial(height~SSexponen(dbh,r,A),data=data.frame(dbh,height)) cat("(Starting values for the exponential fit: r=",tmp[1],"and A=",tmp[2],")\n") modelfit=nls(height~SSexponen(x=dbh,r,A),trace=traceson,control=list(maxiter=maxint,tol=tolerance,minFactor=minstepfactor)) bfr=summary(modelfit)$coefficients[1];bfA=summary(modelfit)$coefficients[2] cat("Best exponential fit is (",ytxt,")=(A*exp(r*(",xtxt,"))) with parameters r=",bfr,"and A=",bfA,"\n") #Produces: Best exponential fit is ( Height in m )=(A*exp(r*( DBH in mm ))) with parameters r= 0.07134055 and A= 9.47907 expfit=lm(log(height)~dbh) #This is doing the exp fit without a self-start function cat("Done another, simpler, way, the best exponential fit has parameters: r=",expfit$coefficients[2],"and A=",exp(expfit$coefficients[1]),"\n") #Produces: Done another, simpler, way, the best exponential fit has parameters: r= 0.1028805 and A= 6.75045
Peter Dalgaard
2008-Aug-29 12:24 UTC
[R] nls() fails on a simple exponential fit, when lm() gets it right?
Toby Marthews wrote:> Dear R-help, > > Here's a simple example of nonlinear curve fitting where nls seems to get > the answer wrong on a very simple exponential fit (my R version 2.7.2). > > Look at this code below for a very basic curve fit using nls to fit to (a) > a logarithmic and (b) an exponential curve. I did the fits using > self-start functions and I compared the results with a more simple fit > using a straight lm() command. > > The logarithmic fit works 100% correctly below. The problem is with the > exponential fit: the nls command gives the wrong values and I have > completely failed to see why this should happen. I can't see any misake in > my self-start function, so why the mismatch?? Even worse, if I give nls a > trivial fit by removing the "#&#&" on line 6, it fails to converge at all > when the 'simpler' method with lm() easily gives the right answer (r=0.03 > and A=5). > > I did the same exp and ln fits using MS Excel and the numbers returned are > exactly as for the 'simpler ways', so nls is definitely in the wrong, but > the self-start is almost identical to the one for the logarithmic fit, > which works perfectly .... > > I can't see my mistake at all. Can anyone help?? >It's in the theory. A least squares fit to log(Y) is not equivalent to a least squares fit to Y. The latter assigns more weight to larger values. Fitting on a log scale is warranted if the error variance after transformation is independent of the magnitude of the response. This is pretty patently wrong for your data: look at plot(log(height)~dbh), which by the way also shows that the exponential model is wrong for these data. So EXCEL is definitely in the wrong!!!!!!> Thanks, > Toby Marthews > > > traceson=FALSE;maxint=10000;minstepfactor=2^(-30);tolerance=1e-7 > xtxt="DBH in mm";ytxt="Height in m" > dbh=c(0.9,1.0,1.1,1.2,4.8,4.9,5.0,5.1,8.9,9.0,9.1,9.2,11.8,11.9,12.0,12.1,14.9,15.1,15.2,15.5) > height=c(5.770089,5.154794,4.888847,6.356770,14.849109,14.973146,15.423816,14.865038,21.335985,20.477624,20.915823,21.886004,23.714724,24.182210,23.954929,23.784659,25.334360,25.411320,26.218614,25.612478) > #&#&#height=5*exp(0.03*dbh) > > logarithmicInit=function(mCall,LHS,data) { > xy=sortedXyData(mCall[["x"]],LHS,data) > aaa=0.01 #Just guess aaa=0.01 to start the fit > bbb=min(xy$y) #Use the minimum y value as an initial guess for bbb > value=c(aaa,bbb) > names(value)=mCall[c("aaa","bbb")] > return(value) > } > fmla=as.formula("~(aaa*log(x)+bbb)") > SSlogarithmic=selfStart(fmla,initial=logarithmicInit,parameters=c("aaa","bbb")) > > exponenInit=function(mCall,LHS,data) { > xy=sortedXyData(mCall[["x"]],LHS,data) > r=0.01 #Just guess r=0.01 to start the fit > A=min(xy$y) #Use the minimum y value as an initial guess for A > value=c(r,A) > names(value)=mCall[c("r","A")] > return(value) > } > fmla=as.formula("~(A*exp(r*x))") > SSexponen=selfStart(fmla,initial=exponenInit,parameters=c("r","A")) > > cat("Logarithmic fit (here, the self-start and the 'simpler' way match):\n") > tmp=getInitial(height~SSlogarithmic(dbh,aaa,bbb),data=data.frame(dbh,height)) > cat("(Starting values for the logarithmic fit were: aaa=",tmp[1],"and > bbb=",tmp[2],")\n") > modelfit=nls(height~SSlogarithmic(x=dbh,aaa,bbb),trace=traceson,control=list(maxiter=maxint,tol=tolerance,minFactor=minstepfactor)) > bfaaa=summary(modelfit)$coefficients[1];bfbbb=summary(modelfit)$coefficients[2] > cat("Best logarithmic fit is (",ytxt,")=(aaa*log(",xtxt,")+bbb) with > parameters aaa=",bfaaa,"and bbb=",bfbbb,"\n") > #Produces: Best logarithmic fit is ( Height in m )=(aaa*log( DBH in mm > )+bbb) with parameters aaa= 7.551666 and bbb= 4.594367 > > lnfit=lm(height~log(dbh)) #This is doing the ln fit without a self-start > function > cat("Done another, simpler, way, the best logarithmic fit has parameters: > aaa=",lnfit$coefficients[2],"and bbb=",lnfit$coefficients[1],"\n") > #Produces: Done another, simpler, way, the best logarithmic fit has > parameters: aaa= 7.551666 and bbb= 4.594367 > > cat("Exponential fit (here, the self-start and the 'simpler' way DON'T > match):\n") > tmp=getInitial(height~SSexponen(dbh,r,A),data=data.frame(dbh,height)) > cat("(Starting values for the exponential fit: r=",tmp[1],"and > A=",tmp[2],")\n") > modelfit=nls(height~SSexponen(x=dbh,r,A),trace=traceson,control=list(maxiter=maxint,tol=tolerance,minFactor=minstepfactor)) > bfr=summary(modelfit)$coefficients[1];bfA=summary(modelfit)$coefficients[2] > cat("Best exponential fit is (",ytxt,")=(A*exp(r*(",xtxt,"))) with > parameters r=",bfr,"and A=",bfA,"\n") > #Produces: Best exponential fit is ( Height in m )=(A*exp(r*( DBH in mm > ))) with parameters r= 0.07134055 and A= 9.47907 > > expfit=lm(log(height)~dbh) #This is doing the exp fit without a self-start > function > cat("Done another, simpler, way, the best exponential fit has parameters: > r=",expfit$coefficients[2],"and A=",exp(expfit$coefficients[1]),"\n") > #Produces: Done another, simpler, way, the best exponential fit has > parameters: r= 0.1028805 and A= 6.75045 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907