Toby Marthews
2008-Aug-29 08:53 UTC
[R] nls() fails on a simple exponential fit, when lm() gets it right?
Dear R-help,
Here's a simple example of nonlinear curve fitting where nls seems to get
the answer wrong on a very simple exponential fit (my R version 2.7.2).
Look at this code below for a very basic curve fit using nls to fit to (a)
a logarithmic and (b) an exponential curve. I did the fits using
self-start functions and I compared the results with a more simple fit
using a straight lm() command.
The logarithmic fit works 100% correctly below. The problem is with the
exponential fit: the nls command gives the wrong values and I have
completely failed to see why this should happen. I can't see any misake in
my self-start function, so why the mismatch?? Even worse, if I give nls a
trivial fit by removing the "#&#&" on line 6, it fails to
converge at all
when the 'simpler' method with lm() easily gives the right answer
(r=0.03
and A=5).
I did the same exp and ln fits using MS Excel and the numbers returned are
exactly as for the 'simpler ways', so nls is definitely in the wrong,
but
the self-start is almost identical to the one for the logarithmic fit,
which works perfectly ....
I can't see my mistake at all. Can anyone help??
Thanks,
Toby Marthews
traceson=FALSE;maxint=10000;minstepfactor=2^(-30);tolerance=1e-7
xtxt="DBH in mm";ytxt="Height in m"
dbh=c(0.9,1.0,1.1,1.2,4.8,4.9,5.0,5.1,8.9,9.0,9.1,9.2,11.8,11.9,12.0,12.1,14.9,15.1,15.2,15.5)
height=c(5.770089,5.154794,4.888847,6.356770,14.849109,14.973146,15.423816,14.865038,21.335985,20.477624,20.915823,21.886004,23.714724,24.182210,23.954929,23.784659,25.334360,25.411320,26.218614,25.612478)
#&#&#height=5*exp(0.03*dbh)
logarithmicInit=function(mCall,LHS,data) {
xy=sortedXyData(mCall[["x"]],LHS,data)
aaa=0.01 #Just guess aaa=0.01 to start the fit
bbb=min(xy$y) #Use the minimum y value as an initial guess for bbb
value=c(aaa,bbb)
names(value)=mCall[c("aaa","bbb")]
return(value)
}
fmla=as.formula("~(aaa*log(x)+bbb)")
SSlogarithmic=selfStart(fmla,initial=logarithmicInit,parameters=c("aaa","bbb"))
exponenInit=function(mCall,LHS,data) {
xy=sortedXyData(mCall[["x"]],LHS,data)
r=0.01 #Just guess r=0.01 to start the fit
A=min(xy$y) #Use the minimum y value as an initial guess for A
value=c(r,A)
names(value)=mCall[c("r","A")]
return(value)
}
fmla=as.formula("~(A*exp(r*x))")
SSexponen=selfStart(fmla,initial=exponenInit,parameters=c("r","A"))
cat("Logarithmic fit (here, the self-start and the 'simpler' way
match):\n")
tmp=getInitial(height~SSlogarithmic(dbh,aaa,bbb),data=data.frame(dbh,height))
cat("(Starting values for the logarithmic fit were:
aaa=",tmp[1],"and
bbb=",tmp[2],")\n")
modelfit=nls(height~SSlogarithmic(x=dbh,aaa,bbb),trace=traceson,control=list(maxiter=maxint,tol=tolerance,minFactor=minstepfactor))
bfaaa=summary(modelfit)$coefficients[1];bfbbb=summary(modelfit)$coefficients[2]
cat("Best logarithmic fit is
(",ytxt,")=(aaa*log(",xtxt,")+bbb) with
parameters aaa=",bfaaa,"and bbb=",bfbbb,"\n")
#Produces: Best logarithmic fit is ( Height in m )=(aaa*log( DBH in mm
)+bbb) with parameters aaa= 7.551666 and bbb= 4.594367
lnfit=lm(height~log(dbh)) #This is doing the ln fit without a self-start
function
cat("Done another, simpler, way, the best logarithmic fit has parameters:
aaa=",lnfit$coefficients[2],"and
bbb=",lnfit$coefficients[1],"\n")
#Produces: Done another, simpler, way, the best logarithmic fit has
parameters: aaa= 7.551666 and bbb= 4.594367
cat("Exponential fit (here, the self-start and the 'simpler' way
DON'T
match):\n")
tmp=getInitial(height~SSexponen(dbh,r,A),data=data.frame(dbh,height))
cat("(Starting values for the exponential fit: r=",tmp[1],"and
A=",tmp[2],")\n")
modelfit=nls(height~SSexponen(x=dbh,r,A),trace=traceson,control=list(maxiter=maxint,tol=tolerance,minFactor=minstepfactor))
bfr=summary(modelfit)$coefficients[1];bfA=summary(modelfit)$coefficients[2]
cat("Best exponential fit is
(",ytxt,")=(A*exp(r*(",xtxt,"))) with
parameters r=",bfr,"and A=",bfA,"\n")
#Produces: Best exponential fit is ( Height in m )=(A*exp(r*( DBH in mm
))) with parameters r= 0.07134055 and A= 9.47907
expfit=lm(log(height)~dbh) #This is doing the exp fit without a self-start
function
cat("Done another, simpler, way, the best exponential fit has parameters:
r=",expfit$coefficients[2],"and
A=",exp(expfit$coefficients[1]),"\n")
#Produces: Done another, simpler, way, the best exponential fit has
parameters: r= 0.1028805 and A= 6.75045
Peter Dalgaard
2008-Aug-29 12:24 UTC
[R] nls() fails on a simple exponential fit, when lm() gets it right?
Toby Marthews wrote:> Dear R-help, > > Here's a simple example of nonlinear curve fitting where nls seems to get > the answer wrong on a very simple exponential fit (my R version 2.7.2). > > Look at this code below for a very basic curve fit using nls to fit to (a) > a logarithmic and (b) an exponential curve. I did the fits using > self-start functions and I compared the results with a more simple fit > using a straight lm() command. > > The logarithmic fit works 100% correctly below. The problem is with the > exponential fit: the nls command gives the wrong values and I have > completely failed to see why this should happen. I can't see any misake in > my self-start function, so why the mismatch?? Even worse, if I give nls a > trivial fit by removing the "#&#&" on line 6, it fails to converge at all > when the 'simpler' method with lm() easily gives the right answer (r=0.03 > and A=5). > > I did the same exp and ln fits using MS Excel and the numbers returned are > exactly as for the 'simpler ways', so nls is definitely in the wrong, but > the self-start is almost identical to the one for the logarithmic fit, > which works perfectly .... > > I can't see my mistake at all. Can anyone help?? >It's in the theory. A least squares fit to log(Y) is not equivalent to a least squares fit to Y. The latter assigns more weight to larger values. Fitting on a log scale is warranted if the error variance after transformation is independent of the magnitude of the response. This is pretty patently wrong for your data: look at plot(log(height)~dbh), which by the way also shows that the exponential model is wrong for these data. So EXCEL is definitely in the wrong!!!!!!> Thanks, > Toby Marthews > > > traceson=FALSE;maxint=10000;minstepfactor=2^(-30);tolerance=1e-7 > xtxt="DBH in mm";ytxt="Height in m" > dbh=c(0.9,1.0,1.1,1.2,4.8,4.9,5.0,5.1,8.9,9.0,9.1,9.2,11.8,11.9,12.0,12.1,14.9,15.1,15.2,15.5) > height=c(5.770089,5.154794,4.888847,6.356770,14.849109,14.973146,15.423816,14.865038,21.335985,20.477624,20.915823,21.886004,23.714724,24.182210,23.954929,23.784659,25.334360,25.411320,26.218614,25.612478) > #&#&#height=5*exp(0.03*dbh) > > logarithmicInit=function(mCall,LHS,data) { > xy=sortedXyData(mCall[["x"]],LHS,data) > aaa=0.01 #Just guess aaa=0.01 to start the fit > bbb=min(xy$y) #Use the minimum y value as an initial guess for bbb > value=c(aaa,bbb) > names(value)=mCall[c("aaa","bbb")] > return(value) > } > fmla=as.formula("~(aaa*log(x)+bbb)") > SSlogarithmic=selfStart(fmla,initial=logarithmicInit,parameters=c("aaa","bbb")) > > exponenInit=function(mCall,LHS,data) { > xy=sortedXyData(mCall[["x"]],LHS,data) > r=0.01 #Just guess r=0.01 to start the fit > A=min(xy$y) #Use the minimum y value as an initial guess for A > value=c(r,A) > names(value)=mCall[c("r","A")] > return(value) > } > fmla=as.formula("~(A*exp(r*x))") > SSexponen=selfStart(fmla,initial=exponenInit,parameters=c("r","A")) > > cat("Logarithmic fit (here, the self-start and the 'simpler' way match):\n") > tmp=getInitial(height~SSlogarithmic(dbh,aaa,bbb),data=data.frame(dbh,height)) > cat("(Starting values for the logarithmic fit were: aaa=",tmp[1],"and > bbb=",tmp[2],")\n") > modelfit=nls(height~SSlogarithmic(x=dbh,aaa,bbb),trace=traceson,control=list(maxiter=maxint,tol=tolerance,minFactor=minstepfactor)) > bfaaa=summary(modelfit)$coefficients[1];bfbbb=summary(modelfit)$coefficients[2] > cat("Best logarithmic fit is (",ytxt,")=(aaa*log(",xtxt,")+bbb) with > parameters aaa=",bfaaa,"and bbb=",bfbbb,"\n") > #Produces: Best logarithmic fit is ( Height in m )=(aaa*log( DBH in mm > )+bbb) with parameters aaa= 7.551666 and bbb= 4.594367 > > lnfit=lm(height~log(dbh)) #This is doing the ln fit without a self-start > function > cat("Done another, simpler, way, the best logarithmic fit has parameters: > aaa=",lnfit$coefficients[2],"and bbb=",lnfit$coefficients[1],"\n") > #Produces: Done another, simpler, way, the best logarithmic fit has > parameters: aaa= 7.551666 and bbb= 4.594367 > > cat("Exponential fit (here, the self-start and the 'simpler' way DON'T > match):\n") > tmp=getInitial(height~SSexponen(dbh,r,A),data=data.frame(dbh,height)) > cat("(Starting values for the exponential fit: r=",tmp[1],"and > A=",tmp[2],")\n") > modelfit=nls(height~SSexponen(x=dbh,r,A),trace=traceson,control=list(maxiter=maxint,tol=tolerance,minFactor=minstepfactor)) > bfr=summary(modelfit)$coefficients[1];bfA=summary(modelfit)$coefficients[2] > cat("Best exponential fit is (",ytxt,")=(A*exp(r*(",xtxt,"))) with > parameters r=",bfr,"and A=",bfA,"\n") > #Produces: Best exponential fit is ( Height in m )=(A*exp(r*( DBH in mm > ))) with parameters r= 0.07134055 and A= 9.47907 > > expfit=lm(log(height)~dbh) #This is doing the exp fit without a self-start > function > cat("Done another, simpler, way, the best exponential fit has parameters: > r=",expfit$coefficients[2],"and A=",exp(expfit$coefficients[1]),"\n") > #Produces: Done another, simpler, way, the best exponential fit has > parameters: r= 0.1028805 and A= 6.75045 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907