Dear all, I am trying to analyze some non-linear data to which I have fit a curve of the following form: dum <- nls(y~(A + (B*x)/(C+x)), start = list(A=370,B=100,C=23000)) I am wondering if there is any way to determine meaningful quality of fit statistics from the nls function? A summary yields highly significant p-values, but it is my impression that these are questionable at best given the iterative nature of the fit:> summary(dum)Formula: y ~ (A + (B * x)/(C + x)) Parameters: Estimate Std. Error t value Pr(>|t|) A 388.753 4.794 81.090 < 2e-16 *** B 115.215 5.006 23.015 < 2e-16 *** C 20843.832 4646.937 4.485 1.12e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 18.25 on 245 degrees of freedom Number of iterations to convergence: 4 Achieved convergence tolerance: 2.244e-06 Is there any other means of determining the quality of the curve fit? I have tried applying confidence intervals using confint(dum), but these curves seem unrealistically narrow. Thanks so much for your help! -Max [[alternative HTML version deleted]]
Inline below. -- Bert On Thu, Jan 26, 2012 at 12:16 PM, Max Brondfield <max.brondfield at gmail.com> wrote:> Dear all, > I am trying to analyze some non-linear data to which I have fit a curve of > the following form: > > dum <- nls(y~(A + (B*x)/(C+x)), start = list(A=370,B=100,C=23000)) > > I am wondering if there is any way to determine meaningful quality of fit > statistics from the nls function? > > A summary yields highly significant p-values, but it is my impression that > these are questionable at best given the iterative nature of the fit:No. They are questionable primarily because there is no clear null model. They are based on profile likelihoods (as ?confint tells you), which may or may not be what you want for "goodness of fit." One can always get "goodness of fit" statistics but the question in nonlinear models is: goodness of fit with respect to what? So the answer to your question is: if you know what you're doing, certainly. Otherwise, find someone who does.> >> summary(dum) > > Formula: y ~ (A + (B * x)/(C + x)) > > Parameters: > ? Estimate Std. Error t value Pr(>|t|) > A ? 388.753 ? ? ?4.794 ?81.090 ?< 2e-16 *** > B ? 115.215 ? ? ?5.006 ?23.015 ?< 2e-16 *** > C 20843.832 ? 4646.937 ? 4.485 1.12e-05 *** > --- > Signif. codes: ?0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 18.25 on 245 degrees of freedom > > Number of iterations to convergence: 4 > Achieved convergence tolerance: 2.244e-06 > > > Is there any other means of determining the quality of the curve fit? I > have tried applying confidence intervals using confint(dum), but these > curves seem unrealistically narrow. Thanks so much for your help! > -Max > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Peter and Bert have already made some pertinent remarks. This comment is a bit tangential, but in the same flavour. As they note, it is "goodness of fit relative to what?" that is important. As a matter of course when doing nonlinear least squares, I generally compute the quantity [1 - residual_sumsquares/(total sum of squares)]. In linear modelling this is usually called R-squared, but I don't want to create a firestorm of complaints by suggesting it be called that here. I'm not doing anything here other than a check for silly results. All I'm suggesting is that a comparison to the model that is the mean of the variable being fitted is a minimal sanity check. Surely we should be able to do better than the mean? It's saved me from wasting time on several occasions, sometimes because the model proposed was really wrong, sometimes because there was a nuisance local minimum well away from a solution, and most often due to a silly typo in setting things up. And it can usually be computed within a cat() statement. Best, John Nash On 01/27/2012 06:00 AM, r-help-request at r-project.org wrote:> Message: 81 > Date: Fri, 27 Jan 2012 10:58:04 +0100 > From: peter dalgaard <pdalgd at gmail.com> > To: Bert Gunter <gunter.berton at gene.com> > Cc: Max Brondfield <max.brondfield at gmail.com>, r-help at r-project.org > Subject: Re: [R] Quality of fit statistics for NLS? > Message-ID: <BDC6D36D-F152-41E0-87DC-38A28CCF3B53 at gmail.com> > Content-Type: text/plain; charset=windows-1252 > > > On Jan 26, 2012, at 22:51 , Bert Gunter wrote: > >> > Inline below. >> > >> > -- Bert >> > >> > On Thu, Jan 26, 2012 at 12:16 PM, Max Brondfield >> > <max.brondfield at gmail.com> wrote: >>> >> Dear all, >>> >> I am trying to analyze some non-linear data to which I have fit a curve of >>> >> the following form: >>> >> >>> >> dum <- nls(y~(A + (B*x)/(C+x)), start = list(A=370,B=100,C=23000)) >>> >> >>> >> I am wondering if there is any way to determine meaningful quality of fit >>> >> statistics from the nls function? >>> >> >>> >> A summary yields highly significant p-values, but it is my impression that >>> >> these are questionable at best given the iterative nature of the fit: >> > No. They are questionable primarily because there is no clear null >> > model. They are based on profile likelihoods (as ?confint tells you), >> > which may or may not be what you want for "goodness of fit."....