Nerak
2012-Jun-04  12:19 UTC
[R] Non-linear curve fitting (nls): starting point and quality of fit
Hi all, Like a lot of people I noticed that I get different results when I use nls in R compared to the exponential fit in excel. A bit annoying because often the R^2 is higher in excel but when I'm reading the different topics on this forum I kind of understand that using R is better than excel? (I don't really understand how the difference occurs, but I understand that there is a different way in fitting, in excel a single value can make the difference, in R it looks at the whole function? I read this: "Fitting a function is an approximation, trying to find a minimum. Think of frozen mountain lake surrounded by mountains. Excel's Solver will report the highest tip of the snowflake on the lake, if it finds it. nls will find out that the lake is essentially flat compare to the surrounding and tell you this fact in unkind word." ) I have several questions about nls: 1. The nls method doesn't give an R^2. But I want to determine the quality of the fit. To understand how to use nls I read "Technical note: Curve fitting with the R environment for Statistical Computing". In that document they suggested this to calculate R^2: RSS.p<-sum(residuals(fit)^2) TSS<-sum((y-mean(y))^2) r.squared<-1-(RSS.p/TSS) LIST.rsq<-r.squared (with fit my results of the nls: formula y ~ exp.f(x, a, b) : y : a*exp(-b*x)) While I was reading on the internet to find a possible reason why I get different results using R and excel, I also read lots of different things about the "R^2 problem" in nls. Is the method I'm using now ok, or should someone suggest to use something else? 2. Another question I have is like a lot of people about the singular gradient problem. I didn't know the best way to chose my starting values for my coefficients. when it was too low, I got this singular gradient error. Raising the value helped me to get rid of that error. Changing that value didn't change my coefficients nor R^2. I was wondering if that's ok, just to raise the starting value of one of my coefficients? The only things that change are the Achieved convergence tolerance and number of iterations to convergence. P values, residual standard error and the coefficients have always exactly the same results. What does the achieved convergence tolerance actually mean? What are its implications? (I suppose the time to calculate it changes) (the most useful information about nls and singular gradient error i found is this one (and that's why I started playing with changing the starting values): " if the estimate of the rank that results is less than the number of columns in the gradient (the number of nonlinear parameters), or less than the number of rows (the number of observations), nls stops.") I hope someone can help me with this questions. I would like to know what's happening and not just having to accept the results I get now :). Kind regards, Nerak -- View this message in context: http://r.789695.n4.nabble.com/Non-linear-curve-fitting-nls-starting-point-and-quality-of-fit-tp4632295.html Sent from the R help mailing list archive at Nabble.com.
Ben Bolker
2012-Jun-04  19:31 UTC
[R] Non-linear curve fitting (nls): starting point and quality of fit
Nerak <nerak.t <at> hotmail.com> writes:> > Hi all, > > Like a lot of people I noticed that I get different results when I use nls > in R compared to the exponential fit in excel. A bit annoying because often > the R^2 is higher in excel but when I'm reading the different topics on this > forum I kind of understand that using R is better than excel? > > (I don't really understand how the difference occurs, but I understand that > there is a different way in fitting, in excel a single value can make the > difference, in R it looks at the whole function? I read this: "Fitting a > function is an approximation, trying to find a minimum. Think of frozen > mountain lake surrounded by mountains. Excel's Solver will report the > highest tip of the snowflake on the lake, if it finds it. nls will find out > that the lake is essentially flat compare to the surrounding and tell you > this fact in unkind word." )Snarky, but I like it. Two alternatives to nls are (1) Gabor Grothendieck's nls2 package: nls2 is an R package that adds the "brute-force" algorithm and multiple starting values to the R nls function. nls2 is free software licensed under the GPL and available from CRAN. It provides a function, nls2, which is a superset of the R nls function which it, in turn, calls. Or John Nash's nlmrt package https://r-forge.r-project.org/R/?group_id=395 : nlmrt provides tools for working with nonlinear least squares problems using a calling structure similar to, but much simpler than, that of the nls() function. Moreover, where nls() specifically does NOT deal with small or zero residual problems, nlmrt is quite happy to solve them. It also attempts to be more robust in finding solutions, thereby avoiding singular gradient messages that arise in the Gauss-Newton method within nls(). The Marquardt-Nash approach in nlmrt generally works more reliably to get a solution, though this may be one of a set of possibilities, and may also be statistically unsatisfactory.> I have several questions about nls: > > 1. The nls method doesn't give an R^2. But I want to determine the quality > of the fit. To understand how to use nls I read "Technical note: Curve > fitting with the R environment for Statistical Computing". In that document > they suggested this to calculate R^2: > > RSS.p<-sum(residuals(fit)^2) > TSS<-sum((y-mean(y))^2) > r.squared<-1-(RSS.p/TSS) > LIST.rsq<-r.squared > > (with fit my results of the nls: formula y ~ exp.f(x, a, b) : y : > a*exp(-b*x)) > > While I was reading on the internet to find a possible reason why I get > different results using R and excel, I also read lots of different things > about the "R^2 problem" in nls. > > Is the method I'm using now ok, or should someone suggest to use something > else?You could use the residual sum of squares as the quality of the fit: (i.e. RSS.p above). If you want a _unitless_ metric of the quality of the fit, I'm not sure what you should do.> 2. Another question I have is like a lot of people about the singular > gradient problem. I didn't know the best way to chose my starting values for > my coefficients. when it was too low, I got this singular gradient error. > Raising the value helped me to get rid of that error. Changing that value > didn't change my coefficients nor R^2. I was wondering if that's ok, just to > raise the starting value of one of my coefficients?[snip] If you can find a set of starting coefficients that give you a sensible fit to the data without any convergence warnings, you shouldn't worry that other sets of starting coefficients that *don't* work also exist.
Greg Snow
2012-Jun-05  19:27 UTC
[R] Non-linear curve fitting (nls): starting point and quality of fit
One thing to note is that there are more than one model that can be called exponential. Two of the common ones are: y = exp( a + b*x + error ) y = exp( a + b*x ) + error The common way to fit the first is to take the log of both sides and just fit a linear model with log(y), I expect (but am not sure) that that is what Excel does. It is likely that the reason you get different results is that you are fitting different models with the 2 programs. You first need to decide which is the correct model (and it may be different from the 2 already mentioned), then worry about fitting that model and what goes with that. On Mon, Jun 4, 2012 at 6:19 AM, Nerak <nerak.t at hotmail.com> wrote:> Hi all, > > Like a lot of people I noticed that I get different results when I use nls > in R compared to the exponential fit in excel. A bit annoying because often > the R^2 is higher in excel but when I'm reading the different topics on this > forum I kind of understand that using R is better than excel? > > ?(I don't really understand how the difference occurs, but I understand that > there is a different way in fitting, in excel a single value can make the > difference, in R it looks at the whole function? I read this: "Fitting a > function is an approximation, trying to find a minimum. Think of frozen > mountain lake surrounded by mountains. Excel's Solver will report the > highest tip of the snowflake on the lake, if it finds it. nls will find out > that the lake is essentially flat compare to the surrounding and tell you > this fact in unkind word." ) > > > I have several questions about nls: > > 1. The nls method doesn't give an R^2. But I want to determine the quality > of the fit. To understand how to use nls I read "Technical note: Curve > fitting with the R environment for Statistical Computing". In that document > they suggested this to calculate R^2: > > RSS.p<-sum(residuals(fit)^2) > ?TSS<-sum((y-mean(y))^2) > ?r.squared<-1-(RSS.p/TSS) > ?LIST.rsq<-r.squared > > (with fit my results of the nls: formula y ~ exp.f(x, a, b) : y : > a*exp(-b*x)) > > While I was reading on the internet to find a possible reason why I get > different results using R and excel, I also read lots of different things > about the "R^2 problem" in nls. > > Is the method I'm using now ok, or should someone suggest to use something > else? > > 2. Another question I have is like a lot of people about the singular > gradient problem. I didn't know the best way to chose my starting values for > my coefficients. when it was too low, I got this singular gradient error. > Raising the value helped me to get rid of that error. Changing that value > didn't change my coefficients nor R^2. I was wondering if that's ok, just to > raise the starting value of one of my coefficients? > > The only things that change are the Achieved convergence tolerance and > number of iterations to convergence. P values, residual standard error and > the coefficients have always exactly the same results. What does the > achieved convergence tolerance actually mean? What are its implications? (I > suppose the time to calculate it changes) > > (the most useful information about nls and singular gradient error i found > is this one (and that's why I started playing with changing the starting > values): > " if the estimate of the rank that results is less than the number of > columns in the gradient (the number of nonlinear parameters), or less than > the number of rows (the number of observations), nls stops.") > > > I hope someone can help me with this questions. I would like to know what's > happening and not just having to accept the results I get now :). > > Kind regards, > > Nerak > > > -- > View this message in context: http://r.789695.n4.nabble.com/Non-linear-curve-fitting-nls-starting-point-and-quality-of-fit-tp4632295.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com