Wolfgang Waser
2008-Mar-05 13:53 UTC
[R] nls: different results if applied to normal or linearized data
Dear all, I did a non-linear least square model fit y ~ a * x^b (a) > nls(y ~ a * x^b, start=list(a=1,b=1)) to obtain the coefficients a & b. I did the same with the linearized formula, including a linear model log(y) ~ log(a) + b * log(x) (b) > nls(log10(y) ~ log10(a) + b*log10(x), start=list(a=1,b=1)) (c) > lm(log10(y) ~ log10(x)) I expected coefficient b to be identical for all three cases. Hoever, using my dataset, coefficient b was: (a) 0.912 (b) 0.9794 (c) 0.9794 Coefficient a also varied between option (a) and (b), 107.2 and 94.7, respectively. Is this supposed to happen? Which is the correct coefficient b? Regards, Wolfgang -- Laboratory of Animal Physiology Department of Biology University of Turku FIN-20014 Turku Finland
Gabor Grothendieck
2008-Mar-05 14:06 UTC
[R] nls: different results if applied to normal or linearized data
Write out the objective functions that they are minimizing and it will be clear they are different so you can't expect the same results. On Wed, Mar 5, 2008 at 8:53 AM, Wolfgang Waser <wolfgang.waser at utu.fi> wrote:> Dear all, > > I did a non-linear least square model fit > > y ~ a * x^b > > (a) > nls(y ~ a * x^b, start=list(a=1,b=1)) > > to obtain the coefficients a & b. > > I did the same with the linearized formula, including a linear model > > log(y) ~ log(a) + b * log(x) > > (b) > nls(log10(y) ~ log10(a) + b*log10(x), start=list(a=1,b=1)) > (c) > lm(log10(y) ~ log10(x)) > > I expected coefficient b to be identical for all three cases. Hoever, using my > dataset, coefficient b was: > (a) 0.912 > (b) 0.9794 > (c) 0.9794 > > Coefficient a also varied between option (a) and (b), 107.2 and 94.7, > respectively. > > Is this supposed to happen? Which is the correct coefficient b? > > > Regards, > > Wolfgang > > -- > Laboratory of Animal Physiology > Department of Biology > University of Turku > FIN-20014 Turku > Finland > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Martin Elff
2008-Mar-05 14:16 UTC
[R] nls: different results if applied to normal or linearized data
On Wednesday 05 March 2008 (14:53:27), Wolfgang Waser wrote:> Dear all, > > I did a non-linear least square model fit > > y ~ a * x^b > > (a) > nls(y ~ a * x^b, start=list(a=1,b=1)) > > to obtain the coefficients a & b. > > I did the same with the linearized formula, including a linear model > > log(y) ~ log(a) + b * log(x) > > (b) > nls(log10(y) ~ log10(a) + b*log10(x), start=list(a=1,b=1)) > (c) > lm(log10(y) ~ log10(x)) > > I expected coefficient b to be identical for all three cases. Hoever, using > my dataset, coefficient b was: > (a) 0.912 > (b) 0.9794 > (c) 0.9794 > > Coefficient a also varied between option (a) and (b), 107.2 and 94.7, > respectively.Models (a) and (b) entail different distributions of the dependent variable y and different ranges of values that y may take. (a) implies that y has, conditionally on x, a normal distribution and has a range of feasible values from -Inf to +Inf. (b) and (c) imply that log(y) has a normal distribution, that is, y has a log-normal distribution and can take values from zero to +Inf.> Is this supposed to happen?Given the above considerations, different results with respect to the intercept are definitely to be expected.> Which is the correct coefficient b?That depends - is y strictly non-negative or not ... Just my 20 cents...
Prof Brian Ripley
2008-Mar-05 15:47 UTC
[R] nls: different results if applied to normal or linearized data
On Wed, 5 Mar 2008, Wolfgang Waser wrote:> Dear all, > > I did a non-linear least square model fit > > y ~ a * x^b > > (a) > nls(y ~ a * x^b, start=list(a=1,b=1)) > > to obtain the coefficients a & b. > > I did the same with the linearized formula, including a linear model > > log(y) ~ log(a) + b * log(x) > > (b) > nls(log10(y) ~ log10(a) + b*log10(x), start=list(a=1,b=1)) > (c) > lm(log10(y) ~ log10(x)) > > I expected coefficient b to be identical for all three cases. Hoever, using my > dataset, coefficient b was: > (a) 0.912 > (b) 0.9794 > (c) 0.9794 > > Coefficient a also varied between option (a) and (b), 107.2 and 94.7, > respectively. > > Is this supposed to happen? Which is the correct coefficient b?Yes. You are fitting by least-squares on two different scales: differences in y and differences in log(y) are not comparable. Both are correct solutions to different problems. Since we have no idea what 'x' and 'y' are, we cannot even guess which is more appropriate in your context.> > > Regards, > > Wolfgang > > -- > Laboratory of Animal Physiology > Department of Biology > University of Turku > FIN-20014 Turku > Finland > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Rolf Turner
2008-Mar-05 20:04 UTC
[R] nls: different results if applied to normal or linearized data
On 6/03/2008, at 2:53 AM, Wolfgang Waser wrote:> Dear all, > > I did a non-linear least square model fit > > y ~ a * x^b > > (a) > nls(y ~ a * x^b, start=list(a=1,b=1)) > > to obtain the coefficients a & b. > > I did the same with the linearized formula, including a linear model > > log(y) ~ log(a) + b * log(x) > > (b) > nls(log10(y) ~ log10(a) + b*log10(x), start=list(a=1,b=1)) > (c) > lm(log10(y) ~ log10(x)) > > I expected coefficient b to be identical for all three cases. > Hoever, using my > dataset, coefficient b was: > (a) 0.912 > (b) 0.9794 > (c) 0.9794 > > Coefficient a also varied between option (a) and (b), 107.2 and 94.7, > respectively. > > Is this supposed to happen? Which is the correct coefficient b?The two approaches assume two different models. Model (1) is y = a*x^b + E (where different errors are independent and identically --- usually normally --- distributed). Model (2) is y = a*(x^b)*E (and you are usually tacitly assuming that ln E is normally distributed). The point estimates of a and b will consequently be different --- although usually not hugely different. Their distributional properties will be substantially different. cheers, Rolf Turner ###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
Prof Brian Ripley
2008-Mar-06 06:03 UTC
[R] nls: different results if applied to normal or linearized data
The only thing you are adding to earlier replies is incorrect: fitting by least squares does not imply a normal distribution. For a regression model, least-squares is in various senses optimal when the errors are i.i.d. and normal, but it is a reasonable procedure for many other situations (but not for modestly long-tailed distributions, the point of robust statistics). Although values from -Inf to +Inf are theoretically possible for a normal, it has very little mass in the tails and is often used as a model for non-negative quantities (and e.g. the justification of Box-Cox estimation relies on this). On Wed, 5 Mar 2008, Martin Elff wrote:> On Wednesday 05 March 2008 (14:53:27), Wolfgang Waser wrote: >> Dear all, >> >> I did a non-linear least square model fit >> >> y ~ a * x^b >> >> (a) > nls(y ~ a * x^b, start=list(a=1,b=1)) >> >> to obtain the coefficients a & b. >> >> I did the same with the linearized formula, including a linear model >> >> log(y) ~ log(a) + b * log(x) >> >> (b) > nls(log10(y) ~ log10(a) + b*log10(x), start=list(a=1,b=1)) >> (c) > lm(log10(y) ~ log10(x)) >> >> I expected coefficient b to be identical for all three cases. Hoever, using >> my dataset, coefficient b was: >> (a) 0.912 >> (b) 0.9794 >> (c) 0.9794 >> >> Coefficient a also varied between option (a) and (b), 107.2 and 94.7, >> respectively. > > Models (a) and (b) entail different distributions of the dependent variable y > and different ranges of values that y may take. > (a) implies that y has, conditionally on x, a normal distribution and > has a range of feasible values from -Inf to +Inf. > (b) and (c) imply that log(y) has a normal distribution, that is, > y has a log-normal distribution and can take values from zero to +Inf. > >> Is this supposed to happen? > Given the above considerations, different results with respect to the > intercept are definitely to be expected. > >> Which is the correct coefficient b? > That depends - is y strictly non-negative or not ... > > Just my 20 cents... > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595