thr3ads.net - R help - [R] Finding starting values for the parameters using nls() or nls2() [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Bert Gunter

2016-Oct-09 22:40 UTC

[R] Finding starting values for the parameters using nls() or nls2()

Well... (inline -- and I hope this isn't homework!)




On Sun, Oct 9, 2016 at 3:05 PM, Andrew Robinson
<A.Robinson at ms.unimelb.edu.au> wrote:> Here are some things to try.  Maybe divide Area by 1000 and retention
> by 100.  Try plotting the data and superimposing the line that
> corresponds to the 'fit' from nls2.  See if you can correct it with
> some careful guesses.
>
> Getting suitable starting parameters for non-linear modeling is one of
> the black arts of statistical fitting. ...
>
> Andrew
True. But it's usually worthwhile thinking about the math a bit before
guessing.

Note that the model can be linearized to:

log(log(Retention)) = b0 + b1*Area^th

So a plot of log(log(Retention)) vs Area may be informative and useful
for finding starting values. e.g., for a grid of th's, do linear
regression fits .

However, when I look at that plot, it seems pretty linear with a
negative slope. This suggests that you may have an overparametrization
problem . i.e. fix th =1 and use the b0 and b1 from the above
regression for starting values.

Do note that this strategy isn't foolproof, as it ignores that the
error term is additive in the above transformed metric, rather than
the original. This can sometimes mislead. But this is just a
heuristic.

Cheers,
Bert






>
> On 9 October 2016 at 22:21, Pinglei Gao <gaopinglei at 163.com>
wrote:
>> Hi,
>>
>> I have some data that i'm trying to fit a double exponential model:
data.
>> Frame (Area=c (521.5, 689.78, 1284.71, 2018.8, 2560.46, 524.91, 989.05,
>> 1646.32, 2239.65, 2972.96, 478.54, 875.52, 1432.5, 2144.74, 2629.2),
>>
>> Retention=c (95.3, 87.18, 44.94, 26.36, 18.12, 84.68, 37.24, 33.04,
23.46,
>> 9.72, 97.92, 71.44, 44.52, 24.44, 15.26) ) and the formula of the
double
>> exponential is: exp (b0*exp (b1*x^th)).
>>
>>
>>
>> I failed to guess the initial parameter values and then I learned a
measure
>> to find starting values from Nonlinear Regression with R (pp. 25-27):
>>
>>
>>
>>> cl<-data.frame(Area =c(521.5, 689.78, 1284.71, 2018.8, 2560.46,
524.91,
>> 989.05, 1646.32, 2239.65, 2972.96, 478.54, 875.52, 1432.5, 2144.74,
2629.2),
>>
>> + Retention =c(95.3, 87.18, 44.94, 26.36, 18.12, 84.68, 37.24, 33.04,
23.46,
>> 9.72, 97.92, 71.44, 44.52, 24.44, 15.26) )
>>
>>> expFct <- function(Area, b0, b1,th) {exp(b0*exp(b1*Area^th))}
>>
>>> grid.Disperse <- expand.grid(list(b0 = seq(0.01,4, by = 0.01),
th >> c(0.02),b1 = seq(0.01, 4, by = 0.01)))
>>
>>> Disperse.m2a <- nls2(Retention ~expFct(Area, b0, b1,th), data =
cl, start
>> = grid.Disperse, algorithm = "brute-force")
>>
>>> Disperse.m2a
>>
>> Nonlinear regression model
>>
>>   model: Retention ~ expFct(Area, b0, th, b1)
>>
>>    data: cl
>>
>> b0   th   b1
>>
>> 3.82 0.02 0.01
>>
>> residual sum-of-squares: 13596
>>
>> Number of iterations to convergence: 160000
>>
>> Achieved convergence tolerance: NA
>>
>>
>>
>> I got no error then I use the output as starting values to nls2 ():
>>
>>> nls.m2<- nls2(Retention ~ expFct(Area, b0, b1, th), data = cl,
start >> list(b0 = 3.82, b1 = 0.02, th = 0.01))
>>
>> Error in (function (formula, data = parent.frame(), start, control
>> nls.control(),  :
>>
>> Singular gradient
>>
>>
>>
>> Why? Did I do something wrong or misunderstand something?
>>
>>
>>
>> Later, I found another measure from Modern Applied Statistics with S
(pp.
>> 216-217):
>>
>>
>>
>>> negexp <- selfStart(model = ~ exp(b0*exp(b1*x^th)),initial
>> negexp.SSival, parameters = c("b0", "b1",
"th"),
>>
>> + template = function(x, b0, b1, th) {})
>>
>>> Disperse.ss <- nls(Retention ~ negexp(Area, b0, b1, th),data =
cl, trace >> T)
>>
>>          b0          b1          th
>>
>>    4.208763  144.205455 1035.324595
>>
>> Error in qr.default(.swts * attr(rhs, "gradient")) :
>>
>>  NA/NaN/Inf (arg1) can not be called when the external function is
called.
>>
>>
>>
>> Error happened again. How can I fix it? I am desperate.
>>
>>
>>
>> Best regards,
>>
>>
>>
>> Pinglei Gao
>>
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Andrew Robinson
> Deputy Director, CEBRA, School of Biosciences
> Reader & Associate Professor in Applied Statistics  Tel: (+61) 0403 138
955
> School of Mathematics and Statistics                        Fax: +61-3-8344
4599
> University of Melbourne, VIC 3010 Australia
> Email: a.robinson at ms.unimelb.edu.au
> Website: http://www.ms.unimelb.edu.au/~andrewpr
>
> MSME: http://www.crcpress.com/product/isbn/9781439858028
> FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/
> SPuR: http://www.ms.unimelb.edu.au/spuRs/
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

peter dalgaard

2016-Oct-09 23:40 UTC

head link

[R] Finding starting values for the parameters using nls() or nls2()

> On 10 Oct 2016, at 00:40 , Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> 
> Well... (inline -- and I hope this isn't homework!)
> 
Pretty much same as I thought. 

Fixing th=0.02 in the grid search looks wrong. Bert's plot is pretty linear,
so th=1 is a good guesstimate. There's a slight curvature but to reduce it,
you would increase th, not decrease it. Running the regression, as Bert
suggests, indicates that b0=5.16 and b1= -0.00024 could work as reasonable
starting values. Notice that the grid search had "b1 = seq(0.01, 4, by =
0.01)" which is wrong in both sign and scale.

Andrew's suggestion of dividing Retention by 100 is tempting, since it looks
like a percentage, but that would make all Y values less than 1 and the double
exponential function as written has values that are always bigger than 1. (It is
conceivable that the model itself is wrong, though. E.g. it could be that
Retention on a scale from 0 to 1 could be modeled as exp(-something), but we
really have no idea of the context here.)

(If this was in fact homework, you should now go and write a proper SelfStart
initializer routine for this model. Even if it isn't homework, you do need
to study the text again, because you have clearly not understood how
self-starting models work.)

-pd
> 
> 
> 
> On Sun, Oct 9, 2016 at 3:05 PM, Andrew Robinson
> <A.Robinson at ms.unimelb.edu.au> wrote:
>> Here are some things to try.  Maybe divide Area by 1000 and retention
>> by 100.  Try plotting the data and superimposing the line that
>> corresponds to the 'fit' from nls2.  See if you can correct it
with
>> some careful guesses.
>> 
>> Getting suitable starting parameters for non-linear modeling is one of
>> the black arts of statistical fitting. ...
>> 
>> Andrew
> 
> True. But it's usually worthwhile thinking about the math a bit before
guessing.
> 
> Note that the model can be linearized to:
> 
> log(log(Retention)) = b0 + b1*Area^th
> 
> So a plot of log(log(Retention)) vs Area may be informative and useful
> for finding starting values. e.g., for a grid of th's, do linear
> regression fits .
> 
> However, when I look at that plot, it seems pretty linear with a
> negative slope. This suggests that you may have an overparametrization
> problem . i.e. fix th =1 and use the b0 and b1 from the above
> regression for starting values.
> 
> Do note that this strategy isn't foolproof, as it ignores that the
> error term is additive in the above transformed metric, rather than
> the original. This can sometimes mislead. But this is just a
> heuristic.
> 
> Cheers,
> Bert
> 
> 
> 
> 
> 
> 
> 
>> 
>> On 9 October 2016 at 22:21, Pinglei Gao <gaopinglei at 163.com>
wrote:
>>> Hi,
>>> 
>>> I have some data that i'm trying to fit a double exponential
model: data.
>>> Frame (Area=c (521.5, 689.78, 1284.71, 2018.8, 2560.46, 524.91,
989.05,
>>> 1646.32, 2239.65, 2972.96, 478.54, 875.52, 1432.5, 2144.74,
2629.2),
>>> 
>>> Retention=c (95.3, 87.18, 44.94, 26.36, 18.12, 84.68, 37.24, 33.04,
23.46,
>>> 9.72, 97.92, 71.44, 44.52, 24.44, 15.26) ) and the formula of the
double
>>> exponential is: exp (b0*exp (b1*x^th)).
>>> 
>>> 
>>> 
>>> I failed to guess the initial parameter values and then I learned a
measure
>>> to find starting values from Nonlinear Regression with R (pp.
25-27):
>>> 
>>> 
>>> 
>>>> cl<-data.frame(Area =c(521.5, 689.78, 1284.71, 2018.8,
2560.46, 524.91,
>>> 989.05, 1646.32, 2239.65, 2972.96, 478.54, 875.52, 1432.5, 2144.74,
2629.2),
>>> 
>>> + Retention =c(95.3, 87.18, 44.94, 26.36, 18.12, 84.68, 37.24,
33.04, 23.46,
>>> 9.72, 97.92, 71.44, 44.52, 24.44, 15.26) )
>>> 
>>>> expFct <- function(Area, b0, b1,th)
{exp(b0*exp(b1*Area^th))}
>>> 
>>>> grid.Disperse <- expand.grid(list(b0 = seq(0.01,4, by =
0.01), th >>> c(0.02),b1 = seq(0.01, 4, by = 0.01)))
>>> 
>>>> Disperse.m2a <- nls2(Retention ~expFct(Area, b0, b1,th),
data = cl, start
>>> = grid.Disperse, algorithm = "brute-force")
>>> 
>>>> Disperse.m2a
>>> 
>>> Nonlinear regression model
>>> 
>>>  model: Retention ~ expFct(Area, b0, th, b1)
>>> 
>>>   data: cl
>>> 
>>> b0   th   b1
>>> 
>>> 3.82 0.02 0.01
>>> 
>>> residual sum-of-squares: 13596
>>> 
>>> Number of iterations to convergence: 160000
>>> 
>>> Achieved convergence tolerance: NA
>>> 
>>> 
>>> 
>>> I got no error then I use the output as starting values to nls2 ():
>>> 
>>>> nls.m2<- nls2(Retention ~ expFct(Area, b0, b1, th), data =
cl, start >>> list(b0 = 3.82, b1 = 0.02, th = 0.01))
>>> 
>>> Error in (function (formula, data = parent.frame(), start, control
>>> nls.control(),  :
>>> 
>>> Singular gradient
>>> 
>>> 
>>> 
>>> Why? Did I do something wrong or misunderstand something?
>>> 
>>> 
>>> 
>>> Later, I found another measure from Modern Applied Statistics with
S (pp.
>>> 216-217):
>>> 
>>> 
>>> 
>>>> negexp <- selfStart(model = ~ exp(b0*exp(b1*x^th)),initial
>>> negexp.SSival, parameters = c("b0", "b1",
"th"),
>>> 
>>> + template = function(x, b0, b1, th) {})
>>> 
>>>> Disperse.ss <- nls(Retention ~ negexp(Area, b0, b1, th),data
= cl, trace >>> T)
>>> 
>>>         b0          b1          th
>>> 
>>>   4.208763  144.205455 1035.324595
>>> 
>>> Error in qr.default(.swts * attr(rhs, "gradient")) :
>>> 
>>> NA/NaN/Inf (arg1) can not be called when the external function is
called.
>>> 
>>> 
>>> 
>>> Error happened again. How can I fix it? I am desperate.
>>> 
>>> 
>>> 
>>> Best regards,
>>> 
>>> 
>>> 
>>> Pinglei Gao
>>> 
>>> 
>>> 
>>> 
>>>        [[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
>> 
>> --
>> Andrew Robinson
>> Deputy Director, CEBRA, School of Biosciences
>> Reader & Associate Professor in Applied Statistics  Tel: (+61) 0403
138 955
>> School of Mathematics and Statistics                        Fax:
+61-3-8344 4599
>> University of Melbourne, VIC 3010 Australia
>> Email: a.robinson at ms.unimelb.edu.au
>> Website: http://www.ms.unimelb.edu.au/~andrewpr
>> 
>> MSME: http://www.crcpress.com/product/isbn/9781439858028
>> FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/
>> SPuR: http://www.ms.unimelb.edu.au/spuRs/
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

ProfJCNash

2016-Oct-10 00:14 UTC

head link

[R] Finding starting values for the parameters using nls() or nls2()

I didn't try very hard, but got a solution from .1, 1, .1 with nlxb() from
nlmrt. It took a lot
of iterations and looks to be pretty ill-conditioned. Note nlmrt uses analytic
derivatives if it
can, and a Marquardt method. It is designed to be a pit bull -- tenacious, not
fast.

I'm working on a replacement for this and nls(), but it will be a while.
However, I welcome
short scripts like this as tests. My script below

Best, JN

cl<-data.frame(Area =c(521.5, 689.78, 1284.71, 2018.8, 2560.46, 524.91,
989.05, 1646.32, 2239.65, 2972.96, 478.54, 875.52, 1432.5, 2144.74, 2629.2),

Retention =c(95.3, 87.18, 44.94, 26.36, 18.12, 84.68, 37.24, 33.04, 23.46,
9.72, 97.92, 71.44, 44.52, 24.44, 15.26) )

expFct <- function(Area, b0, b1,th) {exp(b0*exp(b1*Area^th))}
expf2 <- "Retention ~ exp(b0*exp(b1*Area^th))"

# grid.Disperse <- expand.grid(list(b0 = seq(0.01,4, by = 0.01), th
#c(0.02),b1 = seq(0.01, 4, by = 0.01)))

#> Disperse.m2a <- nls2(Retention ~expFct(Area, b0, b1,th), data = cl,
start
#= grid.Disperse, algorithm = "brute-force")

# Disperse.m2a

library(nlmrt)
test <- nlxb(expf2, start= c(b0=.1, b1=1, th=.1), trace=TRUE, data=cl)



On 16-10-09 07:40 PM, peter dalgaard wrote:> 
>> On 10 Oct 2016, at 00:40 , Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>>
>> Well... (inline -- and I hope this isn't homework!)
>>
> 
> Pretty much same as I thought. 
> 
> Fixing th=0.02 in the grid search looks wrong. Bert's plot is pretty
linear, so th=1 is a good guesstimate. There's a slight curvature but to
reduce it, you would increase th, not decrease it. Running the regression, as
Bert suggests, indicates that b0=5.16 and b1= -0.00024 could work as reasonable
starting values. Notice that the grid search had "b1 = seq(0.01, 4, by =
0.01)" which is wrong in both sign and scale.
> 
> Andrew's suggestion of dividing Retention by 100 is tempting, since it
looks like a percentage, but that would make all Y values less than 1 and the
double exponential function as written has values that are always bigger than 1.
(It is conceivable that the model itself is wrong, though. E.g. it could be that
Retention on a scale from 0 to 1 could be modeled as exp(-something), but we
really have no idea of the context here.)
> 
> (If this was in fact homework, you should now go and write a proper
SelfStart initializer routine for this model. Even if it isn't homework, you
do need to study the text again, because you have clearly not understood how
self-starting models work.)
> 
> -pd
> 
>>
>>
>>
>> On Sun, Oct 9, 2016 at 3:05 PM, Andrew Robinson
>> <A.Robinson at ms.unimelb.edu.au> wrote:
>>> Here are some things to try.  Maybe divide Area by 1000 and
retention
>>> by 100.  Try plotting the data and superimposing the line that
>>> corresponds to the 'fit' from nls2.  See if you can correct
it with
>>> some careful guesses.
>>>
>>> Getting suitable starting parameters for non-linear modeling is one
of
>>> the black arts of statistical fitting. ...
>>>
>>> Andrew
>>
>> True. But it's usually worthwhile thinking about the math a bit
before guessing.
>>
>> Note that the model can be linearized to:
>>
>> log(log(Retention)) = b0 + b1*Area^th
>>
>> So a plot of log(log(Retention)) vs Area may be informative and useful
>> for finding starting values. e.g., for a grid of th's, do linear
>> regression fits .
>>
>> However, when I look at that plot, it seems pretty linear with a
>> negative slope. This suggests that you may have an overparametrization
>> problem . i.e. fix th =1 and use the b0 and b1 from the above
>> regression for starting values.
>>
>> Do note that this strategy isn't foolproof, as it ignores that the
>> error term is additive in the above transformed metric, rather than
>> the original. This can sometimes mislead. But this is just a
>> heuristic.
>>
>> Cheers,
>> Bert
>>
>>
>>
>>
>>
>>
>>
>>>
>>> On 9 October 2016 at 22:21, Pinglei Gao <gaopinglei at
163.com> wrote:
>>>> Hi,
>>>>
>>>> I have some data that i'm trying to fit a double
exponential model: data.
>>>> Frame (Area=c (521.5, 689.78, 1284.71, 2018.8, 2560.46, 524.91,
989.05,
>>>> 1646.32, 2239.65, 2972.96, 478.54, 875.52, 1432.5, 2144.74,
2629.2),
>>>>
>>>> Retention=c (95.3, 87.18, 44.94, 26.36, 18.12, 84.68, 37.24,
33.04, 23.46,
>>>> 9.72, 97.92, 71.44, 44.52, 24.44, 15.26) ) and the formula of
the double
>>>> exponential is: exp (b0*exp (b1*x^th)).
>>>>
>>>>
>>>>
>>>> I failed to guess the initial parameter values and then I
learned a measure
>>>> to find starting values from Nonlinear Regression with R (pp.
25-27):
>>>>
>>>>
>>>>
>>>>> cl<-data.frame(Area =c(521.5, 689.78, 1284.71, 2018.8,
2560.46, 524.91,
>>>> 989.05, 1646.32, 2239.65, 2972.96, 478.54, 875.52, 1432.5,
2144.74, 2629.2),
>>>>
>>>> + Retention =c(95.3, 87.18, 44.94, 26.36, 18.12, 84.68, 37.24,
33.04, 23.46,
>>>> 9.72, 97.92, 71.44, 44.52, 24.44, 15.26) )
>>>>
>>>>> expFct <- function(Area, b0, b1,th)
{exp(b0*exp(b1*Area^th))}
>>>>
>>>>> grid.Disperse <- expand.grid(list(b0 = seq(0.01,4, by =
0.01), th >>>> c(0.02),b1 = seq(0.01, 4, by = 0.01)))
>>>>
>>>>> Disperse.m2a <- nls2(Retention ~expFct(Area, b0, b1,th),
data = cl, start
>>>> = grid.Disperse, algorithm = "brute-force")
>>>>
>>>>> Disperse.m2a
>>>>
>>>> Nonlinear regression model
>>>>
>>>>  model: Retention ~ expFct(Area, b0, th, b1)
>>>>
>>>>   data: cl
>>>>
>>>> b0   th   b1
>>>>
>>>> 3.82 0.02 0.01
>>>>
>>>> residual sum-of-squares: 13596
>>>>
>>>> Number of iterations to convergence: 160000
>>>>
>>>> Achieved convergence tolerance: NA
>>>>
>>>>
>>>>
>>>> I got no error then I use the output as starting values to nls2
():
>>>>
>>>>> nls.m2<- nls2(Retention ~ expFct(Area, b0, b1, th), data
= cl, start >>>> list(b0 = 3.82, b1 = 0.02, th = 0.01))
>>>>
>>>> Error in (function (formula, data = parent.frame(), start,
control >>>> nls.control(),  :
>>>>
>>>> Singular gradient
>>>>
>>>>
>>>>
>>>> Why? Did I do something wrong or misunderstand something?
>>>>
>>>>
>>>>
>>>> Later, I found another measure from Modern Applied Statistics
with S (pp.
>>>> 216-217):
>>>>
>>>>
>>>>
>>>>> negexp <- selfStart(model = ~
exp(b0*exp(b1*x^th)),initial >>>> negexp.SSival, parameters =
c("b0", "b1", "th"),
>>>>
>>>> + template = function(x, b0, b1, th) {})
>>>>
>>>>> Disperse.ss <- nls(Retention ~ negexp(Area, b0, b1,
th),data = cl, trace >>>> T)
>>>>
>>>>         b0          b1          th
>>>>
>>>>   4.208763  144.205455 1035.324595
>>>>
>>>> Error in qr.default(.swts * attr(rhs, "gradient")) :
>>>>
>>>> NA/NaN/Inf (arg1) can not be called when the external function
is called.
>>>>
>>>>
>>>>
>>>> Error happened again. How can I fix it? I am desperate.
>>>>
>>>>
>>>>
>>>> Best regards,
>>>>
>>>>
>>>>
>>>> Pinglei Gao
>>>>
>>>>
>>>>
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>
>>>
>>>
>>> --
>>> Andrew Robinson
>>> Deputy Director, CEBRA, School of Biosciences
>>> Reader & Associate Professor in Applied Statistics  Tel: (+61)
0403 138 955
>>> School of Mathematics and Statistics                        Fax:
+61-3-8344 4599
>>> University of Melbourne, VIC 3010 Australia
>>> Email: a.robinson at ms.unimelb.edu.au
>>> Website: http://www.ms.unimelb.edu.au/~andrewpr
>>>
>>> MSME: http://www.crcpress.com/product/isbn/9781439858028
>>> FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/
>>> SPuR: http://www.ms.unimelb.edu.au/spuRs/
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

Pinglei Gao

2016-Oct-10 14:27 UTC

head link

[R] 答复: Finding starting values for the parameters using nls() or nls2()

Thanks very much for taking time on this. Your assistances are very much
appreciated. But, I am afraid that I still have a question to bother you. 

I am working on a paper about weed seeds dispersal with harvest machine. I
found three general models for seed dispersal and retention after a review
of relevant literature. All models were optimized using nonlinear least
squares via the nls function in the statistical package R. The model that
best described the data will be determined by comparing Akaike Information
Criterion (AIC) values and the model with the lowest AIC score will be
selected. 

The first general model incorporated simple exponential and power
exponential functions, its starting value was easily to be found. But, I am
stuck with model 2 (which was mentioned previously) and model 3 with the
form:  Retention = (b0*Area^th+1)^b1. The model 3 is totally different to
others. I tried the measures that you were mentioned. But I still can?t
find suitable starting values because of my limited knowledge. I hope you
can do me the favor again. I can send the draft to you when I finished the
paper, if it is necessary. Maybe you can give me some constructive
suggestion about statistic and model construction and I can name you as a
coauthor for your contributions.

Best,

Pinglei Gao

-----????-----
???: peter dalgaard [mailto:pdalgd at gmail.com] 
????: 2016?10?10? 7:41
???: Pinglei Gao
??: Andrew Robinson; R help (r-help at r-project.org); Bert Gunter
??: Re: [R] Finding starting values for the parameters using nls() or
nls2()

> On 10 Oct 2016, at 00:40 , Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> 
> Well... (inline -- and I hope this isn't homework!)
> 
Pretty much same as I thought. 

Fixing th=0.02 in the grid search looks wrong. Bert's plot is pretty linear,
so th=1 is a good guesstimate. There's a slight curvature but to reduce it,
you would increase th, not decrease it. Running the regression, as Bert
suggests, indicates that b0=5.16 and b1= -0.00024 could work as reasonable
starting values. Notice that the grid search had "b1 = seq(0.01, 4, by
0.01)" which is wrong in both sign and scale.

Andrew's suggestion of dividing Retention by 100 is tempting, since it looks
like a percentage, but that would make all Y values less than 1 and the
double exponential function as written has values that are always bigger
than 1. (It is conceivable that the model itself is wrong, though. E.g. it
could be that Retention on a scale from 0 to 1 could be modeled as
exp(-something), but we really have no idea of the context here.)

(If this was in fact homework, you should now go and write a proper
SelfStart initializer routine for this model. Even if it isn't homework, you
do need to study the text again, because you have clearly not understood how
self-starting models work.)

-pd
> 
> 
> 
> On Sun, Oct 9, 2016 at 3:05 PM, Andrew Robinson 
> <A.Robinson at ms.unimelb.edu.au> wrote:
>> Here are some things to try.  Maybe divide Area by 1000 and retention 
>> by 100.  Try plotting the data and superimposing the line that 
>> corresponds to the 'fit' from nls2.  See if you can correct it
with
>> some careful guesses.
>> 
>> Getting suitable starting parameters for non-linear modeling is one 
>> of the black arts of statistical fitting. ...
>> 
>> Andrew
> 
> True. But it's usually worthwhile thinking about the math a bit before
guessing.> 
> Note that the model can be linearized to:
> 
> log(log(Retention)) = b0 + b1*Area^th
> 
> So a plot of log(log(Retention)) vs Area may be informative and useful 
> for finding starting values. e.g., for a grid of th's, do linear 
> regression fits .
> 
> However, when I look at that plot, it seems pretty linear with a 
> negative slope. This suggests that you may have an overparametrization 
> problem . i.e. fix th =1 and use the b0 and b1 from the above 
> regression for starting values.
> 
> Do note that this strategy isn't foolproof, as it ignores that the 
> error term is additive in the above transformed metric, rather than 
> the original. This can sometimes mislead. But this is just a 
> heuristic.
> 
> Cheers,
> Bert
> 
> 
> 
> 
> 
> 
> 
>> 
>> On 9 October 2016 at 22:21, Pinglei Gao <gaopinglei at 163.com>
wrote:
>>> Hi,
>>> 
>>> I have some data that i'm trying to fit a double exponential
model:
data.>>> Frame (Area=c (521.5, 689.78, 1284.71, 2018.8, 2560.46, 524.91, 
>>> 989.05, 1646.32, 2239.65, 2972.96, 478.54, 875.52, 1432.5, 2144.74,
>>> 2629.2),
>>> 
>>> Retention=c (95.3, 87.18, 44.94, 26.36, 18.12, 84.68, 37.24, 33.04,
>>> 23.46, 9.72, 97.92, 71.44, 44.52, 24.44, 15.26) ) and the formula
of
>>> the double exponential is: exp (b0*exp (b1*x^th)).
>>> 
>>> 
>>> 
>>> I failed to guess the initial parameter values and then I learned a
>>> measure to find starting values from Nonlinear Regression with R
(pp.
25-27):>>> 
>>> 
>>> 
>>>> cl<-data.frame(Area =c(521.5, 689.78, 1284.71, 2018.8,
2560.46,
>>>> 524.91,
>>> 989.05, 1646.32, 2239.65, 2972.96, 478.54, 875.52, 1432.5, 2144.74,
>>> 2629.2),
>>> 
>>> + Retention =c(95.3, 87.18, 44.94, 26.36, 18.12, 84.68, 37.24, 
>>> + 33.04, 23.46,
>>> 9.72, 97.92, 71.44, 44.52, 24.44, 15.26) )
>>> 
>>>> expFct <- function(Area, b0, b1,th)
{exp(b0*exp(b1*Area^th))}
>>> 
>>>> grid.Disperse <- expand.grid(list(b0 = seq(0.01,4, by =
0.01), th >>> c(0.02),b1 = seq(0.01, 4, by = 0.01)))
>>> 
>>>> Disperse.m2a <- nls2(Retention ~expFct(Area, b0, b1,th),
data = cl,
>>>> start
>>> = grid.Disperse, algorithm = "brute-force")
>>> 
>>>> Disperse.m2a
>>> 
>>> Nonlinear regression model
>>> 
>>>  model: Retention ~ expFct(Area, b0, th, b1)
>>> 
>>>   data: cl
>>> 
>>> b0   th   b1
>>> 
>>> 3.82 0.02 0.01
>>> 
>>> residual sum-of-squares: 13596
>>> 
>>> Number of iterations to convergence: 160000
>>> 
>>> Achieved convergence tolerance: NA
>>> 
>>> 
>>> 
>>> I got no error then I use the output as starting values to nls2 ():
>>> 
>>>> nls.m2<- nls2(Retention ~ expFct(Area, b0, b1, th), data =
cl,
>>>> start >>> list(b0 = 3.82, b1 = 0.02, th = 0.01))
>>> 
>>> Error in (function (formula, data = parent.frame(), start, control
=
>>> nls.control(),  :
>>> 
>>> Singular gradient
>>> 
>>> 
>>> 
>>> Why? Did I do something wrong or misunderstand something?
>>> 
>>> 
>>> 
>>> Later, I found another measure from Modern Applied Statistics with
S
(pp.>>> 216-217):
>>> 
>>> 
>>> 
>>>> negexp <- selfStart(model = ~ exp(b0*exp(b1*x^th)),initial
>>> negexp.SSival, parameters = c("b0", "b1",
"th"),
>>> 
>>> + template = function(x, b0, b1, th) {})
>>> 
>>>> Disperse.ss <- nls(Retention ~ negexp(Area, b0, b1, th),data
= cl,
>>>> trace >>> T)
>>> 
>>>         b0          b1          th
>>> 
>>>   4.208763  144.205455 1035.324595
>>> 
>>> Error in qr.default(.swts * attr(rhs, "gradient")) :
>>> 
>>> NA/NaN/Inf (arg1) can not be called when the external function is
called.>>> 
>>> 
>>> 
>>> Error happened again. How can I fix it? I am desperate.
>>> 
>>> 
>>> 
>>> Best regards,
>>> 
>>> 
>>> 
>>> Pinglei Gao
>>> 
>>> 
>>> 
>>> 
>>>        [[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
>> 
>> --
>> Andrew Robinson
>> Deputy Director, CEBRA, School of Biosciences Reader & Associate 
>> Professor in Applied Statistics  Tel: (+61) 0403 138 955
>> School of Mathematics and Statistics                        Fax:
+61-3-8344 4599>> University of Melbourne, VIC 3010 Australia
>> Email: a.robinson at ms.unimelb.edu.au
>> Website: http://www.ms.unimelb.edu.au/~andrewpr
>> 
>> MSME: http://www.crcpress.com/product/isbn/9781439858028
>> FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/
>> SPuR: http://www.ms.unimelb.edu.au/spuRs/
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000
Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

R help - Oct 2016 - Finding starting values for the parameters using nls() or nls2()

[R] Finding starting values for the parameters using nls() or nls2()

[R] Finding starting values for the parameters using nls() or nls2()

[R] Finding starting values for the parameters using nls() or nls2()

[R] 答复: Finding starting values for the parameters using nls() or nls2()