dear Ivan,
THanks for the reply......
But doesn't removing some of the parameters reduce the precision of the
relationship between the response variable and the predictors(inefficient
estimates of the coefficients)?
very many thanks for your time and effort....
yours sincerely,
AKSHAY M KULKARNI
________________________________
From: Ivan Krylov <krylov.r00t at gmail.com>
Sent: Wednesday, March 20, 2019 3:06 PM
To: akshay kulkarni
Cc: R help Mailing list
Subject: Re: [R] problem with nlsLM.....
On Wed, 20 Mar 2019 08:02:45 +0000
akshay kulkarni <akshay_e4 at hotmail.com> wrote:
> formulaDH5 <- as.formula(HM1 ~ (a + (b * ((HM2 + 0.3)^(1/2)))) +
> (A*sin(w*HM3 + c) + C))
The problem with this formula is simple: the partial derivative with
respect to `a` is the same as the partial derivative with respect to
`C`. This makes the regression problem have an infinite number of
solutions, all of them satisfying equation \lambda_1 * a + \lambda_2 *
C + \lambda_3 = 0 for some values of \lambda_i. Gradient-based
optimizers (which both nls and nlsLM are) don't like problems with
non-unique solutions, especially when the model function has same
partial derivative with respect to different variables, making them
indistinguishable.
Solution: remove one of the variables.
> > formulaDH3
> HM1 ~ (a + (b * ((HM2 + 0.3)^(1/3)))) * (c * log(HM3 + 27))
The problem with this formula is similar, albeit slightly different.
Suppose that (a, b, c) is a solution. Then (\lambda * a, \lambda * b,
c / \lambda) is also a solution for any real \lambda. Once again,
removing `c` should get rid of ambiguity.
> I've checked the Internet for a method of getting the starting
> values, but they are not comprehensive....any resources for how to
> find the starting values?
Starting values depend on the particular function you are trying to
fit. The usual approach seems to be in transforming the formula and
getting rid of parts you can safely assume to be small until it looks
like linear regression, or applying domain specific knowledge (e.g.
when trying to it a peak function, look for the biggest local maximum
in the dataset).
If you cannot do that, there also are global optimization algorithms
(see `nloptr`), though they still perform better on some problems
and worse on others. It certainly helps to have upper and lower
bounds on all parameter values. I've heard about a scientific group
creating a pool of many initial Levenberg-Marquardt parameter estimates,
then improving them using a genetic algorithm. The whole thing
"converged overnight" on a powerful desktop computer.
--
Best regards,
Ivan
[[alternative HTML version deleted]]
On Wed, 20 Mar 2019 09:43:11 +0000 akshay kulkarni <akshay_e4 at hotmail.com> wrote:> But doesn't removing some of the parameters reduce the precision of > the relationship between the response variable and the > predictors(inefficient estimates of the coefficients)?No, it doesn't, since there is already more variables in the formula than it has relationships between response and predictors. Let me offer you an example. Suppose you have a function y(x) = a*b*x + c. Let's try to simulate some data and then fit it: # choose according to your taste a <- ... b <- ... c <- ... # simulate model data abc <- data.frame(x = runif(100)) abc$y <- a*b*abc$x + c # add some normally distributed noise abc$y <- abc$y + rnorm(100, 0, 0.01) Now try to fit formula y ~ a*b*x + c using data in data frame abc. Do you get any results? Do they match the values you have originally set?[*] Then try a formula with the ambiguity removed: y ~ d*x + c. Do you get a result? Does the obtained d match a*b you had originally set? Note that for the d you obtained you can get an infinite amount of (a,b) tuples equally satisfying the equation d = a*b and the original regression problem, unless you constrain a or b. -- Best regards, Ivan [*] Using R, I couldn't, but the nonlinear solver in gnuplot is sometimes able to give *a* result for such a degenerate problem when data is sufficiently noisy. Of course, such a result usually doesn't match the originally set variable values and should not be trusted.
dear Ivan,
Thank you very much...You have been very helpful....
very many thanks for your time and effort....
yours sincerely,
AKSHAY M KULKARNI
________________________________
From: Ivan Krylov <krylov.r00t at gmail.com>
Sent: Wednesday, March 20, 2019 4:08 PM
To: akshay kulkarni
Cc: R help Mailing list
Subject: Re: [R] problem with nlsLM.....
On Wed, 20 Mar 2019 09:43:11 +0000
akshay kulkarni <akshay_e4 at hotmail.com> wrote:
> But doesn't removing some of the parameters reduce the precision of
> the relationship between the response variable and the
> predictors(inefficient estimates of the coefficients)?
No, it doesn't, since there is already more variables in the formula
than it has relationships between response and predictors.
Let me offer you an example. Suppose you have a function y(x) = a*b*x +
c. Let's try to simulate some data and then fit it:
# choose according to your taste
a <- ...
b <- ...
c <- ...
# simulate model data
abc <- data.frame(x = runif(100))
abc$y <- a*b*abc$x + c
# add some normally distributed noise
abc$y <- abc$y + rnorm(100, 0, 0.01)
Now try to fit formula y ~ a*b*x + c using data in data frame abc. Do
you get any results? Do they match the values you have originally
set?[*]
Then try a formula with the ambiguity removed: y ~ d*x + c. Do you get a
result? Does the obtained d match a*b you had originally set?
Note that for the d you obtained you can get an infinite amount of
(a,b) tuples equally satisfying the equation d = a*b and the original
regression problem, unless you constrain a or b.
--
Best regards,
Ivan
[*] Using R, I couldn't, but the nonlinear solver in gnuplot is
sometimes able to give *a* result for such a degenerate problem when
data is sufficiently noisy. Of course, such a result usually doesn't
match the originally set variable values and should not be trusted.
[[alternative HTML version deleted]]