dear Ivan, THanks for the reply...... But doesn't removing some of the parameters reduce the precision of the relationship between the response variable and the predictors(inefficient estimates of the coefficients)? very many thanks for your time and effort.... yours sincerely, AKSHAY M KULKARNI ________________________________ From: Ivan Krylov <krylov.r00t at gmail.com> Sent: Wednesday, March 20, 2019 3:06 PM To: akshay kulkarni Cc: R help Mailing list Subject: Re: [R] problem with nlsLM..... On Wed, 20 Mar 2019 08:02:45 +0000 akshay kulkarni <akshay_e4 at hotmail.com> wrote:> formulaDH5 <- as.formula(HM1 ~ (a + (b * ((HM2 + 0.3)^(1/2)))) + > (A*sin(w*HM3 + c) + C))The problem with this formula is simple: the partial derivative with respect to `a` is the same as the partial derivative with respect to `C`. This makes the regression problem have an infinite number of solutions, all of them satisfying equation \lambda_1 * a + \lambda_2 * C + \lambda_3 = 0 for some values of \lambda_i. Gradient-based optimizers (which both nls and nlsLM are) don't like problems with non-unique solutions, especially when the model function has same partial derivative with respect to different variables, making them indistinguishable. Solution: remove one of the variables.> > formulaDH3 > HM1 ~ (a + (b * ((HM2 + 0.3)^(1/3)))) * (c * log(HM3 + 27))The problem with this formula is similar, albeit slightly different. Suppose that (a, b, c) is a solution. Then (\lambda * a, \lambda * b, c / \lambda) is also a solution for any real \lambda. Once again, removing `c` should get rid of ambiguity.> I've checked the Internet for a method of getting the starting > values, but they are not comprehensive....any resources for how to > find the starting values?Starting values depend on the particular function you are trying to fit. The usual approach seems to be in transforming the formula and getting rid of parts you can safely assume to be small until it looks like linear regression, or applying domain specific knowledge (e.g. when trying to it a peak function, look for the biggest local maximum in the dataset). If you cannot do that, there also are global optimization algorithms (see `nloptr`), though they still perform better on some problems and worse on others. It certainly helps to have upper and lower bounds on all parameter values. I've heard about a scientific group creating a pool of many initial Levenberg-Marquardt parameter estimates, then improving them using a genetic algorithm. The whole thing "converged overnight" on a powerful desktop computer. -- Best regards, Ivan [[alternative HTML version deleted]]
On Wed, 20 Mar 2019 09:43:11 +0000 akshay kulkarni <akshay_e4 at hotmail.com> wrote:> But doesn't removing some of the parameters reduce the precision of > the relationship between the response variable and the > predictors(inefficient estimates of the coefficients)?No, it doesn't, since there is already more variables in the formula than it has relationships between response and predictors. Let me offer you an example. Suppose you have a function y(x) = a*b*x + c. Let's try to simulate some data and then fit it: # choose according to your taste a <- ... b <- ... c <- ... # simulate model data abc <- data.frame(x = runif(100)) abc$y <- a*b*abc$x + c # add some normally distributed noise abc$y <- abc$y + rnorm(100, 0, 0.01) Now try to fit formula y ~ a*b*x + c using data in data frame abc. Do you get any results? Do they match the values you have originally set?[*] Then try a formula with the ambiguity removed: y ~ d*x + c. Do you get a result? Does the obtained d match a*b you had originally set? Note that for the d you obtained you can get an infinite amount of (a,b) tuples equally satisfying the equation d = a*b and the original regression problem, unless you constrain a or b. -- Best regards, Ivan [*] Using R, I couldn't, but the nonlinear solver in gnuplot is sometimes able to give *a* result for such a degenerate problem when data is sufficiently noisy. Of course, such a result usually doesn't match the originally set variable values and should not be trusted.
dear Ivan, Thank you very much...You have been very helpful.... very many thanks for your time and effort.... yours sincerely, AKSHAY M KULKARNI ________________________________ From: Ivan Krylov <krylov.r00t at gmail.com> Sent: Wednesday, March 20, 2019 4:08 PM To: akshay kulkarni Cc: R help Mailing list Subject: Re: [R] problem with nlsLM..... On Wed, 20 Mar 2019 09:43:11 +0000 akshay kulkarni <akshay_e4 at hotmail.com> wrote:> But doesn't removing some of the parameters reduce the precision of > the relationship between the response variable and the > predictors(inefficient estimates of the coefficients)?No, it doesn't, since there is already more variables in the formula than it has relationships between response and predictors. Let me offer you an example. Suppose you have a function y(x) = a*b*x + c. Let's try to simulate some data and then fit it: # choose according to your taste a <- ... b <- ... c <- ... # simulate model data abc <- data.frame(x = runif(100)) abc$y <- a*b*abc$x + c # add some normally distributed noise abc$y <- abc$y + rnorm(100, 0, 0.01) Now try to fit formula y ~ a*b*x + c using data in data frame abc. Do you get any results? Do they match the values you have originally set?[*] Then try a formula with the ambiguity removed: y ~ d*x + c. Do you get a result? Does the obtained d match a*b you had originally set? Note that for the d you obtained you can get an infinite amount of (a,b) tuples equally satisfying the equation d = a*b and the original regression problem, unless you constrain a or b. -- Best regards, Ivan [*] Using R, I couldn't, but the nonlinear solver in gnuplot is sometimes able to give *a* result for such a degenerate problem when data is sufficiently noisy. Of course, such a result usually doesn't match the originally set variable values and should not be trusted. [[alternative HTML version deleted]]