Karl Ove Hufthammer
2012-Aug-18 09:30 UTC
[R] Parameter scaling problems with optim and Nelder-Mead method (bug?)
Dear all,
I?m having some problems getting optim with method="Nelder-Mead" to
work
properly. It seems like there is no way of controlling the step size,
and the step size seems to depend on the *difference* between the
initial values, which makes no sense. Example:
f=function(xy, mu1, mu2) {
print(xy)
dnorm(xy[1]-mu1)*dnorm(xy[2]-mu2)
}
f1=function(xy) -f(xy, 0, 0)
optim(c(1,1), f1)
The first four values evaluated are
1.0, 1.0
1.1, 1.0
1.0, 1.1
0.9, 1.1
which is reasonable (step size of 0.1) for this function. And if I
translate both the function and the initial values
f2=function(xy) -f(xy, 5000, 5000)
optim(c(5001,5001), f2)
the first four values are
5001.0, 5001.0
5501.1, 5001.0
5001.0, 5501.1
4500.9, 5501.1
With
f3=function(xy) -f(xy, 0, 5000)
optim(c(1,5001), f3)
they are
1.0, 5001.0
501.1, 5001.0
1.0, 5501.1
-499.1, 5501.1
and with
f4=function(xy) -f(xy, -3000, 50000)
optim(c(-2999,50001), f4)
-2999.0, 50001.0
2001.1, 50001.0
-2999.0, 55001.1
-7999.1, 55001.1
However, the function to optimise is the same in all cases, only
translated, not scaled, so the step size *should* be the same. From
reading the documentation, it looks like changing the parscale should
work, and *relative* changes have the intended effect. Example:
optim(c(1,1), f1, control=list(parscale=c(1,5)))
gives the function evaluations
1.0, 1.0
1.1, 1.0
1.0, 1.5
1.1, 0.5
But changing both values, e.g.,
optim(c(1,1), f1, control=list(parscale=c(500,500)))
gives the same first four values. There *are* eventually some
differences in the values tried, but these don?t seem to correspond to
parscale as described in ?optim. For example, for parscale=c(1,1), the
parameter values tried are
1: 1, 1
2: 1.1, 1
3: 1, 1.1
4: 0.9, 1.1
5: 0.95, 1.075
6: 0.9, 1
7: 0.85, 0.95
8: 0.95, 0.85
9: 0.9375, 0.9125
10: 0.8, 0.8
11: 0.7, 0.7
12: 0.8, 0.6
13: 0.8125, 0.6875
14: 0.55, 0.45
while for parscale=c(500,500) they are
1: 1, 1
2: 1.1, 1
3: 1, 1.1
4: 0.9, 1.1
5: 0.95, 1.075
6: 0.9, 1
7: 0.85, 0.95
8: 0.95, 0.85
9: 0.975, 0.725
10: 0.825, 0.675
11: 0.7375, 0.5125
12: 0.8625, 0.2875
13: 0.859375, 0.453125
14: 0.625000000000001, 0.0750000000000004
for parscale=1/c(50000,50000) they are
1: 1, 1
2: 1.1, 1
3: 1, 1.1
4: 0.9, 1.1
5: 0.95, 1.075
6: 0.9, 1
7: 0.85, 0.95
8: 0.95, 0.85
9: 0.9375, 0.9125
10: 0.8, 0.8
11: 0.7, 0.7
12: 0.8, 0.6
13: 0.8125, 0.6875
14: 0.55, 0.45
And there seems to be no way of actually changing the step size to
reasonable values (i.e., the same values for optimising f1?f4).
Is there something I have missed in how one is supposed to use optim
with Nelder-Mead? Or is this actually a bug in the implementation?
$ sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-suse-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=nn_NO.UTF-8 LC_NUMERIC=C
[3] LC_TIME=nn_NO.UTF-8 LC_COLLATE=nn_NO.UTF-8
[5] LC_MONETARY=nn_NO.UTF-8 LC_MESSAGES=nn_NO.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=nn_NO.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
--
Karl Ove Hufthammer
Bert Gunter
2012-Aug-18 14:32 UTC
[R] Parameter scaling problems with optim and Nelder-Mead method (bug?)
Well, I'm no optimization guru, but a quick reading of Wikipedia said tha step size depends on the initial value configuration and is then "adjusted" by the algorithm using alpha, beta and gamma scaling parameters thru the optimization. So it seems that it is supposed to work exactly as you describe. Why do you expect something else? -- Bert On Sat, Aug 18, 2012 at 2:30 AM, Karl Ove Hufthammer <karl at huftis.org> wrote:> Dear all, > > I?m having some problems getting optim with method="Nelder-Mead" to work > properly. It seems like there is no way of controlling the step size, > and the step size seems to depend on the *difference* between the > initial values, which makes no sense. Example: > > f=function(xy, mu1, mu2) { > print(xy) > dnorm(xy[1]-mu1)*dnorm(xy[2]-mu2) > } > f1=function(xy) -f(xy, 0, 0) > optim(c(1,1), f1) > > The first four values evaluated are > > 1.0, 1.0 > 1.1, 1.0 > 1.0, 1.1 > 0.9, 1.1 > > which is reasonable (step size of 0.1) for this function. And if I > translate both the function and the initial values > > f2=function(xy) -f(xy, 5000, 5000) > optim(c(5001,5001), f2) > > the first four values are > > 5001.0, 5001.0 > 5501.1, 5001.0 > 5001.0, 5501.1 > 4500.9, 5501.1 > > With > > f3=function(xy) -f(xy, 0, 5000) > optim(c(1,5001), f3) > > they are > > 1.0, 5001.0 > 501.1, 5001.0 > 1.0, 5501.1 > -499.1, 5501.1 > > and with > > f4=function(xy) -f(xy, -3000, 50000) > optim(c(-2999,50001), f4) > > -2999.0, 50001.0 > 2001.1, 50001.0 > -2999.0, 55001.1 > -7999.1, 55001.1 > > However, the function to optimise is the same in all cases, only > translated, not scaled, so the step size *should* be the same. From > reading the documentation, it looks like changing the parscale should > work, and *relative* changes have the intended effect. Example: > > optim(c(1,1), f1, control=list(parscale=c(1,5))) > > gives the function evaluations > > 1.0, 1.0 > 1.1, 1.0 > 1.0, 1.5 > 1.1, 0.5 > > But changing both values, e.g., > > optim(c(1,1), f1, control=list(parscale=c(500,500))) > > gives the same first four values. There *are* eventually some > differences in the values tried, but these don?t seem to correspond to > parscale as described in ?optim. For example, for parscale=c(1,1), the > parameter values tried are > > 1: 1, 1 > 2: 1.1, 1 > 3: 1, 1.1 > 4: 0.9, 1.1 > 5: 0.95, 1.075 > 6: 0.9, 1 > 7: 0.85, 0.95 > 8: 0.95, 0.85 > 9: 0.9375, 0.9125 > 10: 0.8, 0.8 > 11: 0.7, 0.7 > 12: 0.8, 0.6 > 13: 0.8125, 0.6875 > 14: 0.55, 0.45 > > while for parscale=c(500,500) they are > > 1: 1, 1 > 2: 1.1, 1 > 3: 1, 1.1 > 4: 0.9, 1.1 > 5: 0.95, 1.075 > 6: 0.9, 1 > 7: 0.85, 0.95 > 8: 0.95, 0.85 > 9: 0.975, 0.725 > 10: 0.825, 0.675 > 11: 0.7375, 0.5125 > 12: 0.8625, 0.2875 > 13: 0.859375, 0.453125 > 14: 0.625000000000001, 0.0750000000000004 > > for parscale=1/c(50000,50000) they are > > 1: 1, 1 > 2: 1.1, 1 > 3: 1, 1.1 > 4: 0.9, 1.1 > 5: 0.95, 1.075 > 6: 0.9, 1 > 7: 0.85, 0.95 > 8: 0.95, 0.85 > 9: 0.9375, 0.9125 > 10: 0.8, 0.8 > 11: 0.7, 0.7 > 12: 0.8, 0.6 > 13: 0.8125, 0.6875 > 14: 0.55, 0.45 > > And there seems to be no way of actually changing the step size to > reasonable values (i.e., the same values for optimising f1?f4). > > Is there something I have missed in how one is supposed to use optim > with Nelder-Mead? Or is this actually a bug in the implementation? > > > $ sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-suse-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=nn_NO.UTF-8 LC_NUMERIC=C > [3] LC_TIME=nn_NO.UTF-8 LC_COLLATE=nn_NO.UTF-8 > [5] LC_MONETARY=nn_NO.UTF-8 LC_MESSAGES=nn_NO.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=nn_NO.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > -- > Karl Ove Hufthammer > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm