?optim says, in describing the control parameter, 'fnscale' An overall scaling to be applied to the value of 'fn' and 'gr' during optimization. If negative, turns the problem into a maximization problem. Optimization is performed on 'fn(par)/fnscale'. 'parscale' A vector of scaling values for the parameters. Optimization is performed on 'par/parscale' and these should be comparable in the sense that a unit change in any element produces about a unit change in the scaled value. 1. Does the final phrase 'produces about a unit change in the scaled value' refer to the value of the objective function? Substantively I think it must, though grammatically it's less clear. 2. "Optimization is performed on 'par/parscale'" means a) if par is 3 and parscale is 10 then the objective function will be evaluated at .3. This strikes me as the literal reading of what the clause means; it also strikes me as extremely unlikely this is what really happens. or b) if par is 3 and parscale is 10 then the objective function is evaluated at 3. The optimizer records this as if par were 30, and subsequently, e.g. when computing deltas or making steps, does so in this space. So a step of d becomes a step of d/parscale for the real objective function. c) About the same as b, only steps of d become d*parscale. 3. Does scaling affect any of the final results (including log-likelihood, std errors, ...), assuming the scaled and unscaled methods find the same untransformed point? I assume that scaling is transparent in the sense of 3, i.e. does not affect any of the reported results (unless it changes how well the optimizer works or fnscale converts minimizing to maximizing). Even given that, suppose I think that f(x)-f(x1) approx equals f(x)-f(x2) where x1[1] = x[1] + 10 and x2[2] = x[2] + 1, and x, x1, and x2 are otherwise equal. Does this mean I should have parscale = c(10, 1) or parscale= (1/10, 1)? Since I'm not sure about parscale, I'm really not sure about 'ndeps' A vector of step sizes for the finite-difference approximation to the gradient, on 'par/parscale' scale. Defaults to '1e-3'. So, if I don't do any other rescaling, I might say ndeps=c(1e-2, 1e3) in the previous example (response to x[1] is 10 times flatter than to x[2]). I guess that if I do have parscale set, I leave the default ndeps (1e-3 for both) and get the same effect. Right?