On 01/01/16 14:35, Andras Farkas wrote:>
>
> Dear All,
>
> wonder if you have a thought on the followimg: if I have a simple
> model like model <- lm(log(y)~log(x)+log(z),data=data), where both,
> the dependent and independent variables are log transformed, is it ok
> just to use ypred <- predict(model,type=response) to get the
> predictions , then transform ypred with exp(ypred) to y's original
> scale to compare observed or known data (y) with model predicted
> (ypred) on the original scale?
>
> appreciate your thoughts...
Is it OK? In what sense? It's a free country --- at least the country
that I live in is still free (more or less) so you can do whatever you
feel like.
However it helps to do a little elementary mathematics, rather than just
hammering and hoping. When in doubt, do the maths.
You are fitting the model
log(y) = beta_0 + beta_1 * log(x) + beta_2 * log(z) + E
where it is tacitly assumed that E ~ N(0,sigma^2) and that the E's
corresponding to different observations are independent. Using predict()
will get you
log(y).hat = b_0 + b_1 * log(x) + b_2 * log(z)
where the b_i are the *estimates* of the beta_i. Under the given
assumptions, the b_i are unbiased so you get the expected value of
log(y).hat being equal to the expected value of log(y) and the universe
is in harmony. OMMMMMMMMM!
However when you exponentiate everything, things change; expected values
are not preserved since exponentiation is *not* a linear function!
If you set y.hat = exp(log(y).hat) you get
y.hat = exp(b_0) * x^{b_1}* z^{b_2}
It is not obvious what the expected value of this expression is, but it
is pretty sure not to be the same as the expected value of y.
Note that the expected value of y is *not* equal to:
exp(beta_0) * x^{beta_1} * z^{beta_2} (1)
Rather, it is equal to this expression multiplied by exp(sigma^2/2)
(under the stated assumptions).
I *think* (I have not checked this) that the expected value of y.hat
will be equal "asymptotically" to (1) --- that is, to the expected
value
of y divided by exp(sigma^2/2).
Other younger and wiser heads who read this list might like to chime in
here and possibly correct me.
So, bottom line, my guess is that if your sample size is "large" then
you won't go too far wrong by predicting y by
exp(log(y).hat)*exp(s^2/2)
i.e. by exp(predict(model))*exp(s^2/2)
where s^2 is the estimated variance from the model that you fitted on
the log scale using lm().
I would advise doing some fairly thorough simulations to get a feel for
the size of the bias. (There will always be *some* bias.)
You might also think about fitting the non-linear model
y = gamma_0 * x^{gamma_1} * z^{gamma_2} + E
if you think that additive errors make more sense than multiplicative
errors for your setting.
You can do this "fairly easily" using the function nls(), if you
choose
to go this way.
cheers,
Rolf Turner
--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276