Dear all,
I am trying to get an estimate of uncertainty surrounding a single predicted
value from a beta regression model (this is similar to a logistic glm - in that
it involves a link function and linear predictor - but it uses the beta
distribution rather than discrete binomial). For example:
library(betareg)
data("GasolineYield")
test.model<-betareg(yield~gravity+temp,data=GasolineYield)
...and I would like a measure of uncertainty for a single predicted value, such
as:
predicted.value<-predict(test.model,newdata=GasolineYield[20,],type="response")
Ideally, this interval estimate would be a confidence interval (more
specifically, a prediction confidence interval) or, failing that, a standard
error.
However, using predict.betareg {betareg} there is no equivalent to the
‘se.fit=T’ or ‘interval=”confidence”’ arguments that are present in
predict.lm(). I note, also, that there is no ‘interval=”confidence”’ in
predict.glm. Am I missing something or is this an unresolved issue for beta
regression models (and perhaps glms as well)?
It is possible, however, to get the “fitted variances of response” (quoted from
?predict.betareg). For example:
var1<-predict(test.model,newdata=GasolineYield[20,],type="variance")
This seems to be the variance as got from the mean-variance relationship for the
beta distribution (though I’ve been unsuccessful in being able to unpack the
source code tar.gz file from CRAN in Windows to see this):
precis.param<-
predict(test.model,newdata=GasolineYield[20,],type="precision")
#This is from the description of the beta distribution in ?VGAM::betaff:
var2<-predicted.value*(1-predicted.value)/(1+precis.param)
var1==var2
...but I’m not sure what can be done with this value. One can’t simply take the
sq rt to get a “standard error” (of sorts), can they? I don’t think, for
example, this is taking much account of the uncertainty in the regression
parameter estimates.
Even if I was able to get a SE interval, how might I go about converting this to
a confidence interval? Would it be correct to use a quantile from the standard
normal distribution (i.e. CI<-SE*qnorm(p=0.975)) or, rather, from the beta
distribution? I think I’ve seen the former being suggested in the context of
non-normal glms (e.g. binomial errors) but the latter would make more sense to
me here because, for example, the distribution would be suitably asymmetric near
the bounds of the response, i.e. zero and 1.
I found the relevant parameterisation of the beta distribution (involving mu and
a ‘precision parameter’, already estimated in the betareg model) in the package
‘gamlss.dist’, so this could be done with:
library(gamlss.dist)
CI<-SE * qBE(p=0.975,mu=predicted.value,sigma=sqrt(var1))
#...where SE has already been calculated somehow!
Thanks in advance for any help/ideas/suggestions! As you can see, I am out of my
depth here!
Many thanks,
Oliver
[[alternative HTML version deleted]]