thr3ads.net - R devel - [Rd] "simulate" does not include variability in parameter estimation [Dec 2019]

If this information is useful, please help other people find it:
Share via:

Spencer Graves

2019-Dec-27 04:14 UTC

[Rd] "simulate" does not include variability in parameter estimation

Hello, All:


 ????? The default "simulate" method for lm and glm seems to ignore
the
sampling variance of the parameter estimates;? see the trivial lm and 
glm examples below.? Both these examples estimate a mean with formula = 
x~1.? In both cases, the variance of the estimated mean is 1.


 ??? ??????? * In the lm example with x0 = c(-1, 1), var(x0) = 2, and 
var(unlist(simulate(lm(x0~1), 10000, 1))) is 2.0064.? Shouldn't it be 3 
= var(mean(x0)) + var(x0) = (2/2) + 2?


 ??? ??????? * In the glm example with x1=1, 
var(unlist(simulate(glm(x1~1, poisson), 10000, 1))) = 1.006. Shouldn't 
it be 2 = var(glm estimate of the mean) + var(simulated Poisson 
distribution) = 1 + 1?


 ????? I'm asking, because I've recently written "simulate"
methods for
objects of class stats::glm and BMA::bic.glm, where my primary interest 
was simulating the predicted mean with "newdata".? I'm doing this,
so I
can get Monte Carlo prediction intervals.? My current code for 
"simulate.glm" and "simulate.bic.glm" are available in the
development
version of the "Ecfun" package on GitHub:


https://github.com/sbgraves237/Ecfun


 ????? Comparing my new code with "stats:::simulate.lm" raises the 
following questions in my mind regarding "simulate" of a fit object:


 ??? ??????? 1.? Shouldn't "simulate" start by simulating the
random
variability in the estimated parameters?? I need that for my current 
application.? If a generic "simulate" function should NOT include
this,
what should we call something that does include this?? And how does the 
current stats:::simulate.lm behavior fit with this?


 ???? ?????? 2.? Shouldn't "simulate" of a model fit include an
option
for "newdata"?? I need that for my application.


 ??? ??????? 3.? By comparing with "predict.glm", I felt I needed an 
argument 'type = c("link", "response")'.?
"predict.glm" has an argument
'type = c("link", "response", "terms")'.?
I didn't need "terms", so I
didn't take the time to code that.? However, a general "simulate" 
function should probably include that.


 ??? ??????? 4.? My application involves assumed Poisson counts.? I need 
to simulate those as well.? If I combined those with "simulate.glm", 
what would I call them?? I can't use the word "response", because
that's
already used with a different meaning. Might "observations" be the 
appropriate term?


 ????? What do you think?
 ????? Thanks,
 ????? Spencer Graves


 > x0 <- c(-1, 1)
 > var(x0)
[1] 2
 > fit0 <- lm(x0~1)
 > vcov(fit0)
 ??????????? (Intercept)
(Intercept)?????????? 1
 > sim0 <- simulate(fit0, 10000, 1)
 > var(unlist(sim0))
[1] 2.006408
 > x1 <- 1
 > fit1 <- glm(x1~1, poisson)
 > coef(fit1)
 ?(Intercept)
4.676016e-11
 > exp(coef(fit1))
(Intercept)
 ????????? 1
 > vcov(fit1)
 ??????????? (Intercept)
(Intercept)?? 0.9999903
 > sim1 <- simulate(fit1, 10000, 1)
 > var(unlist(sim1))
[1] 1.00617
 > sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.2

Matrix products: default
BLAS: 
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: 
/Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats???? graphics? grDevices utils???? datasets? methods base

loaded via a namespace (and not attached):
[1] compiler_3.6.2 tools_3.6.2

Duncan Murdoch

2019-Dec-27 10:34 UTC

head link

[Rd] "simulate" does not include variability in parameter estimation

On 26/12/2019 11:14 p.m., Spencer Graves wrote:> Hello, All:
> 
> 
>   ????? The default "simulate" method for lm and glm seems to
ignore the
> sampling variance of the parameter estimates;? see the trivial lm and
> glm examples below.? Both these examples estimate a mean with formula >
x~1.? In both cases, the variance of the estimated mean is 1.
That's how it's documented to operate.  Nothing in the help page 
suggests it would try to simulate parameter values.  Indeed, it doesn't 
have enough information on the distribution to sample from:  the 
appropriate distribution to simulate from if you want to include 
uncertainty in the parameter estimates is the posterior distribution, 
but lm and glm take a classical point of view, not a Bayesian point of 
view, so they have no concept of a posterior.
>   ??? ??????? * In the lm example with x0 = c(-1, 1), var(x0) = 2, and
> var(unlist(simulate(lm(x0~1), 10000, 1))) is 2.0064.? Shouldn't it be 3
> = var(mean(x0)) + var(x0) = (2/2) + 2?
That calculation ignores the uncertainty in the estimation of sigma.

Duncan Murdoch
> 
> 
>   ??? ??????? * In the glm example with x1=1,
> var(unlist(simulate(glm(x1~1, poisson), 10000, 1))) = 1.006. Shouldn't
> it be 2 = var(glm estimate of the mean) + var(simulated Poisson
> distribution) = 1 + 1?
> 
> 
>   ????? I'm asking, because I've recently written
"simulate" methods for
> objects of class stats::glm and BMA::bic.glm, where my primary interest
> was simulating the predicted mean with "newdata".? I'm doing
this, so I
> can get Monte Carlo prediction intervals.? My current code for
> "simulate.glm" and "simulate.bic.glm" are available in
the development
> version of the "Ecfun" package on GitHub:
> 
> 
> https://github.com/sbgraves237/Ecfun
> 
> 
>   ????? Comparing my new code with "stats:::simulate.lm" raises
the
> following questions in my mind regarding "simulate" of a fit
object:
> 
> 
>   ??? ??????? 1.? Shouldn't "simulate" start by simulating
the random
> variability in the estimated parameters?? I need that for my current
> application.? If a generic "simulate" function should NOT include
this,
> what should we call something that does include this?? And how does the
> current stats:::simulate.lm behavior fit with this?
> 
> 
>   ???? ?????? 2.? Shouldn't "simulate" of a model fit include
an option
> for "newdata"?? I need that for my application.
> 
> 
>   ??? ??????? 3.? By comparing with "predict.glm", I felt I
needed an
> argument 'type = c("link", "response")'.?
"predict.glm" has an argument
> 'type = c("link", "response",
"terms")'.? I didn't need "terms", so I
> didn't take the time to code that.? However, a general
"simulate"
> function should probably include that.
> 
> 
>   ??? ??????? 4.? My application involves assumed Poisson counts.? I need
> to simulate those as well.? If I combined those with
"simulate.glm",
> what would I call them?? I can't use the word "response",
because that's
> already used with a different meaning. Might "observations" be
the
> appropriate term?
> 
> 
>   ????? What do you think?
>   ????? Thanks,
>   ????? Spencer Graves
> 
> 
>   > x0 <- c(-1, 1)
>   > var(x0)
> [1] 2
>   > fit0 <- lm(x0~1)
>   > vcov(fit0)
>   ??????????? (Intercept)
> (Intercept)?????????? 1
>   > sim0 <- simulate(fit0, 10000, 1)
>   > var(unlist(sim0))
> [1] 2.006408
>   > x1 <- 1
>   > fit1 <- glm(x1~1, poisson)
>   > coef(fit1)
>   ?(Intercept)
> 4.676016e-11
>   > exp(coef(fit1))
> (Intercept)
>   ????????? 1
>   > vcov(fit1)
>   ??????????? (Intercept)
> (Intercept)?? 0.9999903
>   > sim1 <- simulate(fit1, 10000, 1)
>   > var(unlist(sim1))
> [1] 1.00617
>   > sessionInfo()
> R version 3.6.2 (2019-12-12)
> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> Running under: macOS Catalina 10.15.2
> 
> Matrix products: default
> BLAS:
>
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
> LAPACK:
> /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] stats???? graphics? grDevices utils???? datasets? methods base
> 
> loaded via a namespace (and not attached):
> [1] compiler_3.6.2 tools_3.6.2
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Spencer Graves

2019-Dec-27 15:31 UTC

head link

[Rd] "simulate" does not include variability in parameter estimation

On 2019-12-27 04:34, Duncan Murdoch wrote:> On 26/12/2019 11:14 p.m., Spencer Graves wrote:
>> Hello, All:
>>
>>
>> ? ????? The default "simulate" method for lm and glm seems to
ignore the
>> sampling variance of the parameter estimates;? see the trivial lm and
>> glm examples below.? Both these examples estimate a mean with formula
>> x~1.? In both cases, the variance of the estimated mean is 1.
>
> That's how it's documented to operate.? Nothing in the help page 
> suggests it would try to simulate parameter values.? Indeed, it 
> doesn't have enough information on the distribution to sample from:? 
> the appropriate distribution to simulate from if you want to include 
> uncertainty in the parameter estimates is the posterior distribution, 
> but lm and glm take a classical point of view, not a Bayesian point of 
> view, so they have no concept of a posterior.

 ????? Thanks for the reply.? What do you suggest for someone who wants 
confidence, prediction and tolerance intervals for newdata for a general 
fit object?


 ????? For a glm object, one could get confidence intervals starting 
with predicted mean and standard errors
from predict(glm(...), newdata, type='link', se.fit=TRUE), then linkinv 
to get the confidence intervals on scale of expected values of the 
random variables.? From that one could compute tolerance intervals.


 ????? Is there a way to get more standard prediction intervals from a 
glm object, other than the Bayesian approach coded into 
Ecfun:::simulate.glm?? And that still doesn't answer the question re. 
confidence intervals for a more general fit object like BMA::bic.glm.


 ????? Comments?
 ????? Thanks,
 ????? Spencer Graves
>
>> ? ??? ??????? * In the lm example with x0 = c(-1, 1), var(x0) = 2, and
>> var(unlist(simulate(lm(x0~1), 10000, 1))) is 2.0064.? Shouldn't it
be 3
>> = var(mean(x0)) + var(x0) = (2/2) + 2?
>
> That calculation ignores the uncertainty in the estimation of sigma.
>
> Duncan Murdoch
>
>>
>>
>> ? ??? ??????? * In the glm example with x1=1,
>> var(unlist(simulate(glm(x1~1, poisson), 10000, 1))) = 1.006.
Shouldn't
>> it be 2 = var(glm estimate of the mean) + var(simulated Poisson
>> distribution) = 1 + 1?
>>
>>
>> ? ????? I'm asking, because I've recently written
"simulate" methods for
>> objects of class stats::glm and BMA::bic.glm, where my primary interest
>> was simulating the predicted mean with "newdata".? I'm
doing this, so I
>> can get Monte Carlo prediction intervals.? My current code for
>> "simulate.glm" and "simulate.bic.glm" are available
in the development
>> version of the "Ecfun" package on GitHub:
>>
>>
>> https://github.com/sbgraves237/Ecfun
>>
>>
>> ? ????? Comparing my new code with "stats:::simulate.lm"
raises the
>> following questions in my mind regarding "simulate" of a fit
object:
>>
>>
>> ? ??? ??????? 1.? Shouldn't "simulate" start by
simulating the random
>> variability in the estimated parameters?? I need that for my current
>> application.? If a generic "simulate" function should NOT
include this,
>> what should we call something that does include this?? And how does the
>> current stats:::simulate.lm behavior fit with this?
>>
>>
>> ? ???? ?????? 2.? Shouldn't "simulate" of a model fit
include an option
>> for "newdata"?? I need that for my application.
>>
>>
>> ? ??? ??????? 3.? By comparing with "predict.glm", I felt I
needed an
>> argument 'type = c("link", "response")'.?
"predict.glm" has an argument
>> 'type = c("link", "response",
"terms")'.? I didn't need "terms", so I
>> didn't take the time to code that.? However, a general
"simulate"
>> function should probably include that.
>>
>>
>> ? ??? ??????? 4.? My application involves assumed Poisson counts.? I 
>> need
>> to simulate those as well.? If I combined those with
"simulate.glm",
>> what would I call them?? I can't use the word "response",
because that's
>> already used with a different meaning. Might "observations"
be the
>> appropriate term?
>>
>>
>> ? ????? What do you think?
>> ? ????? Thanks,
>> ? ????? Spencer Graves
>>
>>
>> ? > x0 <- c(-1, 1)
>> ? > var(x0)
>> [1] 2
>> ? > fit0 <- lm(x0~1)
>> ? > vcov(fit0)
>> ? ??????????? (Intercept)
>> (Intercept)?????????? 1
>> ? > sim0 <- simulate(fit0, 10000, 1)
>> ? > var(unlist(sim0))
>> [1] 2.006408
>> ? > x1 <- 1
>> ? > fit1 <- glm(x1~1, poisson)
>> ? > coef(fit1)
>> ? ?(Intercept)
>> 4.676016e-11
>> ? > exp(coef(fit1))
>> (Intercept)
>> ? ????????? 1
>> ? > vcov(fit1)
>> ? ??????????? (Intercept)
>> (Intercept)?? 0.9999903
>> ? > sim1 <- simulate(fit1, 10000, 1)
>> ? > var(unlist(sim1))
>> [1] 1.00617
>> ? > sessionInfo()
>> R version 3.6.2 (2019-12-12)
>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>> Running under: macOS Catalina 10.15.2
>>
>> Matrix products: default
>> BLAS:
>>
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
>>
>> LAPACK:
>>
/Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
>>
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats???? graphics? grDevices utils???? datasets? methods base
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_3.6.2 tools_3.6.2
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>

Apparently Analagous Threads

Search for more maybe matching threads

R devel - Dec 2019 - "simulate" does not include variability in parameter estimation

[Rd] "simulate" does not include variability in parameter estimation

[Rd] "simulate" does not include variability in parameter estimation

[Rd] "simulate" does not include variability in parameter estimation

Apparently Analagous Threads