I have found the "simulate" method (incorporated in some packages) very handy. As far as I can tell the only class for which simulate is actually implemented in base R is lm ... this is actually a little dangerous for a naive user who might be tempted to try simulate(X) where X is a glm fit instead, because it defaults to simulate.lm (since glm inherits from the lm class), and the answers make no sense ... Here is my simulate.glm(), which is modeled on simulate.lm . It implements simulation for poisson and binomial (binary or non-binary) models, should be easy to implement others if that seems necessary. I hereby request comments and suggest that it wouldn't hurt to incorporate it into base R ... (I will write docs for it if necessary, perhaps by modifying ?simulate -- there is no specific documentation for simulate.lm) cheers Ben Bolker simulate.glm <- function (object, nsim = 1, seed = NULL, ...) { ## RNG stuff copied from simulate.lm if (!exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE)) runif(1) if (is.null(seed)) RNGstate <- get(".Random.seed", envir = .GlobalEnv) else { R.seed <- get(".Random.seed", envir = .GlobalEnv) set.seed(seed) RNGstate <- structure(seed, kind = as.list(RNGkind())) on.exit(assign(".Random.seed", R.seed, envir = .GlobalEnv)) } ## get probabilities/intensities pred <- matrix(rep(predict(object,type="response"),nsim),ncol=nsim) ntot <- length(pred) if (object$family$family=="binomial") { resp <- object$model[[1]] size <- if (is.matrix(resp)) rowSums(resp) else 1 } val <- switch(object$family$family, poisson=rpois(ntot,pred), binomial=rbinom(ntot,prob=pred,size=size), stop("family ",object$family$family," not implemented")) ans <- as.data.frame(matrix(val,ncol=nsim)) attr(ans, "seed") <- RNGstate ans } if (FALSE) { ## examples: modified from ?simulate x <- 1:10 n <- 10 y <- rbinom(length(x),prob=plogis((x-5)/2),size=n) y2 <- c("a","b")[1+rbinom(length(x),prob=plogis((x-5)/2),size=1)] mod1 <- glm(cbind(y,n-y) ~ x,family=binomial) mod2 <- glm(factor(y2) ~ x,family=binomial) S1 <- simulate(mod1, nsim = 4) S1B <- simulate(mod2, nsim = 4) ## repeat the simulation: .Random.seed <- attr(S1, "seed") identical(S1, simulate(mod1, nsim = 4)) S2 <- simulate(mod1, nsim = 200, seed = 101) rowMeans(S2)/10 # after correcting for binomial sample size, should be about fitted(mod1) plot(rowMeans(S2)/10) lines(fitted(mod1)) ## repeat identically: (sseed <- attr(S2, "seed")) # seed; RNGkind as attribute stopifnot(identical(S2, simulate(mod1, nsim = 200, seed = sseed))) } -- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 260 bytes Desc: OpenPGP digital signature URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20090212/fb93e658/attachment.bin>
There is functionality similar to this included in the Zelig package with it's "sim" method. The "sim" method goes a step further and replicates the fitted model's analysis on the generated datasets as well. I would suggest taking a look -- Zelig supports most (if not all) glm models and a wide range of others. The Zelig maintainers' site can be found at: http://gking.harvard.edu/zelig/. Full disclosure: I am an employee of the Institute for Quantitative Social Science, which performs most of the development and support for the Zelig package. Best, Alex D'Amour Statistical Programmer Harvard Institute for Quantitative Social Science 2009/2/12 Ben Bolker <bolker at ufl.edu>:> > I have found the "simulate" method (incorporated > in some packages) very handy. As far as I can tell the > only class for which simulate is actually implemented > in base R is lm ... this is actually a little dangerous > for a naive user who might be tempted to try > simulate(X) where X is a glm fit instead, because > it defaults to simulate.lm (since glm inherits from > the lm class), and the answers make no sense ... > > Here is my simulate.glm(), which is modeled on > simulate.lm . It implements simulation for poisson > and binomial (binary or non-binary) models, should > be easy to implement others if that seems necessary. > > I hereby request comments and suggest that it wouldn't > hurt to incorporate it into base R ... (I will write > docs for it if necessary, perhaps by modifying ?simulate -- > there is no specific documentation for simulate.lm) > > cheers > Ben Bolker > -- > Ben Bolker > Associate professor, Biology Dep't, Univ. of Florida > bolker at ufl.edu / www.zoology.ufl.edu/bolker > GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >
>>>>> "BB" == Ben Bolker <bolker at ufl.edu> >>>>> on Thu, 12 Feb 2009 11:29:14 -0500 writes:BB> I have found the "simulate" method (incorporated BB> in some packages) very handy. As far as I can tell the BB> only class for which simulate is actually implemented BB> in base R is lm ... this is actually a little dangerous BB> for a naive user who might be tempted to try BB> simulate(X) where X is a glm fit instead, because BB> it defaults to simulate.lm (since glm inherits from BB> the lm class), and the answers make no sense ... BB> Here is my simulate.glm(), which is modeled on BB> simulate.lm . It implements simulation for poisson BB> and binomial (binary or non-binary) models, should BB> be easy to implement others if that seems necessary. BB> I hereby request comments and suggest that it wouldn't BB> hurt to incorporate it into base R ... (I will write BB> docs for it if necessary, perhaps by modifying ?simulate -- BB> there is no specific documentation for simulate.lm) BB> cheers BB> Ben Bolker [...............] Hi Ben, thank you for your proposals. I agree that simulate.glm() has been in missing in some way, till now, in particular, as, as you note, "glm" objects extend "lm" ones and hence simulate(<glm>, ...) currently dispatches to calling simulate.lm(....) which is only correct in the case of the gaussian family. I have looked at your proposal a bit, already "improved" the code slightly (e.g. re-include the comment you lost when you ``copied'' simulate.lm(): In such cases, please work from the source, not from what you get by print()ing stats:::simulate.lm --- the source is either a recent tarball, or the SVN repository, in this case, file https://svn.r-project.org/R/trunk/src/library/stats/R/lm.R ] and am planning to look at your and some own examples; all with the goal to indeed include this in the R standard 'stats' package in R-devel [to become R 2.9.0 in the future]. About the help page: At the moment, I think that only a few words would need to be added to the simulate help page, i.e., https://svn.r-project.org/R/trunk/src/library/stats/man/simulate.Rd and will be happy to receive a patch against this file. Thank you again, and best regards, Martin Maechler, ETH Zurich
Hi, For extended glms such as gams, gnm or other distributions such as negative binomial, would there need to be a separate simulate method? Or, could the current framework, rather than stopping with an error look for the appropriate model matrix, coefficients, distribution function and family object to simulate from? Nicholas> Message: 9 > Date: Fri, 13 Feb 2009 21:27:57 +0100 > From: Martin Maechler <maechler at stat.math.ethz.ch> > Subject: Re: [Rd] proposed simulate.glm method > To: Heather Turner <Heather.Turner at warwick.ac.uk> > Cc: r-devel at r-project.org, Martin Maechler > <maechler at stat.math.ethz.ch> > Message-ID: <18837.55245.15158.29378 at cmath-5.math.ethz.ch> > Content-Type: text/plain; charset=us-ascii > > Thank you, Heather and Ben, > > >>>>> "HT" == Heather Turner <Heather.Turner at warwick.ac.uk> > >>>>> on Fri, 13 Feb 2009 15:52:37 +0000 writes: > > HT> Yes, thanks to Ben for getting the ball rolling. His > HT> code was more streamlined than mine, pointing to further > HT> simplifications which I've included in the extended > HT> version below. > > HT> The code for the additional families uses functions from > HT> MASS and SuppDists - I wasn't sure about the best way to > HT> do this, so have just used :: for now. > > HT> It appears to be working happily for both glm and gnm > HT> objects (no gnm-specific code used). > > HT> Best wishes, > > HT> Heather > > [....] > > I have now followed Brian Ripley's suggetion to just extend > simulate.lm() to also deal with "glm" objects, but using > Heather's suggestions for the different families; > I've just commited src/library/stats/R/lm.R with the new code. > (get it from svn.r-project.org/R/trunk/ or this night's R-devel > tarball). > > One difference to your propsal: Instead of just > object$fitted , the code is using > fitted(object) ... something which should properly to the na.action > used. > > For the (MASS and) SuppDists package requirement, I'm using > the following > > if(is.null(tryCatch(loadNamespace("SuppDists"), > error = function(e) NULL))) > stop("Need CRAN package 'SuppDists' for 'inverse.gaussian' family") > > > I've not yet updated the help page for simulate(), > and have only tested relatively few cases for binomial, poisson > and Gamma. > I've wanted to expose this to you, so you can provide more > feedback and possibly even a patch to > svn.r-project.org/R/trunk/src/library/stats/man/simulate.Rd > > Martin > > >