On Sun, 11 May 2003, Nirmala Ravishankar wrote:
> I am trying to use gee() to calculate the robust standard errors for a
> logit model. My dataset (zol) has 195019 observations; winner, racebl,
> raceas, racehi are all binary variables. ID is saved as a vector of
> length 195019 with alternating 0's and 1's. I get the following
> message. I also tried the same command with corstr set to
> and got the same error message.
> > ID <- as.vector(array(0, nrow(zol)))
> > k <- seq(2, nrow(zol), 2)
> > ID[k] <- 1
> > fm <- gee(winner ~ racebl + racehi + raceas, id = ID, data = zol,
> = binomial(logit), corstr = "exchangeable")
> [1] "Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27"
> [1] "running glm to get initial regression estimate"
> [1] 0.4308219 -0.1929547 -0.1741733 -0.1925523
> Error in rep(0, maxclsz * maxclsz) : invalid number of copies in
> In addition: Warning message:
> NAs produced by integer overflow in: maxclsz * maxclsz
> What am I doing wrong?
Using a much larger dataset that the author of gee envisaged: the warning
message is pretty explicit. Not that I think you will get clusters of size
1e5 to work, since rep(0, maxclsz * maxclsz) is a vector of about 80Gb,
and on a 32-bit machine the OS can only address 4Gb at most per process.
I cannot imagine a real statistical problem with a homogeneous group of
1e5 observations, but if you have one, a 1% subsample ought to suffice for
all practical purposes. And any statistical fluctuations (variance)
will be swamped by model inadequacy (bias) for 2e5 binary observations.
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595