On Sun, 11 May 2003, Nirmala Ravishankar wrote:
> I am trying to use gee() to calculate the robust standard errors for a
> logit model. My dataset (zol) has 195019 observations; winner, racebl,
> raceas, racehi are all binary variables. ID is saved as a vector of
> length 195019 with alternating 0's and 1's. I get the following
error
> message. I also tried the same command with corstr set to
"independence"
> and got the same error message.
>
>
> > ID <- as.vector(array(0, nrow(zol)))
> > k <- seq(2, nrow(zol), 2)
> > ID[k] <- 1
>
>
> > fm <- gee(winner ~ racebl + racehi + raceas, id = ID, data = zol,
family
> = binomial(logit), corstr = "exchangeable")
> [1] "Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27"
> [1] "running glm to get initial regression estimate"
> [1] 0.4308219 -0.1929547 -0.1741733 -0.1925523
> Error in rep(0, maxclsz * maxclsz) : invalid number of copies in
"rep"
> In addition: Warning message:
> NAs produced by integer overflow in: maxclsz * maxclsz
>
>
>
> What am I doing wrong?
Using a much larger dataset that the author of gee envisaged: the warning
message is pretty explicit. Not that I think you will get clusters of size
1e5 to work, since rep(0, maxclsz * maxclsz) is a vector of about 80Gb,
and on a 32-bit machine the OS can only address 4Gb at most per process.
I cannot imagine a real statistical problem with a homogeneous group of
1e5 observations, but if you have one, a 1% subsample ought to suffice for
all practical purposes. And any statistical fluctuations (variance)
will be swamped by model inadequacy (bias) for 2e5 binary observations.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595