Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR)
2007-May-31 12:49 UTC
[R] Conditional logistic regression for "events/trials" format
Dear R users, I have a large individual-level dataset (~700,000 records) which I am performing a conditional logistic regression on. Key variables include the dichotomous outcome, dichotomous exposure, and the stratum to which each person belongs. Using this individual-level dataset I can successfully use clogit to create the model I want. However reading this large .csv file into R and running the models takes a fair amount of time. Alternatively, I could choose to "collapse" the dataset so that each row has the number of events, number of individuals, and the exposure and stratum. In SAS they call this the "events/trials" format. This would make my dataset much smaller and presumably speed things up. So my question is: can I use clogit (or possibly another function) to perform a conditional logistic regression when the data is in this "events/trials" format? I am using R version 2.5.0. Thank you very much, Matt Strickland Birth Defects Branch U.S. Centers for Disease Control
Charles C. Berry
2007-May-31 17:11 UTC
[R] Conditional logistic regression for "events/trials" format
On Thu, 31 May 2007, Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR) wrote:> Dear R users, > > I have a large individual-level dataset (~700,000 records) which I am > performing a conditional logistic regression on. Key variables include > the dichotomous outcome, dichotomous exposure, and the stratum to which > each person belongs. > > Using this individual-level dataset I can successfully use clogit to > create the model I want. However reading this large .csv file into R and > running the models takes a fair amount of time. > > Alternatively, I could choose to "collapse" the dataset so that each row > has the number of events, number of individuals, and the exposure and > stratum. In SAS they call this the "events/trials" format. This would > make my dataset much smaller and presumably speed things up. >I think you have described the data for forming a 2 by 2 by K table of counts. In which case, loglin(), loglm(), mantelhaen.test(), and - if K is not too large - glm(... , family=poisson) would be suitable. But you say 'models' above suggesting that there are some other variables. If so, you need to be a bit more specific in describing your setup.> So my question is: can I use clogit (or possibly another function) to > perform a conditional logistic regression when the data is in this > "events/trials" format? I am using R version 2.5.0. > > Thank you very much, > Matt Strickland > Birth Defects Branch > U.S. Centers for Disease Control > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901