Hi all, I have to run a logit regresion over a large dataset and I am not sure about the best option to do it. The dataset is about 200000x2000 and R runs out of memory when creating it. After going over help archives and the mailing lists, I think there are two main options, though I am not sure about which one will be better. Of course, any alternative will be welcome as well. Actually, I am not quite sure about whether any of these options will work but, before getting into it, I would like to get some advice. -A first option is to use the package ff, that allows to work with the dataset without loading it into the RAM. This, combined with the bigglm function should do the job. -The dataset contains a lot of sparse variables, so I was wondering whether creating the model matrix as a sparse matrix might deliver good results. In this case, I am not sure about the capabilities of glm or some extension of it to deal with sparse matrices (I could not find any documentation about this). If possible, this second option seems more efficient since R might be capable of using the fact that matrices are sparse to speed up the computations. Thanks in advance. All the best! Julio.