Daniel Nordlund
2010-Dec-17 06:15 UTC
[R] excessive (?) memory utilization by package foreign when reading SAS xport file
Someone recently asked about reading the BRFSS data into R. The answer was as
simple as using the foreign package to read the SAS xport dataset. I have been
asked to assist someone in using R and the survey package to analyze the BRFSS
survey. At work, I have a 64-bit system running Windows 7 Enterprise edition,
with 12GB of ram, and have 64-bit R-2.12.0 installed from CRAN. The xport file
is about 830 MB. I executed the following code to read the file
brfss09 <-
read.xport("C:/Users/djnordlund/Documents/R-examples/BRFSS/cdbrfs09.xpt")
The file was read in and I was able to begin playing with it. I then tried to
read the file on my home system which is a 64-bit system running Windows 7
Professional edition with 8GB of ram and I also have 64-bit R-2.12.0 installed
from CRAN. I tried to read in the data using the same syntax as above, and
after a couple of minutes I received an error message that a 3.3 MB vector could
not be allocated and that I had used up all available memory. When I ran gc(),
it reported that the maximum amount of memory that had been used was over 7GB.
I looked at my work computer which was successful, and it had used 9.5 GB of ram
when reading the BRFSS xport file.
Is it expected that read.xport() requires much more memory to read a file than
is required to store it in memory? If necessary, I can install a database as a
back-end and read in pieces for analysis, but I guess I was surprised by the
memory requirement of using read.xport(). Or am I doing something wrong.
Thanks,
Dan
Daniel Nordlund
Bothell, WA USA