On Wed, Feb 18, 2009 at 9:21 PM, dobomode <dobomode@gmail.com> wrote:
> I am trying to import a large dataset from SPSS into R. The SPSS file
> is in .SAV format and is about 1GB in size. I use read.spss to import
> the file and get an error saying that I have run out of memory. I am
> on a MAC OS X 10.5 system with 4GB of RAM. Monitoring the R process
> tells me that R runs out of memory when reaching about 3GB of RAM so I
> suppose the remaining 1GB is used up by the OS.
>
(An obviously late and thus unhelpful answer but maybe someone else has a
similar problem)
I managed to read in a >300 MB file using the following simple function:
read.big.spss.file<- function(file){
# require (foreign)
.Call("do_read_SPSS", file)
}
The result is a list with some attributes (not a data frame).
The idea is that in read.spss, a lot will happen after the data are actually
read in. Some difficulties may be introduced at the step when the list is
converted to a data frame, and/or when the value labels are attached to the
values etc. Once you have the list (but of course, I can't guarantee it will
work with a 1GB file), you can manipulate the data (e.g., keeping only a few
variables or aggregating some of the cases) before doing further statistics.
Maybe it would make sense to add this option to read.spss (i.e. with an
extra argument set TRUE, it would just return whatever it gets from
do_read_SPSS).
Best regards,
Kenn
>
> Why would a 1GB SPSS file take up more than 3GB of memory in R? Is it
> perhaps because R is converting each SPSS column to a less memory-
> efficient data type? In general, what is the best strategy to load
> large datasets in R?
>
> Thanks!
>
> P.S.
>
> I exported the SPSS .SAV file to .CSV and tried importing the comma
> delimited file. Same results – the import was much slower but
> eventually I ran out of memory again...
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]