On Sat, 1 Oct 2005, Jim Hurd wrote:>
> Which provides data in DTA (STATA), XPT (SAS), and POR (SPSS) formats all
> of which I have tried to read with the foreign package but I am not able to
> load any of them. I have 2 gb of RAM, but R crashes when the memory gets
> just over 1 GB. I am using Windows version 2.1.1. The size of the DTA file
> is 48 MB; the xpt file is 188 MB.
>
If you mean the NCS 1 data file from that link (da06694-0001.dta) then I
don't have this problem.
I have been able to load in the .dta file under Windows on a computer with
1Gb of RAM. The maximum memory use was about 350Mb. It was very slow --
about half an hour. This is because the processing of missing values and
of factor levels is very inefficient in read.dta when dealing with very
wide data frames. It makes calls to [.data.frame, [<-.data.frame, etc, for
each column and so the time is probably quadratic in the number of
columns.
The call to .External that does the actual reading took less than 1% of
the time. If you only want a hundred or so of the 3000 variables it may be
worth just using that .External() call to read the data, then subset it
and then work out how to apply the factor levels and so on.
read.dta clearly needs a different algorithm to handle very wide data sets
efficiently.
-thomas