saurav pathak
2009-Jun-30 15:29 UTC
[R] Stata file and R Interaction :File Size Problem in Import
Hi I am using Stata 10 and I need to import a data set in stata 10 to R, I have saved the dataset in lower versions of Stata as well by using saveold command in Stata. My RAM is 4gb and the stata file is 600MB, I am getting an error message which says : "Error: cannot allocate vector of size 3.4 Mb In addition: There were 50 or more warnings (use warnings() to see the first 50)" Thus far I have already tried the following 1. By right clicking on the R icon I have used --max-mem-size=1000M in the "target" under "properties of the R icon 2. I have used library(foreign) at teh command prompt 3. then I use trialfile <- read.dta("C:/filename.dta") Here I get error for a Stata data file that is 600MB in size, however, with data set in Stata 10 and Stata 8 of the size of 200KB, I have successfully being able to import the stata file in R I am therefor confused whteher there is problem with the version of my stata file (which should not eb the case as I the smaller file of both versions are working fine) or is it the size issue, Its pretty important for me, kindly address this question Thanks Saurav -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.pathak08@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]]
Thomas Lumley
2009-Jun-30 16:09 UTC
[R] Stata file and R Interaction :File Size Problem in Import
This is at least the fourth time you have asked this question, which is at least two more than the maximum excusable number of times. The error message says that your computer doesn't have enough memory to load this data set. This pretty clearly suggests that the size of the file is the problem. This isn't something we can fix. If you need the whole file for your analysis you are probably out of luck -- since Stata binary files use smaller data types than R uses in memory, a 600Mb file is quite likely over 1Gb in memory, and this isn't going to work on a 32 bit system [it looks as though you are using Windows, though you don't actually *say*]. The fact that you told R to use at most 1Gb of memory with --max-mem-size=1000M would have pretty much guaranteed that it would fail, but I think it is likely to be impossible even if you allow R to use all your available memory. If you don't need the whole file at once, saving it as a text file will allow you to read just parts of the file. Alternatively, if I recall correctly, two-stage least squares just involves two ordinary least squares fits, so you could use the biglm package to fit each of these least squares fits to a data set that was too big to fit in memory. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle