Dear all, One of my students (whom I am trying to convince to use R) wants to get a fairly large SAS dataset into R (about 150mB). An obvious and simple thing she tried was to write the dataset as a .csv-file and then read that into R, but that takes forever (or something close to that..). The dataset is so large, that exporting it as an Excel file from SAS is not feasible (more than 65000 lines). I am reluctant to ask her to go through all the data base steps (then she'll just stick to SAS...). Can anyone help me out on that one? Thanks in advance Søren Højsgaard [[alternative HTML version deleted]]
S??ren H??jsgaard wrote:> Dear all, > One of my students (whom I am trying to convince to use R) wants to get a fairly large SAS dataset into R (about 150mB). An obvious and simple thing she tried was to write the dataset as a .csv-file and then read that into R, but that takes forever (or something close to that..). The dataset is so large, that exporting it as an Excel file from SAS is not feasible (more than 65000 lines). I am reluctant to ask her to go through all the data base steps (then she'll just stick to SAS...). Can anyone help me out on that one? > Thanks in advance > S??ren H??jsgaard > >See ?read.ssd in package:foreign or ?sas.get in package:Hmisc. Both require you have SAS installed and in your PATH. --sundar
> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch [mailto:r-help- > bounces at stat.math.ethz.ch] On Behalf Of S??ren H??jsgaard > Sent: Friday, August 27, 2004 11:46 AM > To: r-help at stat.math.ethz.ch > Cc: S??ren H??jsgaard > Subject: [R] Reading SAS data into R > > Dear all, > One of my students (whom I am trying to convince to use R) wants to get a > fairly large SAS dataset into R (about 150mB). An obvious and simple thing > she tried was to write the dataset as a .csv-file and then read that into > R, but that takes forever (or something close to that..). The dataset is > so large, that exporting it as an Excel file from SAS is not feasible > (more than 65000 lines). I am reluctant to ask her to go through all the > data base steps (then she'll just stick to SAS...). Can anyone help me out > on that one?What platform are you on, and how much memory do you have? 150mb isn't *that* large - but it will depend on your system. See the FAQs regarding memory issues, as well as ?mem.limits and ?gc It also depends on the type of data you're working with (integers take half the space of numerics) and what type of analysis you want to do. The CSV approach should work fine - but you'll want to use scan instead of read.table. You can use scan to read data in chunks (using skip and nlines), do something useful with this chunk of data, run gc(), and then read in another chunk. In general, users I've seen who try to go from SAS to R don't seem to realize that R is not a data manipulation language, and hence just try to shove their entire dataset into R and manipulate it (which is what they would do in SAS). Perl, awk, cut, etc. (not to mention DBMSes) are all very useful for processing data before putting it into R.> Thanks in advance > S??ren H??jsgaard > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting- > guide.html
------- Begin quoted text From: "Gilpin, Scott" <sgilpin at doubleclick.net> >-----Original Message----- >> From: r-help-bounces at stat.math.ethz.ch [mailto:r-help- >> bounces at stat.math.ethz.ch] On Behalf Of S??ren H??jsgaard >> Sent: Friday, August 27, 2004 11:46 AM >> To: r-help at stat.math.ethz.ch >> Cc: S??ren H??jsgaard >> Subject: [R] Reading SAS data into R >> >> Dear all, >> One of my students (whom I am trying to convince to use R) wants to get a >> fairly large SAS dataset into R (about 150mB). An obvious and simple thing >> she tried was to write the dataset as a .csv-file and then read that into >> R, but that takes forever (or something close to that..). The dataset is >> so large, that exporting it as an Excel file from SAS is not feasible >> (more than 65000 lines). I am reluctant to ask her to go through all the >> data base steps (then she'll just stick to SAS...). Can anyone help me out >> on that one? What platform are you on, and how much memory do you have? 150mb isn't *that* large - but it will depend on your system. See the FAQs regarding memory issues, as well as ?mem.limits and ?gc It also depends on the type of data you're working with (integers take half the space of numerics) and what type of analysis you want to do. The CSV approach should work fine - but you'll want to use scan instead of read.table. You can use scan to read data in chunks (using skip and nlines), do something useful with this chunk of data, run gc(), and then read in another chunk. In general, users I've seen who try to go from SAS to R don't seem to realize that R is not a data manipulation language, and hence just try to shove their entire dataset into R and manipulate it (which is what they would do in SAS). Perl, awk, cut, etc. (not to mention DBMSes) are all very useful for processing data before putting it into R. -----------End quoted text R is also a data manipulation language and once you get used to it it is better than SAS at data manipulation, for non-huge datasets. We have many examples of data manipulation with R on our web site http://biostat.mc.vanderbilt.edu (see especially the Alzola and Harrell text). What we do for importing SAS datasets is described at http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/SASexportHowto . This will preserve labels and value labels, handle dates, times, date/times, and get around problems we've faced in importing SAS V5 transport files using the foreign package, by having SAS run PROC EXPORT to create csv files. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University