Hi there! I am considering to port a SAS application to R and I would like to hear your opinion if you think this is possible and worthwhile. SAS is mainly used to do data management and then to do some aggregations and simple computations on the data and to output a modified data set. The main problem I see is the size of the data file. As I have no access to SAS yet I cannot give real details but the SAS data file is about 7 gigabytes large. (It's only the basic SAS system without any additional modules) What do you think, would a port to R be possible with reasonable effort? Is R able to handle that size of data? Or is R prepared to work together with some database system? Thanks for your thoughts! Best regards, Werner --------------------------------- [[alternative HTML version deleted]]
Please, read the R Data Import/Export manual provided with any version of R, and come back with more specific questions. In general, R cannot deal with datasets as large as those handled by SAS. But this is true only when you use standard R functions, like read.table(), which are not written to save memory and load very large datasets (other aspects are optimized). I would advise to put your data in a database and then access to it piece-by-piece using SQL queries. There are very little cases where you actually need the whole dataset in memory at once. A simple database system, if you just need to access those data (no complex database operations required) is SQLite. There is an R package to connect to such a database without extra software needed. Thus, very convenient. Best, Philippe Grosjean ..............................................<?}))><........ ) ) ) ) ) ( ( ( ( ( Prof. Philippe Grosjean ) ) ) ) ) ( ( ( ( ( Numerical Ecology of Aquatic Systems ) ) ) ) ) Mons-Hainaut University, Belgium ( ( ( ( ( .............................................................. Werner Wernersen wrote:> Hi there! > > I am considering to port a SAS application to R and I would like to hear your opinion if you think this is possible and worthwhile. SAS is mainly used to do data management and then to do some aggregations and simple computations on the data and to output a modified data set. The main problem I see is the size of the data file. As I have no access to SAS yet I cannot give real details but the SAS data file is about 7 gigabytes large. (It's only the basic SAS system without any additional modules) > > What do you think, would a port to R be possible with reasonable effort? Is R able to handle that size of data? Or is R prepared to work together with some database system? > > Thanks for your thoughts! > > Best regards, > Werner > > > --------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > >
R supports a number of databases and if you only need to work with a small amount of data at once it should be readily do-able; however, R keeps objects in memory and if you need large amounts at once then you could run into problems. Note that S-Plus keeps objects on disk and has other features aimed at large data and might be an alternative if R cannot handle the size and you want something based on the S language. Since SAS was developed many years ago when optimizing computer resources was more important than it is now it might be difficult to find an alternative that matches it for performance with large data sets. You probably want to quickly develop the core of your app in such a way that it has the main performance characteristics of the full app so you can get an idea of whether it will work prior to spending the time on the full code. Also note that R typically processes matrices faster than data frames and, in general, how you write your application may affect its performance. On 4/21/06, Werner Wernersen <pensterfuzzer at yahoo.de> wrote:> Hi there! > > I am considering to port a SAS application to R and I would like to hear your opinion if you think this is possible and worthwhile. SAS is mainly used to do data management and then to do some aggregations and simple computations on the data and to output a modified data set. The main problem I see is the size of the data file. As I have no access to SAS yet I cannot give real details but the SAS data file is about 7 gigabytes large. (It's only the basic SAS system without any additional modules) > > What do you think, would a port to R be possible with reasonable effort? Is R able to handle that size of data? Or is R prepared to work together with some database system? > > Thanks for your thoughts! > > Best regards, > Werner > > > --------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
Forget about R for now and port the application to MySQL/PostgreSQL etc, it is possible and worthwhile. In case you happen to use (and really need) some SAS DATA STEP looping features you might be forced to look into SQL cursors, otherwise the port should be (very) straightforward.> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Werner > Wernersen > Sent: Friday, April 21, 2006 7:09 AM > To: r-help at stat.math.ethz.ch > Subject: [R] Considering port of SAS application to R > > Hi there! > > I am considering to port a SAS application to R and I would > like to hear your opinion if you think this is possible and > worthwhile. SAS is mainly used to do data management and then > to do some aggregations and simple computations on the data > and to output a modified data set. The main problem I see is > the size of the data file. As I have no access to SAS yet I > cannot give real details but the SAS data file is about 7 > gigabytes large. (It's only the basic SAS system without any > additional modules) > > What do you think, would a port to R be possible with > reasonable effort? Is R able to handle that size of data? Or > is R prepared to work together with some database system? > > Thanks for your thoughts! > > Best regards, > Werner > > > --------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
Now it sounds like a consense that it is not advisable to do everything in R but to use a database system and scripting language instead for the rough work. Thanks for all of your suggestions! Werner Steve Miller <steve.miller@jhu.edu> schrieb: Good suggestion. Multiple gigabytes is stretching it with R. Use PostgreSQL Python, and Python DBI database connectivity to replace your SAS data step, then use the RODBC package to import data into R "convenience stores" as appropriate. Steve Miller -----Original Message----- From: r-help-bounces@stat.math.ethz.ch [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of bogdan romocea Sent: Friday, April 21, 2006 7:59 AM To: pensterfuzzer@yahoo.de Cc: r-help Subject: Re: [R] Considering port of SAS application to R Forget about R for now and port the application to MySQL/PostgreSQL etc, it is possible and worthwhile. In case you happen to use (and really need) some SAS DATA STEP looping features you might be forced to look into SQL cursors, otherwise the port should be (very) straightforward.> -----Original Message----- > From: r-help-bounces@stat.math.ethz.ch > [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Werner > Wernersen > Sent: Friday, April 21, 2006 7:09 AM > To: r-help@stat.math.ethz.ch > Subject: [R] Considering port of SAS application to R > > Hi there! > > I am considering to port a SAS application to R and I would > like to hear your opinion if you think this is possible and > worthwhile. SAS is mainly used to do data management and then > to do some aggregations and simple computations on the data > and to output a modified data set. The main problem I see is > the size of the data file. As I have no access to SAS yet I > cannot give real details but the SAS data file is about 7 > gigabytes large. (It's only the basic SAS system without any > additional modules) > > What do you think, would a port to R be possible with > reasonable effort? Is R able to handle that size of data? Or > is R prepared to work together with some database system? > > Thanks for your thoughts! > > Best regards, > Werner > > > --------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html --------------------------------- [[alternative HTML version deleted]]