JWilliamson at lecg.com
2008-Jan-19 04:10 UTC
[R] MySQL + R as a Replacement for SAS Proc SQL + Various Stat Procs
For cost reasons, I'd like to replace SAS on my PC under Win XP Pro. Nearly all my work involves medium-size datasets (100k-10M) records which I cleanup, relate, fliter and get into shape for analysis using SAS/SQL followed by standard statistical procedures, e.g. regression using SAS proc reg. It seems to me that this type of analysis could be done in MySQL followed by R, but I'd like some advice about the best way to pass datasets from MySQL to R. I understand there are various connectivity packages avaialble, RODBC and RMySQL, but I'd appreciate some advice about where to dig in first. I'd like to avoid additional syntax in my SQL code -- so it seems better not to coat SQL queries in R wrappers and pass them to MySQL -- probably better for my way of working to finish all the SQL work and pass a clean table ready to analyze to R. One of the great advantages for me using SAS is that I can beat the data into shape using proc SQL and then call proc STAT_OF_THE_DAY all in the same batch file with no plumbing, data conversion or additional machination. But if I could do this in MySQL + R, well in a few years I'd have the down payment for an Audi TT instead of having given it to SAS Institute. If someone found a practical way to make this work -- pls let me know. And thanks in advance. Jack Williamson LECG 2049 Century Park East, Suite 2300 Los Angeles CA 90067 323-683-5004 jwilliamson at lecg.com
Prof Brian Ripley
2008-Jan-19 07:12 UTC
[R] MySQL + R as a Replacement for SAS Proc SQL + Various Stat Procs
On Fri, 18 Jan 2008, JWilliamson at lecg.com wrote:> For cost reasons, I'd like to replace SAS on my PC under Win XP Pro. > > Nearly all my work involves medium-size datasets (100k-10M) records which I > cleanup, relate, fliter and get into shape for analysis using SAS/SQL > followed by standard statistical procedures, e.g. regression using SAS proc > reg. > > It seems to me that this type of analysis could be done in MySQL followed > by R, but I'd like some advice about the best way to pass datasets from > MySQL to R. I understand there are various connectivity packages > avaialble, RODBC and RMySQL, but I'd appreciate some advice about where to > dig in first. > > I'd like to avoid additional syntax in my SQL code -- so it seems better > not to coat SQL queries in R wrappers and pass them to MySQL -- probably > better for my way of working to finish all the SQL work and pass a clean > table ready to analyze to R.The way these work (using RODBC is an example) is - If necessary, send the data to MySQL via sqlSave(). - Use sqlQuery() to send SQL statements verbatim to the RDBMS (here MySQL) - Retrieve a table via sqlFetch(). - Do the analysis on the fetched table. If the table is very large, you can fetch in junks and use the facilities in the 'biglm' package to do a regression a block of data at a time. However, I am not sure of the value of using more than 10,000 cases in a regression, as well before that non-sampling errors will dominate the error distribution: e.g. the systematic error from model misfit may be larger than the nominal standard errors. I can see why experienced SAS users like to use it for data cleanup, but it seem generally true that the user is a more important variable than the tool: people work best with the tools they understand best (and personal preference comes into it).> One of the great advantages for me using SAS is that I can beat the data > into shape using proc SQL and then call proc STAT_OF_THE_DAY all in the > same batch file with no plumbing, data conversion or additional > machination.The way I am sketching above is using R as the scripting language. It's a pretty powerful one, certainly powerful enuough to do the text processing needed to prepare SQL queries.> But if I could do this in MySQL + R, well in a few years I'd have the down > payment for an Audi TT instead of having given it to SAS Institute. > > If someone found a practical way to make this work -- pls let me know. And > thanks in advance. > > Jack Williamson > LECG > 2049 Century Park East, Suite 2300 > Los Angeles CA 90067 > 323-683-5004 > jwilliamson at lecg.com > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, stats.ox.ac.uk/~ripley University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Wensui Liu
2008-Jan-19 07:35 UTC
[R] MySQL + R as a Replacement for SAS Proc SQL + Various Stat Procs
Hi, there, it seems size of your data is very manageable in R. i've been using proc sql in SAS a lot and found sqldf package is very similar to proc sql. check it out. one more thought. instead of using R as scripting lang. to process data, it might be worthy to take a look at python. On Jan 18, 2008 11:10 PM, <JWilliamson at lecg.com> wrote:> > For cost reasons, I'd like to replace SAS on my PC under Win XP Pro. > > Nearly all my work involves medium-size datasets (100k-10M) records which I > cleanup, relate, fliter and get into shape for analysis using SAS/SQL > followed by standard statistical procedures, e.g. regression using SAS proc > reg. > > It seems to me that this type of analysis could be done in MySQL followed > by R, but I'd like some advice about the best way to pass datasets from > MySQL to R. I understand there are various connectivity packages > avaialble, RODBC and RMySQL, but I'd appreciate some advice about where to > dig in first. > > I'd like to avoid additional syntax in my SQL code -- so it seems better > not to coat SQL queries in R wrappers and pass them to MySQL -- probably > better for my way of working to finish all the SQL work and pass a clean > table ready to analyze to R. > > One of the great advantages for me using SAS is that I can beat the data > into shape using proc SQL and then call proc STAT_OF_THE_DAY all in the > same batch file with no plumbing, data conversion or additional > machination. > > But if I could do this in MySQL + R, well in a few years I'd have the down > payment for an Audi TT instead of having given it to SAS Institute. > > If someone found a practical way to make this work -- pls let me know. And > thanks in advance. > > Jack Williamson > LECG > 2049 Century Park East, Suite 2300 > Los Angeles CA 90067 > 323-683-5004 > jwilliamson at lecg.com > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- ==============================WenSui Liu Statistical Project Manager ChoicePoint Precision Marketing (spaces.msn.com/statcompute/blog)