Hello, I know that the topics of using large datasets in R vs. SAS, using PostGreSQL vs MySQL, and using databases with R have been discussed extensively on this list and elsewhere. However I hope that I have a slightly new combination of the questions here. I am doing my PhD research on a large dataset and trying to decide whether to use PostgreSQL or MySQL with R, or simply use SAS, to which I also have access. My databasing experience is currently limited to reading a few chapters of a textbook, but I have some general programming experience in C, C++, and Perl. I am leaning towards PGSQL at the moment because it seems to have more core functions (i.e. less writing for me to do) but on the other hand, our sysadmin already has MySQL installed, and I hear that it's faster. Of PostGreSQL vs. MySQL, which has the more mature interface with R? Are there any issues with RMySQL or Rdbi.PGSQL (or .MySQL) that I should be aware of and should they influence my decision for MySQL vs. PGSQL vs. the SAS integrated database? My dataset is about 26G, currently split up into files of 260 MB... about 540,000 records with 40 "explanatory" variables, many of which are probably redundant but I just don't know at the moment. It was way too slow to work with in R using Red Hat Linux machines with 500MB-1G RAM, especially when producing plots. Preprocessing using Perl scripts every time I wanted to look at a different subset of the data became too tedious. I hope to create exploratory graphics such as sunflower plots, and also try some lattices to help me get a feel for the data. Then I'm interested in trying some stepwise ANOVA, and finally searching for patterns using discriminant analysis, and/or classification trees. I would greatly appreciate any advice you might have on choosing a databasing software environment. Thank you, Jean Chung