A colleague is receiving some data from another person. That person reads the data in SAS and it takes 30s and uses 64k RAM. That person then tries to read the data in R and it takes 10 minutes and uses a gigabyte of RAM. Person then goes on to say: It's not that I think SAS is such great software, it's not. But I really hate badly designed software. R is designed by committee. Worse, it's designed by a committee of statisticians. They tend to confuse numerical analysis with computer science and don't have any idea about software development at all. The result is R. I do hope [your colleague] won't have to waste time doing [this analysis] in an outdated and poorly designed piece of software like R. Would any of the "committee" like to respond to this? Or shall we just slap our collective forehead and wonder how someone could get such a view? Barry
My reaction, as a "mere" individual user: Of course, one cannot have any idea what's really going on, so a rational reply to the rant is impossible. But, as this list repeatedly demonstrates (and as we all have probably experienced), it is possible to do things foolishly in any software. Worth noting: John Chambers, the designer of the S language (of which R is an implementation) won an ACM computing award (readers -- please correct details of this citation) for his achievement; so apparently the professional computing community disagreed with the sentiments expressed in the rant. Cheers, -- Bert Gunter Non-Clinical Biostatistics Genentech MS: 240B Phone: 650-467-7374 "The business of the statistician is to catalyze the scientific learning process." -- George E.P. Box Barry Rowlingson wrote:> A colleague is receiving some data from another person. That person > reads the data in SAS and it takes 30s and uses 64k RAM. That person > then tries to read the data in R and it takes 10 minutes and uses a > gigabyte of RAM. Person then goes on to say: > > It's not that I think SAS is such great software, > it's not. But I really hate badly designed > software. R is designed by committee. Worse, > it's designed by a committee of statisticians. > They tend to confuse numerical analysis with > computer science and don't have any idea about > software development at all. The result is R. > > I do hope [your colleague] won't have to waste time doing > [this analysis] in an outdated and poorly designed piece > of software like R. > > Would any of the "committee" like to respond to this? Or shall we just > slap our collective forehead and wonder how someone could get such a view? > > Barry > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
I'm not too concerned about your colleague's view about R. S/He doesn' have to like it, and I don't think anyone actually believes that R is designed to make *everyone* happy. For me, R does about 99% of the things I need to do, but sadly, when I need to order a pizza, I still have to pick up the telephone. What worries me more is that your colleague seems to have lost sight of the fact that just about all software development involves tradeoffs. Although I've never used SAS, I've used other stat packages and it's clear that all of them (including R) have traded in some things to get out other things. An example is R's potentially large memory usage, which, one might argue, trades in analyses of very large datasets but gets out a very powerful and elegant programming language. Rather than use absolutes, I'd encourage your colleague to be more specific. Rather than and say things like "R is poorly designed" I'd like to hear "R is poorly designed for [fill in the blank]". Then we can get a better handle on the world in which s/he lives. -roger Barry Rowlingson wrote:> A colleague is receiving some data from another person. That person > reads the data in SAS and it takes 30s and uses 64k RAM. That person > then tries to read the data in R and it takes 10 minutes and uses a > gigabyte of RAM. Person then goes on to say: > > It's not that I think SAS is such great software, > it's not. But I really hate badly designed > software. R is designed by committee. Worse, > it's designed by a committee of statisticians. > They tend to confuse numerical analysis with > computer science and don't have any idea about > software development at all. The result is R. > > I do hope [your colleague] won't have to waste time doing > [this analysis] in an outdated and poorly designed piece > of software like R. > > Would any of the "committee" like to respond to this? Or shall we just > slap our collective forehead and wonder how someone could get such a view? > > > > Barry > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >-- Roger D. Peng http://www.biostat.jhsph.edu/~rpeng/
> From: Barry Rowlingson > > A colleague is receiving some data from another person. That person > reads the data in SAS and it takes 30s and uses 64k RAM. That person > then tries to read the data in R and it takes 10 minutes and uses a > gigabyte of RAM. Person then goes on to say: > > It's not that I think SAS is such great software, > it's not. But I really hate badly designed > software. R is designed by committee. Worse, > it's designed by a committee of statisticians. > They tend to confuse numerical analysis with > computer science and don't have any idea about > software development at all. The result is R. > > I do hope [your colleague] won't have to waste time doing > [this analysis] in an outdated and poorly designed piece > of software like R. > > Would any of the "committee" like to respond to this? Or > shall we just > slap our collective forehead and wonder how someone could get > such a view? > > BarryMy $0.02: R, being a flexible programming language, has an amazing ability to cope with people's laziness/ignorance/inelegance, but it comes at a (sometimes hefty) price. While there is no specifics on the situation leading to the person's comments, here's one (not as extreme) example that I happen to come across today:> system.time(spam <- read.table("data_dmc2003_train.txt",+ header=T, + colClasses=c(rep("numeric", 833), + "character"))) [1] 15.92 0.09 16.80 NA NA> system.time(spam <- read.table("data_dmc2003_train.txt", header=T))[1] 187.29 0.60 200.19 NA NA My SAS ability is rather serverely limited, but AFAIK, one needs to specify _all_ variables to be read into a dataset in order to read in the data in SAS. If one has that information, R can be very efficient as well. Without that information, one gets nothing in SAS, or just let R does the hard work. Best, Andy
Barry Rowlingson <B.Rowlingson <at> lancaster.ac.uk> writes: : A colleague is receiving some data from another person. That person : reads the data in SAS and it takes 30s and uses 64k RAM. That person : then tries to read the data in R and it takes 10 minutes and uses a : gigabyte of RAM. Person then goes on to say: : : It's not that I think SAS is such great software, : it's not. But I really hate badly designed : software. R is designed by committee. Worse, : it's designed by a committee of statisticians. : They tend to confuse numerical analysis with : computer science and don't have any idea about : software development at all. The result is R. : : I do hope [your colleague] won't have to waste time doing : [this analysis] in an outdated and poorly designed piece : of software like R. : : Would any of the "committee" like to respond to this? Or shall we just : slap our collective forehead and wonder how someone could get such a view? Does he have to repeatedly read in different large datasets or is this just a one time requirement? In the latter case, he could read in the data, save it (using the save command), and then just load it (using the load command) in subsequent sessions. He would only have to wait 10 minutes the first time. If he has that much data its probably a large project and a one time hit of 10 minutes versus several days, weeks or months of work seems negligible.
Barry Rowlingson <B.Rowlingson at lancaster.ac.uk> writes:> It's not that I think SAS is such great software, > it's not. But I really hate badly designed > software. R is designed by committee. Worse, > it's designed by a committee of statisticians. > They tend to confuse numerical analysis with > computer science and don't have any idea about > software development at all. The result is R.They'd probably prefer computer scientists and numerical analysts who confuse data munging with statistical data analysis, a common problem in mixed departments... best, -tony -- rossini at u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}
Hi, I wonder, why SAS should be better in time for reading a data in the system. I have an example, that shows that R is (sometimes?, always?) faster. ----------------- Data with 14432 observations and 120 variables. Time for reading the data: SAS 8e: data testt; set l1.lse01;run; real time 1.46 seconds cpu time 0.18 seconds R 1.9.0: system.time(read.table("lse01.txt",header=T")) [1] 0.63 0.06 6.22 NA NA ---------------- And this is 2.5 times faster as SAS. (SAS reads the .sas7bdat and R the .txt file) I??m working with SAS (I should working with SAS) and R (I'm going to work with R) on the same Computer. In my examples about time series and in something simple but also time consuming procedures like summaries,... R is always 2 times faster and sometimes 30 times faster (with the same results). I think R is a great software and you can do more things as in SAS. Some new developments in SAS 9, like COM-server to Excel, some new procedures, better graphs, ... is developed and implemented in R for many years ago. Thanks to the R Development Team!!! Matthias> -----Urspr??ngliche Nachricht----- > Von: Liaw, Andy [mailto:andy_liaw at merck.com] > Gesendet: Dienstag, 29. Juni 2004 20:21 > An: 'Barry Rowlingson'; R-help > Betreff: RE: [R] anti-R vitriol > > > > From: Barry Rowlingson > > > > A colleague is receiving some data from another person. That person > > reads the data in SAS and it takes 30s and uses 64k RAM. > That person > > then tries to read the data in R and it takes 10 minutes and uses a > > gigabyte of RAM. Person then goes on to say: > > > > It's not that I think SAS is such great software, > > it's not. But I really hate badly designed > > software. R is designed by committee. Worse, > > it's designed by a committee of statisticians. > > They tend to confuse numerical analysis with > > computer science and don't have any idea about > > software development at all. The result is R. > > > > I do hope [your colleague] won't have to waste time doing > > [this analysis] in an outdated and poorly designed piece > > of software like R. > > > > Would any of the "committee" like to respond to this? Or > > shall we just > > slap our collective forehead and wonder how someone could get > > such a view? > > > > Barry > > > My $0.02: > > R, being a flexible programming language, has an amazing > ability to cope with people's laziness/ignorance/inelegance, > but it comes at a (sometimes > hefty) price. While there is no specifics on the situation > leading to the person's comments, here's one (not as extreme) > example that I happen to come across today: > > > system.time(spam <- read.table("data_dmc2003_train.txt", > + header=T, > + colClasses=c(rep("numeric", 833), > + "character"))) > [1] 15.92 0.09 16.80 NA NA > > system.time(spam <- read.table("data_dmc2003_train.txt", header=T)) > [1] 187.29 0.60 200.19 NA NA > > My SAS ability is rather serverely limited, but AFAIK, one > needs to specify _all_ variables to be read into a dataset in > order to read in the data in SAS. If one has that > information, R can be very efficient as well. Without that > information, one gets nothing in SAS, or just let R does the > hard work. > > Best, > Andy > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo> /r-help > PLEASE > do read the posting guide! > http://www.R-project.org/posting-guide.html >
I am curious. What were the dimensions of this data set? Did this person know use read.table(), or scan(). Did they know about the possibility of reading the data one part at a time? The way that SAS processes the data row by row limits what can be done. It is often possible with scant loss of information, and more satisfactory, to work with a subset of the large data set or with multiple subsets. Neither SAS (in my somewhat dated experience of it) nor R is entirely satisfactory for this purpose. But at least in R, given a subset that fits so easily into memory that the graphs are not masses of black, there are few logistic problems in doing, rapidly and interactively, a variety of manipulations and plots, with each new task taking advantage of the learning that has gone before. To do that well in the SAS world, it is necessary to use something like JMP or its equivalent in one of the newer modules, which process data in a way that is not all that different from R. I have wondered about possibilities for a suite of functions that would make it easy to process through R data that is stored in one large data set, with a mix of adding a new variable or variables, repeating a calculation on successive subsets of the data, producing predictions or suchlike for separate subsets, etc. Database connections may be the way to go (c.f., the Ripley and Fei Chen paper at ISI 2003), but it might also be useful to have a simple set of functions that would handle some standard requirements. John Maindonald. On 30 Jun 2004, at 8:02 PM, Barry Rowlingson <B.Rowlingson at lancaster.ac.uk> wrote:> A colleague is receiving some data from another person. That person > reads the data in SAS and it takes 30s and uses 64k RAM. That person > then tries to read the data in R and it takes 10 minutes and uses a > gigabyte of RAM. Person then goes on to say: > > It's not that I think SAS is such great software, > it's not. But I really hate badly designed > software. R is designed by committee. Worse, > it's designed by a committee of statisticians. > They tend to confuse numerical analysis with > computer science and don't have any idea about > software development at all. The result is R. > > I do hope [your colleague] won't have to waste time doing > [this analysis] in an outdated and poorly designed piece > of software like R. > > Would any of the "committee" like to respond to this? Or shall we just > slap our collective forehead and wonder how someone could get such a > view? >John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200.