Hi all, Reading the wikipedia page on R, I stumbled across the following: http://fluff.info/blog/arch/00000172.htm It does seem interesting that the C execution is that much slower from R than from a native C program. Could any of the more technically knowledgeable people explain why this is so? The author also have some thought-provoking opinions on R being no-good and that you should write everything in C instead (mainly because R is slow and too good at graphics, encouraging data snooping). See http://fluff.info/blog/arch/00000041.htm While I don't agree (granted, I can't really write C), it was interesting to read something from a very different perspective than I'm used to. Best regards, Gustaf _____ Department of Epidemiology, Swedish Institute for Infectious Disease Control work email: gustaf.rydevik at smi dot ki dot se skype:gustaf_rydevik
fisher.test seems to use the .C calling convention in a couple of
different places.
for example:
tmp <- .C("fisher_sim", as.integer(nr), as.integer(nc), 
                as.integer(sr), as.integer(sc), as.integer(n), 
                as.integer(B), integer(nr * nc), double(n + 1), 
                integer(nc), results = double(B), PACKAGE
"stats")$results
perhaps some R experts on the list can tell us whether there is
significant overhead to .C vs .Call.
Does .C really duplicate its arguments?  What does RObjToCPtr do?
(line 1682.. in dotcode.c)
    /* Convert the arguments for use in foreign */
    /* function calls.  Note that we copy twice */
    /* once here, on the way into the call, and */
    /* once below on the way out. */
    cargs = (void**)R_alloc(nargs, sizeof(void*));
    nargs = 0;
    for(pargs = args ; pargs != R_NilValue; pargs = CDR(pargs)) {
#ifdef THROW_REGISTRATION_TYPE_ERROR
        if(checkTypes &&
           !comparePrimitiveTypes(checkTypes[nargs], CAR(pargs), dup)) {
            /* We can loop over all the arguments and report all the
               erroneous ones, but then we would also want to avoid
               the conversions.  Also, in the future, we may just
               attempt to coerce the value to the appropriate
               type. This is why we pass the checkTypes[nargs] value
               to RObjToCPtr(). We just have to sort out the ability
               to return the correct value which is complicated by
               dup, etc. */
            errorcall(call, _("Wrong type for argument %d in call to
%s"),
                      nargs+1, symName);
        }
#endif
        cargs[nargs] = RObjToCPtr(CAR(pargs), naok, dup, nargs + 1,
                                  which, symName, argConverters + nargs,
                                  checkTypes ? checkTypes[nargs] : 0,
                                  encname);
#ifdef R_MEMORY_PROFILING
        if (TRACE(CAR(pargs)) && dup)
                memtrace_report(CAR(pargs), cargs[nargs]);
#endif
        nargs++;
    }
Thanks,
Whit
> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Gustaf Rydevik
> Sent: Wednesday, January 09, 2008 10:25 AM
> To: r-help at r-project.org
> Subject: [R] An "R is slow"-article
> 
> Hi all,
> 
> Reading the wikipedia page on R, I stumbled across the following:
> http://fluff.info/blog/arch/00000172.htm
> 
> It does seem interesting that the C execution is that much 
> slower from R than from a native C program. Could any of the 
> more technically knowledgeable people explain why this is so?
> 
> The author also have some thought-provoking opinions on R 
> being no-good and that you should write everything in C 
> instead (mainly because R is slow and too good at graphics, 
> encouraging data snooping). See  
> http://fluff.info/blog/arch/00000041.htm
>  While I don't agree (granted, I can't really write C), it 
> was interesting to read something from a very different 
> perspective than I'm used to.
> 
> Best regards,
> 
> Gustaf
> 
> _____
> Department of Epidemiology,
> Swedish Institute for Infectious Disease Control work email: 
> gustaf.rydevik at smi dot ki dot se skype:gustaf_rydevik
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
This e-mail message is intended only for the named recipient(s) above. It may
contain confidential information. If you are not the intended recipient you are
hereby notified that any dissemination, distribution or copying of this e-mail
and any attachment(s) is strictly prohibited. If you have received this e-mail
in error, please immediately notify the sender by replying to this e-mail and
delete the message and any attachment(s) from your system. Thank you.
Gustaf Rydevik wrote:> Hi all, > > Reading the wikipedia page on R, I stumbled across the following: > http://fluff.info/blog/arch/00000172.htm > > It does seem interesting that the C execution is that much slower from > R than from a native C program. Could any of the more technically > knowledgeable people explain why this is so?I don't think it is. He's comparing some C code with calling fisher.test() from R, which he claims does 'nothing but call C code over and over'. Wrong. It checks its arguments in R, it checks for multiple arguments, it does all sorts of goodness before finally calling .C("fexact"). And then it does even more things. Confidence intervals, odds ratios, p-values and so on. He needs to re-run his tests but instead of calling fisher.test() he should prepare the data and call .C("fexact",...) directly.> The author also have some thought-provoking opinions on R being > no-good and that you should write everything in C instead (mainly > because R is slow and too good at graphics, encouraging data > snooping). See http://fluff.info/blog/arch/00000041.htmAnd of course C is good at buffer overflows and memory leaks and spending ages compiling when you really just want to do fisher.test(foo) and have done with it. He says: "I used to have a simulation written in R calling compiled C that took overnight to process 100 agents, but now that it's all in C simulations with 9,000 agents run in forty minutes. Don't risk it--learn to do statistical computing in C today!". Fine, but I imagine his R code was created much quicker than the C code. R is quicker to write, and once you have established that your code is running too slow for you, then you optimise. By that point you've hopefully debugged your algorithm and spotted all the nasty traps that would have tied you up in the C debugger for a week. You then rewrite in pure C for speed, and you of course have a set of test cases generated from R to verify your C is doing the same as your R. Win win. He claims to be an economist but clearly doesn't recognise the economy of rapid development... Barry
On 1/9/2008 10:25 AM, Gustaf Rydevik wrote:> Hi all, > > Reading the wikipedia page on R, I stumbled across the following: > http://fluff.info/blog/arch/00000172.htm > > It does seem interesting that the C execution is that much slower from > R than from a native C program. Could any of the more technically > knowledgeable people explain why this is so?That conclusion isn't supported by his test. The main source of the difference is interpreting the loop: test_ct <- 10000 x <- matrix(c(300, 860, 240, 380), nrow=2) for (i in 1:test_ct) {fisher.test(x)} If he wanted to show that R makes C go slower, he should have put together an example that spent most of its time in C, without returning to R 10000 times. For example, make the entries in that table 1000 times larger, and do the test just once: > x <- matrix(c(300000, 860000, 240000, 380000), nrow=2) > fisher.test(x) This takes about 20 seconds on my PC, and I'd guess it would take about the same amount of time in this author's pure C implementation. My own experience is that R is about 100 times slower than pure C, and usually it doesn't matter. In cases where it does, I'll move the calculations into C. If I followed Blair's advice and did everything in C, then development would take much longer, the code would be much buggier (even his example has bugs, and he admits it!!) and all those cases where R is fast enough would just never get done. Duncan Murdoch> > The author also have some thought-provoking opinions on R being > no-good and that you should write everything in C instead (mainly > because R is slow and too good at graphics, encouraging data > snooping). See http://fluff.info/blog/arch/00000041.htm > While I don't agree (granted, I can't really write C), it was > interesting to read something from a very different perspective than > I'm used to. > > Best regards, > > Gustaf > > _____ > Department of Epidemiology, > Swedish Institute for Infectious Disease Control > work email: gustaf.rydevik at smi dot ki dot se > skype:gustaf_rydevik > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hello Gustaf, List. Thanks Gustaf for your post! well I am working pretty intensively with fisher.test() right now, as some of you will know. The comparison is not fair: R's fisher.test() does a whole bunch of error checking and testing for the size of the input matrix and assessing of other arguments, and puts together a nice little list of class "htest". The C routine does none of this. The clincher is that fisher.test() as called gives an estimate for the odds ratio using uniroot() to numerically solve an equation in terms of the hypergeometric probability distribution. This takes a looooonnnngggg time, but one doesn't notice it in a standard R session. Sorry, but the time comparison is simply not worth reporting. On 9 Jan 2008, at 15:25, Gustaf Rydevik wrote:> Hi all, > > Reading the wikipedia page on R, I stumbled across the following: > http://fluff.info/blog/arch/00000172.htm > > It does seem interesting that the C execution is that much slower from > R than from a native C program. Could any of the more technically > knowledgeable people explain why this is so? > > The author also have some thought-provoking opinions on R being > no-good and that you should write everything in C instead (mainly > because R is slow and too good at graphics, encouraging data > snooping). See http://fluff.info/blog/arch/00000041.htm > While I don't agree (granted, I can't really write C), it was > interesting to read something from a very different perspective than > I'm used to. > > Best regards, > > Gustaf > > _____ > Department of Epidemiology, > Swedish Institute for Infectious Disease Control > work email: gustaf.rydevik at smi dot ki dot se > skype:gustaf_rydevik > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Robin Hankin Uncertainty Analyst and Neutral Theorist, National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743
Gustaf Rydevik wrote:> Hi all, > > Reading the wikipedia page on R, I stumbled across the following: > http://fluff.info/blog/arch/00000172.htm > > It does seem interesting that the C execution is that much slower from > R than from a native C program. Could any of the more technically > knowledgeable people explain why this is so? > >Well, if you are obsessed with speed, R can be the wrong tool. This is an ingrained aspect of the language itself; if you are interested, consult some of Luke Tierney's writings about the difficulties of writing an R compiler. To some extent, it is a tradeoff for flexibility and expressiveness. The example is somewhat misleading. The C execution time is probably the same, but it is drowned out by the administrative overhead of fisher.test (a 2x2 Fisher test is really not a very complex operation when cell counts are in the hundreds.)> The author also have some thought-provoking opinions on R being > no-good and that you should write everything in C instead (mainly > because R is slow and too good at graphics, encouraging data > snooping). See http://fluff.info/blog/arch/00000041.htm > While I don't agree (granted, I can't really write C), it was > interesting to read something from a very different perspective than > I'm used to. >The idea that you really shouldn't look at data before testing statistical hypotheses is not without merit, but taken to the extreme, it tends to become ridiculous. You end up in a situation where you either can't do anything or you don't know what you are doing. It is related to the discussions about randomized trials versus observational studies. The former are in many ways stronger, but sometimes unavailable, and they tend to be using a very big hammer to whack in a single nail. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Gustaf Rydevik wrote:> Hi all, > > Reading the wikipedia page on R, I stumbled across the following: > http://fluff.info/blog/arch/00000172.htm >There are certainly situations where one would want to consider faster solutions than interpreted languages but, having been through these arguments a few times over the years, here are a few things you might consider: 1/ How much is your time worth, how much does the computer time cost, and how much does a faster computer cost when you start writing your code? 2/ How much is your time worth, how much does the computer time cost, and how much does a faster computer cost when you finish writing your code? 3/ If you tweak the code, or use someone else's private tweaks, how much do you trust the results relative to more widely used and tested versions? 4/ You should do speed comparisons with something resembling your real problem. 5/ If you want to make R look really bad use a loop that gobbles lots of memory, so your machine starts to swap. (This is my guess of part of the problem with the "script".) 6/ If you want your code to be really fast, don't do any error checking. (This also avoids the enormous amount of time you waste when you find errors.)> It does seem interesting that the C execution is that much slower from > R than from a native C program. Could any of the more technically > knowledgeable people explain why this is so? > > The author also have some thought-provoking opinions on R being > no-good and that you should write everything in CPeople used to say assembler, that's progress. Paul Gilbert instead (mainly> because R is slow and too good at graphics, encouraging data > snooping). See http://fluff.info/blog/arch/00000041.htm > While I don't agree (granted, I can't really write C), it was > interesting to read something from a very different perspective than > I'm used to. > > Best regards, > > Gustaf > > _____ > Department of Epidemiology, > Swedish Institute for Infectious Disease Control > work email: gustaf.rydevik at smi dot ki dot se > skype:gustaf_rydevik > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.=================================================================================== La version fran?aise suit le texte anglais. ------------------------------------------------------------------------------------ This email may contain privileged and/or confidential in...{{dropped:26}}
Gustaf Rydevik wrote:> Hi all, > > Reading the wikipedia page on R, I stumbled across the following: > http://fluff.info/blog/arch/00000172.htm > > It does seem interesting that the C execution is that much slower from > R than from a native C program. Could any of the more technically > knowledgeable people explain why this is so? > > The author also have some thought-provoking opinions on R being > no-good and that you should write everything in C instead (mainly > because R is slow and too good at graphics, encouraging data > snooping). See http://fluff.info/blog/arch/00000041.htm > While I don't agree (granted, I can't really write C), it was > interesting to read something from a very different perspective than > I'm used to.The important aspect of R is not that it is less fast for a particular kind of operation than a dedicated program written in a compiled language like C, Pascal, or Fortran for a particular kind of analysis. That is not really surprising, and not relevant for anything but the most extreme situations given the speed (and low price) of modern computers. What is really relevant is (a) the context of any operation, R is a well documented language where a very large number number of operations may be combined in an extremely large number of ways where the probability of errors is very low, and (b) all aspects of the language is peer reviewed. Both points are extremely important in any research context, where everything, including the software used in computations, should be possible to document. These qualities are difficult to achieve in homebrewed programs. Therefore one should not resort to programming anything on your own unless the operations you need are definitely not present in the language you are using. Apart from that, you have to think about cost in respect to the time and resources used to develop your own substitutes for something that already exists. He also says that R encourages "fishing trips" in the data. Well, that may be somewhat true for R as well as any of the major statistical packages. But that is a problem that really is in a different domain, one of attitudes on how to do research in the first place. Tom> > Best regards, > > Gustaf > > _____ > Department of Epidemiology, > Swedish Institute for Infectious Disease Control > work email: gustaf.rydevik at smi dot ki dot se > skype:gustaf_rydevik > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- +----------------------------------------------------------------+ | Tom Backer Johnsen, Psychometrics Unit, Faculty of Psychology | | University of Bergen, Christies gt. 12, N-5015 Bergen, NORWAY | | Tel : +47-5558-9185 Fax : +47-5558-9879 | | Email : backer at psych.uib.no URL : http://www.galton.uib.no/ | +----------------------------------------------------------------+