thr3ads.net - R help - [R] An "R is slow"-article [Jan 2008]

If this information is useful, please help other people find it:
Share via:

Gustaf Rydevik

2008-Jan-09 15:25 UTC

[R] An "R is slow"-article

Hi all,

Reading the wikipedia page on R, I stumbled across the following:
http://fluff.info/blog/arch/00000172.htm

It does seem interesting that the C execution is that much slower from
R than from a native C program. Could any of the more technically
knowledgeable people explain why this is so?

The author also have some thought-provoking opinions on R being
no-good and that you should write everything in C instead (mainly
because R is slow and too good at graphics, encouraging data
snooping). See  http://fluff.info/blog/arch/00000041.htm
 While I don't agree (granted, I can't really write C), it was
interesting to read something from a very different perspective than
I'm used to.

Best regards,

Gustaf

_____
Department of Epidemiology,
Swedish Institute for Infectious Disease Control
work email: gustaf.rydevik at smi dot ki dot se
skype:gustaf_rydevik

Armstrong, Whit

2008-Jan-09 15:49 UTC

head link

[R] An "R is slow"-article

fisher.test seems to use the .C calling convention in a couple of
different places.

for example:

tmp <- .C("fisher_sim", as.integer(nr), as.integer(nc), 
                as.integer(sr), as.integer(sc), as.integer(n), 
                as.integer(B), integer(nr * nc), double(n + 1), 
                integer(nc), results = double(B), PACKAGE
"stats")$results


perhaps some R experts on the list can tell us whether there is
significant overhead to .C vs .Call.

Does .C really duplicate its arguments?  What does RObjToCPtr do?


(line 1682.. in dotcode.c)

    /* Convert the arguments for use in foreign */
    /* function calls.  Note that we copy twice */
    /* once here, on the way into the call, and */
    /* once below on the way out. */
    cargs = (void**)R_alloc(nargs, sizeof(void*));
    nargs = 0;
    for(pargs = args ; pargs != R_NilValue; pargs = CDR(pargs)) {
#ifdef THROW_REGISTRATION_TYPE_ERROR
        if(checkTypes &&
           !comparePrimitiveTypes(checkTypes[nargs], CAR(pargs), dup)) {
            /* We can loop over all the arguments and report all the

               erroneous ones, but then we would also want to avoid

               the conversions.  Also, in the future, we may just

               attempt to coerce the value to the appropriate

               type. This is why we pass the checkTypes[nargs] value

               to RObjToCPtr(). We just have to sort out the ability

               to return the correct value which is complicated by

               dup, etc. */
            errorcall(call, _("Wrong type for argument %d in call to
%s"),
                      nargs+1, symName);
        }
#endif
        cargs[nargs] = RObjToCPtr(CAR(pargs), naok, dup, nargs + 1,
                                  which, symName, argConverters + nargs,
                                  checkTypes ? checkTypes[nargs] : 0,
                                  encname);
#ifdef R_MEMORY_PROFILING
        if (TRACE(CAR(pargs)) && dup)
                memtrace_report(CAR(pargs), cargs[nargs]);
#endif
        nargs++;
    }

Thanks,
Whit

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Gustaf Rydevik
> Sent: Wednesday, January 09, 2008 10:25 AM
> To: r-help at r-project.org
> Subject: [R] An "R is slow"-article
> 
> Hi all,
> 
> Reading the wikipedia page on R, I stumbled across the following:
> http://fluff.info/blog/arch/00000172.htm
> 
> It does seem interesting that the C execution is that much 
> slower from R than from a native C program. Could any of the 
> more technically knowledgeable people explain why this is so?
> 
> The author also have some thought-provoking opinions on R 
> being no-good and that you should write everything in C 
> instead (mainly because R is slow and too good at graphics, 
> encouraging data snooping). See  
> http://fluff.info/blog/arch/00000041.htm
>  While I don't agree (granted, I can't really write C), it 
> was interesting to read something from a very different 
> perspective than I'm used to.
> 
> Best regards,
> 
> Gustaf
> 
> _____
> Department of Epidemiology,
> Swedish Institute for Infectious Disease Control work email: 
> gustaf.rydevik at smi dot ki dot se skype:gustaf_rydevik
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



This e-mail message is intended only for the named recipient(s) above. It may
contain confidential information. If you are not the intended recipient you are
hereby notified that any dissemination, distribution or copying of this e-mail
and any attachment(s) is strictly prohibited. If you have received this e-mail
in error, please immediately notify the sender by replying to this e-mail and
delete the message and any attachment(s) from your system. Thank you.

Barry Rowlingson

2008-Jan-09 15:56 UTC

head link

[R] An "R is slow"-article

Gustaf Rydevik wrote:> Hi all,
> 
> Reading the wikipedia page on R, I stumbled across the following:
> http://fluff.info/blog/arch/00000172.htm
> 
> It does seem interesting that the C execution is that much slower from
> R than from a native C program. Could any of the more technically
> knowledgeable people explain why this is so?
  I don't think it is. He's comparing some C code with calling 
fisher.test() from R, which he claims does 'nothing but call C code over 
and over'. Wrong. It checks its arguments in R, it checks for multiple 
arguments, it does all sorts of goodness before finally calling 
.C("fexact"). And then it does even more things. Confidence intervals,
odds ratios, p-values and so on.

  He needs to re-run his tests but instead of calling fisher.test() he 
should prepare the data and call .C("fexact",...) directly.
> The author also have some thought-provoking opinions on R being
> no-good and that you should write everything in C instead (mainly
> because R is slow and too good at graphics, encouraging data
> snooping). See  http://fluff.info/blog/arch/00000041.htm
  And of course C is good at buffer overflows and memory leaks and 
spending ages compiling when you really just want to do fisher.test(foo) 
and have done with it.

  He says: "I used to have a simulation written in R calling compiled C 
that took overnight to process 100 agents, but now that it's all in C 
simulations with 9,000 agents run in forty minutes. Don't risk it--learn 
to do statistical computing in C today!". Fine, but I imagine his R code 
was created much quicker than the C code. R is quicker to write, and 
once you have established that your code is running too slow for you, 
then you optimise. By that point you've hopefully debugged your 
algorithm and spotted all the nasty traps that would have tied you up in 
the C debugger for a week. You then rewrite in pure C for speed, and you 
of course have a set of test cases generated from R to verify your C is 
doing the same as your R. Win win.

  He claims to be an economist but clearly doesn't recognise the economy 
of rapid development...

Barry

Duncan Murdoch

2008-Jan-09 15:57 UTC

head link

[R] An "R is slow"-article

On 1/9/2008 10:25 AM, Gustaf Rydevik wrote:> Hi all,
> 
> Reading the wikipedia page on R, I stumbled across the following:
> http://fluff.info/blog/arch/00000172.htm
> 
> It does seem interesting that the C execution is that much slower from
> R than from a native C program. Could any of the more technically
> knowledgeable people explain why this is so?
That conclusion isn't supported by his test. The main source of the 
difference is interpreting the loop:

test_ct <- 10000
x       <- matrix(c(300, 860, 240, 380), nrow=2)
for (i in 1:test_ct)
     {fisher.test(x)}

If he wanted to show that R makes C go slower, he should have put 
together an example that spent most of its time in C, without returning 
to R 10000 times.  For example, make the entries in that table 1000 
times larger, and do the test just once:

 > x       <- matrix(c(300000, 860000, 240000, 380000), nrow=2)
 > fisher.test(x)

This takes about 20 seconds on my PC, and I'd guess it would take about 
the same amount of time in this author's pure C implementation.

My own experience is that R is about 100 times slower than pure C, and 
usually it doesn't matter.  In cases where it does, I'll  move the 
calculations into C.

If I followed Blair's advice and did everything in C, then development 
would take much longer, the code would be much buggier (even his example 
has bugs, and he admits it!!) and all those cases where R is fast enough 
would just never get done.

Duncan Murdoch
> 
> The author also have some thought-provoking opinions on R being
> no-good and that you should write everything in C instead (mainly
> because R is slow and too good at graphics, encouraging data
> snooping). See  http://fluff.info/blog/arch/00000041.htm
>  While I don't agree (granted, I can't really write C), it was
> interesting to read something from a very different perspective than
> I'm used to.
> 
> Best regards,
> 
> Gustaf
> 
> _____
> Department of Epidemiology,
> Swedish Institute for Infectious Disease Control
> work email: gustaf.rydevik at smi dot ki dot se
> skype:gustaf_rydevik
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Robin Hankin

2008-Jan-09 16:03 UTC

head link

[R] An "R is slow"-article

Hello Gustaf, List.

Thanks Gustaf for your post!

well I am working pretty intensively with fisher.test() right now, as
some of you will know.

The comparison is not fair:  R's fisher.test() does a whole
bunch of error checking and testing for the size of the
input matrix and assessing of other arguments, and
puts together a nice little list of class "htest".

The C routine does none of this.

The clincher is that fisher.test() as called gives an estimate
for the odds ratio using uniroot() to numerically solve an
equation in terms of the hypergeometric probability
distribution.  This takes a looooonnnngggg time, but
one doesn't notice it in a standard R session.

Sorry, but the time comparison is simply not worth reporting.

On 9 Jan 2008, at 15:25, Gustaf Rydevik wrote:
> Hi all,
>
> Reading the wikipedia page on R, I stumbled across the following:
> http://fluff.info/blog/arch/00000172.htm
>
> It does seem interesting that the C execution is that much slower from
> R than from a native C program. Could any of the more technically
> knowledgeable people explain why this is so?
>
> The author also have some thought-provoking opinions on R being
> no-good and that you should write everything in C instead (mainly
> because R is slow and too good at graphics, encouraging data
> snooping). See  http://fluff.info/blog/arch/00000041.htm
> While I don't agree (granted, I can't really write C), it was
> interesting to read something from a very different perspective than
> I'm used to.
>
> Best regards,
>
> Gustaf
>
> _____
> Department of Epidemiology,
> Swedish Institute for Infectious Disease Control
> work email: gustaf.rydevik at smi dot ki dot se
> skype:gustaf_rydevik
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Robin Hankin
Uncertainty Analyst and Neutral Theorist,
National Oceanography Centre, Southampton
European Way, Southampton SO14 3ZH, UK
  tel  023-8059-7743

Peter Dalgaard

2008-Jan-09 16:14 UTC

head link

[R] An "R is slow"-article

Gustaf Rydevik wrote:> Hi all,
>
> Reading the wikipedia page on R, I stumbled across the following:
> http://fluff.info/blog/arch/00000172.htm
>
> It does seem interesting that the C execution is that much slower from
> R than from a native C program. Could any of the more technically
> knowledgeable people explain why this is so?
>
>   Well, if you are obsessed with speed, R can be the wrong tool. This is
an ingrained aspect of the language itself; if you are interested,
consult some of Luke Tierney's writings about the difficulties of
writing an R compiler. To some extent, it is a tradeoff for flexibility
and expressiveness.

The example is somewhat misleading. The C execution time is probably the
same, but it is drowned out by the administrative overhead of
fisher.test (a 2x2 Fisher test is really not a very complex operation
when cell counts are in the hundreds.)
> The author also have some thought-provoking opinions on R being
> no-good and that you should write everything in C instead (mainly
> because R is slow and too good at graphics, encouraging data
> snooping). See  http://fluff.info/blog/arch/00000041.htm
>  While I don't agree (granted, I can't really write C), it was
> interesting to read something from a very different perspective than
> I'm used to.
>   The idea that you really shouldn't look at data before testing
statistical hypotheses is not without merit, but taken to the extreme,
it tends to become ridiculous. You end up in a situation where you
either can't do anything or you don't know what you are doing. It is
related to the discussions about randomized trials versus observational
studies. The former are in many ways stronger, but sometimes
unavailable, and they tend to be using a very big hammer to whack in a
single nail.


-- 
   O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

Paul Gilbert

2008-Jan-09 16:20 UTC

head link

[R] An "R is slow"-article

Gustaf Rydevik wrote:> Hi all,
> 
> Reading the wikipedia page on R, I stumbled across the following:
> http://fluff.info/blog/arch/00000172.htm
> There are certainly situations where one would want to consider faster 
solutions than interpreted languages but, having been through these 
arguments a few times over the years, here are a few things you might 
consider:

1/ How much is your time worth, how much does the computer time cost, 
and how much does a faster computer cost when you start writing your code?

2/ How much is your time worth, how much does the computer time cost, 
and how much does a faster computer cost when you finish writing your code?

3/ If you tweak the code, or use someone else's private tweaks, how much 
do you trust the results relative to more widely used and tested versions?

4/ You should do speed comparisons with something resembling your real 
problem.

5/ If you want to make R look really bad use a loop that gobbles lots of 
memory, so your machine starts to swap. (This is my guess of part of the 
problem with the "script".)

6/ If you want your code to be really fast, don't do any error checking. 
(This also avoids the enormous amount of time you waste when you find 
errors.)
> It does seem interesting that the C execution is that much slower from
> R than from a native C program. Could any of the more technically
> knowledgeable people explain why this is so?
> 
> The author also have some thought-provoking opinions on R being
> no-good and that you should write everything in C 
People used to say assembler, that's progress.

Paul Gilbert

instead (mainly> because R is slow and too good at graphics, encouraging data
> snooping). See  http://fluff.info/blog/arch/00000041.htm
>  While I don't agree (granted, I can't really write C), it was
> interesting to read something from a very different perspective than
> I'm used to.
> 
> Best regards,
> 
> Gustaf
> 
> _____
> Department of Epidemiology,
> Swedish Institute for Infectious Disease Control
> work email: gustaf.rydevik at smi dot ki dot se
> skype:gustaf_rydevik
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.===================================================================================
La version fran?aise suit le texte anglais.

------------------------------------------------------------------------------------

This email may contain privileged and/or confidential in...{{dropped:26}}

Tom Backer Johnsen

2008-Jan-10 15:38 UTC

head link

[R] An "R is slow"-article

Gustaf Rydevik wrote:> Hi all,
> 
> Reading the wikipedia page on R, I stumbled across the following:
> http://fluff.info/blog/arch/00000172.htm
> 
> It does seem interesting that the C execution is that much slower from
> R than from a native C program. Could any of the more technically
> knowledgeable people explain why this is so?
> 
> The author also have some thought-provoking opinions on R being
> no-good and that you should write everything in C instead (mainly
> because R is slow and too good at graphics, encouraging data
> snooping). See  http://fluff.info/blog/arch/00000041.htm
>  While I don't agree (granted, I can't really write C), it was
> interesting to read something from a very different perspective than
> I'm used to.
The important aspect of R is not that it is less fast for a particular
kind of operation than a dedicated  program written in a compiled
language like C, Pascal, or Fortran for a particular kind of analysis.
  That is not really surprising, and not relevant for anything but the
most extreme situations given the speed (and low price) of modern
computers.

What is really relevant is (a) the context of any operation, R is a
well documented language where a very large number number of
operations may be combined in an extremely large number of ways where
the probability of errors is very low, and (b) all aspects of the
language is peer reviewed.

Both points are extremely important in any research context, where
everything, including the software used in computations, should be
possible to document.  These qualities are difficult to achieve in
homebrewed programs.  Therefore one should not resort to programming
anything on your own unless the operations you need are definitely not
present in the language you are using.  Apart from that, you have to
think about cost in respect to the time and resources used to develop
your own substitutes for something that already exists.

He also says that R encourages "fishing trips" in the data.  Well,
that may be somewhat true for R as well as any of the major
statistical packages.  But that is a problem that really is in a
different domain, one of attitudes on how to do research in the first
place.

Tom> 
> Best regards,
> 
> Gustaf
> 
> _____
> Department of Epidemiology,
> Swedish Institute for Infectious Disease Control
> work email: gustaf.rydevik at smi dot ki dot se
> skype:gustaf_rydevik
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
+----------------------------------------------------------------+
| Tom Backer Johnsen, Psychometrics Unit,  Faculty of Psychology |
| University of Bergen, Christies gt. 12, N-5015 Bergen,  NORWAY |
| Tel : +47-5558-9185                        Fax : +47-5558-9879 |
| Email : backer at psych.uib.no    URL : http://www.galton.uib.no/ |
+----------------------------------------------------------------+

Reasonably Related Threads

Search for more maybe matching threads

R help - Jan 2008 - An "R is slow"-article

[R] An "R is slow"-article

[R] An "R is slow"-article

[R] An "R is slow"-article

[R] An "R is slow"-article

[R] An "R is slow"-article

[R] An "R is slow"-article

[R] An "R is slow"-article

[R] An "R is slow"-article

Reasonably Related Threads