On Thu, Mar 8, 2012 at 2:06 PM, Justin Talbot <jtalbot at cs.stanford.edu>
wrote:> I've been working on an R performance academic project for the last
> couple years which has involved writing an interpreter for R from
> scratch and a JIT for R vector operations.
>
> With the recent comments on Julia, I thought I'd share some thoughts
> from my experience since they differ substantially from the common
> speculation on R performance.
>
> I went into the project thinking that R would be slow for the commonly
> cited reasons: NAs, call-by-value, immutable values, ability to
> dynamically add/remove variables from environments, etc. But this is
> largely *not* true. It does require being somewhat clever, but most of
> the cost of these features can be either eliminated or moved to
> uncommon cases that won't affect most code. And there's plenty of
room
> for innovation here. The history of Javascript runtimes over the last
> decade has shown that dramatic performance improvements are possible
> even for difficult languages.
>
> This is good news. I think we can keep essentially everything that
> people like about R and still achieve great performance.
>
> So why is R performance poor now? I think the fundamental reason is
> related to software engineering: R is nearly impossible to experiment
> with, so no one tries out new performance techniques on it. There are
> two main issues here:
>
> 1) The R Language Definition doesn't get enough love. I could point
> out plenty of specific problems, omissions, etc., but I think the
> high-level problem is that the Language Definition currently conflates
> three things: 1) the actual language definition, 2) the definition of
> what is more properly the standard library, and 3) the implementation.
> This conflation hides how simple the R/S language actually is and, by
> assuming that the current implementation is the only implementation,
> obscures performance improvements that could be made by changing the
> implementation.
>
> 2) The R core implementation (e.g. everything in src/main) is too big.
> There are ~900 functions listed in names.c. This has got to be simply
> unmanageable. If one were to change the SEXP representation, how many
> internal functions would have to be checked and updated? This is a
> severe hinderance on improving performance.
>
> I see little value is debating changes to the language semantics until
> we've addressed this low hanging fruit and at least tried to make the
> current R/S semantics run fast.
Isn't R much like Lisp under the covers? Afterall, it evolved from Scheme.
Hasn't there been a great deal of work done on optimizing Lisp over the
last 30 years? This suggests that instead of dropping the R/S semantics
and moving to another language like Julia, the proposals of Ross Ihaka
and Duncan Temple Lang could be followed to provide the familiar
R/S syntax on top of an optimized Lisp engine.
One could view the R language as "syntactic sugar" for Lisp and focus
on optimizing the Lisp engine, in the same way that functional languages
are viewed as syntactic sugar for the lambda calculus.
Another possibility is to implement R/S on top of an optimized virtual
machine like the JVM, LLVM, etc.
Of course, no matter what strategy is followed a foreign function
interface will be very important to leverage the existing base of
C/C++/Fortran numerical and graphics libs.
Dominick
> Justin
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel