Hi, people. Allow me to chat a tiny bit on two vectorisation-related
matters, in the context of R. I'm curious about if the following ideas
have ever been considered, and rejected already.
First is about using the so-called Duff's device for partially unrolling
loops. I did not overly check in R sources, and am not familiar with
them anyway, but the only usage I saw is within
"src/gnuwin32/malloc.c".
Maybe it could be put to good usage in "src/main/arithmetic.c" and
elsewhere. Second is about what is called "chaining" on some vector
computers, in which one vector operation uses, as an operand, the result
of another vector operation, even before that result is sent for
register or memory storage; R could use this technique for sparing
memory, when it "knows" that the result is going to be discarded
anyway.
I used and abused Duff's device a good while ago, when I was working
in computer graphics; it was routinely used to speed up image-wide
operations. With a few properly devised C pre-processor macros, it was
made easy to use (I thrown mine away a few years ago, recognizing I lost
interest in low-level coding matters, the macros could easily be
rethought anyway). Questions existed at the time about unrolled loops
fitting or not within specialised fetch-next-instruction caches of some
CPUs, but nowadays, memory caches are much bigger then they used to be,
I have the prejudice it is just not a problem anymore. Maybe more of
a concern might be the conditionals implementing vector recycling
(already hidden in macros), as they may disrupt the speed of merely
falling through linear code. One might probably do without jumps using
clever masking operations, yet I wonder how far we would safely
benchmark at configuration time to decide best code to generate, and how
good C would be to write masked conditionals. I'm not familiar enough
with modern CPUs to judge if this really needs to be addressed or not.
I would not doubt that hardware chaining is worth all the efforts the
engineers put so the hardware recognises and activates it on the fly.
Vectorised chaining implemented in software as a way to spare memory,
may be much of a challenge, as it requires sort of half-compilation.
One one hand, it might alleviate memory problems which are often the
subject of discussions on R-help; through thrashing, going over real
memory and into paging may considerably slow down an R application. On
the other hand, unless very carefully implemented, chaining overhead
might slow down all non-thrashing applications, which is most of them.
Nevertheless, being softer on memory requirements is already a concern
in R, I vaguely remember having read that R "tries to prove" that
a vector being modified will not needed anymore in its original form,
and when the proof succeeds, the original vector gets modified without
prior copying. Chaining, despite difficult to implement, might be
a significant further step, and so, be worth a discussion.
--
Fran?ois Pinard http://pinard.progiciels-bpi.ca