Hi, R core developers,
I work in the Swiss Institute of Bioinformatics. We have two clusters of
Intel Itanium2 clusters for bioinformaticians to crank their data. One
piece of software they use heavily is R and BioConductors. I ported the R
codes and R packages to this platform already, and am working on
optimizing their performance. I'm using Intel C/C++ compiler on this
platform running Linux. One of my findings is that turning some functions
in R to "inline" functions boost performance significantly.
While R follows strict C89 standard right now, there're quite some good
reasons to relax the rule somewhat. From my experience in software
development in industry, I understand very well both the portability issue
and backward compatability issue, I also see the hidden cost of holding
back for too long and not fully achieving the potential of new technology,
I recommend that we allow "inline" functions in R's C codes.
The following explains why I recommend the above.
In modern processor microarchitecture, pipelining is a major approach to
achieve higher clock speed. Super-pipelining involves pipelining the
microarchitecture to finer granularities. With far more instructions
in-flight in a super-pipelined microarchitecture, handling of events that
disrupt the pipeling, such as cache mises, interrupts and branch
misprediction, can be costly.
A case in point is the Intel Itanium architecture, EPIC (explicitly
parallel intruction computing). EPIC enables programmer or compiler to
indicate the inherent parallelism of programs *explicitly* in the
instruction sequence. The main features to improve performance are:
application registers, predication, branching and register rotation. The
implication is the cost of disrupting the pipeline is magnified greatly on
this architecture.
In R, there are quite some simple functions that are called extremely
often, such as "R_IsNaNorNA", "R_finite", etc. They are
used in heavy
loops quite a lot. They disrupt the pipelining, and negatively affect the
performance of the software. For instance, on IA64, system call of
"isnan" cost 4 cycles, while a wrapper like "R_IsNaNorNA"
could cost
several times more.
One way to reduce this kind of disruption in C/C++ is to "inline" a
function, i.e., to integrate it into the code for its callers, eliminating
the function-call overhead.
The benefits from inlining comes especially with very short functions.
On unix and linux, we could find inline functions in standard .h files in
/usr/include or /usr/local/include.
C++ supports "inline" functions from beginning, while
"inline" keyword was
introduced in C99 standard in 1999. A feature that has been in standard
for so many years is considered very mature in computer industry. Many C++
compilers actually translate C++ codes to C codes, so it's quite natural
for corresponding C compilers to support inline functions. The compilers
could choose to generate function calls or to inline the functions, so
this feature poses little risk to the application.
The default compilers that R uses, gcc/g++, support it at least since
version 2.95 in Jul 1999. GCC User's manual states that it
"works" only
in optimizing compilation for "gcc/g++".
Since R calls for C, C++, FORTRAN compilers, it's no surprise to expect
that "inline" functions are allowed. This will not only improve the
performance of R on modern processors with little effort, but also
encourage people to develop and use R packages on more challenging
problems.
In configure-step, R checks for many OS/compiler-related issues, this
could be just one more check. I expect that the initial use of inline
functions are mainly for small but heavily used functions, so the impact
of such change could be managed.
The attachments are from GCC User's Manual and C99 rationale, regarding
"inline" functions.
Thanks for considering this issue.
Li Long
-------------- next part -------------->From C99 Rationale
=================
6.4.1 Keywords
Several keywords were added in C89: const, enum, signed, void and volatile. New
in
C9X are the keywords inline, restrict, _Bool, _Complex and _Imaginary.
Where possible, however, new features have been added by overloading existing
keywords, as, for
example, long double instead of extended. It is recognized that each added
keyword will
require some existing code that used it as an identifier to be rewritten. No
meaningful programs are
known to be quietly changed by adding the new keywords.
6.7.4 Function specifiers
A new feature of C99: The inline keyword, adapted from C++, is a
function-specifier that
can be used only in function declarations. It is useful for program
optimizations that require the
definition of a function to be visible at the site of a call. (Note that the
Standard does not
attempt to specify the nature of these optimizations.)
Visibility is assured if the function has internal linkage, or if it has
external linkage and the call
is in the same translation unit as the external definition. In these cases, the
presence of the
inline keyword in a declaration or definition of the function has no effect
beyond indicating a
preference that calls of that function should be optimized in preference to
calls of other
functions declared without the inline keyword.
Visibility is a problem for a call of a function with external linkage where the
call is in a
different translation unit from the function's definition. In this case, the
inline keyword
allows the translation unit containing the call to also contain a local, or
inline, definition of the
function.
A program can contain a translation unit with an external definition, a
translation unit with an
inline definition, and a translation unit with a declaration but no definition
for a function. Calls
in the latter translation unit will use the external definition as usual.
An inline definition of a function is considered to be a different definition
than the external
definition. If a call to some function func with external linkage occurs where
an inline
definition is visible, the behavior is the same as if the call were made to
another function, say
__func, with internal linkage. A conforming program must not depend on which
function is
called. This is the inline model in the Standard.
A conforming program must not rely on the implementation using the inline
definition, nor may
it rely on the implementation using the external definition. The address of a
function is always
the address corresponding to the external definition, but when this address is
used to call the
function, the inline definition might be used. Therefore, the following example
might not
behave as expected.
inline const char *saddr(void)
{ static const char name[] = "saddr";
return name;
}
int compare_name(void)
{ return saddr() == saddr(); // unspecified behavior
}
Since the implementation might use the inline definition for one of the calls to
saddr and use
the external definition for the other, the equality operation is not guaranteed
to evaluate to 1
(true). This shows that static objects defined within the inline definition are
distinct from their
corresponding object in the external definition. This motivated the constraint
against even
defining a non-const object of this type.
Inlining was added to the Standard in such a way that it can be implemented with
existing linker
technology, and a subset of C99 inlining is compatible with C++. This was
achieved by
requiring that exactly one translation unit containing the definition of an
inline function be
specified as the one that provides the external definition for the function.
Because that
specification consists simply of a declaration that either lacks the inline
keyword, or contains
both inline and extern, it will also be accepted by a C++ translator.
Inlining in C99 does extend the C++ specification in two ways. First, if a
function is declared
inline in one translation unit, it need not be declared inline in every other
translation unit.
This allows, for example, a library function that is to be inlined within the
library but available
only through an external definition elsewhere. The alternative of using a
wrapper function for
the external function requires an additional name; and it may also adversely
impact performance
if a translator does not actually do inline substitution.
Second, the requirement that all definitions of an inline function be
"exactly the same" is
replaced by the requirement that the behavior of the program should not depend
on whether a
call is implemented with a visible inline definition, or the external
definition, of a function.
This allows an inline definition to be specialized for its use within a
particular translation unit.
For example, the external definition of a library function might include some
argument
validation that is not needed for calls made from other functions in the same
library. These
extensions do offer some advantages; and programmers who are concerned about
compatibility
can simply abide by the stricter C++ rules.
Note that it is not appropriate for implementations to provide inline
definitions of standard
library functions in the standard headers because this can break some legacy
code that
redeclares standard library functions after including their headers. The inline
keyword is
intended only to provide users with a portable way to suggest inlining of
functions. Because the
standard headers need not be portable, implementations have other options along
the lines of:
#define abs(x) __builtin_abs(x)
or other non-portable mechanisms for inlining standard library functions.
-------------- next part -------------->From GCC 2.95.3 Manual
=====================
4.31 An Inline Function is As Fast As a Macro
By declaring a function inline, you can direct GNU CC to integrate that
function's code into the code for its callers. This makes execution faster
by eliminating the function-call overhead; in addition, if any of the actual
argument values are constant, their known values may permit simplifications at
compile time so that not all of the inline function's code needs to be
included. The effect on code size is less predictable; object code may be larger
or smaller with function inlining, depending on the particular case. Inlining of
functions is an optimization and it really "works" only in optimizing
compilation. If you don't use `-O', no function is really inline.
To declare a function inline, use the inline keyword in its declaration, like
this:
inline int
inc (int *a)
{
(*a)++;
}
(If you are writing a header file to be included in ANSI C programs, write
__inline__ instead of inline. See section 4.35 Alternate Keywords.) You can also
make all "simple enough" functions inline with the option
`-finline-functions'.
Note that certain usages in a function definition can make it unsuitable for
inline substitution. Among these usages are: use of varargs, use of alloca, use
of variable sized data types (see section 4.14 Arrays of Variable Length), use
of computed goto (see section 4.3 Labels as Values), use of nonlocal goto, and
nested functions (see section 4.4 Nested Functions). Using `-Winline' will
warn when a function marked inline could not be substituted, and will give the
reason for the failure.
Note that in C and Objective C, unlike C++, the inline keyword does not affect
the linkage of the function.
GNU CC automatically inlines member functions defined within the class body of
C++ programs even if they are not explicitly declared inline. (You can override
this with `-fno-default-inline'; see section Options Controlling C++
Dialect.)
When a function is both inline and static, if all calls to the function are
integrated into the caller, and the function's address is never used, then
the function's own assembler code is never referenced. In this case, GNU CC
does not actually output assembler code for the function, unless you specify the
option `-fkeep-inline-functions'. Some calls cannot be integrated for
various reasons (in particular, calls that precede the function's definition
cannot be integrated, and neither can recursive calls within the definition). If
there is a nonintegrated call, then the function is compiled to assembler code
as usual. The function must also be compiled as usual if the program refers to
its address, because that can't be inlined.
When an inline function is not static, then the compiler must assume that there
may be calls from other source files; since a global symbol can be defined only
once in any program, the function must not be defined in the other source files,
so the calls therein cannot be integrated. Therefore, a non-static inline
function is always compiled on its own in the usual fashion.
If you specify both inline and extern in the function definition, then the
definition is used only for inlining. In no case is the function compiled on its
own, not even if you refer to its address explicitly. Such an address becomes an
external reference, as if you had only declared the function, and had not
defined it.
This combination of inline and extern has almost the effect of a macro. The way
to use it is to put a function definition in a header file with these keywords,
and put another copy of the definition (lacking inline and extern) in a library
file. The definition in the header file will cause most calls to the function to
be inlined. If any uses of the function remain, they will refer to the single
copy in the library.
GNU C does not inline any functions when not optimizing. It is not clear whether
it is better to inline or not, in this case, but we found that a correct
implementation when not optimizing was difficult. So we did the easy thing, and
turned it off.
4.35 Alternate Keywords
The option `-traditional' disables certain keywords; `-ansi' disables
certain others. This causes trouble when you want to use GNU C extensions, or
ANSI C features, in a general-purpose header file that should be usable by all
programs, including ANSI C programs and traditional ones. The keywords asm,
typeof and inline cannot be used since they won't work in a program compiled
with `-ansi', while the keywords const, volatile, signed, typeof and inline
won't work in a program compiled with `-traditional'.
The way to solve these problems is to put `__' at the beginning and end of
each problematical keyword. For example, use __asm__ instead of asm, __const__
instead of const, and __inline__ instead of inline.
Other C compilers won't accept these alternative keywords; if you want to
compile with another compiler, you can define the alternate keywords as macros
to replace them with the customary keywords. It looks like this:
#ifndef __GNUC__
#define __asm__ asm
#endif
`-pedantic' causes warnings for many GNU C extensions. You can prevent such
warnings within one expression by writing __extension__ before the expression.
__extension__ has no effect aside from this.