thr3ads.net - llvm dev - [LLVMdev] Scheme + LLVM JIT [May 2005]

If this information is useful, please help other people find it:
Share via:

Alexander Friedman

2005-May-05 07:46 UTC

[LLVMdev] Scheme + LLVM JIT

On May  5, Misha Brukman wrote:> Maybe we can use you for a testimonial... :)
Certainly.
> > Tail Call Elimination:
> > 
> > I've read over the "Random llvm notes", and see that you
guys have
> > though about this already.
> > 
> > However, the note dates from last year, so I am wondering if there is
> > an implementation in the works. If no one is working on this or is
> > planning to work on this in the near future, I would be willing to
> > give it a shot if I was given some direction as to where to start.
> 
> To the best of my knowledge, this has not been done and no one has
> announced their intent to work on it, so if you are interested, you'd
be
> more than welcome to do so.
My C++ knowledge is completely non-existant, but so far I've had a
surprisingly easy time reading the source. This change seems somewhat
involved - I will have to implement different calling conventions -
ie, passing a return-address to the callee, etc. Who is the right
person to talk to abot this?
> The use case scenario is usually like this:
> 
> llvm-gcc/llvm-g++ produces very simple, brain-dead code for a given
> C/C++ file.  It does not create SSA form, but creates stack allocations
> for all variables.  This makes it easier to write a front-end.  We
> turned off all optimizations in GCC and so the code produced by the
> C/C++ front-end is really not pretty.
[ ... ]
> After this, you can use llc or lli (JIT) on the resulting bytecode, and
> llc or lli don't have to do any optimizations, because they have
already
> been performed.
+
> > Does it re-do these optimizations when functions are added/ removed/
> > changed? Are there parameters to tune the compiler's
aggressiveness?
> 
> There is a JIT::recompileAndRelinkFunction() method, but it doesn't
> optimize the code.
Ok, this makes sense. However, am I correct in assuming that the
interaprocedural optimizations performed in gccas will make it
problematic to call 'JIT::recompileAndRelinkFunction()' . For example,
suppose I run run some module that looks like

module a

int foo () {
 ...
 bar()
 ...
}

int bar () {
 ...
}

through all of those optimizations. Will the result nessisarily have a
bar() function? If inlining is enabled, replacing bar might have no
effect if it's inlined in foo. If inlining is not enabled, are there
other gotcha's like this?

If there are complications like this, how much of a performance gain
do the interprocedural opts give?

Also, compileAndRelink (F) seems to update references in call sites of
F. Does this mean that every function call incurs an extra 'load' , or
is there some cleverer solution?

Finally, if I jit-compile several modules, can they reference each
other's functions? If this is answered somewhere in the docs, I
appologize.
> > If it does indeed optimize the input, does it attempt to do global
> > optimizations on the functions (intraprocedural register allocation,
> > inlining, whatever)?
> 
> The default register allocator in use for most platforms is a
> linear-scan register allocator, and the SparcV9 backend uses a
> graph-coloring register allocator.  However, the JIT performs no
> inlining, as mentioned above.
Why use linear scan on X86? Does it have some benefits over
graph-coloring? FWIW, Lal George has a paper on using graph coloring
on the register poor X86 by implicitly taking advantage of the Intel's
register mapping to emulate 32 registers. The result is between 10 and
100% improvement on the benchmarks he ran (but the allocater is 40%
slower).
> llvm/examples/HowToUseJIT pretty much has the minimal support one needs
> for a JIT, but if you can make it even smaller, I'd be interested.
Sorry, what i actually meant was: what are the minimum libraries that
I have to compile in order to be able to build the HowToUseJIT (and
all the passes in gccas/ld).

We will eventually need to distrubute the sources and binaries with
our scheme distrubution, and so I need to find smallest set of files
that need to be compiled in order to have the jit + optimizers working.
> > [...] configure script seems to ignore my directives. For examle, it
> > always builds all architectures, ...
> 
> Are you using a release or CVS version?  Support for this just went into
> CVS recently, so you should check it out and see if it works for you.
> If you *are* using CVS, are you saying you used `configure
> -enable-target=[blah]' and it compiled and linked them all?  In that
> case, it's a bug, so please post your results over here:
Yes, I just tried with cvs and It still compiles all back-ends. I'll
try it again to make sure, and then report the bug.
> > ... and it always statically links each binary.
> 
> Yes, that is currently the default method of building libraries and
> tools.  If you were to make all the libraries shared, you would be doing
> the same linking/relocating at run-time every time you start the tool.
It's not the linking/relocating that's the problem. The problem is
that each binary winds up being rather large. However, since these
tools don't need to be distributed or compiled for my purposes, I
guess i'm not really worried about it.

-- 


-Alex

Misha Brukman

2005-May-05 16:11 UTC

head link

[LLVMdev] Scheme + LLVM JIT

On Thu, May 05, 2005 at 03:46:58AM -0400, Alexander Friedman
wrote:> On May  5, Misha Brukman wrote:
> > To the best of my knowledge, this has not been done and no one has
> > announced their intent to work on it, so if you are interested,
> > you'd be more than welcome to do so.
> 
> My C++ knowledge is completely non-existant, but so far I've had a
> surprisingly easy time reading the source. This change seems somewhat
> involved - I will have to implement different calling conventions -
> ie, passing a return-address to the callee, etc. Who is the right
> person to talk to abot this?
The notes you refer to belong to Chris Lattner, but you should just post
your questions on llvmdev and someone will answer them.  The benefits
are that you may get your response faster than emailing someone
directly, you may get multiple perspectives, and the discussion is
archived for future LLVMers who are looking for some similar advice.
 > Ok, this makes sense. However, am I correct in assuming that the
> interaprocedural optimizations performed in gccas will make it
> problematic to call 'JIT::recompileAndRelinkFunction()' . For
example,
> suppose I run run some module that looks like
> 
> module a
> 
> int foo () {
>  ...
>  bar()
>  ...
> }
> 
> int bar () {
>  ...
> }
> 
> through all of those optimizations. Will the result nessisarily have a
> bar() function? 
You are correct, it may get inlined.
> If inlining is enabled, replacing bar might have no effect if it's
> inlined in foo. 
True.
> If inlining is not enabled, are there other gotcha's like this?
There are optimizations that will remove dead code (unreachable basic
blocks, functions that are never called directly or possibly aliased,
global variables that are not referenced or possibly aliased),
optimizations that eliminate unused arguments from functions
interprocedurally, structs may be broken up into constituent elements
(scalar replacement of arguments), arguments may become passed by value
instead of by reference if it's legal to do so, etc.

These all affect the structure of the code and if you care to preserve
some elements of it, you might do well to select your own set of
optimization passes to do what you want and none of the things you
don't.

Alternatively, I *think* you can use the debugging intrinsics to
"mark"
what you want to be preserved and it will pessimize the optimizations.
I am not well-versed with the debugging intrinsics, so that's a guess. 
See http://llvm.cs.uiuc.edu/docs/SourceLevelDebugging.html for info.

However, let's step back for a second.  I am talking about what effect
gccas/gccld will have on code generated by some front-end.  Presumably,
you want to write a single stand-alone JIT that will take scheme -> LLVM
-> native code via JIT.  Hence, gccas/gccld optimization selection
doesn't really apply to you.  You can write your own "tool" that
will
use the JIT libraries and the optimizations of your choosing, if you so
desire.

If, however, you are using the gccas/gccld + lli model, then we're
talking about two different "Modules", i.e. the one before
optimization
that gccas/gccld hack on, and the one after optimization that lli
actually sees and compiles/executes.

Once the bytecode file is passed to the JIT, *that* is what I would call
the "Module being executed".  So while you are in the JIT environment,
the JIT will not do any inlining.  Any inling will have already been
performed.  So if the JIT has no function "bar", well, it's not
that it
"disappeared", it's just that in the lifetime of the JIT, it never
existed in the first place.
> If there are complications like this, how much of a performance gain
> do the interprocedural opts give?
I don't have any numbers for this, because the inter-procedural
optimizations are bunched in with intra-procedural ones, and we always
run them all.  So we don't really have any measurements for the gain of
JUST interprocedural optimizations.

You can try turning off ALL optimizations with 

  llvm-gcc -Wa,-disable-opt -Wl,-disable-opt [...]

The first will disable all gccas optimizations and the second -- gccld.
The problem is that by removing them all it REALLY pessimizes the code,
as all locals will be stack-allocated, for instance.

If inlining is really the biggest problem you're facing, there's a
-disable-inlining flag for gccld.
> Also, compileAndRelink (F) seems to update references in call sites of
> F. Does this mean that every function call incurs an extra 'load' ,
or
> is there some cleverer solution?
We don't track all the call sites.  Instead, what recompile and relink
does is adds an unconditional branch from the old function (in native
code) to the new one (in native code again), so what this does is add an
extra indirection to all subsequent calls to that function, but not an
extra load.

One cleverer solution would be to actually track all the call sites, but
then if recompile-and-relink is called rarely, it would be an extra
overhead, not a gain, so it would slow down the common case.

Another cleverer solution would be to overwrite the machine code
in-place with the code for the new function, but then the problem is
that we lay out the code for functions sequentially in memory so as to
not waste space, and hence, each recompilation of the function better
fit into the place of the old one, or else we might run into the code
region of the next function.  This means that we then have to break up
the code region for a function into multiple code sections, possibly
quite far apart in memory, and this leads to more issues.

So we've taken the simpler approach above which works.
> Finally, if I jit-compile several modules, can they reference each
> other's functions? If this is answered somewhere in the docs, I
> appologize.
At present, I am not quite sure that the JIT will accept two different
Modules, most tools (except the linkers) assume a single Module that is
given to them.  I have not used two Modules with the JIT and I haven't
seen anyone else do that, so it maybe a limitation or it just may need
some extention to support it, I'm not sure.
> Why use linear scan on X86? 
I should mention that the SparcV9 is different in structure from the
other backends so the only backend that can use the graph-coloring
register allocator is the SparcV9 backend.  The linear-scan was written
in a different format, and it can be used on any backend except the V9.

Also, why linear scan vs. XYZ?  Because someone wrote it and contributed
it to LLVM (that someone happens to be Alkis, and he's a member of the
LLVM group). :)
> Does it have some benefits over graph-coloring? 
Algorithmically, I think it's O(n) for linear scan vs. O(n^3) for graph
coloring.  Specifically for LLVM, Alkis wrote a linear-scan allocator
and so we have it.  Someone at a different school wrote a graph-coloring
allocator for LLVM, but they haven't contributed it back to us, so we
don't have it.
> FWIW, Lal George has a paper on using graph coloring on the register
> poor X86 by implicitly taking advantage of the Intel's register
> mapping to emulate 32 registers. The result is between 10 and 100%
> improvement on the benchmarks he ran (but the allocater is 40%
> slower).
It's not that we're against graph-coloring per se, it's just that no
one
has stepped up to the plate to do it and share the code with us.

I should mention that we accept patches. :)
> > llvm/examples/HowToUseJIT pretty much has the minimal support one
needs
> > for a JIT, but if you can make it even smaller, I'd be interested.
> 
> Sorry, what i actually meant was: what are the minimum libraries that
> I have to compile in order to be able to build the HowToUseJIT (and
> all the passes in gccas/ld).
The Makefiles for gccas/gccld/lli list the libraries they use in the
USEDLIBS variable.  Additionally, lli uses the special meta-library
"JIT" which expands to include the generic JIT libraries and the
target
backend(s) that you have selected.  See llvm/Makefile.rules, search for
JIT_LIBS and read from there.

You want to take the union of these sets, and that would do it.
> Yes, I just tried with cvs and It still compiles all back-ends. I'll
> try it again to make sure, and then report the bug.
Instead of opening a new bug, please reopen this one:
http://llvm.cs.uiuc.edu/PR518
 > It's not the linking/relocating that's the problem. The problem is
> that each binary winds up being rather large. However, since these
> tools don't need to be distributed or compiled for my purposes, I
> guess i'm not really worried about it.
Compiling optimized binaries rather than debug build would save quite a
bit of space in the binaries.  Other than that, I'm not really sure,
except maybe to compile LLVM with LLVM and then use its aggressive
optimizations and dead-code elimination transformations? :)

-- 
Misha Brukman :: http://misha.brukman.net :: http://llvm.cs.uiuc.edu

Chris Lattner

2005-May-05 16:41 UTC

head link

[LLVMdev] Scheme + LLVM JIT

On Thu, 5 May 2005, Misha Brukman wrote:> On Thu, May 05, 2005 at 03:46:58AM -0400, Alexander Friedman wrote:
>> On May  5, Misha Brukman wrote:
>>> To the best of my knowledge, this has not been done and no one has
>>> announced their intent to work on it, so if you are interested,
>>> you'd be more than welcome to do so.
>>
>> My C++ knowledge is completely non-existant, but so far I've had a
>> surprisingly easy time reading the source. This change seems somewhat
>> involved - I will have to implement different calling conventions -
>> ie, passing a return-address to the callee, etc. Who is the right
>> person to talk to abot this?
>
> The notes you refer to belong to Chris Lattner, but you should just post
> your questions on llvmdev and someone will answer them.  The benefits
> are that you may get your response faster than emailing someone
> directly, you may get multiple perspectives, and the discussion is
> archived for future LLVMers who are looking for some similar advice.
I agree with misha.  This should definately be discussed on-list if 
possible.
>> Ok, this makes sense. However, am I correct in assuming that the
>> interaprocedural optimizations performed in gccas will make it
>> problematic to call 'JIT::recompileAndRelinkFunction()' . For
example,
>> suppose I run run some module that looks like
...
>> through all of those optimizations. Will the result nessisarily have a
>> bar() function?
>
> You are correct, it may get inlined.
>
>> If inlining is enabled, replacing bar might have no effect if it's
>> inlined in foo.
>
> True.
Yes, this is an important point.

We build LLVM to be as modular as possible, which means that you get to 
choose exactly which pieces you want to put together into your program. 
If you're interested in doing function-level replacement, you basically 
have to avoid *all interprocedural optimizations*.  There may be ways 
around this in specific cases, but anything that assumes something about a 
function that has been replaced will need to be updated.  I don't think 
there is anything LLVM-specific about this problem though.
> However, let's step back for a second.  I am talking about what effect
> gccas/gccld will have on code generated by some front-end.  Presumably,
> you want to write a single stand-alone JIT that will take scheme -> LLVM
> -> native code via JIT.  Hence, gccas/gccld optimization selection
> doesn't really apply to you.  You can write your own "tool"
that will
> use the JIT libraries and the optimizations of your choosing, if you so
> desire.
Yup exactly, you get to choose exactly what you want to use :)
>> If there are complications like this, how much of a performance gain
>> do the interprocedural opts give?
This is impossible to say: it totally depends on the program.  I know that 
some real-world C codes are sped up by 30-50% in some cases, but others 
are not sped up at all.  I can't say for scheme programs, but I expect 
that the situation would be similar.
>> Also, compileAndRelink (F) seems to update references in call sites of
>> F. Does this mean that every function call incurs an extra
'load' , or
>> is there some cleverer solution?
>
> We don't track all the call sites.  Instead, what recompile and relink
> does is adds an unconditional branch from the old function (in native
> code) to the new one (in native code again), so what this does is add an
> extra indirection to all subsequent calls to that function, but not an
> extra load.
>
> One cleverer solution would be to actually track all the call sites, but
> then if recompile-and-relink is called rarely, it would be an extra
> overhead, not a gain, so it would slow down the common case.
Actually this is not clear, it might be a win to do this.  I don't think 
anyone has pounded on the replace function functionality enough for this 
to show up though.
> Another cleverer solution would be to overwrite the machine code
> in-place with the code for the new function, but then the problem is
> that we lay out the code for functions sequentially in memory so as to
> not waste space, and hence, each recompilation of the function better
> fit into the place of the old one, or else we might run into the code
> region of the next function.  This means that we then have to break up
> the code region for a function into multiple code sections, possibly
> quite far apart in memory, and this leads to more issues.
Also, if the function is currently being executed by a stack frame higher 
up on the stack, when we got back to that function, chaos would be 
unleashed :)
>> Finally, if I jit-compile several modules, can they reference each
>> other's functions? If this is answered somewhere in the docs, I
>> appologize.
>
> At present, I am not quite sure that the JIT will accept two different
> Modules, most tools (except the linkers) assume a single Module that is
> given to them.  I have not used two Modules with the JIT and I haven't
> seen anyone else do that, so it maybe a limitation or it just may need
> some extention to support it, I'm not sure.
I don't think lli supports this (yet!), but there is no fundamental reason 
why it could not be extended, and the JIT library might already work.  I'm 
not sure.
>> It's not the linking/relocating that's the problem. The problem
is
>> that each binary winds up being rather large. However, since these
>> tools don't need to be distributed or compiled for my purposes, I
>> guess i'm not really worried about it.
>
> Compiling optimized binaries rather than debug build would save quite a
> bit of space in the binaries.  Other than that, I'm not really sure,
> except maybe to compile LLVM with LLVM and then use its aggressive
> optimizations and dead-code elimination transformations? :)
Like misha said, please try compiling with 'make ENABLE_OPTIMIZED=1'. 
This will produce files in the llvm/Release/bin directory which much 
smaller than the debug files (e.g. opt goes from 72M -> 4M without debug 
info).

-Chris

-- 
http://nondot.org/sabre/
http://llvm.cs.uiuc.edu/

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - May 2005 - [LLVMdev] Scheme + LLVM JIT

[LLVMdev] Scheme + LLVM JIT

[LLVMdev] Scheme + LLVM JIT

[LLVMdev] Scheme + LLVM JIT

Possibly Parallel Threads