Hi List, I am in the preliminary stages of adding a JIT compiler to a sizable Scheme system (PLT Scheme). The original plan was to use GNU Lightning, but 1) it seems to be dead, and 2) LLVM has already done a huge amount of stuff that I would have had to write (poorly) from scratch. At the moment, LLVM seems to be the ideal choice for implementing the Scheme JIT, but there are problems that need to be addressed first. I hope you guys can help me with these - I'll list them in descending order of importance. Tail Call Elimination: I've read over the "Random llvm notes", and see that you guys have though about this already. However, the note dates from last year, so I am wondering if there is an implementation in the works. If no one is working on this or is planning to work on this in the near future, I would be willing to give it a shot if I was given some direction as to where to start. Explicitly managed stack frames would also be nice, but are not a necessity unlike the mixed calling conventions and tail call elimination. For all of you who are wondering about call/cc, we currently implement it via stack copying (and will continue to), so I am not worried about llvm not having a representation for continuations. JIT + Optimization interactions: I have looked over the JIT documentation (which is a bit sparse) and the examples. So far I am completely unclear as to what the JIT compiler actually does with the code that is passed to it. To be more precise, does the JIT perform all of the standard llvm optimizations on the code, or does it depend on it's client to do so himself? Are there some examples of that? If it does indeed optimize the input, does it attempt to do global optimizations on the functions (intraprocedural register allocation, inlining, whatever)? Does it re-do these optimizations when functions are added/ removed/ changed? Are there parameters to tune the compiler's aggressiveness? C-Interface: Does there happen to be a C interface to the jit ? Our scheme impl has a good FFI, but it doesn't do C++. If not, this is no big deal, and i'll just write something myself. Size of Distro/ Compilation Speed While the sources of llvm are not that big, the project builds very slowly into something very large. Someone already asked about what is the minimum needed for just a JIT compiler, and I think I have a vague idea of what needs to tweaked. However, I want to minimize the changes I make to my llvm tree. I know that no one can make g++ run any faster, but part of the speed problem and resulting size of the compilation is that the configure script seems to ignore my directives. For examle, it always builds all architectures, and it always statically links each binary. Well, that's all I can think of for now. Any help will be greatly appreciated :) -- -Alex
Hi, Alexander! On Wed, May 04, 2005 at 11:59:06PM -0400, Alexander Friedman wrote:> I am in the preliminary stages of adding a JIT compiler to a sizable > Scheme system (PLT Scheme).Cool!> The original plan was to use GNU Lightning, but 1) it seems to be > dead, and 2) LLVM has already done a huge amount of stuff that I would > have had to write (poorly) from scratch.Maybe we can use you for a testimonial... :)> At the moment, LLVM seems to be the ideal choice for implementing the > Scheme JIT, but there are problems that need to be addressed first. I > hope you guys can help me with these - I'll list them in descending > order of importance.Sounds good, I'll do my best.> Tail Call Elimination: > > I've read over the "Random llvm notes", and see that you guys have > though about this already. > > However, the note dates from last year, so I am wondering if there is > an implementation in the works. If no one is working on this or is > planning to work on this in the near future, I would be willing to > give it a shot if I was given some direction as to where to start.To the best of my knowledge, this has not been done and no one has announced their intent to work on it, so if you are interested, you'd be more than welcome to do so.> I have looked over the JIT documentation (which is a bit sparse) and > the examples. So far I am completely unclear as to what the JIT > compiler actually does with the code that is passed to it.A target runs the passes listed in the method <target>JITInfo::addPassesToJITCompile() to emit machine code.> To be more precise, does the JIT perform all of the standard llvm > optimizations on the code, or does it depend on it's client to do so > himself? Are there some examples of that?No, the JIT performs no optimizations. The method I mentioned above just lowers the constructs the instruction selector cannot handle (yet) or things that the target does not have support for. About the only thing the JIT does (in X86) is eliminate unreachable blocks (dead code). Then, it's passed on to the instruction selector which creates machine code and some peephole optimizations are ran, then prolog/epilog are inserted. I glossed over the x86 floating point details, but you get the idea. The use case scenario is usually like this: llvm-gcc/llvm-g++ produces very simple, brain-dead code for a given C/C++ file. It does not create SSA form, but creates stack allocations for all variables. This makes it easier to write a front-end. We turned off all optimizations in GCC and so the code produced by the C/C++ front-end is really not pretty. Then, gccas is run on each LLVM assembly file, and gccas is basically an optimizing assembler, it runs the optimizations listed in llvm/tools/gccas/gccas.cpp which you can inspect. Once all the files for a program are compiled to bytecode, they are linked with gccld, which is an optimizing linker, which does a lot of interprocedural optimization, and creates the final bytecode file. After this, you can use llc or lli (JIT) on the resulting bytecode, and llc or lli don't have to do any optimizations, because they have already been performed.> If it does indeed optimize the input, does it attempt to do global > optimizations on the functions (intraprocedural register allocation, > inlining, whatever)?The default register allocator in use for most platforms is a linear-scan register allocator, and the SparcV9 backend uses a graph-coloring register allocator. However, the JIT performs no inlining, as mentioned above.> Does it re-do these optimizations when functions are added/ removed/ > changed? Are there parameters to tune the compiler's aggressiveness?There is a JIT::recompileAndRelinkFunction() method, but it doesn't optimize the code.> Does there happen to be a C interface to the jit ? Our scheme impl > has a good FFI, but it doesn't do C++. If not, this is no big deal, > and i'll just write something myself.No, this would have to be added.> While the sources of llvm are not that big, the project builds very > slowly into something very large. Someone already asked about what is > the minimum needed for just a JIT compiler, and I think I have a > vague idea of what needs to tweaked. However, I want to minimize the > changes I make to my llvm tree.llvm/examples/HowToUseJIT pretty much has the minimal support one needs for a JIT, but if you can make it even smaller, I'd be interested.> [...] configure script seems to ignore my directives. For examle, it > always builds all architectures, ...Are you using a release or CVS version? Support for this just went into CVS recently, so you should check it out and see if it works for you. If you *are* using CVS, are you saying you used `configure -enable-target=[blah]' and it compiled and linked them all? In that case, it's a bug, so please post your results over here: http://llvm.cs.uiuc.edu/PR518> ... and it always statically links each binary.Yes, that is currently the default method of building libraries and tools. If you were to make all the libraries shared, you would be doing the same linking/relocating at run-time every time you start the tool. There is support for loading target backends, etc. from shared objects with -load, and we may move to the model of having shared objects for targets in the future, but at present, they are static. Having more shared libraries may speed up link time, but I suspect will negatively impact run time. -- Misha Brukman :: http://misha.brukman.net :: http://llvm.cs.uiuc.edu
On May 5, Misha Brukman wrote:> Maybe we can use you for a testimonial... :)Certainly.> > Tail Call Elimination: > > > > I've read over the "Random llvm notes", and see that you guys have > > though about this already. > > > > However, the note dates from last year, so I am wondering if there is > > an implementation in the works. If no one is working on this or is > > planning to work on this in the near future, I would be willing to > > give it a shot if I was given some direction as to where to start. > > To the best of my knowledge, this has not been done and no one has > announced their intent to work on it, so if you are interested, you'd be > more than welcome to do so.My C++ knowledge is completely non-existant, but so far I've had a surprisingly easy time reading the source. This change seems somewhat involved - I will have to implement different calling conventions - ie, passing a return-address to the callee, etc. Who is the right person to talk to abot this?> The use case scenario is usually like this: > > llvm-gcc/llvm-g++ produces very simple, brain-dead code for a given > C/C++ file. It does not create SSA form, but creates stack allocations > for all variables. This makes it easier to write a front-end. We > turned off all optimizations in GCC and so the code produced by the > C/C++ front-end is really not pretty.[ ... ]> After this, you can use llc or lli (JIT) on the resulting bytecode, and > llc or lli don't have to do any optimizations, because they have already > been performed.+> > Does it re-do these optimizations when functions are added/ removed/ > > changed? Are there parameters to tune the compiler's aggressiveness? > > There is a JIT::recompileAndRelinkFunction() method, but it doesn't > optimize the code.Ok, this makes sense. However, am I correct in assuming that the interaprocedural optimizations performed in gccas will make it problematic to call 'JIT::recompileAndRelinkFunction()' . For example, suppose I run run some module that looks like module a int foo () { ... bar() ... } int bar () { ... } through all of those optimizations. Will the result nessisarily have a bar() function? If inlining is enabled, replacing bar might have no effect if it's inlined in foo. If inlining is not enabled, are there other gotcha's like this? If there are complications like this, how much of a performance gain do the interprocedural opts give? Also, compileAndRelink (F) seems to update references in call sites of F. Does this mean that every function call incurs an extra 'load' , or is there some cleverer solution? Finally, if I jit-compile several modules, can they reference each other's functions? If this is answered somewhere in the docs, I appologize.> > If it does indeed optimize the input, does it attempt to do global > > optimizations on the functions (intraprocedural register allocation, > > inlining, whatever)? > > The default register allocator in use for most platforms is a > linear-scan register allocator, and the SparcV9 backend uses a > graph-coloring register allocator. However, the JIT performs no > inlining, as mentioned above.Why use linear scan on X86? Does it have some benefits over graph-coloring? FWIW, Lal George has a paper on using graph coloring on the register poor X86 by implicitly taking advantage of the Intel's register mapping to emulate 32 registers. The result is between 10 and 100% improvement on the benchmarks he ran (but the allocater is 40% slower).> llvm/examples/HowToUseJIT pretty much has the minimal support one needs > for a JIT, but if you can make it even smaller, I'd be interested.Sorry, what i actually meant was: what are the minimum libraries that I have to compile in order to be able to build the HowToUseJIT (and all the passes in gccas/ld). We will eventually need to distrubute the sources and binaries with our scheme distrubution, and so I need to find smallest set of files that need to be compiled in order to have the jit + optimizers working.> > [...] configure script seems to ignore my directives. For examle, it > > always builds all architectures, ... > > Are you using a release or CVS version? Support for this just went into > CVS recently, so you should check it out and see if it works for you. > If you *are* using CVS, are you saying you used `configure > -enable-target=[blah]' and it compiled and linked them all? In that > case, it's a bug, so please post your results over here:Yes, I just tried with cvs and It still compiles all back-ends. I'll try it again to make sure, and then report the bug.> > ... and it always statically links each binary. > > Yes, that is currently the default method of building libraries and > tools. If you were to make all the libraries shared, you would be doing > the same linking/relocating at run-time every time you start the tool.It's not the linking/relocating that's the problem. The problem is that each binary winds up being rather large. However, since these tools don't need to be distributed or compiled for my purposes, I guess i'm not really worried about it. -- -Alex
On Wed, 2005-05-04 at 23:59 -0400, Alexander Friedman wrote:> Hi List, > > I am in the preliminary stages of adding a JIT compiler to a sizable > Scheme system (PLT Scheme). The original plan was to use GNU > Lightning, but 1) it seems to be dead, and 2) LLVM has already done a > huge amount of stuff that I would have had to write (poorly) from > scratch.Yay! A real language :)> Explicitly managed stack frames would also be nice, but are not a > necessity unlike the mixed calling conventions and tail call > elimination. For all of you who are wondering about call/cc, we > currently implement it via stack copying (and will continue to), so I > am not worried about llvm not having a representation for > continuations.Mixed calling conventions are on a lot of people's list of things they want to see. The only mixed calling convention hack I know of is in the Alpha backend to avoid indirect calls and some prologue. However this is just a single arch's hack. Work on the general case would probably receive a fair amount of interest and support.> JIT + Optimization interactions: > To be more precise, does the JIT perform all of the standard llvm > optimizations on the code, or does it depend on it's client to do so > himself? Are there some examples of that?So as it stands, one should think of out JIT as something akin to the early Java JITs: one function at a time and only one compile per function. This is extremely primative by modern JIT standards, where a JIT will do profiling, find hot functions and reoptimize them, reoptimize functions when more information about the call tree is available, have several levels of optimizations, etc. There isn't, AFAIK, anything stopping a user of the JIT from doing much of this work, however it would be nice to improve the JIT.> C-Interface: > > Does there happen to be a C interface to the jit ? Our scheme impl > has a good FFI, but it doesn't do C++. If not, this is no big deal, > and i'll just write something myself.No, but such bindings would be *very useful*. And since there might be other people who need them this summer, such work might also get a lot of help.> Size of Distro/ Compilation Speed > > While the sources of llvm are not that big, the project builds very > slowly into something very large. Someone already asked about what is > the minimum needed for just a JIT compiler, and I think I have a > vague idea of what needs to tweaked. However, I want to minimize the > changes I make to my llvm tree. I know that no one can make g++ run > any faster, but part of the speed problem and resulting size of the > compilation is that the configure script seems to ignore my > directives. For examle, it always builds all architectures, and it > always statically links each binary.See misha's comment about some new build flags. Also, things are considerably smaller (an order of magnitude) if one makes a release build (make ENABLE_OPTIMIZED=1)> _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev > >-- Andrew Lenharth <alenhar2 at cs.uiuc.edu> -- Andrew Lenharth <andrewl at lenharth.org>
> So as it stands, one should think of out JIT as something akin to the > early Java JITs: one function at a time and only one compile per > function. This is extremely primative by modern JIT standards, where a > JIT will do profiling, find hot functions and reoptimize them, > reoptimize functions when more information about the call tree is > available, have several levels of optimizations, etc.While this is extremely primative by modern JIT standards, it is extremely good by modern Open Source standards, so I'm quite thankfull for it. If no one else does, this is something I'll invenstigate in the future.> > Does there happen to be a C interface to the jit ? Our scheme impl > > has a good FFI, but it doesn't do C++. If not, this is no big deal, > > and i'll just write something myself. > > No, but such bindings would be *very useful*. And since there might be > other people who need them this summer, such work might also get a lot > of help.I'll probably wind up writing one, and if I do, I will certainly submit it - it seems like it should just be gluing a bunch of functions together and proving an 'extern "C" ' decleration. What sort of interface should such an interface provide? The simplest is "pass-in-a-string-to-compile", but that's rather crude. -- -Alex