On May 5, Misha Brukman wrote:> Maybe we can use you for a testimonial... :)Certainly.> > Tail Call Elimination: > > > > I've read over the "Random llvm notes", and see that you guys have > > though about this already. > > > > However, the note dates from last year, so I am wondering if there is > > an implementation in the works. If no one is working on this or is > > planning to work on this in the near future, I would be willing to > > give it a shot if I was given some direction as to where to start. > > To the best of my knowledge, this has not been done and no one has > announced their intent to work on it, so if you are interested, you'd be > more than welcome to do so.My C++ knowledge is completely non-existant, but so far I've had a surprisingly easy time reading the source. This change seems somewhat involved - I will have to implement different calling conventions - ie, passing a return-address to the callee, etc. Who is the right person to talk to abot this?> The use case scenario is usually like this: > > llvm-gcc/llvm-g++ produces very simple, brain-dead code for a given > C/C++ file. It does not create SSA form, but creates stack allocations > for all variables. This makes it easier to write a front-end. We > turned off all optimizations in GCC and so the code produced by the > C/C++ front-end is really not pretty.[ ... ]> After this, you can use llc or lli (JIT) on the resulting bytecode, and > llc or lli don't have to do any optimizations, because they have already > been performed.+> > Does it re-do these optimizations when functions are added/ removed/ > > changed? Are there parameters to tune the compiler's aggressiveness? > > There is a JIT::recompileAndRelinkFunction() method, but it doesn't > optimize the code.Ok, this makes sense. However, am I correct in assuming that the interaprocedural optimizations performed in gccas will make it problematic to call 'JIT::recompileAndRelinkFunction()' . For example, suppose I run run some module that looks like module a int foo () { ... bar() ... } int bar () { ... } through all of those optimizations. Will the result nessisarily have a bar() function? If inlining is enabled, replacing bar might have no effect if it's inlined in foo. If inlining is not enabled, are there other gotcha's like this? If there are complications like this, how much of a performance gain do the interprocedural opts give? Also, compileAndRelink (F) seems to update references in call sites of F. Does this mean that every function call incurs an extra 'load' , or is there some cleverer solution? Finally, if I jit-compile several modules, can they reference each other's functions? If this is answered somewhere in the docs, I appologize.> > If it does indeed optimize the input, does it attempt to do global > > optimizations on the functions (intraprocedural register allocation, > > inlining, whatever)? > > The default register allocator in use for most platforms is a > linear-scan register allocator, and the SparcV9 backend uses a > graph-coloring register allocator. However, the JIT performs no > inlining, as mentioned above.Why use linear scan on X86? Does it have some benefits over graph-coloring? FWIW, Lal George has a paper on using graph coloring on the register poor X86 by implicitly taking advantage of the Intel's register mapping to emulate 32 registers. The result is between 10 and 100% improvement on the benchmarks he ran (but the allocater is 40% slower).> llvm/examples/HowToUseJIT pretty much has the minimal support one needs > for a JIT, but if you can make it even smaller, I'd be interested.Sorry, what i actually meant was: what are the minimum libraries that I have to compile in order to be able to build the HowToUseJIT (and all the passes in gccas/ld). We will eventually need to distrubute the sources and binaries with our scheme distrubution, and so I need to find smallest set of files that need to be compiled in order to have the jit + optimizers working.> > [...] configure script seems to ignore my directives. For examle, it > > always builds all architectures, ... > > Are you using a release or CVS version? Support for this just went into > CVS recently, so you should check it out and see if it works for you. > If you *are* using CVS, are you saying you used `configure > -enable-target=[blah]' and it compiled and linked them all? In that > case, it's a bug, so please post your results over here:Yes, I just tried with cvs and It still compiles all back-ends. I'll try it again to make sure, and then report the bug.> > ... and it always statically links each binary. > > Yes, that is currently the default method of building libraries and > tools. If you were to make all the libraries shared, you would be doing > the same linking/relocating at run-time every time you start the tool.It's not the linking/relocating that's the problem. The problem is that each binary winds up being rather large. However, since these tools don't need to be distributed or compiled for my purposes, I guess i'm not really worried about it. -- -Alex
On Thu, May 05, 2005 at 03:46:58AM -0400, Alexander Friedman wrote:> On May 5, Misha Brukman wrote: > > To the best of my knowledge, this has not been done and no one has > > announced their intent to work on it, so if you are interested, > > you'd be more than welcome to do so. > > My C++ knowledge is completely non-existant, but so far I've had a > surprisingly easy time reading the source. This change seems somewhat > involved - I will have to implement different calling conventions - > ie, passing a return-address to the callee, etc. Who is the right > person to talk to abot this?The notes you refer to belong to Chris Lattner, but you should just post your questions on llvmdev and someone will answer them. The benefits are that you may get your response faster than emailing someone directly, you may get multiple perspectives, and the discussion is archived for future LLVMers who are looking for some similar advice.> Ok, this makes sense. However, am I correct in assuming that the > interaprocedural optimizations performed in gccas will make it > problematic to call 'JIT::recompileAndRelinkFunction()' . For example, > suppose I run run some module that looks like > > module a > > int foo () { > ... > bar() > ... > } > > int bar () { > ... > } > > through all of those optimizations. Will the result nessisarily have a > bar() function?You are correct, it may get inlined.> If inlining is enabled, replacing bar might have no effect if it's > inlined in foo.True.> If inlining is not enabled, are there other gotcha's like this?There are optimizations that will remove dead code (unreachable basic blocks, functions that are never called directly or possibly aliased, global variables that are not referenced or possibly aliased), optimizations that eliminate unused arguments from functions interprocedurally, structs may be broken up into constituent elements (scalar replacement of arguments), arguments may become passed by value instead of by reference if it's legal to do so, etc. These all affect the structure of the code and if you care to preserve some elements of it, you might do well to select your own set of optimization passes to do what you want and none of the things you don't. Alternatively, I *think* you can use the debugging intrinsics to "mark" what you want to be preserved and it will pessimize the optimizations. I am not well-versed with the debugging intrinsics, so that's a guess. See http://llvm.cs.uiuc.edu/docs/SourceLevelDebugging.html for info. However, let's step back for a second. I am talking about what effect gccas/gccld will have on code generated by some front-end. Presumably, you want to write a single stand-alone JIT that will take scheme -> LLVM -> native code via JIT. Hence, gccas/gccld optimization selection doesn't really apply to you. You can write your own "tool" that will use the JIT libraries and the optimizations of your choosing, if you so desire. If, however, you are using the gccas/gccld + lli model, then we're talking about two different "Modules", i.e. the one before optimization that gccas/gccld hack on, and the one after optimization that lli actually sees and compiles/executes. Once the bytecode file is passed to the JIT, *that* is what I would call the "Module being executed". So while you are in the JIT environment, the JIT will not do any inlining. Any inling will have already been performed. So if the JIT has no function "bar", well, it's not that it "disappeared", it's just that in the lifetime of the JIT, it never existed in the first place.> If there are complications like this, how much of a performance gain > do the interprocedural opts give?I don't have any numbers for this, because the inter-procedural optimizations are bunched in with intra-procedural ones, and we always run them all. So we don't really have any measurements for the gain of JUST interprocedural optimizations. You can try turning off ALL optimizations with llvm-gcc -Wa,-disable-opt -Wl,-disable-opt [...] The first will disable all gccas optimizations and the second -- gccld. The problem is that by removing them all it REALLY pessimizes the code, as all locals will be stack-allocated, for instance. If inlining is really the biggest problem you're facing, there's a -disable-inlining flag for gccld.> Also, compileAndRelink (F) seems to update references in call sites of > F. Does this mean that every function call incurs an extra 'load' , or > is there some cleverer solution?We don't track all the call sites. Instead, what recompile and relink does is adds an unconditional branch from the old function (in native code) to the new one (in native code again), so what this does is add an extra indirection to all subsequent calls to that function, but not an extra load. One cleverer solution would be to actually track all the call sites, but then if recompile-and-relink is called rarely, it would be an extra overhead, not a gain, so it would slow down the common case. Another cleverer solution would be to overwrite the machine code in-place with the code for the new function, but then the problem is that we lay out the code for functions sequentially in memory so as to not waste space, and hence, each recompilation of the function better fit into the place of the old one, or else we might run into the code region of the next function. This means that we then have to break up the code region for a function into multiple code sections, possibly quite far apart in memory, and this leads to more issues. So we've taken the simpler approach above which works.> Finally, if I jit-compile several modules, can they reference each > other's functions? If this is answered somewhere in the docs, I > appologize.At present, I am not quite sure that the JIT will accept two different Modules, most tools (except the linkers) assume a single Module that is given to them. I have not used two Modules with the JIT and I haven't seen anyone else do that, so it maybe a limitation or it just may need some extention to support it, I'm not sure.> Why use linear scan on X86?I should mention that the SparcV9 is different in structure from the other backends so the only backend that can use the graph-coloring register allocator is the SparcV9 backend. The linear-scan was written in a different format, and it can be used on any backend except the V9. Also, why linear scan vs. XYZ? Because someone wrote it and contributed it to LLVM (that someone happens to be Alkis, and he's a member of the LLVM group). :)> Does it have some benefits over graph-coloring?Algorithmically, I think it's O(n) for linear scan vs. O(n^3) for graph coloring. Specifically for LLVM, Alkis wrote a linear-scan allocator and so we have it. Someone at a different school wrote a graph-coloring allocator for LLVM, but they haven't contributed it back to us, so we don't have it.> FWIW, Lal George has a paper on using graph coloring on the register > poor X86 by implicitly taking advantage of the Intel's register > mapping to emulate 32 registers. The result is between 10 and 100% > improvement on the benchmarks he ran (but the allocater is 40% > slower).It's not that we're against graph-coloring per se, it's just that no one has stepped up to the plate to do it and share the code with us. I should mention that we accept patches. :)> > llvm/examples/HowToUseJIT pretty much has the minimal support one needs > > for a JIT, but if you can make it even smaller, I'd be interested. > > Sorry, what i actually meant was: what are the minimum libraries that > I have to compile in order to be able to build the HowToUseJIT (and > all the passes in gccas/ld).The Makefiles for gccas/gccld/lli list the libraries they use in the USEDLIBS variable. Additionally, lli uses the special meta-library "JIT" which expands to include the generic JIT libraries and the target backend(s) that you have selected. See llvm/Makefile.rules, search for JIT_LIBS and read from there. You want to take the union of these sets, and that would do it.> Yes, I just tried with cvs and It still compiles all back-ends. I'll > try it again to make sure, and then report the bug.Instead of opening a new bug, please reopen this one: http://llvm.cs.uiuc.edu/PR518> It's not the linking/relocating that's the problem. The problem is > that each binary winds up being rather large. However, since these > tools don't need to be distributed or compiled for my purposes, I > guess i'm not really worried about it.Compiling optimized binaries rather than debug build would save quite a bit of space in the binaries. Other than that, I'm not really sure, except maybe to compile LLVM with LLVM and then use its aggressive optimizations and dead-code elimination transformations? :) -- Misha Brukman :: http://misha.brukman.net :: http://llvm.cs.uiuc.edu
On Thu, 5 May 2005, Misha Brukman wrote:> On Thu, May 05, 2005 at 03:46:58AM -0400, Alexander Friedman wrote: >> On May 5, Misha Brukman wrote: >>> To the best of my knowledge, this has not been done and no one has >>> announced their intent to work on it, so if you are interested, >>> you'd be more than welcome to do so. >> >> My C++ knowledge is completely non-existant, but so far I've had a >> surprisingly easy time reading the source. This change seems somewhat >> involved - I will have to implement different calling conventions - >> ie, passing a return-address to the callee, etc. Who is the right >> person to talk to abot this? > > The notes you refer to belong to Chris Lattner, but you should just post > your questions on llvmdev and someone will answer them. The benefits > are that you may get your response faster than emailing someone > directly, you may get multiple perspectives, and the discussion is > archived for future LLVMers who are looking for some similar advice.I agree with misha. This should definately be discussed on-list if possible.>> Ok, this makes sense. However, am I correct in assuming that the >> interaprocedural optimizations performed in gccas will make it >> problematic to call 'JIT::recompileAndRelinkFunction()' . For example, >> suppose I run run some module that looks like...>> through all of those optimizations. Will the result nessisarily have a >> bar() function? > > You are correct, it may get inlined. > >> If inlining is enabled, replacing bar might have no effect if it's >> inlined in foo. > > True.Yes, this is an important point. We build LLVM to be as modular as possible, which means that you get to choose exactly which pieces you want to put together into your program. If you're interested in doing function-level replacement, you basically have to avoid *all interprocedural optimizations*. There may be ways around this in specific cases, but anything that assumes something about a function that has been replaced will need to be updated. I don't think there is anything LLVM-specific about this problem though.> However, let's step back for a second. I am talking about what effect > gccas/gccld will have on code generated by some front-end. Presumably, > you want to write a single stand-alone JIT that will take scheme -> LLVM > -> native code via JIT. Hence, gccas/gccld optimization selection > doesn't really apply to you. You can write your own "tool" that will > use the JIT libraries and the optimizations of your choosing, if you so > desire.Yup exactly, you get to choose exactly what you want to use :)>> If there are complications like this, how much of a performance gain >> do the interprocedural opts give?This is impossible to say: it totally depends on the program. I know that some real-world C codes are sped up by 30-50% in some cases, but others are not sped up at all. I can't say for scheme programs, but I expect that the situation would be similar.>> Also, compileAndRelink (F) seems to update references in call sites of >> F. Does this mean that every function call incurs an extra 'load' , or >> is there some cleverer solution? > > We don't track all the call sites. Instead, what recompile and relink > does is adds an unconditional branch from the old function (in native > code) to the new one (in native code again), so what this does is add an > extra indirection to all subsequent calls to that function, but not an > extra load. > > One cleverer solution would be to actually track all the call sites, but > then if recompile-and-relink is called rarely, it would be an extra > overhead, not a gain, so it would slow down the common case.Actually this is not clear, it might be a win to do this. I don't think anyone has pounded on the replace function functionality enough for this to show up though.> Another cleverer solution would be to overwrite the machine code > in-place with the code for the new function, but then the problem is > that we lay out the code for functions sequentially in memory so as to > not waste space, and hence, each recompilation of the function better > fit into the place of the old one, or else we might run into the code > region of the next function. This means that we then have to break up > the code region for a function into multiple code sections, possibly > quite far apart in memory, and this leads to more issues.Also, if the function is currently being executed by a stack frame higher up on the stack, when we got back to that function, chaos would be unleashed :)>> Finally, if I jit-compile several modules, can they reference each >> other's functions? If this is answered somewhere in the docs, I >> appologize. > > At present, I am not quite sure that the JIT will accept two different > Modules, most tools (except the linkers) assume a single Module that is > given to them. I have not used two Modules with the JIT and I haven't > seen anyone else do that, so it maybe a limitation or it just may need > some extention to support it, I'm not sure.I don't think lli supports this (yet!), but there is no fundamental reason why it could not be extended, and the JIT library might already work. I'm not sure.>> It's not the linking/relocating that's the problem. The problem is >> that each binary winds up being rather large. However, since these >> tools don't need to be distributed or compiled for my purposes, I >> guess i'm not really worried about it. > > Compiling optimized binaries rather than debug build would save quite a > bit of space in the binaries. Other than that, I'm not really sure, > except maybe to compile LLVM with LLVM and then use its aggressive > optimizations and dead-code elimination transformations? :)Like misha said, please try compiling with 'make ENABLE_OPTIMIZED=1'. This will produce files in the llvm/Release/bin directory which much smaller than the debug files (e.g. opt goes from 72M -> 4M without debug info). -Chris -- http://nondot.org/sabre/ http://llvm.cs.uiuc.edu/