Roland Scheidegger
2010-Oct-19 18:40 UTC
[LLVMdev] llvm register reload/spilling around calls
Hi, I was investigating some performance issues with llvm JIT-generated code (x86_64), and looking at the assembly it indeed seemed quite suboptimal. In particular, the code is basically implementing some kind of caching. If there's a cache hit, the code just takes the value from the cache, if not it will do whatever is necessary to update the cache - this is expensive but happens only in about 1% of all cases and just calls a function to do it. So I saw that the code is doing lots of register spilling/reloading. Now I understand that due to calling conventions, there's not really a way to avoid this - I tried using coldcc but apparently the backend doesn't implement it and hence this is ignored. But what is really bad about this, is that the spilling/reloading ALWAYS happens, regardless if the branch containing the call is taken or not. Since the branch is almost never taken, that is obviously quite bad (but even if the branch would be taken more often, which the compiler can't know, I can't see why the reloading is always happening). I'm not quite sure what performance impact this has, but it looks to me like it definitely would make a difference, as the code not taking the branch is quite simple. I tried with both llvm 2.7 and 2.8, no difference. So is there any optimization option I'm missing which could improve this? Or is this simply the way things are (would that be considered a bug?). If this is a known limitation, any ideas if it's possible to work around that (by changing the affected jit code)? I'm attaching the IR I've hack-extracted from the jit code (it might be bogus but it compiles just fine). I think the assembly shows what I'm talking about quite well (even has the comments about the restore/spills). I used llc -O3 to compile. Roland -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sillysaverestore URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20101019/d4578f8d/attachment.ksh> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sillysaverestore.s URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20101019/d4578f8d/attachment-0001.ksh>
Jakob Stoklund Olesen
2010-Oct-19 21:21 UTC
[LLVMdev] llvm register reload/spilling around calls
On Oct 19, 2010, at 11:40 AM, Roland Scheidegger wrote:> So I saw that the code is doing lots of register spilling/reloading. Now > I understand that due to calling conventions, there's not really a way > to avoid this - I tried using coldcc but apparently the backend doesn't > implement it and hence this is ignored.Yes, unfortunately the list of call-clobbered registers is fixed at the moment, so coldcc is mostly ignored by the backend. Patches welcome.> So is there any optimization option I'm missing which could improve > this? Or is this simply the way things are (would that be considered a > bug?). If this is a known limitation, any ideas if it's possible to work > around that (by changing the affected jit code)?The -pre-alloc-split option should handle stuff like this when calls clobber an entire register class. That probably only applies to XMM registers. Work on proper live range splitting is in progress. You can try it out with -spiller=inline, but it is highly experimental and volatile at the moment. I don't know any short term solutions. /jakob
Roland Scheidegger
2010-Oct-20 01:37 UTC
[LLVMdev] llvm register reload/spilling around calls
Thanks for giving it a look! On 19.10.2010 23:21, Jakob Stoklund Olesen wrote:> On Oct 19, 2010, at 11:40 AM, Roland Scheidegger wrote: > >> So I saw that the code is doing lots of register >> spilling/reloading. Now I understand that due to calling >> conventions, there's not really a way to avoid this - I tried using >> coldcc but apparently the backend doesn't implement it and hence >> this is ignored. > > Yes, unfortunately the list of call-clobbered registers is fixed at > the moment, so coldcc is mostly ignored by the backend. > > Patches welcome.What would be needed there? I actually tried a quick hack and simply changed the registers included in the list in X86RegisterInfo::getCalleeSavedRegs, so some xmm regs were included (similar to what was done for win64). But the result wasn't what I expected - the callee now indeed saved/restored all the xmm regs I added, however the calling code did not change at all...> >> So is there any optimization option I'm missing which could improve >> this? Or is this simply the way things are (would that be >> considered a bug?). If this is a known limitation, any ideas if >> it's possible to work around that (by changing the affected jit >> code)? > > The -pre-alloc-split option should handle stuff like this when calls > clobber an entire register class. That probably only applies to XMM > registers.I tried that and the generated code did not change at all.> > Work on proper live range splitting is in progress. You can try it > out with -spiller=inline, but it is highly experimental and volatile > at the moment.Tried that too but the code mostly remained the same (there were 2 additional spills right at the beginning and some of the register numbers changed but that was all). There's also a -spiller=splitting option, I don't know what it should do but it just crashed... Roland> > I don't know any short term solutions. > > /jakob >
Possibly Parallel Threads
- [LLVMdev] llvm register reload/spilling around calls
- [LLVMdev] llvm register reload/spilling around calls
- [LLVMdev] llvm register reload/spilling around calls
- [LLVMdev] llvm register reload/spilling around calls
- [LLVMdev] llvm register reload/spilling around calls