Roland Scheidegger
2010-Oct-20 14:46 UTC
[LLVMdev] llvm register reload/spilling around calls
On 20.10.2010 05:00, Jakob Stoklund Olesen wrote:> On Oct 19, 2010, at 6:37 PM, Roland Scheidegger wrote: > >> Thanks for giving it a look! >> >> On 19.10.2010 23:21, Jakob Stoklund Olesen wrote: >>> On Oct 19, 2010, at 11:40 AM, Roland Scheidegger wrote: >>> >>>> So I saw that the code is doing lots of register >>>> spilling/reloading. Now I understand that due to calling >>>> conventions, there's not really a way to avoid this - I tried >>>> using coldcc but apparently the backend doesn't implement it >>>> and hence this is ignored. >>> Yes, unfortunately the list of call-clobbered registers is fixed >>> at the moment, so coldcc is mostly ignored by the backend. >>> >>> Patches welcome. >> What would be needed there? I actually tried a quick hack and >> simply changed the registers included in the list in >> X86RegisterInfo::getCalleeSavedRegs, so some xmm regs were included >> (similar to what was done for win64). But the result wasn't what I >> expected - the callee now indeed saved/restored all the xmm regs I >> added, however the calling code did not change at all... > > Look in X86InstrControl.td. The call instructions are all prefixed > by: > > let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2, > FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, > XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, > XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS], > > This is the fixed list of call-clobbered registers. It should really > be controlled by the calling convention of the called function > instead. > > The WINCALL* instructions only exist because of this.Ahh I see now. I hacked this up and indeed the code looks much better. I can't force it to use win64 calling conventions right? Would do just fine for this case (much closer to a cold calling convention, I really only need 5 preserved xmm regs). Roland> > One problem is that calling conventions are handled while building > the selection DAG, and the DAG doesn't really know to represent > clobbered registers. > > Perhaps X86TargetLowering::LowerCall() could decorate the > X86ISD::CALL node with the calling convention somehow? > > Dan, do you have any thoughts on how to communicate the calling > convention and call clobbered registers to the eventual CALL > MachineInstr? > > /jakob >
Jakob Stoklund Olesen
2010-Oct-20 16:13 UTC
[LLVMdev] llvm register reload/spilling around calls
On Oct 20, 2010, at 7:46 AM, Roland Scheidegger wrote:> On 20.10.2010 05:00, Jakob Stoklund Olesen wrote: >> Look in X86InstrControl.td. The call instructions are all prefixed >> by: >> >> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2, >> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, >> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, >> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS], >> >> This is the fixed list of call-clobbered registers. It should really >> be controlled by the calling convention of the called function >> instead. >> >> The WINCALL* instructions only exist because of this. > Ahh I see now. I hacked this up and indeed the code looks much better. > I can't force it to use win64 calling conventions right?No, only by targeting Windows.> Would do just fine for this case (much closer to a cold calling > convention, I really only need 5 preserved xmm regs).If XMM registers are the problem, -pre-alloc-split really ought to help you. You may want to investigate why it doesn't. /jakob
Roland Scheidegger
2010-Oct-20 23:31 UTC
[LLVMdev] llvm register reload/spilling around calls
(repost with right sender address) On 20.10.2010 18:13, Jakob Stoklund Olesen wrote:> On Oct 20, 2010, at 7:46 AM, Roland Scheidegger wrote: > >> On 20.10.2010 05:00, Jakob Stoklund Olesen wrote: >>> Look in X86InstrControl.td. The call instructions are all prefixed >>> by: >>> >>> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2, >>> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, >>> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, >>> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS], >>> >>> This is the fixed list of call-clobbered registers. It should really >>> be controlled by the calling convention of the called function >>> instead. >>> >>> The WINCALL* instructions only exist because of this. >> Ahh I see now. I hacked this up and indeed the code looks much better. >> I can't force it to use win64 calling conventions right? > > No, only by targeting Windows. > >> Would do just fine for this case (much closer to a cold calling >> convention, I really only need 5 preserved xmm regs). > > If XMM registers are the problem, -pre-alloc-split really ought to help you. > > You may want to investigate why it doesn't.Ok, I see if I can figure out something, though I have no in-depth knowledge of llvm. I think only xmm regs are really a problem because r12-r15 are callee-saved and hence used for holding the most frequently used values, which seems to be enough to avoid spilling there. It looked to me like it could be related to something mentioned in the lib/Target/README.txt file: //===---------------------------------------------------------------------===// We should investigate an instruction sinking pass. Consider this silly example in pic mode: #include <assert.h> void foo(int x) { assert(x); //... } we compile this to: _foo: subl $28, %esp call "L1$pb" "L1$pb": popl %eax cmpl $0, 32(%esp) je LBB1_2 # cond_true LBB1_1: # return # ... addl $28, %esp ret LBB1_2: # cond_true ... The PIC base computation (call+popl) is only used on one path through the code, but is currently always computed in the entry block. It would be better to sink the picbase computation down into the block for the assertion, as it is the only one that uses it. This happens for a lot of code with early outs. Another example is loads of arguments, which are usually emitted into the entry block on targets like x86. If not used in all paths through a function, they should be sunk into the ones that do. In this case, whole-function-isel would also handle this. //===---------------------------------------------------------------------===// Though maybe that's not related, since the arguments are actually (mostly) used in all paths. Roland
Possibly Parallel Threads
- [LLVMdev] llvm register reload/spilling around calls
- [LLVMdev] llvm register reload/spilling around calls
- [LLVMdev] llvm register reload/spilling around calls
- [LLVMdev] llvm register reload/spilling around calls
- [LLVMdev] Codegen/Register allocation question.