thr3ads.net - llvm dev - [LLVMdev] Register scavenger and SP/FP adjustments [Sep 2013]

If this information is useful, please help other people find it:
Share via:

Krzysztof Parzyszek

2013-Sep-26 19:00 UTC

[LLVMdev] Register scavenger and SP/FP adjustments

Consider this example:

--- ex.ll ---
declare void @bar()

; Function Attrs: nounwind optsize
define void @main() {
entry:
   %hin = alloca [256 x i32], align 4
   %xin = alloca [256 x i32], align 4
   call void @bar()
   ret void
}
-------------


Freshly built llc:

llc -O2 -march=x86 < ex.ll -print-before-all

# *** IR Dump Before Prologue/Epilogue Insertion & Frame Finalization ***:
# Machine code for function main: Post SSA
Frame Objects:
   fi#0: size=1024, align=4, at location [SP+4]
   fi#1: size=1024, align=4, at location [SP+4]

BB#0: derived from LLVM BB %entry
         ADJCALLSTACKDOWN32 0, %ESP<imp-def>, %EFLAGS<imp-def,dead>,
%ESP<imp-use>
         CALLpcrel32 <ga:@bar>, <regmask>, %ESP<imp-use>,
%ESP<imp-def>
         ADJCALLSTACKUP32 0, 0, %ESP<imp-def>,
%EFLAGS<imp-def,dead>,
%ESP<imp-use>
         RET

# End machine code for function main.

before replace frame indices
# Machine code for function main: Post SSA
Frame Objects:
   fi#0: size=1024, align=4, at location [SP-1024]
   fi#1: size=1024, align=4, at location [SP-2048]

BB#0: derived from LLVM BB %entry
         %ESP<def,tied1> = SUB32ri %ESP<tied0>, 2060, 
%EFLAGS<imp-def,dead>; flags: FrameSetup
         PROLOG_LABEL <MCSym=.Ltmp0>
         CALLpcrel32 <ga:@bar>, <regmask>, %ESP<imp-use>,
%ESP<imp-def>
         %ESP<def,tied1> = ADD32ri %ESP<tied0>, 2060,
%EFLAGS<imp-def,dead>
         RET

# End machine code for function main.



Let's see what happens if we remove the call to "bar".

There aren't any pseudocodes that set up the frame to begin with, even 
though the SP is actually modified.  (This is to show that RS has no way 
of finding out that SP was actually adjusted in such cases.)


# *** IR Dump Before Prologue/Epilogue Insertion & Frame Finalization ***:
# Machine code for function main: Post SSA
Frame Objects:
   fi#0: size=1024, align=4, at location [SP+4]
   fi#1: size=1024, align=4, at location [SP+4]

BB#0: derived from LLVM BB %entry
         RET

# End machine code for function main.

before replace frame indices
# Machine code for function main: Post SSA
Frame Objects:
   fi#0: size=1024, align=4, at location [SP-1024]
   fi#1: size=1024, align=4, at location [SP-2048]

BB#0: derived from LLVM BB %entry
         %ESP<def,tied1> = SUB32ri %ESP<tied0>, 2048, 
%EFLAGS<imp-def,dead>; flags: FrameSetup
         PROLOG_LABEL <MCSym=.Ltmp0>
         %ESP<def,tied1> = ADD32ri %ESP<tied0>, 2048,
%EFLAGS<imp-def,dead>
         RET

# End machine code for function main.



And here's where the problem becomes more apparent.

Compile for Thumb and see that there is a virtual register used in the 
frame setup:

# *** IR Dump Before Prologue/Epilogue Insertion & Frame Finalization ***:
# Machine code for function main: Post SSA
Frame Objects:
   fi#0: size=1024, align=4, at location [SP]
   fi#1: size=1024, align=4, at location [SP]

BB#0: derived from LLVM BB %entry
         tBX_RET pred:14, pred:%noreg

# End machine code for function main.

before replace frame indices
# Machine code for function main: Post SSA
Frame Objects:
   fi#0: size=1024, align=4, at location [SP-1032]
   fi#1: size=1024, align=4, at location [SP-2056]
   fi#2: size=4, align=4, at location [SP-4]
   fi#3: size=4, align=4, at location [SP-8]
Constant Pool:
   cp#0: -2048, align=4
   cp#1: 2048, align=4

BB#0: derived from LLVM BB %entry
     Live Ins: %R4 %LR
         tPUSH pred:14, pred:%noreg, %R4<kill>, %LR<kill>,
%SP<imp-def>,
%SP<imp-use>; flags: FrameSetup
         %vreg0<def> = tLDRpci <cp#0>, pred:14, pred:%noreg; flags: 
FrameSetup tGPR:%vreg0
         %SP<def,tied1> = tADDhirr %SP<tied0>, %vreg0<kill>,
pred:14,
pred:%noreg; tGPR:%vreg0
         %vreg1<def> = tLDRpci <cp#1>, pred:14, pred:%noreg;
tGPR:%vreg1
         %SP<def,tied1> = tADDhirr %SP<tied0>, %vreg1<kill>,
pred:14,
pred:%noreg; tGPR:%vreg1
         tPOP_RET pred:14, pred:%noreg, %R4<def>, %PC<def>, 
%SP<imp-def>, %SP<imp-use>

# End machine code for function main.


On Thumb you can save/restore a register without having to use a spill 
slot, so the scavenger won't run into problems, but if a target had to 
spill, we would end up with a register save before the SP update, and 
restore after the SP update, and the RS would use the same offset in 
both instructions.
I don't have a working testcase (i.e. one that demonstrates the failure) 
that I can post, but if I cheat the RS into believing that it has to 
spill, the problem will happen.

Here's a sample result of this.  Don't mind the FixedStack-1, I 
explicitly used a base offset of 0 in the code, and this was to 
illustrate the lack of adjustment in RS:

         tSTRspi %R1<kill>, %SP, 0, pred:14, pred:%noreg; 
mem:ST4[FixedStack-1]    <- spill to *(SP+0)
         %R1<def> = tLDRpci <cp#1>, pred:14, pred:%noreg
         %SP<def,tied1> = tADDhirr %SP<tied0>, %R1<kill>,
pred:14,
pred:%noreg     <- SP = something different
         %R3<def> = tLDRspi %SP, 0, pred:14, pred:%noreg; 
mem:LD4[FixedStack-1]
         %R1<def> = tLDRspi %SP, 0, pred:14, pred:%noreg; 
mem:LD4[FixedStack-1]    <- restore from *(NewSP+0)   !!


-Krzysztof



On 9/26/2013 1:24 PM, Evan Cheng wrote:> CallFrameSetupOpcode is a pseudo opcode like X86::ADJCALLSTACKDOWN64.
> That means when the code is expected to be called before the pseudo
> instructions are eliminated. I don't know why it's not the case for
you.
> A quick look at PEI code indicates the pseudo's should not have been
> removed at the time when replaceFrameIndices are run.
>
> Evan
>
>
> On Sep 25, 2013, at 8:57 AM, Krzysztof Parzyszek
> <kparzysz at codeaurora.org <mailto:kparzysz at
codeaurora.org>> wrote:
>
>> Hi All,
>> I'm dealing with a problem where the spill/restore instructions
>> inserted during scavenging span an adjustment of the SP/FP register.
>>  The result is that despite the base register (SP/FP) being changed
>> between the spill and the restore, both store and load use the same
>> immediate offset.
>>
>> I see code in the PEI (replaceFrameIndices) that is supposed to track
>> the SP/FP adjustment:
>>
>> ----------------------------------------
>> void PEI::replaceFrameIndices(MachineBasicBlock *BB,
>>                              MachineFunction &Fn, int &SPAdj) {
>>  const TargetMachine &TM = Fn.getTarget();
>>  assert(TM.getRegisterInfo() &&
>>         "TM::getRegisterInfo() must be implemented!");
>>  const TargetInstrInfo &TII = *Fn.getTarget().getInstrInfo();
>>  const TargetRegisterInfo &TRI = *TM.getRegisterInfo();
>>  const TargetFrameLowering *TFI = TM.getFrameLowering();
>>  bool StackGrowsDown >>    TFI->getStackGrowthDirection()
=>>                TargetFrameLowering::StackGrowsDown;
>>  int FrameSetupOpcode   = TII.getCallFrameSetupOpcode();
>>  int FrameDestroyOpcode = TII.getCallFrameDestroyOpcode();
>>
>>  if (RS && !FrameIndexVirtualScavenging)
RS->enterBasicBlock(BB);
>>
>>  for (MachineBasicBlock::iterator I = BB->begin(); I !=
BB->end(); ) {
>>
>>    if (I->getOpcode() == FrameSetupOpcode ||
>>        I->getOpcode() == FrameDestroyOpcode) {
>>      // Remember how much SP has been adjusted to create the call
>>      // frame.
>>      int Size = I->getOperand(0).getImm();
>>
>>      if ((!StackGrowsDown && I->getOpcode() ==
FrameSetupOpcode) ||
>>          (StackGrowsDown && I->getOpcode() ==
FrameDestroyOpcode))
>>        Size = -Size;
>>
>>      SPAdj += Size;
>>
>>  [...]
>> ----------------------------------------
>>
>>
>> The problem is that it expects frame-setup and frame-destroy opcodes,
>> but at the time it runs (after emitPrologue/emitEpilogue) the frame
>> setup and teardown will be expanded into instruction sequences that
>> can be different for each target, let alone having the immediate value
>> in the 0-th operand.
>>
>> As I see, this code won't work, although I'm not sure what was
the
>> original idea behind it.  Should this code run before the target
>> specific generation of prolog/epilog?  Even then, there won't need
to
>> be ADJCALLSTACKUP/DOWN instructions (if it's a leaf function).  If
it
>> runs where it should, should it instead use some target-specific hook
>> that identifies the actual stack adjustment amount?
>>
>> -Krzysztof
>>
>>
>> --
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>> hosted by The Linux Foundation
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu
>> <mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu
>> <http://llvm.cs.uiuc.edu/>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Evan Cheng

2013-Sep-26 20:30 UTC

head link

[LLVMdev] Register scavenger and SP/FP adjustments

The code has changed a lot over the years. Looks like at some point of time the
assumption was broken. calculateCallsInformation() may have eliminated the
pseudo set up instructions already.

    // If call frames are not being included as part of the stack frame, and
    // the target doesn't indicate otherwise, remove the call frame pseudos
    // here. The sub/add sp instruction pairs are still inserted, but we
don't
    // need to track the SP adjustment for frame index elimination.
    if (TFI->canSimplifyCallFramePseudos(Fn))
=>    TFI->eliminateCallFramePseudoInstr(Fn, *I->getParent(), I);

Perhaps there is a bug in canSimplifyCallFramePseudos?

Evan

On Sep 26, 2013, at 12:00 PM, Krzysztof Parzyszek <kparzysz at
codeaurora.org> wrote:
> Consider this example:
> 
> --- ex.ll ---
> declare void @bar()
> 
> ; Function Attrs: nounwind optsize
> define void @main() {
> entry:
>  %hin = alloca [256 x i32], align 4
>  %xin = alloca [256 x i32], align 4
>  call void @bar()
>  ret void
> }
> -------------
> 
> 
> Freshly built llc:
> 
> llc -O2 -march=x86 < ex.ll -print-before-all
> 
> # *** IR Dump Before Prologue/Epilogue Insertion & Frame Finalization
***:
> # Machine code for function main: Post SSA
> Frame Objects:
>  fi#0: size=1024, align=4, at location [SP+4]
>  fi#1: size=1024, align=4, at location [SP+4]
> 
> BB#0: derived from LLVM BB %entry
>        ADJCALLSTACKDOWN32 0, %ESP<imp-def>,
%EFLAGS<imp-def,dead>, %ESP<imp-use>
>        CALLpcrel32 <ga:@bar>, <regmask>, %ESP<imp-use>,
%ESP<imp-def>
>        ADJCALLSTACKUP32 0, 0, %ESP<imp-def>,
%EFLAGS<imp-def,dead>, %ESP<imp-use>
>        RET
> 
> # End machine code for function main.
> 
> before replace frame indices
> # Machine code for function main: Post SSA
> Frame Objects:
>  fi#0: size=1024, align=4, at location [SP-1024]
>  fi#1: size=1024, align=4, at location [SP-2048]
> 
> BB#0: derived from LLVM BB %entry
>        %ESP<def,tied1> = SUB32ri %ESP<tied0>, 2060,
%EFLAGS<imp-def,dead>; flags: FrameSetup
>        PROLOG_LABEL <MCSym=.Ltmp0>
>        CALLpcrel32 <ga:@bar>, <regmask>, %ESP<imp-use>,
%ESP<imp-def>
>        %ESP<def,tied1> = ADD32ri %ESP<tied0>, 2060,
%EFLAGS<imp-def,dead>
>        RET
> 
> # End machine code for function main.
> 
> 
> 
> Let's see what happens if we remove the call to "bar".
> 
> There aren't any pseudocodes that set up the frame to begin with, even
though the SP is actually modified.  (This is to show that RS has no way of
finding out that SP was actually adjusted in such cases.)
> 
> 
> # *** IR Dump Before Prologue/Epilogue Insertion & Frame Finalization
***:
> # Machine code for function main: Post SSA
> Frame Objects:
>  fi#0: size=1024, align=4, at location [SP+4]
>  fi#1: size=1024, align=4, at location [SP+4]
> 
> BB#0: derived from LLVM BB %entry
>        RET
> 
> # End machine code for function main.
> 
> before replace frame indices
> # Machine code for function main: Post SSA
> Frame Objects:
>  fi#0: size=1024, align=4, at location [SP-1024]
>  fi#1: size=1024, align=4, at location [SP-2048]
> 
> BB#0: derived from LLVM BB %entry
>        %ESP<def,tied1> = SUB32ri %ESP<tied0>, 2048,
%EFLAGS<imp-def,dead>; flags: FrameSetup
>        PROLOG_LABEL <MCSym=.Ltmp0>
>        %ESP<def,tied1> = ADD32ri %ESP<tied0>, 2048,
%EFLAGS<imp-def,dead>
>        RET
> 
> # End machine code for function main.
> 
> 
> 
> And here's where the problem becomes more apparent.
> 
> Compile for Thumb and see that there is a virtual register used in the
frame setup:
> 
> # *** IR Dump Before Prologue/Epilogue Insertion & Frame Finalization
***:
> # Machine code for function main: Post SSA
> Frame Objects:
>  fi#0: size=1024, align=4, at location [SP]
>  fi#1: size=1024, align=4, at location [SP]
> 
> BB#0: derived from LLVM BB %entry
>        tBX_RET pred:14, pred:%noreg
> 
> # End machine code for function main.
> 
> before replace frame indices
> # Machine code for function main: Post SSA
> Frame Objects:
>  fi#0: size=1024, align=4, at location [SP-1032]
>  fi#1: size=1024, align=4, at location [SP-2056]
>  fi#2: size=4, align=4, at location [SP-4]
>  fi#3: size=4, align=4, at location [SP-8]
> Constant Pool:
>  cp#0: -2048, align=4
>  cp#1: 2048, align=4
> 
> BB#0: derived from LLVM BB %entry
>    Live Ins: %R4 %LR
>        tPUSH pred:14, pred:%noreg, %R4<kill>, %LR<kill>,
%SP<imp-def>, %SP<imp-use>; flags: FrameSetup
>        %vreg0<def> = tLDRpci <cp#0>, pred:14, pred:%noreg;
flags: FrameSetup tGPR:%vreg0
>        %SP<def,tied1> = tADDhirr %SP<tied0>,
%vreg0<kill>, pred:14, pred:%noreg; tGPR:%vreg0
>        %vreg1<def> = tLDRpci <cp#1>, pred:14, pred:%noreg;
tGPR:%vreg1
>        %SP<def,tied1> = tADDhirr %SP<tied0>,
%vreg1<kill>, pred:14, pred:%noreg; tGPR:%vreg1
>        tPOP_RET pred:14, pred:%noreg, %R4<def>, %PC<def>,
%SP<imp-def>, %SP<imp-use>
> 
> # End machine code for function main.
> 
> 
> On Thumb you can save/restore a register without having to use a spill
slot, so the scavenger won't run into problems, but if a target had to
spill, we would end up with a register save before the SP update, and restore
after the SP update, and the RS would use the same offset in both instructions.
> I don't have a working testcase (i.e. one that demonstrates the
failure) that I can post, but if I cheat the RS into believing that it has to
spill, the problem will happen.
> 
> Here's a sample result of this.  Don't mind the FixedStack-1, I
explicitly used a base offset of 0 in the code, and this was to illustrate the
lack of adjustment in RS:
> 
>        tSTRspi %R1<kill>, %SP, 0, pred:14, pred:%noreg;
mem:ST4[FixedStack-1]    <- spill to *(SP+0)
>        %R1<def> = tLDRpci <cp#1>, pred:14, pred:%noreg
>        %SP<def,tied1> = tADDhirr %SP<tied0>, %R1<kill>,
pred:14, pred:%noreg     <- SP = something different
>        %R3<def> = tLDRspi %SP, 0, pred:14, pred:%noreg;
mem:LD4[FixedStack-1]
>        %R1<def> = tLDRspi %SP, 0, pred:14, pred:%noreg;
mem:LD4[FixedStack-1]    <- restore from *(NewSP+0)   !!
> 
> 
> -Krzysztof
> 
> 
> 
> On 9/26/2013 1:24 PM, Evan Cheng wrote:
>> CallFrameSetupOpcode is a pseudo opcode like X86::ADJCALLSTACKDOWN64.
>> That means when the code is expected to be called before the pseudo
>> instructions are eliminated. I don't know why it's not the case
for you.
>> A quick look at PEI code indicates the pseudo's should not have
been
>> removed at the time when replaceFrameIndices are run.
>> 
>> Evan
>> 
>> 
>> On Sep 25, 2013, at 8:57 AM, Krzysztof Parzyszek
>> <kparzysz at codeaurora.org <mailto:kparzysz at
codeaurora.org>> wrote:
>> 
>>> Hi All,
>>> I'm dealing with a problem where the spill/restore instructions
>>> inserted during scavenging span an adjustment of the SP/FP
register.
>>> The result is that despite the base register (SP/FP) being changed
>>> between the spill and the restore, both store and load use the same
>>> immediate offset.
>>> 
>>> I see code in the PEI (replaceFrameIndices) that is supposed to
track
>>> the SP/FP adjustment:
>>> 
>>> ----------------------------------------
>>> void PEI::replaceFrameIndices(MachineBasicBlock *BB,
>>>                             MachineFunction &Fn, int
&SPAdj) {
>>> const TargetMachine &TM = Fn.getTarget();
>>> assert(TM.getRegisterInfo() &&
>>>        "TM::getRegisterInfo() must be implemented!");
>>> const TargetInstrInfo &TII = *Fn.getTarget().getInstrInfo();
>>> const TargetRegisterInfo &TRI = *TM.getRegisterInfo();
>>> const TargetFrameLowering *TFI = TM.getFrameLowering();
>>> bool StackGrowsDown >>>  
TFI->getStackGrowthDirection() =>>>              
TargetFrameLowering::StackGrowsDown;
>>> int FrameSetupOpcode   = TII.getCallFrameSetupOpcode();
>>> int FrameDestroyOpcode = TII.getCallFrameDestroyOpcode();
>>> 
>>> if (RS && !FrameIndexVirtualScavenging)
RS->enterBasicBlock(BB);
>>> 
>>> for (MachineBasicBlock::iterator I = BB->begin(); I !=
BB->end(); ) {
>>> 
>>>   if (I->getOpcode() == FrameSetupOpcode ||
>>>       I->getOpcode() == FrameDestroyOpcode) {
>>>     // Remember how much SP has been adjusted to create the call
>>>     // frame.
>>>     int Size = I->getOperand(0).getImm();
>>> 
>>>     if ((!StackGrowsDown && I->getOpcode() ==
FrameSetupOpcode) ||
>>>         (StackGrowsDown && I->getOpcode() ==
FrameDestroyOpcode))
>>>       Size = -Size;
>>> 
>>>     SPAdj += Size;
>>> 
>>> [...]
>>> ----------------------------------------
>>> 
>>> 
>>> The problem is that it expects frame-setup and frame-destroy
opcodes,
>>> but at the time it runs (after emitPrologue/emitEpilogue) the frame
>>> setup and teardown will be expanded into instruction sequences that
>>> can be different for each target, let alone having the immediate
value
>>> in the 0-th operand.
>>> 
>>> As I see, this code won't work, although I'm not sure what
was the
>>> original idea behind it.  Should this code run before the target
>>> specific generation of prolog/epilog?  Even then, there won't
need to
>>> be ADJCALLSTACKUP/DOWN instructions (if it's a leaf function). 
If it
>>> runs where it should, should it instead use some target-specific
hook
>>> that identifies the actual stack adjustment amount?
>>> 
>>> -Krzysztof
>>> 
>>> 
>>> --
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>> hosted by The Linux Foundation
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu
>>> <mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu
>>> <http://llvm.cs.uiuc.edu/>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
> 
> 
> -- 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
by The Linux Foundation
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130926/dc29c29a/attachment.html>

Krzysztof Parzyszek

2013-Sep-26 20:41 UTC

head link

[LLVMdev] Register scavenger and SP/FP adjustments

Thanks, I'll look into that.  Still, the case where the function does 
not call anything remains---in such a situation there are no 
ADJCALLSTACK pseudos, so regardless of what that function you pointed at 
does, there won't be any target-independent information about the SP 
adjustment by the time the frame index elimination runs.

Would it make sense to have ADJCALLSTACK pseudos every time there are 
objects to be allocated on the stack (regardless of whether the function 
is a leaf or not)?  What would be the implications of that?

An alternative approach would be to never use virtual registers in frame 
setup, but I'm not sure how popular that would be.  So far I have only 
seen that in the Thumb backend.

-Krzysztof


On 9/26/2013 3:30 PM, Evan Cheng wrote:> The code has changed a lot over the years. Looks like at some point of
> time the assumption was broken. calculateCallsInformation() may have
> eliminated the pseudo set up instructions already.
>
> // If call frames are not being included as part of the stack frame, and
> // the target doesn't indicate otherwise, remove the call frame pseudos
> // here. The sub/add sp instruction pairs are still inserted, but we
don't
> // need to track the SP adjustment for frame index elimination.
> if (TFI->canSimplifyCallFramePseudos(Fn))
> =>    TFI->eliminateCallFramePseudoInstr(Fn, *I->getParent(), I);
>
> Perhaps there is a bug in canSimplifyCallFramePseudos?
>
> Evan
>
> On Sep 26, 2013, at 12:00 PM, Krzysztof Parzyszek
> <kparzysz at codeaurora.org <mailto:kparzysz at
codeaurora.org>> wrote:
>
>> Consider this example:
>>
>> --- ex.ll ---
>> declare void @bar()
>>
>> ; Function Attrs: nounwind optsize
>> define void @main() {
>> entry:
>>  %hin = alloca [256 x i32], align 4
>>  %xin = alloca [256 x i32], align 4
>>  call void @bar()
>>  ret void
>> }
>> -------------
>>
>>
>> Freshly built llc:
>>
>> llc -O2 -march=x86 < ex.ll -print-before-all
>>
>> # *** IR Dump Before Prologue/Epilogue Insertion & Frame
Finalization ***:
>> # Machine code for function main: Post SSA
>> Frame Objects:
>>  fi#0: size=1024, align=4, at location [SP+4]
>>  fi#1: size=1024, align=4, at location [SP+4]
>>
>> BB#0: derived from LLVM BB %entry
>>        ADJCALLSTACKDOWN32 0, %ESP<imp-def>,
%EFLAGS<imp-def,dead>,
>> %ESP<imp-use>
>>        CALLpcrel32 <ga:@bar>, <regmask>,
%ESP<imp-use>, %ESP<imp-def>
>>        ADJCALLSTACKUP32 0, 0, %ESP<imp-def>,
%EFLAGS<imp-def,dead>,
>> %ESP<imp-use>
>>        RET
>>
>> # End machine code for function main.
>>
>> before replace frame indices
>> # Machine code for function main: Post SSA
>> Frame Objects:
>>  fi#0: size=1024, align=4, at location [SP-1024]
>>  fi#1: size=1024, align=4, at location [SP-2048]
>>
>> BB#0: derived from LLVM BB %entry
>>        %ESP<def,tied1> = SUB32ri %ESP<tied0>, 2060,
>> %EFLAGS<imp-def,dead>; flags: FrameSetup
>>        PROLOG_LABEL <MCSym=.Ltmp0>
>>        CALLpcrel32 <ga:@bar>, <regmask>,
%ESP<imp-use>, %ESP<imp-def>
>>        %ESP<def,tied1> = ADD32ri %ESP<tied0>, 2060,
%EFLAGS<imp-def,dead>
>>        RET
>>
>> # End machine code for function main.
>>
>>
>>
>> Let's see what happens if we remove the call to "bar".
>>
>> There aren't any pseudocodes that set up the frame to begin with,
even
>> though the SP is actually modified.  (This is to show that RS has no
>> way of finding out that SP was actually adjusted in such cases.)
>>
>>
>> # *** IR Dump Before Prologue/Epilogue Insertion & Frame
Finalization ***:
>> # Machine code for function main: Post SSA
>> Frame Objects:
>>  fi#0: size=1024, align=4, at location [SP+4]
>>  fi#1: size=1024, align=4, at location [SP+4]
>>
>> BB#0: derived from LLVM BB %entry
>>        RET
>>
>> # End machine code for function main.
>>
>> before replace frame indices
>> # Machine code for function main: Post SSA
>> Frame Objects:
>>  fi#0: size=1024, align=4, at location [SP-1024]
>>  fi#1: size=1024, align=4, at location [SP-2048]
>>
>> BB#0: derived from LLVM BB %entry
>>        %ESP<def,tied1> = SUB32ri %ESP<tied0>, 2048,
>> %EFLAGS<imp-def,dead>; flags: FrameSetup
>>        PROLOG_LABEL <MCSym=.Ltmp0>
>>        %ESP<def,tied1> = ADD32ri %ESP<tied0>, 2048,
%EFLAGS<imp-def,dead>
>>        RET
>>
>> # End machine code for function main.
>>
>>
>>
>> And here's where the problem becomes more apparent.
>>
>> Compile for Thumb and see that there is a virtual register used in the
>> frame setup:
>>
>> # *** IR Dump Before Prologue/Epilogue Insertion & Frame
Finalization ***:
>> # Machine code for function main: Post SSA
>> Frame Objects:
>>  fi#0: size=1024, align=4, at location [SP]
>>  fi#1: size=1024, align=4, at location [SP]
>>
>> BB#0: derived from LLVM BB %entry
>>        tBX_RET pred:14, pred:%noreg
>>
>> # End machine code for function main.
>>
>> before replace frame indices
>> # Machine code for function main: Post SSA
>> Frame Objects:
>>  fi#0: size=1024, align=4, at location [SP-1032]
>>  fi#1: size=1024, align=4, at location [SP-2056]
>>  fi#2: size=4, align=4, at location [SP-4]
>>  fi#3: size=4, align=4, at location [SP-8]
>> Constant Pool:
>>  cp#0: -2048, align=4
>>  cp#1: 2048, align=4
>>
>> BB#0: derived from LLVM BB %entry
>>    Live Ins: %R4 %LR
>>        tPUSH pred:14, pred:%noreg, %R4<kill>, %LR<kill>,
%SP<imp-def>,
>> %SP<imp-use>; flags: FrameSetup
>>        %vreg0<def> = tLDRpci <cp#0>, pred:14, pred:%noreg;
flags:
>> FrameSetup tGPR:%vreg0
>>        %SP<def,tied1> = tADDhirr %SP<tied0>,
%vreg0<kill>, pred:14,
>> pred:%noreg; tGPR:%vreg0
>>        %vreg1<def> = tLDRpci <cp#1>, pred:14, pred:%noreg;
tGPR:%vreg1
>>        %SP<def,tied1> = tADDhirr %SP<tied0>,
%vreg1<kill>, pred:14,
>> pred:%noreg; tGPR:%vreg1
>>        tPOP_RET pred:14, pred:%noreg, %R4<def>, %PC<def>,
>> %SP<imp-def>, %SP<imp-use>
>>
>> # End machine code for function main.
>>
>>
>> On Thumb you can save/restore a register without having to use a spill
>> slot, so the scavenger won't run into problems, but if a target had
to
>> spill, we would end up with a register save before the SP update, and
>> restore after the SP update, and the RS would use the same offset in
>> both instructions.
>> I don't have a working testcase (i.e. one that demonstrates the
>> failure) that I can post, but if I cheat the RS into believing that it
>> has to spill, the problem will happen.
>>
>> Here's a sample result of this.  Don't mind the FixedStack-1, I
>> explicitly used a base offset of 0 in the code, and this was to
>> illustrate the lack of adjustment in RS:
>>
>>        tSTRspi %R1<kill>, %SP, 0, pred:14, pred:%noreg;
>> mem:ST4[FixedStack-1]    <- spill to *(SP+0)
>>        %R1<def> = tLDRpci <cp#1>, pred:14, pred:%noreg
>>        %SP<def,tied1> = tADDhirr %SP<tied0>,
%R1<kill>, pred:14,
>> pred:%noreg     <- SP = something different
>>        %R3<def> = tLDRspi %SP, 0, pred:14, pred:%noreg;
>> mem:LD4[FixedStack-1]
>>        %R1<def> = tLDRspi %SP, 0, pred:14, pred:%noreg;
>> mem:LD4[FixedStack-1]    <- restore from *(NewSP+0)   !!
>>
>>
>> -Krzysztof
>>
>>
>>
>> On 9/26/2013 1:24 PM, Evan Cheng wrote:
>>> CallFrameSetupOpcode is a pseudo opcode like
X86::ADJCALLSTACKDOWN64.
>>> That means when the code is expected to be called before the pseudo
>>> instructions are eliminated. I don't know why it's not the
case for you.
>>> A quick look at PEI code indicates the pseudo's should not have
been
>>> removed at the time when replaceFrameIndices are run.
>>>
>>> Evan
>>>
>>>
>>> On Sep 25, 2013, at 8:57 AM, Krzysztof Parzyszek
>>> <kparzysz at codeaurora.org <mailto:kparzysz at
codeaurora.org>
>>> <mailto:kparzysz at codeaurora.org>> wrote:
>>>
>>>> Hi All,
>>>> I'm dealing with a problem where the spill/restore
instructions
>>>> inserted during scavenging span an adjustment of the SP/FP
register.
>>>> The result is that despite the base register (SP/FP) being
changed
>>>> between the spill and the restore, both store and load use the
same
>>>> immediate offset.
>>>>
>>>> I see code in the PEI (replaceFrameIndices) that is supposed to
track
>>>> the SP/FP adjustment:
>>>>
>>>> ----------------------------------------
>>>> void PEI::replaceFrameIndices(MachineBasicBlock *BB,
>>>>                             MachineFunction &Fn, int
&SPAdj) {
>>>> const TargetMachine &TM = Fn.getTarget();
>>>> assert(TM.getRegisterInfo() &&
>>>>        "TM::getRegisterInfo() must be implemented!");
>>>> const TargetInstrInfo &TII =
*Fn.getTarget().getInstrInfo();
>>>> const TargetRegisterInfo &TRI = *TM.getRegisterInfo();
>>>> const TargetFrameLowering *TFI = TM.getFrameLowering();
>>>> bool StackGrowsDown >>>>  
TFI->getStackGrowthDirection() =>>>>              
TargetFrameLowering::StackGrowsDown;
>>>> int FrameSetupOpcode   = TII.getCallFrameSetupOpcode();
>>>> int FrameDestroyOpcode = TII.getCallFrameDestroyOpcode();
>>>>
>>>> if (RS && !FrameIndexVirtualScavenging)
RS->enterBasicBlock(BB);
>>>>
>>>> for (MachineBasicBlock::iterator I = BB->begin(); I !=
BB->end(); ) {
>>>>
>>>>   if (I->getOpcode() == FrameSetupOpcode ||
>>>>       I->getOpcode() == FrameDestroyOpcode) {
>>>>     // Remember how much SP has been adjusted to create the
call
>>>>     // frame.
>>>>     int Size = I->getOperand(0).getImm();
>>>>
>>>>     if ((!StackGrowsDown && I->getOpcode() ==
FrameSetupOpcode) ||
>>>>         (StackGrowsDown && I->getOpcode() ==
FrameDestroyOpcode))
>>>>       Size = -Size;
>>>>
>>>>     SPAdj += Size;
>>>>
>>>> [...]
>>>> ----------------------------------------
>>>>
>>>>
>>>> The problem is that it expects frame-setup and frame-destroy
opcodes,
>>>> but at the time it runs (after emitPrologue/emitEpilogue) the
frame
>>>> setup and teardown will be expanded into instruction sequences
that
>>>> can be different for each target, let alone having the
immediate value
>>>> in the 0-th operand.
>>>>
>>>> As I see, this code won't work, although I'm not sure
what was the
>>>> original idea behind it.  Should this code run before the
target
>>>> specific generation of prolog/epilog?  Even then, there
won't need to
>>>> be ADJCALLSTACKUP/DOWN instructions (if it's a leaf
function).  If it
>>>> runs where it should, should it instead use some
target-specific hook
>>>> that identifies the actual stack adjustment amount?
>>>>
>>>> -Krzysztof
>>>>
>>>>
>>>> --
>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora
Forum,
>>>> hosted by The Linux Foundation
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>>> <mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu
>>>> <http://llvm.cs.uiuc.edu/>
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>>
>> --
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>> hosted by The Linux Foundation
>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Sep 2013 - [LLVMdev] Register scavenger and SP/FP adjustments

[LLVMdev] Register scavenger and SP/FP adjustments

[LLVMdev] Register scavenger and SP/FP adjustments

[LLVMdev] Register scavenger and SP/FP adjustments

Maybe Matching Threads