thr3ads.net - llvm dev - [LLVMdev] ScheduleDAGInstrs computes deps using IR Values that may be invalid [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Chad Rosier

2015-Feb-19 18:06 UTC

[LLVMdev] ScheduleDAGInstrs computes deps using IR Values that may be invalid

Hi All,
I've encountered an issue where tail merging MIs is causing a problem with
the post-RA MI scheduler dependency analysis and I'm not sure of the best
way to address the problem.

In my case, the branch folding pass (lib/CodeGen/BranchFolding.cpp) is
merging common code from BB#14 and BB#15 into BB#16.  It's clear that
there are 4 common instructions (marked with an *) in BB#14 and BB#15:

--------------------------------------------------------------------------------
BB#14: derived from LLVM BB %if.end.1
    Live Ins: %X16 %X17 %X18 %X7 %X0 %X6 %X4 %W8 %X15 %X14 %W3 %W2 %X1
%X10 %X11 %X12 %X13 %X9
    Predecessors according to CFG: BB#12
        %X5<def> = ADDXrr %X16, %X13
*       %W19<def> = LDRBBui %X5, 1;
mem:LD1[%scevgep95](tbaa=<0x6e02518>)
*       %W3<def> = MADDWrrr %W2<kill>, %W3<kill>, %WZR
*       %W2<def> = SUBWrr %W3<kill>, %W19<kill>
*       STRBBui %W2<kill>, %X5<kill>, 1;
mem:ST1[%scevgep95](tbaa=<0x6e02518>)
    Successors according to CFG: BB#16

BB#15: derived from LLVM BB %L20.1
    Live Ins: %X16 %X17 %X18 %X7 %X0 %X6 %X4 %W8 %X15 %X14 %W3 %W2 %X1
%X10 %X11 %X12 %X13 %X9
    Predecessors according to CFG: BB#12
        %X5<def> = ADDXrr %X17, %X13
*       %W19<def> = LDRBBui %X5, 1;
mem:LD1[%scevgep91](tbaa=<0x6e02518>)
*       %W3<def> = MADDWrrr %W2<kill>, %W3<kill>, %WZR
*       %W2<def> = SUBWrr %W3<kill>, %W19<kill>
*       STRBBui %W2<kill>, %X5<kill>, 1;
mem:ST1[%scevgep91](tbaa=<0x6e02518>)
    Successors according to CFG: BB#16
--------------------------------------------------------------------------------

The pass splits the edges between BB#14/BB#15 and inserts the common code
into a new common successor BB#17.  BB#17 is then merged into BB#16:

--------------------------------------------------------------------------------
BB#16: derived from LLVM BB %L30.1
    Predecessors according to CFG: BB#15 BB#14
        %W19<def> = LDRBBui %X5, 1;
mem:LD1[%scevgep91](tbaa=<0x6e02518>)
    (BB#15)
        %W3<def> = MADDWrrr %W2<kill>, %W3<kill>, %WZR
    (BB#15)
        %W2<def> = SUBWrr %W3<kill>, %W19<kill>
    (BB#15)
        STRBBui %W2<kill>, %X5<kill>, 1;
mem:ST1[%scevgep91](tbaa=<0x6e02518>) (BB#15)

        %W7<def> = LDRBBui %X7<kill>, 1;
mem:LD1[%scevgep99](tbaa=<0x6e02518>)
        %W0<def> = LDRSBWui %X0<kill>, 1;
mem:LD1[%scevgep101](tbaa=<0x6e02518>)
        %W6<def> = LDRBBui %X6<kill>, 1;
mem:LD1[%scevgep103](tbaa=<0x6e02518>)
        %W5<def> = MADDWrrr %W6<kill>, %W0<kill>,
%W7<kill>
        %X9<def> = ADDXri %X9<kill>, 2, 0
        %X13<def> = ADDXri %X13<kill>, 2, 0
        %WZR<def,dead> = SUBSWrr %W4, %W8, %NZCV<imp-def>,
%X4<imp-use,kill>
        STRBBui %W5<kill>, %X18<kill>, 1;
mem:ST1[%31](tbaa=<0x6e02518>)
        Bcc 1, <BB#9>, %NZCV<imp-use>
    Successors according to CFG: BB#13(4) BB#9(124)
--------------------------------------------------------------------------------

This transformation looks fine, IMO.  However, notice that the merged
instructions are derived from BB#15.  The memory operations in BB#15 were
accessing array 'C', while the memory operations in BB#14 were accessing
array 'B'.

The next two loads

        %W7<def> = LDRBBui %X7<kill>, 1;
mem:LD1[%scevgep99](tbaa=<0x6e02518>)
        %W0<def> = LDRSBWui %X0<kill>, 1;
mem:LD1[%scevgep101](tbaa=<0x6e02518>)

load from array 'B' and 'C', respectively.  The problem occurs
because
ScheduleDAGInstrs does not correctly identify the true dependency between
the

        STRBBui %W2<kill>, %X5<kill>, 1;
mem:ST1[%scevgep91](tbaa=<0x6e02518>)

and

        %W0<def> = LDRSBWui %X0<kill>, 1;
mem:LD1[%scevgep101](tbaa=<0x6e02518>)

instructions.  AFAICT, the problem is that the Values associated with the
MI's memory operands are used to determine the underlying objects accessed
by a memory operation (See getUnderlyingObjectsForInstr() in
ScheduleDAGInstrs.cpp).  In turn the dependency analysis relies on these
Values.

When the STRBBui is merged into BB#16 it is derived from BB#15.  Thus, the
STRBBui associated Value is 'C'.  The dependency analysis detects the
dependency on array 'C', but misses the dependency on 'B'. 
Later, the
scheduler hoists the LDRSBWui above the STRBBui and violates a true
dependency.  Boo..

Assuming my analysis is correct, the first solution that comes to mind is
to disable tail merging when we encounter a memory operation (unless they
access the same objects).  Thoughts?

I'm not particularly fond of inspecting the Values associated with MI
operands for dependency analysis, but it's what we have now..

I would like to know everyones thoughts.  Is there a more robust solution
I'm missing?  What happens if another MI-level pass performs a similar
type of merging?

 Chad

Hal Finkel

2015-Feb-19 18:23 UTC

head link

[LLVMdev] ScheduleDAGInstrs computes deps using IR Values that may be invalid

----- Original Message -----> From: "Chad Rosier" <mcrosier at codeaurora.org>
> To: llvmdev at cs.uiuc.edu
> Sent: Thursday, February 19, 2015 12:06:10 PM
> Subject: [LLVMdev] ScheduleDAGInstrs computes deps using IR Values that may
be invalid
> 
> Hi All,
> I've encountered an issue where tail merging MIs is causing a problem
> with
> the post-RA MI scheduler dependency analysis and I'm not sure of the
> best
> way to address the problem.
> 
> In my case, the branch folding pass (lib/CodeGen/BranchFolding.cpp)
> is
> merging common code from BB#14 and BB#15 into BB#16.  It's clear that
> there are 4 common instructions (marked with an *) in BB#14 and
> BB#15:
> 
>
--------------------------------------------------------------------------------
> BB#14: derived from LLVM BB %if.end.1
>     Live Ins: %X16 %X17 %X18 %X7 %X0 %X6 %X4 %W8 %X15 %X14 %W3 %W2
>     %X1
> %X10 %X11 %X12 %X13 %X9
>     Predecessors according to CFG: BB#12
>         %X5<def> = ADDXrr %X16, %X13
> *       %W19<def> = LDRBBui %X5, 1;
> mem:LD1[%scevgep95](tbaa=<0x6e02518>)
> *       %W3<def> = MADDWrrr %W2<kill>, %W3<kill>, %WZR
> *       %W2<def> = SUBWrr %W3<kill>, %W19<kill>
> *       STRBBui %W2<kill>, %X5<kill>, 1;
> mem:ST1[%scevgep95](tbaa=<0x6e02518>)
>     Successors according to CFG: BB#16
> 
> BB#15: derived from LLVM BB %L20.1
>     Live Ins: %X16 %X17 %X18 %X7 %X0 %X6 %X4 %W8 %X15 %X14 %W3 %W2
>     %X1
> %X10 %X11 %X12 %X13 %X9
>     Predecessors according to CFG: BB#12
>         %X5<def> = ADDXrr %X17, %X13
> *       %W19<def> = LDRBBui %X5, 1;
> mem:LD1[%scevgep91](tbaa=<0x6e02518>)
> *       %W3<def> = MADDWrrr %W2<kill>, %W3<kill>, %WZR
> *       %W2<def> = SUBWrr %W3<kill>, %W19<kill>
> *       STRBBui %W2<kill>, %X5<kill>, 1;
> mem:ST1[%scevgep91](tbaa=<0x6e02518>)
>     Successors according to CFG: BB#16
>
--------------------------------------------------------------------------------
> 
> The pass splits the edges between BB#14/BB#15 and inserts the common
> code
> into a new common successor BB#17.  BB#17 is then merged into BB#16:
> 
>
--------------------------------------------------------------------------------
> BB#16: derived from LLVM BB %L30.1
>     Predecessors according to CFG: BB#15 BB#14
>         %W19<def> = LDRBBui %X5, 1;
>         mem:LD1[%scevgep91](tbaa=<0x6e02518>)
>     (BB#15)
>         %W3<def> = MADDWrrr %W2<kill>, %W3<kill>, %WZR
>     (BB#15)
>         %W2<def> = SUBWrr %W3<kill>, %W19<kill>
>     (BB#15)
>         STRBBui %W2<kill>, %X5<kill>, 1;
> mem:ST1[%scevgep91](tbaa=<0x6e02518>) (BB#15)
> 
>         %W7<def> = LDRBBui %X7<kill>, 1;
> mem:LD1[%scevgep99](tbaa=<0x6e02518>)
>         %W0<def> = LDRSBWui %X0<kill>, 1;
> mem:LD1[%scevgep101](tbaa=<0x6e02518>)
>         %W6<def> = LDRBBui %X6<kill>, 1;
> mem:LD1[%scevgep103](tbaa=<0x6e02518>)
>         %W5<def> = MADDWrrr %W6<kill>, %W0<kill>,
%W7<kill>
>         %X9<def> = ADDXri %X9<kill>, 2, 0
>         %X13<def> = ADDXri %X13<kill>, 2, 0
>         %WZR<def,dead> = SUBSWrr %W4, %W8, %NZCV<imp-def>,
>         %X4<imp-use,kill>
>         STRBBui %W5<kill>, %X18<kill>, 1;
>         mem:ST1[%31](tbaa=<0x6e02518>)
>         Bcc 1, <BB#9>, %NZCV<imp-use>
>     Successors according to CFG: BB#13(4) BB#9(124)
>
--------------------------------------------------------------------------------
> 
> This transformation looks fine, IMO.  However, notice that the merged
> instructions are derived from BB#15.  The memory operations in BB#15
> were
> accessing array 'C', while the memory operations in BB#14 were
> accessing
> array 'B'.
> 
> The next two loads
> 
>         %W7<def> = LDRBBui %X7<kill>, 1;
> mem:LD1[%scevgep99](tbaa=<0x6e02518>)
>         %W0<def> = LDRSBWui %X0<kill>, 1;
> mem:LD1[%scevgep101](tbaa=<0x6e02518>)
> 
> load from array 'B' and 'C', respectively.  The problem
occurs
> because
> ScheduleDAGInstrs does not correctly identify the true dependency
> between
> the
> 
>         STRBBui %W2<kill>, %X5<kill>, 1;
> mem:ST1[%scevgep91](tbaa=<0x6e02518>)
> 
> and
> 
>         %W0<def> = LDRSBWui %X0<kill>, 1;
> mem:LD1[%scevgep101](tbaa=<0x6e02518>)
> 
> instructions.  AFAICT, the problem is that the Values associated with
> the
> MI's memory operands are used to determine the underlying objects
> accessed
> by a memory operation (See getUnderlyingObjectsForInstr() in
> ScheduleDAGInstrs.cpp).  In turn the dependency analysis relies on
> these
> Values.
> 
> When the STRBBui is merged into BB#16 it is derived from BB#15.
>  Thus, the
> STRBBui associated Value is 'C'.  The dependency analysis detects
the
> dependency on array 'C', but misses the dependency on 'B'. 
Later,
> the
> scheduler hoists the LDRSBWui above the STRBBui and violates a true
> dependency.  Boo..
> 
> Assuming my analysis is correct, the first solution that comes to
> mind is
> to disable tail merging when we encounter a memory operation (unless
> they
> access the same objects).  Thoughts?
> 
> I'm not particularly fond of inspecting the Values associated with MI
> operands for dependency analysis, but it's what we have now..
> 
> I would like to know everyones thoughts.  Is there a more robust
> solution
> I'm missing?  What happens if another MI-level pass performs a
> similar
> type of merging?
You can also remove the MMOs when you tail merge, or add multiple MMOs.
Currently the two are equivalent because we ignore all MMOs on instructions with
multiple MMOs in ScheduleDAGInstrs.cpp. It would be really nice, however, to
support multiple MMOs, so work here would certainly be appreciated.

 -Hal
> 
>  Chad
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Chad Rosier

2015-Feb-19 19:02 UTC

head link

[LLVMdev] ScheduleDAGInstrs computes deps using IR Values that may be invalid

> ----- Original Message -----
>> From: "Chad Rosier" <mcrosier at codeaurora.org>
>> To: llvmdev at cs.uiuc.edu
>> Sent: Thursday, February 19, 2015 12:06:10 PM
>> Subject: [LLVMdev] ScheduleDAGInstrs computes deps using IR Values that
>> may	be invalid
>>
>> Hi All,
>> I've encountered an issue where tail merging MIs is causing a
problem
>> with
>> the post-RA MI scheduler dependency analysis and I'm not sure of
the
>> best
>> way to address the problem.
>>
>> In my case, the branch folding pass (lib/CodeGen/BranchFolding.cpp)
>> is
>> merging common code from BB#14 and BB#15 into BB#16.  It's clear
that
>> there are 4 common instructions (marked with an *) in BB#14 and
>> BB#15:
>>
>>
--------------------------------------------------------------------------------
>> BB#14: derived from LLVM BB %if.end.1
>>     Live Ins: %X16 %X17 %X18 %X7 %X0 %X6 %X4 %W8 %X15 %X14 %W3 %W2
>>     %X1
>> %X10 %X11 %X12 %X13 %X9
>>     Predecessors according to CFG: BB#12
>>         %X5<def> = ADDXrr %X16, %X13
>> *       %W19<def> = LDRBBui %X5, 1;
>> mem:LD1[%scevgep95](tbaa=<0x6e02518>)
>> *       %W3<def> = MADDWrrr %W2<kill>, %W3<kill>,
%WZR
>> *       %W2<def> = SUBWrr %W3<kill>, %W19<kill>
>> *       STRBBui %W2<kill>, %X5<kill>, 1;
>> mem:ST1[%scevgep95](tbaa=<0x6e02518>)
>>     Successors according to CFG: BB#16
>>
>> BB#15: derived from LLVM BB %L20.1
>>     Live Ins: %X16 %X17 %X18 %X7 %X0 %X6 %X4 %W8 %X15 %X14 %W3 %W2
>>     %X1
>> %X10 %X11 %X12 %X13 %X9
>>     Predecessors according to CFG: BB#12
>>         %X5<def> = ADDXrr %X17, %X13
>> *       %W19<def> = LDRBBui %X5, 1;
>> mem:LD1[%scevgep91](tbaa=<0x6e02518>)
>> *       %W3<def> = MADDWrrr %W2<kill>, %W3<kill>,
%WZR
>> *       %W2<def> = SUBWrr %W3<kill>, %W19<kill>
>> *       STRBBui %W2<kill>, %X5<kill>, 1;
>> mem:ST1[%scevgep91](tbaa=<0x6e02518>)
>>     Successors according to CFG: BB#16
>>
--------------------------------------------------------------------------------
>>
>> The pass splits the edges between BB#14/BB#15 and inserts the common
>> code
>> into a new common successor BB#17.  BB#17 is then merged into BB#16:
>>
>>
--------------------------------------------------------------------------------
>> BB#16: derived from LLVM BB %L30.1
>>     Predecessors according to CFG: BB#15 BB#14
>>         %W19<def> = LDRBBui %X5, 1;
>>         mem:LD1[%scevgep91](tbaa=<0x6e02518>)
>>     (BB#15)
>>         %W3<def> = MADDWrrr %W2<kill>, %W3<kill>,
%WZR
>>     (BB#15)
>>         %W2<def> = SUBWrr %W3<kill>, %W19<kill>
>>     (BB#15)
>>         STRBBui %W2<kill>, %X5<kill>, 1;
>> mem:ST1[%scevgep91](tbaa=<0x6e02518>) (BB#15)
>>
>>         %W7<def> = LDRBBui %X7<kill>, 1;
>> mem:LD1[%scevgep99](tbaa=<0x6e02518>)
>>         %W0<def> = LDRSBWui %X0<kill>, 1;
>> mem:LD1[%scevgep101](tbaa=<0x6e02518>)
>>         %W6<def> = LDRBBui %X6<kill>, 1;
>> mem:LD1[%scevgep103](tbaa=<0x6e02518>)
>>         %W5<def> = MADDWrrr %W6<kill>, %W0<kill>,
%W7<kill>
>>         %X9<def> = ADDXri %X9<kill>, 2, 0
>>         %X13<def> = ADDXri %X13<kill>, 2, 0
>>         %WZR<def,dead> = SUBSWrr %W4, %W8, %NZCV<imp-def>,
>>         %X4<imp-use,kill>
>>         STRBBui %W5<kill>, %X18<kill>, 1;
>>         mem:ST1[%31](tbaa=<0x6e02518>)
>>         Bcc 1, <BB#9>, %NZCV<imp-use>
>>     Successors according to CFG: BB#13(4) BB#9(124)
>>
--------------------------------------------------------------------------------
>>
>> This transformation looks fine, IMO.  However, notice that the merged
>> instructions are derived from BB#15.  The memory operations in BB#15
>> were
>> accessing array 'C', while the memory operations in BB#14 were
>> accessing
>> array 'B'.
>>
>> The next two loads
>>
>>         %W7<def> = LDRBBui %X7<kill>, 1;
>> mem:LD1[%scevgep99](tbaa=<0x6e02518>)
>>         %W0<def> = LDRSBWui %X0<kill>, 1;
>> mem:LD1[%scevgep101](tbaa=<0x6e02518>)
>>
>> load from array 'B' and 'C', respectively.  The problem
occurs
>> because
>> ScheduleDAGInstrs does not correctly identify the true dependency
>> between
>> the
>>
>>         STRBBui %W2<kill>, %X5<kill>, 1;
>> mem:ST1[%scevgep91](tbaa=<0x6e02518>)
>>
>> and
>>
>>         %W0<def> = LDRSBWui %X0<kill>, 1;
>> mem:LD1[%scevgep101](tbaa=<0x6e02518>)
>>
>> instructions.  AFAICT, the problem is that the Values associated with
>> the
>> MI's memory operands are used to determine the underlying objects
>> accessed
>> by a memory operation (See getUnderlyingObjectsForInstr() in
>> ScheduleDAGInstrs.cpp).  In turn the dependency analysis relies on
>> these
>> Values.
>>
>> When the STRBBui is merged into BB#16 it is derived from BB#15.
>>  Thus, the
>> STRBBui associated Value is 'C'.  The dependency analysis
detects the
>> dependency on array 'C', but misses the dependency on
'B'.  Later,
>> the
>> scheduler hoists the LDRSBWui above the STRBBui and violates a true
>> dependency.  Boo..
>>
>> Assuming my analysis is correct, the first solution that comes to
>> mind is
>> to disable tail merging when we encounter a memory operation (unless
>> they
>> access the same objects).  Thoughts?
>>
>> I'm not particularly fond of inspecting the Values associated with
MI
>> operands for dependency analysis, but it's what we have now..
>>
>> I would like to know everyones thoughts.  Is there a more robust
>> solution
>> I'm missing?  What happens if another MI-level pass performs a
>> similar
>> type of merging?
>
> You can also remove the MMOs when you tail merge, or add multiple MMOs.
> Currently the two are equivalent because we ignore all MMOs on
> instructions with multiple MMOs in ScheduleDAGInstrs.cpp. It would be
> really nice, however, to support multiple MMOs, so work here would
> certainly be appreciated.
Thanks for the advice, Hal.  That sounds like a much better solution now
that I better understand how ScheduleDAGInstrs.cpp uses MMOs.
>
>  -Hal
>
>>
>>  Chad
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>

llvm dev - Feb 2015 - [LLVMdev] ScheduleDAGInstrs computes deps using IR Values that may be invalid

[LLVMdev] ScheduleDAGInstrs computes deps using IR Values that may be invalid

[LLVMdev] ScheduleDAGInstrs computes deps using IR Values that may be invalid

[LLVMdev] ScheduleDAGInstrs computes deps using IR Values that may be invalid