thr3ads.net - llvm dev - [llvm-dev] Enforcing in post-RA scheduling to keep (two) MachineInstrs together [Feb 2017]

If this information is useful, please help other people find it:
Share via:

Alex Susu via llvm-dev

2017-Feb-10 20:52 UTC

[llvm-dev] Enforcing in post-RA scheduling to keep (two) MachineInstrs together

Hello.
     I am using the post-RA (Register Allocation) scheduler to avoid data
hazards by
inserting other USEFUL instructions from the program (besides NOPs) and it
breaks apart
some sequences of instructions which should remain "glued" together.
     More exactly, in my [Target]ISelDAGToDAG.cpp it is possible that I replace
for
example a BUILD_VECTOR with a machine SDNode called VLOAD_D_WO_IMM and an
INLINEASM, the
latter having a simple dataflow dependence (black solid edge when outputting the
DAG as a
.DOT after instruction selection) on the result of the former instruction. (I
can present
the .DOT after instruction selection obtained with llc -view-sched-dags).
     When I run the default pre-RA scheduler (which seems to be a "List
Scheduling"
algorithm)  I always obtain the ASM generated code where the string of the
INLINEASM
follows immediately after the associated asm instruction for the VLOAD_D_WO_IMM.
But when
I use also the post-RA scheduler (llc -post-RA-scheduler ...) I get some
different
instructions inserted between the VLOAD_D_WO_IMM and the INLINEASM, which is not
correct
semantically.

     How can I avoid these 2 instructions being separated by the post-RA
scheduler? Can I
customize the behavior of the post-RA scheduler (I found some documentation at 
http://llvm.org/docs/doxygen/html/PostRASchedulerList_8cpp.html)?

     The first natural idea was to use SelectionDAG glue edges, but I noticed
that they
are not very reliable (sometimes I even have difficulties in creating them for
example in
the classes [Target]ISelDAGToDAG, [Target]ISelLowering). Also I understood that
anyhow the
scheduler can disregard the glue edges between SelectionDAG nodes. For example:
         - from http://lists.llvm.org/pipermail/llvm-dev/2014-June/074046.html
             <<You can't Glue the two nodes together forever. All Glue
really does is
             keep them together long enough for LLVM to put together a data
             dependency through "Uses" and "Defs" implicit
operands. Once the
             MachineInstrs have been created, the two instructions are at the
whim
             of the scheduler as much as any others.
             If you really need them to remain together, you have to either
create
             a pseudo-instruction and expand it extremely late, or create a
bundle
             (depending on what's natural for your target).>>
         - from http://lists.llvm.org/pipermail/llvm-dev/2016-June/100885.html:
             <<If you want to have these nodes stick together, using glue
may not be
             sufficient.  After the machine instructions are generated, the
scheduler
             may place instructions between the interrupt disable/restore and
the
             atomic load itself.  Also, the register allocator may insert some
spills
             there---there are ways that this sequence may get separated.
             For this, the best approach may be to define a pseudo-instruction,
which
             will be expanded into real instruction in the post-RA expansion
pass.>>

     Also, I don't want to use MachineInstr bundles or pseudo-instructions.
MachineInstr
bundles seem to difficult to use and too late in the code generation (I prefer
working at
the level of instruction selection). Also, I found little information about 
pseudo-instructions - there is some API support, namely expandPostRAPseudo()
described at
http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html. Also, some 
documentation at 
http://llvm.org/devmtg/2014-04/PDFs/Talks/Building%20an%20LLVM%20backend.pdf,
slide 55
(and 53, 54).

    Please let me know if I can customize the post-RA scheduler to avoid
scheduling in
non-consecutive cycles my two SDNodes created "together" or if you
recommend a different
approach.

   Thank you very much,
     Alex

Matthias Braun via llvm-dev

2017-Feb-10 21:26 UTC

head link

[llvm-dev] Enforcing in post-RA scheduling to keep (two) MachineInstrs together

> On Feb 10, 2017, at 12:52 PM, Alex Susu via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>  Hello.
>    I am using the post-RA (Register Allocation) scheduler to avoid data
hazards by inserting other USEFUL instructions from the program (besides NOPs)
and it breaks apart some sequences of instructions which should remain
"glued" together.
>    More exactly, in my [Target]ISelDAGToDAG.cpp it is possible that I
replace for example a BUILD_VECTOR with a machine SDNode called VLOAD_D_WO_IMM
and an INLINEASM, the latter having a simple dataflow dependence (black solid
edge when outputting the DAG as a .DOT after instruction selection) on the
result of the former instruction. (I can present the .DOT after instruction
selection obtained with llc -view-sched-dags).
>    When I run the default pre-RA scheduler (which seems to be a "List
Scheduling" algorithm)  I always obtain the ASM generated code where the
string of the INLINEASM follows immediately after the associated asm instruction
for the VLOAD_D_WO_IMM. But when I use also the post-RA scheduler (llc
-post-RA-scheduler ...) I get some different instructions inserted between the
VLOAD_D_WO_IMM and the INLINEASM, which is not correct semantically.
> 
>    How can I avoid these 2 instructions being separated by the post-RA
scheduler? Can I customize the behavior of the post-RA scheduler (I found some
documentation at
http://llvm.org/docs/doxygen/html/PostRASchedulerList_8cpp.html)?
> 
>    The first natural idea was to use SelectionDAG glue edges, but I noticed
that they are not very reliable (sometimes I even have difficulties in creating
them for example in the classes [Target]ISelDAGToDAG, [Target]ISelLowering).
Also I understood that anyhow the scheduler can disregard the glue edges between
SelectionDAG nodes. For example:
>        - from
http://lists.llvm.org/pipermail/llvm-dev/2014-June/074046.html
>            <<You can't Glue the two nodes together forever. All
Glue really does is
>            keep them together long enough for LLVM to put together a data
>            dependency through "Uses" and "Defs"
implicit operands. Once the
>            MachineInstrs have been created, the two instructions are at the
whim
>            of the scheduler as much as any others.
>            If you really need them to remain together, you have to either
create
>            a pseudo-instruction and expand it extremely late, or create a
bundle
>            (depending on what's natural for your target).>>
>        - from
http://lists.llvm.org/pipermail/llvm-dev/2016-June/100885.html:
>            <<If you want to have these nodes stick together, using
glue may not be
>            sufficient.  After the machine instructions are generated, the
scheduler
>            may place instructions between the interrupt disable/restore and
the
>            atomic load itself.  Also, the register allocator may insert
some spills
>            there---there are ways that this sequence may get separated.
>            For this, the best approach may be to define a
pseudo-instruction, which
>            will be expanded into real instruction in the post-RA expansion
pass.>>
> 
>    Also, I don't want to use MachineInstr bundles or
pseudo-instructions. MachineInstr bundles seem to difficult to use and too late
in the code generation (I prefer working at the level of instruction selection).
Also, I found little information about pseudo-instructions - there is some API
support, namely expandPostRAPseudo() described at
http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html. Also, some
documentation at
http://llvm.org/devmtg/2014-04/PDFs/Talks/Building%20an%20LLVM%20backend.pdf,
slide 55 (and 53, 54).Well if it is two instructions, then there is always a chance that some pass
moves them around or inserts new instructions in between (esp. regalloc may
insert spills/reloads/copies). The only guaranteed solution is indeed to a
pseudo instruction or an instruction bundle so the instructions look like a
single unit to codegen.

That said, if you use the PostMachineScheduler you can insert a schedule dag
mutation in createPostMachineScheduler() that adds a cluster edge between the
two nodes so the scheduler tries hard to keep them together. Unfortunately this
doesn't work always today because the schedulemodel is always checked for
stalls first (Pending vs. Available lists in the MachineScheduler) before the
scheduler even checks its usual cost function with the cluster heuristic.

- Matthias

Krzysztof Parzyszek via llvm-dev

2017-Feb-10 21:36 UTC

head link

[llvm-dev] Enforcing in post-RA scheduling to keep (two) MachineInstrs together

On 2/10/2017 3:26 PM, Matthias Braun via llvm-dev wrote:> That said, if you use the PostMachineScheduler you can insert a schedule
dag mutation in createPostMachineScheduler() that adds a cluster edge between
the two nodes so the scheduler tries hard to keep them together. Unfortunately
this doesn't work always today because the schedulemodel is always checked
for stalls first (Pending vs. Available lists in the MachineScheduler) before
the scheduler even checks its usual cost function with the cluster heuristic.
You can do that with the regular post-RA scheduler as well via 
"TargetSubtargetInfo::getPostRAMutations".

-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Alex Susu via llvm-dev

2017-Feb-13 03:02 UTC

head link

[llvm-dev] Enforcing in post-RA scheduling to keep (two) MachineInstrs together

Hello.
     After looking at the debug information from llc, it seems actually the
pre-RA
scheduler (NOT the post-RA scheduler) is the one breaking my INLINEASM SDNodes
from the
"associated" instructions in my program, (there is a simple dataflow
edge between the
INLINEASM and the associated node).

     Is it possible to generate instruction bundles (or pseudo-instructions) in
the pre-RA
scheduler pass? At http://llvm.org/docs/CodeGenerator.html#machineinstr-bundles
it is
written that: "Packing / bundling of MachineInstr’s should be done as part
of the register
allocation super-pass.", etc.

     Matthias, thank you for pointing out that at least the register allocator
can move
around my 2 instructions - but note that a MachineSDNode with one destination
register and
an immediate value and a consecutive INLINEASM (which has no register) should
NOT be
separated by the register allocator. What other passes from llc (llc -O3) would
you
believe could separate my 2 instructions?

     I will read about mutations in the documentation (for example, 
http://llvm.org/docs/doxygen/html/classllvm_1_1ScheduleDAGMI.html and 
http://llvm.org/docs/doxygen/html/MachineScheduler_8h_source.html) .

   Thank you,
     Alex


On 2/10/2017 11:36 PM, Krzysztof Parzyszek via llvm-dev
wrote:> On 2/10/2017 3:26 PM, Matthias Braun via llvm-dev wrote:
>> That said, if you use the PostMachineScheduler you can insert a
schedule dag mutation
>> in createPostMachineScheduler() that adds a cluster edge between the
two nodes so
>> the scheduler tries hard to keep them together. Unfortunately this
doesn't work
>> always today because the schedulemodel is always checked for stalls
first (Pending
>> vs. Available lists in the MachineScheduler) before the scheduler even
checks its
>> usual cost function with the cluster heuristic.
>
> You can do that with the regular post-RA scheduler as well via
> "TargetSubtargetInfo::getPostRAMutations".
>
> -Krzysztof



With best regards,
     Alex Susu

On 2/10/2017 11:26 PM, Matthias Braun wrote:>
>> On Feb 10, 2017, at 12:52 PM, Alex Susu via llvm-dev <llvm-dev at
lists.llvm.org>
>> wrote:
>>
>> Hello. I am using the post-RA (Register Allocation) scheduler to avoid
data hazards
>> by inserting other USEFUL instructions from the program (besides NOPs)
and it breaks
>> apart some sequences of instructions which should remain
"glued" together. More
>> exactly, in my [Target]ISelDAGToDAG.cpp it is possible that I replace
for example a
>> BUILD_VECTOR with a machine SDNode called VLOAD_D_WO_IMM and an
INLINEASM, the latter
>> having a simple dataflow dependence (black solid edge when outputting
the DAG as a
>> .DOT after instruction selection) on the result of the former
instruction. (I can
>> present the .DOT after instruction selection obtained with llc
-view-sched-dags).
>> When I run the default pre-RA scheduler (which seems to be a "List
Scheduling"
>> algorithm)  I always obtain the ASM generated code where the string of
the INLINEASM
>> follows immediately after the associated asm instruction for the
VLOAD_D_WO_IMM. But
>> when I use also the post-RA scheduler (llc -post-RA-scheduler ...) I
get some
>> different instructions inserted between the VLOAD_D_WO_IMM and the
INLINEASM, which
>> is not correct semantically.
>>
>> How can I avoid these 2 instructions being separated by the post-RA
scheduler? Can I
>> customize the behavior of the post-RA scheduler (I found some
documentation at
>> http://llvm.org/docs/doxygen/html/PostRASchedulerList_8cpp.html)?
>>
>> The first natural idea was to use SelectionDAG glue edges, but I
noticed that they
>> are not very reliable (sometimes I even have difficulties in creating
them for
>> example in the classes [Target]ISelDAGToDAG, [Target]ISelLowering).
Also I understood
>> that anyhow the scheduler can disregard the glue edges between
SelectionDAG nodes.
>> For example: - from
http://lists.llvm.org/pipermail/llvm-dev/2014-June/074046.html
>> <<You can't Glue the two nodes together forever. All Glue
really does is keep them
>> together long enough for LLVM to put together a data dependency through
"Uses" and
>> "Defs" implicit operands. Once the MachineInstrs have been
created, the two
>> instructions are at the whim of the scheduler as much as any others. If
you really
>> need them to remain together, you have to either create a
pseudo-instruction and
>> expand it extremely late, or create a bundle (depending on what's
natural for your
>> target).>> - from
http://lists.llvm.org/pipermail/llvm-dev/2016-June/100885.html:
>> <<If you want to have these nodes stick together, using glue may
not be sufficient.
>> After the machine instructions are generated, the scheduler may place
instructions
>> between the interrupt disable/restore and the atomic load itself. 
Also, the register
>> allocator may insert some spills there---there are ways that this
sequence may get
>> separated. For this, the best approach may be to define a
pseudo-instruction, which
>> will be expanded into real instruction in the post-RA expansion
pass.>>
>>
>> Also, I don't want to use MachineInstr bundles or
pseudo-instructions. MachineInstr
>> bundles seem to difficult to use and too late in the code generation (I
prefer
>> working at the level of instruction selection). Also, I found little
information
>> about pseudo-instructions - there is some API support, namely
expandPostRAPseudo()
>> described at
http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html.
>> Also, some documentation at
>>
http://llvm.org/devmtg/2014-04/PDFs/Talks/Building%20an%20LLVM%20backend.pdf,
slide
>> 55 (and 53, 54).
> Well if it is two instructions, then there is always a chance that some
pass moves them
> around or inserts new instructions in between (esp. regalloc may insert
> spills/reloads/copies). The only guaranteed solution is indeed to a pseudo
instruction
> or an instruction bundle so the instructions look like a single unit to
codegen.
> That said, if you use the PostMachineScheduler you can insert a schedule
dag mutation
> in createPostMachineScheduler() that adds a cluster edge between the two
nodes so the
> scheduler tries hard to keep them together. Unfortunately this doesn't
work always
> today because the schedulemodel is always checked for stalls first (Pending
vs.
> Available lists in the MachineScheduler) before the scheduler even checks
its usual
> cost function with the cluster heuristic.
>
> - Matthias
>
>

llvm dev - Feb 2017 - Enforcing in post-RA scheduling to keep (two) MachineInstrs together

[llvm-dev] Enforcing in post-RA scheduling to keep (two) MachineInstrs together

[llvm-dev] Enforcing in post-RA scheduling to keep (two) MachineInstrs together

[llvm-dev] Enforcing in post-RA scheduling to keep (two) MachineInstrs together

[llvm-dev] Enforcing in post-RA scheduling to keep (two) MachineInstrs together