thr3ads.net - llvm dev - [llvm-dev] Understanding and Cleaning Up Machine Instruction Bundles [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Matthias Braun via llvm-dev

2016-Oct-27 21:33 UTC

[llvm-dev] Understanding and Cleaning Up Machine Instruction Bundles

I am using machine instruction bundles [1] before register allocation. This
appears not to be too common today and I'd really like some input on the
intentions and
plans of the current system. And would like some input on clean up proposals.

[1] I am currently experimenting to use machine instruction bundles to reliably
form macroop fusion opportunities without spills, reloads, splits or further
scheduling sneaking instruction in between.

== Documentation =It seems the best existing description about bundles is Evans
original RFC:
http://lists.llvm.org/pipermail/llvm-dev/2011-December/045851.html
Are there any subsequent discussion threads that I missed?

== BUNDLE instruction / operands =For many backend passes a bundle can appear as
a single unit. However one important tool
here is having an iterator over all operands of this unit.

The original RFC indicates that to achieve this without changing a big number
of passes an additional bundle instruction is added in front of the bundle. A
copy of all register/regmask machine operands is added to this header
instruction. Internal reads inside the bundle are marked as such. This step is
called finalization of a bundle.

The system works because the default basic block iterator moves from bundle to
bundle skipping the instructions inside the bundle. Iterating over the operands
will only give us the operands of the BUNDLE instruction but that is fine,
because it basically has a copy of everything inside the bundle.

== When to finalize bundles; Remove the FinalizeMachineBundles pass? =
However there is a number of remaining questions/confusion: The RFC indicates
that the finalization step is done as a separate pass at the end of the
register allocation pipeline. In fact a FinalizeMachineBundles pass exists but
is not used by anyone. There is no in-tree target doing bundling before
register allocation, the one out of tree target I am aware of finalizes bundles
immediate after constructing them and is not using the separate pass.

In fact I am not sure why you would even wait with the finalization and do it
in a separate pass rather than doing it immediately after forming the bundle.
Using the pass today does not even work as the MachineVerifier will reject the
intermediate unfinalized state (missing internal read markers). I'd suggest
to get
rid of the pass and the idea of delegating finalization to an own pass, any
objections?

== Too many different iterators =
Another source of confusion even for experience register allocation developers
is that we have 3 kinds of iterators for MachineOperands:

  - There is MachineInstr::iterator which is used by the majority of passes and
    gives you the operands of a single instruction.
  - There is (Const)MIOperands which appears to be equivalent to
    MachineInstr::iterator. I think we do not need a 2nd iterator and should get
    rid of this one (the only real reason to use it today is
    analyze{Virt|Phys}Reg() but that can be changed).
  - There is (Const)MIBundleOperands which iterates all machine operands of all
    instructions inside a bundle.

The last one appears to be necessary in a world without the initial BUNDLE
instruction repeating all the operands inside the bundle. In a setting where
finalization happens as a separate pass at the end of register allocation this
would be necessary for earlier register allocation passes.

However given that delaying finalization to a pass appears broken/unused it
seems we could just as well use MachineInstr::iterator instead and remove
MIBundleOperands. Any objections?

== Moving to a scheme without repeating the operands in the bundle header =
I've heard some comments that the long term plan was to move to a scheme
where
the operands inside the bundle are not repeated in a bundle header and instead
everyone uses an iterator like MIBundleOperands. I could not find any mails
documenting this, so it would be nice if some people could chime in here if
that was indeed the plan.

Even with this long term plan in mind I would suggest to remove
MIBundleOperands. If we implement this plan we should rather change
MachineInstr::iterator later instead of being in the confusin in-between state
that we have today.

- Matthias

Andrew Trick via llvm-dev

2016-Oct-28 00:05 UTC

head link

[llvm-dev] Understanding and Cleaning Up Machine Instruction Bundles

> On Oct 27, 2016, at 2:33 PM, Matthias Braun <mbraun at apple.com>
wrote:
> 
> == BUNDLE instruction / operands => For many backend passes a bundle can
appear as a single unit. However one important tool
> here is having an iterator over all operands of this unit.
> 
> The original RFC indicates that to achieve this without changing a big
number
> of passes an additional bundle instruction is added in front of the bundle.
A
> copy of all register/regmask machine operands is added to this header
> instruction. Internal reads inside the bundle are marked as such. This step
is
> called finalization of a bundle.
That RFC was written before instructions had their own bundling flags.

http://lists.llvm.org/pipermail/llvm-dev/2011-December/045906.html
> The system works because the default basic block iterator moves from bundle
to
> bundle skipping the instructions inside the bundle. Iterating over the
operands
> will only give us the operands of the BUNDLE instruction but that is fine,
> because it basically has a copy of everything inside the bundle.
The BUNDLE instruction simply isn’t necessary to do anything you just described.
> == When to finalize bundles; Remove the FinalizeMachineBundles pass? => 
> However there is a number of remaining questions/confusion: The RFC
indicates
> that the finalization step is done as a separate pass at the end of the
> register allocation pipeline. In fact a FinalizeMachineBundles pass exists
but
> is not used by anyone. There is no in-tree target doing bundling before
> register allocation, the one out of tree target I am aware of finalizes
bundles
> immediate after constructing them and is not using the separate pass.
That is a 4-5 year old bootstrapping pass to defer updating post-RA passes to
use newer bundle operand iterator.
> In fact I am not sure why you would even wait with the finalization and do
it
> in a separate pass rather than doing it immediately after forming the
bundle.
> Using the pass today does not even work as the MachineVerifier will reject
the
> intermediate unfinalized state (missing internal read markers). I'd
suggest to get
> rid of the pass and the idea of delegating finalization to an own pass, any
objections?
Adding a BUNDLE instruction and duplicating operands doesn’t make sense in the
presence of virtual registers and live intervals.

The questions is not “why do we wait to insert BUNDLEs?”

The question is “Why do we ever insert BUNDLEs:.
> == Too many different iterators => 
> Another source of confusion even for experience register allocation
developers
> is that we have 3 kinds of iterators for MachineOperands:
> 
>  - There is MachineInstr::iterator which is used by the majority of passes
and
>    gives you the operands of a single instruction.
>  - There is (Const)MIOperands which appears to be equivalent to
>    MachineInstr::iterator. I think we do not need a 2nd iterator and should
get
>    rid of this one (the only real reason to use it today is
>    analyze{Virt|Phys}Reg() but that can be changed).
>  - There is (Const)MIBundleOperands which iterates all machine operands of
all
>    instructions inside a bundle.
A pass needs to know whether it’s cares about bundles or instructions.
I don’t understand how adding an extra BUNDLE instruction does anything to solve
this problem or make the MIR more robust.

A pass that cares about liveness, dependencies, instruction insertion or
reordering needs to work on bundles.
Machine-independent passes should probably work on bundles.

By default, passes now use the bundle iterator for instructions and non-bundle
iterator for operands. That allows passes to limp along in the presence of
bundles without actually handling the bundles. I think the bundles will just
silently defeat optimizations. It’s not safe, but it’s not too badly broken
either.

The MIBundleOperands iterator simply makes more sense to me than the BUNDLE
instruction. It seems straightforward to migrate passes to the new iterator, but
it’s a lot of places that need updating.
> The last one appears to be necessary in a world without the initial BUNDLE
> instruction repeating all the operands inside the bundle. In a setting
where
> finalization happens as a separate pass at the end of register allocation
this
> would be necessary for earlier register allocation passes.
> 
> However given that delaying finalization to a pass appears broken/unused it
> seems we could just as well use MachineInstr::iterator instead and remove
> MIBundleOperands. Any objections?
IIUC, live intervals, the register allocator, and the scheduler already handle
bundles.

I’m fairly sure that adding new vreg uses is not what we want to do.
> == Moving to a scheme without repeating the operands in the bundle header
=>
> I've heard some comments that the long term plan was to move to a
scheme where
> the operands inside the bundle are not repeated in a bundle header and
instead
> everyone uses an iterator like MIBundleOperands. I could not find any mails
> documenting this, so it would be nice if some people could chime in here if
> that was indeed the plan.
> 
> Even with this long term plan in mind I would suggest to remove
> MIBundleOperands. If we implement this plan we should rather change
> MachineInstr::iterator later instead of being in the confusin in-between
state
> that we have today.
> 
> - Matthias
I’m not sure what you mean by changing MachineInstr::iterator. You mean
mop_iterator?

You can’t replace an instr iterator with a bundle iterator without breaking some
basic invariants:
MI == MI->operands_begin()->getParent()

That’s why passes should explicitly ask for the bundle operands.

-Andy

Krzysztof Parzyszek via llvm-dev

2016-Oct-28 00:23 UTC

head link

[llvm-dev] Understanding and Cleaning Up Machine Instruction Bundles

On 10/27/2016 4:33 PM, Matthias Braun via llvm-dev
wrote:>
> In fact I am not sure why you would even wait with the finalization and do
it
> in a separate pass rather than doing it immediately after forming the
bundle.
> Using the pass today does not even work as the MachineVerifier will reject
the
> intermediate unfinalized state (missing internal read markers). I'd
suggest to get
> rid of the pass and the idea of delegating finalization to an own pass, any
objections?
No objections. Also, the existing finalizeBundle function works on 
Hexagon mostly by luck. Hexagon's packet semantics is "mostly"
parallel.
In general all registers are read before any register updates take 
place. The current function works by scanning the bundle sequentially. 
That doesn't cause problems for us, mostly because there is nothing that 
cares enough about the BUNDLE operands.

Actually, in our local branch we add immediate operands to the BUNDLE 
instruction to mark certain properties of the bundle as a whole that are 
of interest to us.  While we can't upstream that code at the moment, we 
would really like to retain that form of use.

> However given that delaying finalization to a pass appears broken/unused it
> seems we could just as well use MachineInstr::iterator instead and remove
> MIBundleOperands. Any objections?
No.

> == Moving to a scheme without repeating the operands in the bundle header
=>
> I've heard some comments that the long term plan was to move to a
scheme where
> the operands inside the bundle are not repeated in a bundle header and
instead
> everyone uses an iterator like MIBundleOperands. I could not find any mails
> documenting this, so it would be nice if some people could chime in here if
> that was indeed the plan.
That sounds wrong. To reiterate, a Hexagon's packet is not equivalent to 
a sequence of individual instructions. Packetization on Hexagon can be 
considered irreversible. This may not be the case on other 
architectures.  Iterating over operands within a bundle without 
additional information seems random at best.


-Krzysztof

Matthias Braun via llvm-dev

2016-Oct-28 00:30 UTC

head link

[llvm-dev] Understanding and Cleaning Up Machine Instruction Bundles

> On Oct 27, 2016, at 5:05 PM, Andrew Trick <atrick at apple.com>
wrote:
> 
> 
>> On Oct 27, 2016, at 2:33 PM, Matthias Braun <mbraun at apple.com>
wrote:
>> 
>> == BUNDLE instruction / operands =>> For many backend passes a
bundle can appear as a single unit. However one important tool
>> here is having an iterator over all operands of this unit.
>> 
>> The original RFC indicates that to achieve this without changing a big
number
>> of passes an additional bundle instruction is added in front of the
bundle. A
>> copy of all register/regmask machine operands is added to this header
>> instruction. Internal reads inside the bundle are marked as such. This
step is
>> called finalization of a bundle.
> 
> That RFC was written before instructions had their own bundling flags.
> 
> http://lists.llvm.org/pipermail/llvm-dev/2011-December/045906.html
<http://lists.llvm.org/pipermail/llvm-dev/2011-December/045906.html>
This appears to be from the same thread that proposes the flags and the bundling
scheme I linked to above. However I don't see any indication as to how the
finalization should be done or whether or not to use a BUNDLE/header
instruction.
> 
>> The system works because the default basic block iterator moves from
bundle to
>> bundle skipping the instructions inside the bundle. Iterating over the
operands
>> will only give us the operands of the BUNDLE instruction but that is
fine,
>> because it basically has a copy of everything inside the bundle.
> 
> The BUNDLE instruction simply isn’t necessary to do anything you just
described.
That may all be true. However I'd like to point out that this is the status
quo! finalizeBundle() will give you the BUNDLE instruction in the header and it
is used by everyone using bundles: ARM, and AMDGPU target and the DFAPacketizer
(which is used by Hexagon).

Not using BUNDLE and correctly using MIBundleOperands at the right places in the
register allocator is not where the code is today! I believe that we are far
enough away from it that we should rather fix the status quo first to avoid all
the confusion and then move forward to the header-less scheme in a targetted
change. That is why I added the last paragraph in my mail.
> 
>> == When to finalize bundles; Remove the FinalizeMachineBundles pass?
=>>
>> However there is a number of remaining questions/confusion: The RFC
indicates
>> that the finalization step is done as a separate pass at the end of the
>> register allocation pipeline. In fact a FinalizeMachineBundles pass
exists but
>> is not used by anyone. There is no in-tree target doing bundling before
>> register allocation, the one out of tree target I am aware of finalizes
bundles
>> immediate after constructing them and is not using the separate pass.
> 
> That is a 4-5 year old bootstrapping pass to defer updating post-RA passes
to use newer bundle operand iterator.
> 
>> In fact I am not sure why you would even wait with the finalization and
do it
>> in a separate pass rather than doing it immediately after forming the
bundle.
>> Using the pass today does not even work as the MachineVerifier will
reject the
>> intermediate unfinalized state (missing internal read markers). I'd
suggest to get
>> rid of the pass and the idea of delegating finalization to an own pass,
any objections?
> 
> Adding a BUNDLE instruction and duplicating operands doesn’t make sense in
the presence of virtual registers and live intervals.
> 
> The questions is not “why do we wait to insert BUNDLEs?”
> 
> The question is “Why do we ever insert BUNDLEs:.see above.
> 
>> == Too many different iterators =>> 
>> Another source of confusion even for experience register allocation
developers
>> is that we have 3 kinds of iterators for MachineOperands:
>> 
>> - There is MachineInstr::iterator which is used by the majority of
passes and
>>   gives you the operands of a single instruction.
>> - There is (Const)MIOperands which appears to be equivalent to
>>   MachineInstr::iterator. I think we do not need a 2nd iterator and
should get
>>   rid of this one (the only real reason to use it today is
>>   analyze{Virt|Phys}Reg() but that can be changed).
>> - There is (Const)MIBundleOperands which iterates all machine operands
of all
>>   instructions inside a bundle.
> 
> A pass needs to know whether it’s cares about bundles or instructions.
> I don’t understand how adding an extra BUNDLE instruction does anything to
solve this problem or make the MIR more robust.
> 
> A pass that cares about liveness, dependencies, instruction insertion or
reordering needs to work on bundles.
> Machine-independent passes should probably work on bundles.
> 
> By default, passes now use the bundle iterator for instructions and
non-bundle iterator for operands. That allows passes to limp along in the
presence of bundles without actually handling the bundles. I think the bundles
will just silently defeat optimizations. It’s not safe, but it’s not too badly
broken either.
> 
> The MIBundleOperands iterator simply makes more sense to me than the BUNDLE
instruction. It seems straightforward to migrate passes to the new iterator, but
it’s a lot of places that need updating.I am convinced that as soon as we decide for a scheme with or without BUNDLE
instruction we should remove all but one iterator (or at least write a long
comment on the other iterator why you should not use it in most situation).
Whatever the result it should use a C++ style iterator so at the very least
MIBundleOperators needs to be rewritten for that.
> 
>> The last one appears to be necessary in a world without the initial
BUNDLE
>> instruction repeating all the operands inside the bundle. In a setting
where
>> finalization happens as a separate pass at the end of register
allocation this
>> would be necessary for earlier register allocation passes.
>> 
>> However given that delaying finalization to a pass appears
broken/unused it
>> seems we could just as well use MachineInstr::iterator instead and
remove
>> MIBundleOperands. Any objections?
> 
> IIUC, live intervals, the register allocator, and the scheduler already
handle bundles.
> 
> I’m fairly sure that adding new vreg uses is not what we want to do.The code looks like it can handle it, but as I said above it is not exercised by
any of the existing targets and I can show you some places where uses of
MachineInstr::iterator sneaked in even in the regalloc passes which would be
invalid in the BUNDLE-header-less scheme.
> 
>> == Moving to a scheme without repeating the operands in the bundle
header =>>
>> I've heard some comments that the long term plan was to move to a
scheme where
>> the operands inside the bundle are not repeated in a bundle header and
instead
>> everyone uses an iterator like MIBundleOperands. I could not find any
mails
>> documenting this, so it would be nice if some people could chime in
here if
>> that was indeed the plan.
>> 
>> Even with this long term plan in mind I would suggest to remove
>> MIBundleOperands. If we implement this plan we should rather change
>> MachineInstr::iterator later instead of being in the confusin
in-between state
>> that we have today.
>> 
>> - Matthias
> 
> I’m not sure what you mean by changing MachineInstr::iterator. You mean
mop_iterator?Oh sorry, I was talking about mop_iterator indeed. The thing you get when you
use
    `for (MachineOperand &MO : someinstruction.operands()) { ... }` which is
the standard for the majority of codegen passes today.
> 
> You can’t replace an instr iterator with a bundle iterator without breaking
some basic invariants:
> MI == MI->operands_begin()->getParent()
> 
> That’s why passes should explicitly ask for the bundle operands.If we move to a BUNDLE-less world then the majority of passes will need
something like MIBundleOperands, in that case we really should replace
MachineInstr::iterator anyway, make the typical use the most convenient one and
adapt the passes to not expect those invariants. But again I consider this a
change of the status quo, not something we already do or just need to fix in a
handful of places.

- Matthias
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161027/87eca31d/attachment.html>

Matthias Braun via llvm-dev

2016-Oct-28 00:34 UTC

head link

[llvm-dev] Understanding and Cleaning Up Machine Instruction Bundles

> On Oct 27, 2016, at 5:23 PM, Krzysztof Parzyszek via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
> 
> On 10/27/2016 4:33 PM, Matthias Braun via llvm-dev wrote:
>> 
>> In fact I am not sure why you would even wait with the finalization and
do it
>> in a separate pass rather than doing it immediately after forming the
bundle.
>> Using the pass today does not even work as the MachineVerifier will
reject the
>> intermediate unfinalized state (missing internal read markers). I'd
suggest to get
>> rid of the pass and the idea of delegating finalization to an own pass,
any objections?
> 
> No objections. Also, the existing finalizeBundle function works on Hexagon
mostly by luck. Hexagon's packet semantics is "mostly" parallel.
In general all registers are read before any register updates take place. The
current function works by scanning the bundle sequentially. That doesn't
cause problems for us, mostly because there is nothing that cares enough about
the BUNDLE operands.
> 
> Actually, in our local branch we add immediate operands to the BUNDLE
instruction to mark certain properties of the bundle as a whole that are of
interest to us.  While we can't upstream that code at the moment, we would
really like to retain that form of use.
> 
> 
>> However given that delaying finalization to a pass appears
broken/unused it
>> seems we could just as well use MachineInstr::iterator instead and
remove
>> MIBundleOperands. Any objections?
> 
> No.
> 
> 
>> == Moving to a scheme without repeating the operands in the bundle
header =>>
>> I've heard some comments that the long term plan was to move to a
scheme where
>> the operands inside the bundle are not repeated in a bundle header and
instead
>> everyone uses an iterator like MIBundleOperands. I could not find any
mails
>> documenting this, so it would be nice if some people could chime in
here if
>> that was indeed the plan.
> 
> That sounds wrong. To reiterate, a Hexagon's packet is not equivalent
to a sequence of individual instructions. Packetization on Hexagon can be
considered irreversible. This may not be the case on other architectures. 
Iterating over operands within a bundle without additional information seems
random at best.Just to clarify: No matter the proposal/style the whole bundle is considered a
single instruction in terms of liveness / identifying a position in the program.
We would not make any assumptions about how the bundle is executed/behaves
internally.
The iterator over all operands inside the bundle would just as well give you the
union of all defs and use operands inside the bundle (ignoring internal read of
course) in a similar fashion to the operands you have on the BUNDLE instruction
today.

- Matthias

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Oct 2016 - Understanding and Cleaning Up Machine Instruction Bundles

[llvm-dev] Understanding and Cleaning Up Machine Instruction Bundles

[llvm-dev] Understanding and Cleaning Up Machine Instruction Bundles

[llvm-dev] Understanding and Cleaning Up Machine Instruction Bundles

[llvm-dev] Understanding and Cleaning Up Machine Instruction Bundles

[llvm-dev] Understanding and Cleaning Up Machine Instruction Bundles

Possibly Parallel Threads