thr3ads.net - llvm dev - [llvm-dev] MachinePipeliner refactoring [Jul 2019]

If this information is useful, please help other people find it:
Share via:

James Molloy via llvm-dev

2019-Jul-15 10:16 UTC

[llvm-dev] MachinePipeliner refactoring

Hi Brendan (and friends of MachinePipeliner, +llvm-dev for openness),

Over the past week or so I've been attempting to extend the
MachinePipeliner to support different idioms of code generation. To make
this a bit more concrete, there are two areas where the currently generated
code could be improved depending on architecture:

1) The epilog blocks peel off the final iterations in reverse order. This
means that the overall execution of loop iterations isn't in a perfectly
pipelined order. For architectures that have hardware constructs that
insist on a first-in-first-out order (queues), the currently generated code
cannot be used.
2) For architectures (like Hexagon) that have dedicated predicate
register files, we can generate a compact representation of the loop by
predicating stages of the loop kernel independently. In this case we can
either have a prolog, epilog, or neither (wrapping the prolog and epilog
inside the kernel by using PHIs of predicates).

At the moment, a lot of the code generation helper code in MachinePipeliner
is tightly fit to its current code generation strategy ("If we're in
the
epilog, to this, else do this"). I'm keen to try and make some of the
complex calculations it does, such as where PHIs should come from, more
abstract so they can be reused and composed.

https://reviews.llvm.org/D64665 is my current best-effort. This generates
perfect code for PowerPC, but causes a load of problems for Hexagon. It's
become apparent that I don't know enough about some of the edge cases in
the MachinePipeliner code to refactor this from scratch. I'm therefore
looking for direction in factoring in an incremental fashion.

I think there are a couple of areas that I'd like to change, and I'd
appreciate your ideas and opinions because I clearly don't know enough
about the edge cases here.

a) TII->reduceLoopCount() is hard to understand. Understanding the
intended semantics of this hook from the documentation, I've found, is
hard. Its use appears to be strongly fit to Hexagon (there is even a
comment about the removal of LOOP0 in the MachinePipeliner target agnostic
code, which really shouldn't be there). Why it's called multiple times I
don't understand (why can't we just call it once with the total number
of
iterations to peel?).
b) Understanding how loop-carried PHIs are linked together is really
hard. There are two functions dedicated to this with many edge cases, which
are linked with the prolog and epilog schedule. It'd be great to somehow
factor these such that they are independent of the code generation
strategy. Note that this is really important for some of the code gen
strategies I mention at the beginning, because loop-carried PHIs in this
case may actually end up being selects or uses of predicated instructions.
c) There is a slight conflation of "iteration" and "stage"
in the
documentation that makes it hard to follow what VRMap contains and the
invariants between functions.

My intent in D64665 was to create two abstractions: "Stage" and
"Block".
Instead of referring to stages by index (VRMap), each Stage would take a
prior Stage as input. Stages are contained inside Blocks, which handles
predecessors and successors. I feel that arranging the code generation in
this CFG-like way will make the flow of data much easier to analyze. Of
course, a Stage doesn't just depend on a prior Stage - their loop carried
inputs could come from any other Stage (and while I think I understand how
this works, I clearly don't get all the edge cases).

What do you think of this abstraction? do you think it's doomed to failure
because it's too simplistic to cover all the cases?

Do you have any suggestions of areas where we can start to factor out
without a large-scale code breakage? I'm finding this hard to get my teeth
into as the proposed code structure is so different from its current form.

Thanks for any opinions or suggestions!

Cheers,

James
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190715/08b329f7/attachment-0001.html>

Jinsong Ji via llvm-dev

2019-Jul-15 15:32 UTC

head link

[llvm-dev] MachinePipeliner refactoring

Hi James:

Personally, I like the idea of refactoring and more abstraction,
But unfortunately, I don't know enough about the edges cases either.

BTW: the prototype is still causing quite some Asseertions in PowerPC -
some nodes are not generated in correct order.


Best,

Jinsong Ji (纪金松), PhD.

XL/LLVM on Power Compiler Development
E-mail: jji at us.ibm.com



From:	James Molloy <james at jamesmolloy.co.uk>
To:	LLVM Dev <llvm-dev at lists.llvm.org>, jji at us.ibm.com,
            bcahoon at quicinc.com, Hal Finkel <hfinkel at anl.gov>
Date:	07/15/2019 06:16 AM
Subject:	[EXTERNAL] MachinePipeliner refactoring



Hi Brendan (and friends of MachinePipeliner, +llvm-dev for openness),

Over the past week or so I've been attempting to extend the
MachinePipeliner to support different idioms of code generation. To make
this a bit more concrete, there are two areas where the currently generated
code could be improved depending on architecture:

  1) The epilog blocks peel off the final iterations in reverse order. This
means that the overall execution of loop iterations isn't in a perfectly
pipelined order. For architectures that have hardware constructs that
insist on a first-in-first-out order (queues), the currently generated code
cannot be used.
  2) For architectures (like Hexagon) that have dedicated predicate
register files, we can generate a compact representation of the loop by
predicating stages of the loop kernel independently. In this case we can
either have a prolog, epilog, or neither (wrapping the prolog and epilog
inside the kernel by using PHIs of predicates).

At the moment, a lot of the code generation helper code in MachinePipeliner
is tightly fit to its current code generation strategy ("If we're in
the
epilog, to this, else do this"). I'm keen to try and make some of the
complex calculations it does, such as where PHIs should come from, more
abstract so they can be reused and composed.

https://reviews.llvm.org/D64665 is my current best-effort. This generates
perfect code for PowerPC, but causes a load of problems for Hexagon. It's
become apparent that I don't know enough about some of the edge cases in
the MachinePipeliner code to refactor this from scratch. I'm therefore
looking for direction in factoring in an incremental fashion.

I think there are a couple of areas that I'd like to change, and I'd
appreciate your ideas and opinions because I clearly don't know enough
about the edge cases here.

  a) TII->reduceLoopCount() is hard to understand. Understanding the
intended semantics of this hook from the documentation, I've found, is
hard. Its use appears to be strongly fit to Hexagon (there is even a
comment about the removal of LOOP0 in the MachinePipeliner target agnostic
code, which really shouldn't be there). Why it's called multiple times I
don't understand (why can't we just call it once with the total number
of
iterations to peel?).
  b) Understanding how loop-carried PHIs are linked together is really
hard. There are two functions dedicated to this with many edge cases, which
are linked with the prolog and epilog schedule. It'd be great to somehow
factor these such that they are independent of the code generation
strategy. Note that this is really important for some of the code gen
strategies I mention at the beginning, because loop-carried PHIs in this
case may actually end up being selects or uses of predicated instructions.
  c) There is a slight conflation of "iteration" and "stage"
in the
documentation that makes it hard to follow what VRMap contains and the
invariants between functions.

My intent in D64665 was to create two abstractions: "Stage" and
"Block".
Instead of referring to stages by index (VRMap), each Stage would take a
prior Stage as input. Stages are contained inside Blocks, which handles
predecessors and successors. I feel that arranging the code generation in
this CFG-like way will make the flow of data much easier to analyze. Of
course, a Stage doesn't just depend on a prior Stage - their loop carried
inputs could come from any other Stage (and while I think I understand how
this works, I clearly don't get all the edge cases).

What do you think of this abstraction? do you think it's doomed to failure
because it's too simplistic to cover all the cases?

Do you have any suggestions of areas where we can start to factor out
without a large-scale code breakage? I'm finding this hard to get my teeth
into as the proposed code structure is so different from its current form.

Thanks for any opinions or suggestions!

Cheers,

James

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190715/3a0ce161/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190715/3a0ce161/attachment.gif>

James Molloy via llvm-dev

2019-Jul-15 16:05 UTC

head link

[llvm-dev] MachinePipeliner refactoring

Hi Jingsong,

Thanks for testing out the prototype! I'm not surprised there are errors in
that version; it's not 100% ready yet (mainly a demonstrator for the path
I'd like to take). I'm really trying to work out if it makes sense to
try
and incrementally get from the current state of the world to that prototype
incrementally, or if it's just so far away that a full-on refactor (or two
code paths) is required. I suspect only Brendan really has the context to
give insight there!

James

On Mon, 15 Jul 2019 at 16:32, Jinsong Ji <jji at us.ibm.com> wrote:
> Hi James:
>
> Personally, I like the idea of refactoring and more abstraction,
> But unfortunately, I don't know enough about the edges cases either.
>
> BTW: the prototype is still causing quite some Asseertions in PowerPC -
> some nodes are not generated in correct order.
>
>
> Best,
>
> Jinsong Ji (纪金松), PhD.
>
> XL/LLVM on Power Compiler Development
> E-mail: jji at us.ibm.com
>
> [image: Inactive hide details for James Molloy ---07/15/2019 06:16:22
> AM---Hi Brendan (and friends of MachinePipeliner, +llvm-dev for o]James
> Molloy ---07/15/2019 06:16:22 AM---Hi Brendan (and friends of
> MachinePipeliner, +llvm-dev for openness), Over the past week or so
I've
>
> From: James Molloy <james at jamesmolloy.co.uk>
> To: LLVM Dev <llvm-dev at lists.llvm.org>, jji at us.ibm.com,
> bcahoon at quicinc.com, Hal Finkel <hfinkel at anl.gov>
> Date: 07/15/2019 06:16 AM
> Subject: [EXTERNAL] MachinePipeliner refactoring
> ------------------------------
>
>
>
> Hi Brendan (and friends of MachinePipeliner, +llvm-dev for openness),
>
> Over the past week or so I've been attempting to extend the
> MachinePipeliner to support different idioms of code generation. To make
> this a bit more concrete, there are two areas where the currently generated
> code could be improved depending on architecture:
>
>   1) The epilog blocks peel off the final iterations in reverse order.
> This means that the overall execution of loop iterations isn't in a
> perfectly pipelined order. For architectures that have hardware constructs
> that insist on a first-in-first-out order (queues), the currently generated
> code cannot be used.
>   2) For architectures (like Hexagon) that have dedicated predicate
> register files, we can generate a compact representation of the loop by
> predicating stages of the loop kernel independently. In this case we can
> either have a prolog, epilog, or neither (wrapping the prolog and epilog
> inside the kernel by using PHIs of predicates).
>
> At the moment, a lot of the code generation helper code in
> MachinePipeliner is tightly fit to its current code generation strategy
> ("If we're in the epilog, to this, else do this"). I'm
keen to try and make
> some of the complex calculations it does, such as where PHIs should come
> from, more abstract so they can be reused and composed.
>
> *https://reviews.llvm.org/D64665* <https://reviews.llvm.org/D64665>
is my
> current best-effort. This generates perfect code for PowerPC, but causes a
> load of problems for Hexagon. It's become apparent that I don't
know enough
> about some of the edge cases in the MachinePipeliner code to refactor this
> from scratch. I'm therefore looking for direction in factoring in an
> incremental fashion.
>
> I think there are a couple of areas that I'd like to change, and
I'd
> appreciate your ideas and opinions because I clearly don't know enough
> about the edge cases here.
>
>   a) TII->reduceLoopCount() is hard to understand. Understanding the
> intended semantics of this hook from the documentation, I've found, is
> hard. Its use appears to be strongly fit to Hexagon (there is even a
> comment about the removal of LOOP0 in the MachinePipeliner target agnostic
> code, which really shouldn't be there). Why it's called multiple
times I
> don't understand (why can't we just call it once with the total
number of
> iterations to peel?).
>   b) Understanding how loop-carried PHIs are linked together is really
> hard. There are two functions dedicated to this with many edge cases, which
> are linked with the prolog and epilog schedule. It'd be great to
somehow
> factor these such that they are independent of the code generation
> strategy. Note that this is really important for some of the code gen
> strategies I mention at the beginning, because loop-carried PHIs in this
> case may actually end up being selects or uses of predicated instructions.
>   c) There is a slight conflation of "iteration" and
"stage" in the
> documentation that makes it hard to follow what VRMap contains and the
> invariants between functions.
>
> My intent in D64665 was to create two abstractions: "Stage" and
"Block".
> Instead of referring to stages by index (VRMap), each Stage would take a
> prior Stage as input. Stages are contained inside Blocks, which handles
> predecessors and successors. I feel that arranging the code generation in
> this CFG-like way will make the flow of data much easier to analyze. Of
> course, a Stage doesn't just depend on a prior Stage - their loop
carried
> inputs could come from any other Stage (and while I think I understand how
> this works, I clearly don't get all the edge cases).
>
> What do you think of this abstraction? do you think it's doomed to
failure
> because it's too simplistic to cover all the cases?
>
> Do you have any suggestions of areas where we can start to factor out
> without a large-scale code breakage? I'm finding this hard to get my
teeth
> into as the proposed code structure is so different from its current form.
>
> Thanks for any opinions or suggestions!
>
> Cheers,
>
> James
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190715/ef1cc6a2/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190715/ef1cc6a2/attachment.gif>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Jul 2019 - MachinePipeliner refactoring

[llvm-dev] MachinePipeliner refactoring

[llvm-dev] MachinePipeliner refactoring

[llvm-dev] MachinePipeliner refactoring

Maybe Matching Threads