thr3ads.net - llvm dev - [llvm-dev] [RFC] Porting MachinePipeliner to AArch64+SVE [Jun 2018]

If this information is useful, please help other people find it:
Share via:

Masaki Arai via llvm-dev

2018-Jun-08 14:11 UTC

[llvm-dev] [RFC] Porting MachinePipeliner to AArch64+SVE

Hi,

I am extending LLVM for HPC applications.
As one of them, I am trying to make MachinePipeliner available on
AArch64 + Scalable Vector Extension environment.

MachinePipeliner is currently used only by Hexagon CPU.
Since it is a very portable implementation, I think that it will
actually work just by adding a little code for many CPUs(See Code [2]).

The current MachinePipeliner is written on the premise that
DFAPacketizer is used for resource management.
However, I'd like to use MachinePipeliner in a way that does not use
DFAPacketizer for the reasons described below(*).
In MachinePipeliner implementation, only a small part is dependent on
DFAPacketizer or Instruction itineraries.
Therefore, I think that one of the following implementations is
possible:

(a) creating a path in MachinePipeliner that does not use DFAPacketizer
(b) making MachinePipeliner inheritable so that anyone can write code
    that does not use DFAPacketizer

Since implementations using only Instruction itineraries without
DFAPacketizer are possible, I don't think that I can use
TargetSchedModel::hasInstrItineraries to select the execution path.
Personally, I think that implementation of (b) is better.

Also, if predicated instructions like SVE are available, prologue and
epilogue code generation using predicated execution as shown in the
reference[1] may be possible.
In this case, if we choose the implementation of (b) and it is
possible to override SwingSchedulerDAG::generatePipelinedLoop, I think
that it can easily be extended.

Comments or suggestions are welcome.

Thank you very much.

Best regards,
--
--------------------------------------
Masaki Arai

=======================================
(*) Currently, many CPU scheduling models are defined by the form not
using Instruction itineraries.
Therefore, they have the form 1 or 2 in the following
TargetSchedule.td:

// The SchedMachineModel is defined by subtargets for three categories of
data:
// 1. Basic properties for coarse grained instruction cost model.
// 2. Scheduler Read/Write resources for simple per-opcode cost model.
// 3. Instruction itineraries for detailed reservation tables.

By making MachinePipeliner work even in a form not using Instruction
itineraries, we will be able to run MachinePipeliner's execution test
on various machines, even if we do not use it on those machines.

Instruction itineraries essentially expresses the following
correspondence:

  opcode ==> {FU1, FU2, ...}

and DFAPacketizer uses DFA with opcodes.
In order to strictly schedule predicated instructions like SVE,
We need to consider that following two instructions use pipeline resources
exclusively in the same cycle:

  MI1 if P ==> {FU1, FU2, ...}
  MI2 if Q ==> {FU1, FU2, ...}

where predicate P and Q hold P == not Q.
However, I don't think that current DFAPacketizer can represent these
situations.

References:

[1] Code Generation Schemas for Modulo Scheduled DO-loops and WHILE-loops
http://www.hpl.hp.com/techreports/92/HPL-92-47.pdf?jumpid=reg_R1002_USEN

Code:

  The sample patch for origin/release_60 [2], which doesn't use
  DFAPacketizer, can generate executable files from sample-code.c for
  both AArch64 and x86_64.

  [AArch64]% clang -O2 -mcpu=thunderx2t99 -mllvm -enable-pipeliner -mllvm
-pipeliner-max=100 sample-code.c
  [x86_64] % clang -O2 -march=sandybridge -mllvm -enable-pipeliner -mllvm
-pipeliner-max=100 sample-code.c

[2] https://reviews.llvm.org/D47943
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180608/10496bc3/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sample-code.c
Type: application/octet-stream
Size: 468 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180608/10496bc3/attachment-0001.obj>

Masaki Arai via llvm-dev

2018-Jun-08 17:04 UTC

head link

[llvm-dev] [RFC] Porting MachinePipeliner to AArch64+SVE

Hi,

Masaki Arai via llvm-dev <llvm-dev at lists.llvm.org>
writes:> Code:
>
> The sample patch for origin/release_60 [2], which doesn't use
> DFAPacketizer, can generate executable files from sample-code.c for
> both AArch64 and x86_64.
  ...> [2] https://reviews.llvm.org/D47943
I am sorry that I misunderstood that `origin/release_60' means
`LLVM 6.0.0' and the above link included many irrelevant differences.

I made new

   https://reviews.llvm.org/D47948

so please check this instead.

Best regards,
--
--------------------------------------
Masaki Arai
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180609/98a109fc/attachment.html>

Renato Golin via llvm-dev

2018-Jun-09 14:50 UTC

head link

[llvm-dev] [RFC] Porting MachinePipeliner to AArch64+SVE

On 8 June 2018 at 18:04, Masaki Arai via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> I made new
>
>    https://reviews.llvm.org/D47948
>
> so please check this instead.
Hi Masaki,

You can update the diff on the old review, I think it'll be easier, as
we don't have to keep adding all the people to it.

Also, make sure the review is against trunk, not a release.

-- 
cheers,
--renato

Florian Hahn via llvm-dev

2018-Jun-11 09:42 UTC

head link

[llvm-dev] [RFC] Porting MachinePipeliner to AArch64+SVE

Hi,

On 08/06/2018 15:11, Masaki Arai via llvm-dev wrote:> Hi,
> 
> I am extending LLVM for HPC applications.
> As one of them, I am trying to make MachinePipeliner available on
> AArch64 + Scalable Vector Extension environment.
> 
Great, thanks for looking into that.

IIUC from having a first look at your patch, there is nothing SVE 
specific there so far. Although it potentially will be very useful for 
SVE, it should also be beneficial for AArch64 without SVE and X86, 
right? As there are no scheduling models available for SVE in LLVM yet, 
I suppose it would be a good motivation if you could show some benefit 
on existing AArch64 or X86 cores with your proposed modelling.
> MachinePipeliner is currently used only by Hexagon CPU.
> Since it is a very portable implementation, I think that it will
> actually work just by adding a little code for many CPUs(See Code [2]).
> 
> The current MachinePipeliner is written on the premise that
> DFAPacketizer is used for resource management.
> However, I'd like to use MachinePipeliner in a way that does not use
> DFAPacketizer for the reasons described below(*).
> In MachinePipeliner implementation, only a small part is dependent on
> DFAPacketizer or Instruction itineraries.
> Therefore, I think that one of the following implementations is
> possible:
> 
> (a) creating a path in MachinePipeliner that does not use DFAPacketizer
> (b) making MachinePipeliner inheritable so that anyone can write code
>      that does not use DFAPacketizer
> 
> Since implementations using only Instruction itineraries without
> DFAPacketizer are possible, I don't think that I can use
> TargetSchedModel::hasInstrItineraries to select the execution path.
> Personally, I think that implementation of (b) is better.
> 
IMO it makes sense to go with (b), given that the dispatch overhead 
should be tiny compared to the other work that's going on and we also 
added similar hooks to the generic machine scheduler recently. But it 
seems like this is a smaller implementation detail and making sure we 
are getting the modelling aspect right is more important.

Thanks,
Florian

Masaki Arai via llvm-dev

2018-Jun-11 12:23 UTC

head link

[llvm-dev] [RFC] Porting MachinePipeliner to AArch64+SVE

Hi,

Thank you very much for your comments.

Florian Hahn <florian.hahn at arm.com> writes:> IIUC from having a first look at your patch, there is nothing SVE
> specific there so far. Although it potentially will be very useful for
> SVE, it should also be beneficial for AArch64 without SVE and X86,
> right?
Yes.
Our significant target is FUJITSU's AArch64+SVE CPU, but I think
MachinePipeliner is beneficial for AArch 64 without SVE or any ILP
RISC CPUs.
However, I'm not sure for x86.
> As there are no scheduling models available for SVE in LLVM
> yet, I suppose it would be a good motivation if you could show some
> benefit on existing AArch64 or X86 cores with your proposed modelling.
It is easy to make a small test set that can confirm performance
improvement.
However, I think there are many challenges to make MachinePipeliner
really beneficial on AArch64 without SVE for actual applications.
For example,
(a) Preparing the appropriate machine model for scheduling
(b) Consideration of register pressure in AArch64
    (Coordination with register allocation pass)
(c) Extending iteration dependence distance (2 or more)
(d) Consideration of the impact of VPlan's estimation
    (Coordination with VPlan)
(e) Consideration of the impact of loop optimizations
    (especially loop distribution)
(f) Consideration of the impact of flang

I would like to make it work only when option `-enable-pipeliner' is
specified until these issues are solved.
> IMO it makes sense to go with (b), given that the dispatch overhead
> should be tiny compared to the other work that's going on and we also
> added similar hooks to the generic machine scheduler recently. But it
> seems like this is a smaller implementation detail and making sure we
> are getting the modelling aspect right is more important.
One of the reasons for posting the RFC is that MachinePipeliner is
updated frequently.
Therefore, I would like to hear the opinion of MachinePipeliner
developers.
I am glad to make any patches, but since I do not have a Hexagon
environment, I'm worried whether I can thoroughly test them.

Best regards,
--
--------------------------------------
Masaki Arai
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180611/6c2ff1ef/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Jun 2018 - [RFC] Porting MachinePipeliner to AArch64+SVE

[llvm-dev] [RFC] Porting MachinePipeliner to AArch64+SVE

[llvm-dev] [RFC] Porting MachinePipeliner to AArch64+SVE

[llvm-dev] [RFC] Porting MachinePipeliner to AArch64+SVE

[llvm-dev] [RFC] Porting MachinePipeliner to AArch64+SVE

[llvm-dev] [RFC] Porting MachinePipeliner to AArch64+SVE

Possibly Parallel Threads