Masaki Arai via llvm-dev
2018-Jun-08 14:11 UTC
[llvm-dev] [RFC] Porting MachinePipeliner to AArch64+SVE
Hi, I am extending LLVM for HPC applications. As one of them, I am trying to make MachinePipeliner available on AArch64 + Scalable Vector Extension environment. MachinePipeliner is currently used only by Hexagon CPU. Since it is a very portable implementation, I think that it will actually work just by adding a little code for many CPUs(See Code [2]). The current MachinePipeliner is written on the premise that DFAPacketizer is used for resource management. However, I'd like to use MachinePipeliner in a way that does not use DFAPacketizer for the reasons described below(*). In MachinePipeliner implementation, only a small part is dependent on DFAPacketizer or Instruction itineraries. Therefore, I think that one of the following implementations is possible: (a) creating a path in MachinePipeliner that does not use DFAPacketizer (b) making MachinePipeliner inheritable so that anyone can write code that does not use DFAPacketizer Since implementations using only Instruction itineraries without DFAPacketizer are possible, I don't think that I can use TargetSchedModel::hasInstrItineraries to select the execution path. Personally, I think that implementation of (b) is better. Also, if predicated instructions like SVE are available, prologue and epilogue code generation using predicated execution as shown in the reference[1] may be possible. In this case, if we choose the implementation of (b) and it is possible to override SwingSchedulerDAG::generatePipelinedLoop, I think that it can easily be extended. Comments or suggestions are welcome. Thank you very much. Best regards, -- -------------------------------------- Masaki Arai ======================================= (*) Currently, many CPU scheduling models are defined by the form not using Instruction itineraries. Therefore, they have the form 1 or 2 in the following TargetSchedule.td: // The SchedMachineModel is defined by subtargets for three categories of data: // 1. Basic properties for coarse grained instruction cost model. // 2. Scheduler Read/Write resources for simple per-opcode cost model. // 3. Instruction itineraries for detailed reservation tables. By making MachinePipeliner work even in a form not using Instruction itineraries, we will be able to run MachinePipeliner's execution test on various machines, even if we do not use it on those machines. Instruction itineraries essentially expresses the following correspondence: opcode ==> {FU1, FU2, ...} and DFAPacketizer uses DFA with opcodes. In order to strictly schedule predicated instructions like SVE, We need to consider that following two instructions use pipeline resources exclusively in the same cycle: MI1 if P ==> {FU1, FU2, ...} MI2 if Q ==> {FU1, FU2, ...} where predicate P and Q hold P == not Q. However, I don't think that current DFAPacketizer can represent these situations. References: [1] Code Generation Schemas for Modulo Scheduled DO-loops and WHILE-loops http://www.hpl.hp.com/techreports/92/HPL-92-47.pdf?jumpid=reg_R1002_USEN Code: The sample patch for origin/release_60 [2], which doesn't use DFAPacketizer, can generate executable files from sample-code.c for both AArch64 and x86_64. [AArch64]% clang -O2 -mcpu=thunderx2t99 -mllvm -enable-pipeliner -mllvm -pipeliner-max=100 sample-code.c [x86_64] % clang -O2 -march=sandybridge -mllvm -enable-pipeliner -mllvm -pipeliner-max=100 sample-code.c [2] https://reviews.llvm.org/D47943 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180608/10496bc3/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: sample-code.c Type: application/octet-stream Size: 468 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180608/10496bc3/attachment-0001.obj>
Masaki Arai via llvm-dev
2018-Jun-08 17:04 UTC
[llvm-dev] [RFC] Porting MachinePipeliner to AArch64+SVE
Hi, Masaki Arai via llvm-dev <llvm-dev at lists.llvm.org> writes:> Code: > > The sample patch for origin/release_60 [2], which doesn't use > DFAPacketizer, can generate executable files from sample-code.c for > both AArch64 and x86_64....> [2] https://reviews.llvm.org/D47943I am sorry that I misunderstood that `origin/release_60' means `LLVM 6.0.0' and the above link included many irrelevant differences. I made new https://reviews.llvm.org/D47948 so please check this instead. Best regards, -- -------------------------------------- Masaki Arai -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180609/98a109fc/attachment.html>
Renato Golin via llvm-dev
2018-Jun-09 14:50 UTC
[llvm-dev] [RFC] Porting MachinePipeliner to AArch64+SVE
On 8 June 2018 at 18:04, Masaki Arai via llvm-dev <llvm-dev at lists.llvm.org> wrote:> I made new > > https://reviews.llvm.org/D47948 > > so please check this instead.Hi Masaki, You can update the diff on the old review, I think it'll be easier, as we don't have to keep adding all the people to it. Also, make sure the review is against trunk, not a release. -- cheers, --renato
Florian Hahn via llvm-dev
2018-Jun-11 09:42 UTC
[llvm-dev] [RFC] Porting MachinePipeliner to AArch64+SVE
Hi, On 08/06/2018 15:11, Masaki Arai via llvm-dev wrote:> Hi, > > I am extending LLVM for HPC applications. > As one of them, I am trying to make MachinePipeliner available on > AArch64 + Scalable Vector Extension environment. >Great, thanks for looking into that. IIUC from having a first look at your patch, there is nothing SVE specific there so far. Although it potentially will be very useful for SVE, it should also be beneficial for AArch64 without SVE and X86, right? As there are no scheduling models available for SVE in LLVM yet, I suppose it would be a good motivation if you could show some benefit on existing AArch64 or X86 cores with your proposed modelling.> MachinePipeliner is currently used only by Hexagon CPU. > Since it is a very portable implementation, I think that it will > actually work just by adding a little code for many CPUs(See Code [2]). > > The current MachinePipeliner is written on the premise that > DFAPacketizer is used for resource management. > However, I'd like to use MachinePipeliner in a way that does not use > DFAPacketizer for the reasons described below(*). > In MachinePipeliner implementation, only a small part is dependent on > DFAPacketizer or Instruction itineraries. > Therefore, I think that one of the following implementations is > possible: > > (a) creating a path in MachinePipeliner that does not use DFAPacketizer > (b) making MachinePipeliner inheritable so that anyone can write code > that does not use DFAPacketizer > > Since implementations using only Instruction itineraries without > DFAPacketizer are possible, I don't think that I can use > TargetSchedModel::hasInstrItineraries to select the execution path. > Personally, I think that implementation of (b) is better. >IMO it makes sense to go with (b), given that the dispatch overhead should be tiny compared to the other work that's going on and we also added similar hooks to the generic machine scheduler recently. But it seems like this is a smaller implementation detail and making sure we are getting the modelling aspect right is more important. Thanks, Florian
Masaki Arai via llvm-dev
2018-Jun-11 12:23 UTC
[llvm-dev] [RFC] Porting MachinePipeliner to AArch64+SVE
Hi, Thank you very much for your comments. Florian Hahn <florian.hahn at arm.com> writes:> IIUC from having a first look at your patch, there is nothing SVE > specific there so far. Although it potentially will be very useful for > SVE, it should also be beneficial for AArch64 without SVE and X86, > right?Yes. Our significant target is FUJITSU's AArch64+SVE CPU, but I think MachinePipeliner is beneficial for AArch 64 without SVE or any ILP RISC CPUs. However, I'm not sure for x86.> As there are no scheduling models available for SVE in LLVM > yet, I suppose it would be a good motivation if you could show some > benefit on existing AArch64 or X86 cores with your proposed modelling.It is easy to make a small test set that can confirm performance improvement. However, I think there are many challenges to make MachinePipeliner really beneficial on AArch64 without SVE for actual applications. For example, (a) Preparing the appropriate machine model for scheduling (b) Consideration of register pressure in AArch64 (Coordination with register allocation pass) (c) Extending iteration dependence distance (2 or more) (d) Consideration of the impact of VPlan's estimation (Coordination with VPlan) (e) Consideration of the impact of loop optimizations (especially loop distribution) (f) Consideration of the impact of flang I would like to make it work only when option `-enable-pipeliner' is specified until these issues are solved.> IMO it makes sense to go with (b), given that the dispatch overhead > should be tiny compared to the other work that's going on and we also > added similar hooks to the generic machine scheduler recently. But it > seems like this is a smaller implementation detail and making sure we > are getting the modelling aspect right is more important.One of the reasons for posting the RFC is that MachinePipeliner is updated frequently. Therefore, I would like to hear the opinion of MachinePipeliner developers. I am glad to make any patches, but since I do not have a Hexagon environment, I'm worried whether I can thoroughly test them. Best regards, -- -------------------------------------- Masaki Arai -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180611/6c2ff1ef/attachment.html>