Ivan Llopard
2012-Jul-31  15:37 UTC
[LLVMdev] [RFC] Bundling support in the PostRA Scheduler
Hi, I'm working on a custom top-down post RA scheduler which builds bundles at the same time for our VLIW processor. I've borrowed most of the implementation from the resource priority queue implemented for the existent VLIW scheduler but applied to the context of MI scheduling. Basically, instructions that are likely to be bundled must be scheduled first (i.e. get higher priority). This work should integrate very well with the current infrastructure and I'd like to contribute with a patch to add bundling capabilities to the current post RA scheduler if the LLVM community is interested :-) (May Hexagon need it as well?). It would also be a great opportunity for us to get feedback from the community about this. We have a non-interlocked processor which relies on the post ra scheduler to emit cycle-accurate bundles (valid bundles without incurring in structural hazards). The construction of bundles outside the scope of post RA scheduling will require structural hazard information to work properly for processors without pipeline interlocks. For example, we can discover that an instruction can fit into the current packet (following a schema of linear code scanning, just like the current DFAPacketizer does) while in fact it cannot because of structural hazards. The two terms are strongly connected and necessary to build valid packets. The concerns are mainly based on our non-interlocked processor, where cycle-accurate bundle emission is necessary. Other approaches/ideas are very welcome. Do you have any plan for adding a more robust bundler into the current infrastructure ? Ivan
Tom Stellard
2012-Jul-31  20:47 UTC
[LLVMdev] [RFC] Bundling support in the PostRA Scheduler
On Tue, Jul 31, 2012 at 05:37:27PM +0200, Ivan Llopard wrote:> Hi, > > I'm working on a custom top-down post RA scheduler which builds bundles > at the same time for our VLIW processor. I've borrowed most of the > implementation from the resource priority queue implemented for the > existent VLIW scheduler but applied to the context of MI scheduling. > Basically, instructions that are likely to be bundled must be scheduled > first (i.e. get higher priority). > This work should integrate very well with the current infrastructure and > I'd like to contribute with a patch to add bundling capabilities to the > current post RA scheduler if the LLVM community is interested :-) (May > Hexagon need it as well?). It would also be a great opportunity for us > to get feedback from the community about this. >This could potentially be useful for the R600 backend, which targets VLIW GPUs (AMD HD2XXX-HD6XXX), so I would be very interested in this. -Tom> We have a non-interlocked processor which relies on the post ra > scheduler to emit cycle-accurate bundles (valid bundles without > incurring in structural hazards). The construction of bundles outside > the scope of post RA scheduling will require structural hazard > information to work properly for processors without pipeline interlocks. > For example, we can discover that an instruction can fit into the > current packet (following a schema of linear code scanning, just like > the current DFAPacketizer does) while in fact it cannot because of > structural hazards. The two terms are strongly connected and necessary > to build valid packets. > The concerns are mainly based on our non-interlocked processor, where > cycle-accurate bundle emission is necessary. Other approaches/ideas are > very welcome. > Do you have any plan for adding a more robust bundler into the current > infrastructure ? > > Ivan > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Jordy Potman
2012-Aug-06  08:48 UTC
[LLVMdev] [RFC] Bundling support in the PostRA Scheduler
On Tue, 2012-07-31 at 17:37 +0200, Ivan Llopard wrote:> Hi, > > I'm working on a custom top-down post RA scheduler which builds bundles > at the same time for our VLIW processor. I've borrowed most of the > implementation from the resource priority queue implemented for the > existent VLIW scheduler but applied to the context of MI scheduling. > Basically, instructions that are likely to be bundled must be scheduled > first (i.e. get higher priority). > This work should integrate very well with the current infrastructure and > I'd like to contribute with a patch to add bundling capabilities to the > current post RA scheduler if the LLVM community is interested :-) (May > Hexagon need it as well?). It would also be a great opportunity for us > to get feedback from the community about this. > > We have a non-interlocked processor which relies on the post ra > scheduler to emit cycle-accurate bundles (valid bundles without > incurring in structural hazards). The construction of bundles outside > the scope of post RA scheduling will require structural hazard > information to work properly for processors without pipeline interlocks. > For example, we can discover that an instruction can fit into the > current packet (following a schema of linear code scanning, just like > the current DFAPacketizer does) while in fact it cannot because of > structural hazards. The two terms are strongly connected and necessary > to build valid packets. > The concerns are mainly based on our non-interlocked processor, where > cycle-accurate bundle emission is necessary. Other approaches/ideas are > very welcome.We would be interested in this since we are targeting a non-interlocked VLIW processor as well. Currently we are using a custom post RA scheduler implemented in a pre emit pass. However, we would like to integrate better with and reuse more of the current LLVM infrastructure such as the post RA scheduler framework and bundling support. Jordy> Do you have any plan for adding a more robust bundler into the current > infrastructure ? > > Ivan > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Andrew Trick
2012-Aug-06  17:12 UTC
[LLVMdev] [RFC] Bundling support in the PostRA Scheduler
On Jul 31, 2012, at 8:37 AM, Ivan Llopard <ivanllopard at gmail.com> wrote:> Hi, > > I'm working on a custom top-down post RA scheduler which builds bundles > at the same time for our VLIW processor. I've borrowed most of the > implementation from the resource priority queue implemented for the > existent VLIW scheduler but applied to the context of MI scheduling. > Basically, instructions that are likely to be bundled must be scheduled > first (i.e. get higher priority). > This work should integrate very well with the current infrastructure and > I'd like to contribute with a patch to add bundling capabilities to the > current post RA scheduler if the LLVM community is interested :-) (May > Hexagon need it as well?). It would also be a great opportunity for us > to get feedback from the community about this. > > We have a non-interlocked processor which relies on the post ra > scheduler to emit cycle-accurate bundles (valid bundles without > incurring in structural hazards). The construction of bundles outside > the scope of post RA scheduling will require structural hazard > information to work properly for processors without pipeline interlocks. > For example, we can discover that an instruction can fit into the > current packet (following a schema of linear code scanning, just like > the current DFAPacketizer does) while in fact it cannot because of > structural hazards. The two terms are strongly connected and necessary > to build valid packets. > The concerns are mainly based on our non-interlocked processor, where > cycle-accurate bundle emission is necessary. Other approaches/ideas are > very welcome. > Do you have any plan for adding a more robust bundler into the current > infrastructure ? > > IvanHi Ivan, Your description sounds fine to me. I assume you are totally decoupled from what LLVM currently calls the "postRA" scheduling pass. Hopefully you don't need anything in PostRASchedulerList.cpp. Running your bundler as a preEmit pass is the cleanest approach. But if need be, we can support preRA bundling at the time the MachineScheduler currently runs (if enabled). TargetPassConfig allows you to substitute your own pass in place of MachineScheduler. Passes that run after MachineScheduler are intended to support instruction bundles. This feature is not extensively tested, so anyone taking this approach would need to work with backend maintainers to get things fixed. -Andy
Sergei Larin
2012-Aug-09  19:03 UTC
[LLVMdev] [RFC] Bundling support in the PostRA Scheduler
Ivan, IMHO the current bundle infrastructure is in need of some additional features - not only bundle handling utilities, but also support for liveness computation, several (key) code transformation passes etc. (see my recent post for conditional defs for instance). Does your back-end perform any substantial transformations to the code _after_ the second pass (postRA) scheduler and bundle formation/finalization? Does any of them might demand bundle decomposition? If so, I really wonder how do you plan to address that. We also have an MI based version of the current VLIW scheduler, that I am planning to bring up to date with the tip, and upstream ASAP. But it is preRA scheduler, and it does not preserve bundles (though it constructs them and easily could keep them around). The issue is what I have said above, many current passes do not understand bundles (well or at all) and accurate liveness update has been an issue for me recently. I also tried to mess with PostRA scheduler to achieve similar goals, only to find out that all the additional dependencies after RA make it virtually impossible to produce high quality schedule, and obviously it is too late at that point to address reg pressure via scheduling techniques, so I have put that project on the back burner for a while. I guess what I am saying is that you are not alone in this :) ...and we do need to formulate our requirements clearly... and repeat them regularly :) Hope this helps. Sergei -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.> -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Ivan Llopard > Sent: Tuesday, July 31, 2012 10:37 AM > To: LLVM Developers Mailing List > Subject: [LLVMdev] [RFC] Bundling support in the PostRA Scheduler > > Hi, > > I'm working on a custom top-down post RA scheduler which builds bundles > at the same time for our VLIW processor. I've borrowed most of the > implementation from the resource priority queue implemented for the > existent VLIW scheduler but applied to the context of MI scheduling. > Basically, instructions that are likely to be bundled must be scheduled > first (i.e. get higher priority). > This work should integrate very well with the current infrastructure > and I'd like to contribute with a patch to add bundling capabilities to > the current post RA scheduler if the LLVM community is interested :-) > (May Hexagon need it as well?). It would also be a great opportunity > for us to get feedback from the community about this. > > We have a non-interlocked processor which relies on the post ra > scheduler to emit cycle-accurate bundles (valid bundles without > incurring in structural hazards). The construction of bundles outside > the scope of post RA scheduling will require structural hazard > information to work properly for processors without pipeline > interlocks. > For example, we can discover that an instruction can fit into the > current packet (following a schema of linear code scanning, just like > the current DFAPacketizer does) while in fact it cannot because of > structural hazards. The two terms are strongly connected and necessary > to build valid packets. > The concerns are mainly based on our non-interlocked processor, where > cycle-accurate bundle emission is necessary. Other approaches/ideas are > very welcome. > Do you have any plan for adding a more robust bundler into the current > infrastructure ? > > Ivan > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Pekka Jääskeläinen
2012-Aug-10  07:28 UTC
[LLVMdev] [RFC] Bundling support in the PostRA Scheduler
Hi, On 08/09/2012 10:03 PM, Sergei Larin wrote:> I also tried to mess with PostRA scheduler to achieve similar goals, only > to find out that all the additional dependencies after RA make it virtually > impossible to produce high quality schedule, and obviously it is too late at > that point to address reg pressure via scheduling techniques, so I have put > that project on the back burner for a while.Has anyone studied how much work it would be to implement an integrated allocator/scheduler in LLVM now? It should be an efficient solution to the VLIW RA/scheduling phase ordering problem, but of course it's more complex to implement and modularize. (I'm unsure if it's possible to exploit the current register allocator code easily with it). Another solution (which we use in TCE) is to use register renaming. That is, rename schedule-restricting antideps introduced by the pre-RA "on the fly" during scheduling. It's sort of a middle ground solution that allows modularizing the proper RA to a separate pass cleanly. However, its proper implementation goes step-by-step towards the integrated allocator (smarter the renamer becomes, more it looks like a proper RA). BR, -- Pekka
Ivan Llopard
2012-Aug-13  10:07 UTC
[LLVMdev] [RFC] Bundling support in the PostRA Scheduler
Hi all, Thanks for your feed-backs :-) @Andrew: In fact, I've reused most of the postRA list scheduler code and the resource priority queue. Every time it needs to move forward, either because of a res hazard (HazardRec) or an invalid combination of instructions in the current packet (DFA), it closes the current bundle and advances to the next cycle. The non-interlocked nature of our processor forces the bundling logic to live with the scheduling logic. We cannot build bundles without the scoreboard. I also tried to build bundles as a preRA pass in order to reduce the register pressure (so the RA will take full advantage of the vliw architecture). I've ran into some problems such as the re-materialization one that we have discussed some time ago (http://llvm.1065342.n5.nabble.com/Instruction-bundles-before-RA-Rematerialization-td45900.html) and the liveness re-computation while moving MI's into packets, where we have contributed with a patch. Other problems are related to our specific BE implementation which doesn't allow us to get good performances with the preRA bundler. The preRA bundling forces a starting point from which the postRA bundler must start with which may or may not be the optimal point. Without bundles before RA, the register pressure will be higher but the postRA bundler will get the freedom it needs to build better bundles. It seems to have a trade-off between reg pressure and bundling capabilities. We have chosen the latter. @Sergei: It's good to see that you are working on it also :-). ATM, we don't do any transformation which may affect bundles. The postRA scheduler will schedule and packetize all MI's then we run the packetFinalization pass. Bundle decomposition is somewhat complex in non-interlocked processors where the DFA is not enough to rebuild them. @Pekka: We don't care about anti-deps as far as the dependent MI's can fit into the same bundle. There are anti-dep breakers in llvm and the requested one runs together with the postRA scheduler. The changes I'd like to propose are mainly based in: - Adapting the current resource priority queue to work with MI's. - And either add a new postRA SchedBundler or modify the existent one. But I think I should wait until Sergei send upstream his MI based scheduler. Sergei, are you working on some resource priority queue at MI level? Ivan On 06/08/2012 19:12, Andrew Trick wrote:> On Jul 31, 2012, at 8:37 AM, Ivan Llopard <ivanllopard at gmail.com> wrote: > >> Hi, >> >> I'm working on a custom top-down post RA scheduler which builds bundles >> at the same time for our VLIW processor. I've borrowed most of the >> implementation from the resource priority queue implemented for the >> existent VLIW scheduler but applied to the context of MI scheduling. >> Basically, instructions that are likely to be bundled must be scheduled >> first (i.e. get higher priority). >> This work should integrate very well with the current infrastructure and >> I'd like to contribute with a patch to add bundling capabilities to the >> current post RA scheduler if the LLVM community is interested :-) (May >> Hexagon need it as well?). It would also be a great opportunity for us >> to get feedback from the community about this. >> >> We have a non-interlocked processor which relies on the post ra >> scheduler to emit cycle-accurate bundles (valid bundles without >> incurring in structural hazards). The construction of bundles outside >> the scope of post RA scheduling will require structural hazard >> information to work properly for processors without pipeline interlocks. >> For example, we can discover that an instruction can fit into the >> current packet (following a schema of linear code scanning, just like >> the current DFAPacketizer does) while in fact it cannot because of >> structural hazards. The two terms are strongly connected and necessary >> to build valid packets. >> The concerns are mainly based on our non-interlocked processor, where >> cycle-accurate bundle emission is necessary. Other approaches/ideas are >> very welcome. >> Do you have any plan for adding a more robust bundler into the current >> infrastructure ? >> >> Ivan > Hi Ivan, > > Your description sounds fine to me. I assume you are totally decoupled from what LLVM currently calls the "postRA" scheduling pass. Hopefully you don't need anything in PostRASchedulerList.cpp. > > Running your bundler as a preEmit pass is the cleanest approach. But if need be, we can support preRA bundling at the time the MachineScheduler currently runs (if enabled). TargetPassConfig allows you to substitute your own pass in place of MachineScheduler. Passes that run after MachineScheduler are intended to support instruction bundles. This feature is not extensively tested, so anyone taking this approach would need to work with backend maintainers to get things fixed. > > -Andy
Possibly Parallel Threads
- [LLVMdev] [RFC] Bundling support in the PostRA Scheduler
- [LLVMdev] [RFC] Bundling support in the PostRA Scheduler
- [LLVMdev] [RFC] Bundling support in the PostRA Scheduler
- [LLVMdev] Enabling MI Scheduler on x86 (was Experimental Evaluation of the Schedulers in LLVM 3.3)
- [LLVMdev] VLIWPacketizerList: failing to schedule terminators