On Thu, 10 May 2012 20:33:53 -0700 Andrew Trick <atrick at apple.com> wrote:> On May 9, 2012, at 8:34 AM, dag at cray.com wrote: > > > Andrew Trick <atrick at apple.com> writes: > > > >>> When I asked about enhancing scheduler heuristics a month or so > >>> ago, I got a response about a MachineInstr scheduler and that > >>> that was the way of the LLVM future. Is that so? Is the > >>> ScheduleDAG going away? > >> > >> You sent a lengthy RFC on Apr 20 that demonstrated you aren't > >> following developments on trunk. That's perfectly fine, but if you > >> want to use the new scheduler before it is mature, you'll need to > >> follow trunk. > > > > Ok, but that doesn't answer the question. Is SchedulerDAG going > > away? If so, what's the timeframe for that? 3.2? > > SchedulerDAG is used for both SD scheduling and MI scheduling. It's > not going away. > > SD scheduling is not going away in 3.2--it will be the first release > with MI scheduling on by default. > > If all goes well, I expect SD scheduling to be removed by 3.3. That > has not been discussed. > > Consider this the preliminary announcement. I'll post another > announcement as soon as we have something that's more broadly > interesting. In the current state it's only interesting for someone > just beginning to write their own custom scheduler. > > Here's a more complete list of the implementation steps, but the real > effort will be spent in performance analysis required before flipping > the switch. Don't expect it to be an adequate replacement out-of-box > for your benchmarks before 3.2. > > - Target pass configuration: DONE > - MachineScheduler pass framework: DONE > - MI Scheduling DAG: DONE > - AliasAnalysis aware DAG option: In review (Sergei) > - Bidirectional list scheduling: DONE > - LiveInterval Update: WIP (simple instruction reordering is > supported) > - Target-independent precise modeling of register pressure: DONE > - Register pressure reduction scheduler: WIP > - Support for existing HazardChecker pluginIs support for the existing hazard detectors working now? [it does not say DONE or WIP here, but your comment below implies, I think, that it is at least partially working].> - New target description feature: buffered resources > - Modeling min latency, expected latency, and resources constraintsCan you comment on how min and expected latency will be used in the scheduling?> - Heuristics that balance interlocks, regpressure, latency and > buffered resources > > For targets where scheduling is critical, I encourage developers who > stay in sync with trunk to write their own target-specific scheduler > based on the pieces that are already available. Hexagon developers > are doing this now. The LLVM toolkit for scheduling is all there--not > perfect, but ready for developers. > > - Pluggable MachineScheduler pass > - Scheduling DAG > - LiveInterval Update > - RegisterPressure tracking > - InstructionItinerary and HazardChecker (to be extended) > > If you would simply like improved X86 scheduling without rolling your > own, then providing feedback and test cases is useful so we can > incorporate improvements into the standard scheduler while it's being > developed.Does this mean that we're going to see a new X86 scheduling paradigm, or is the existing ILP heuristic, in large part, expected to stay? Thanks again, Hal> > >> Feel free to use the patch and send your thanks to Hal. It doesn't > >> serve any purpose to mainline a partial solution only to replace it > >> before it can ever be enabled by default, which would require a > >> major performance investigation and introduces a huge risk > >> (AliasAnalysis in CodeGen has not be well tested). > > > > Er, but as I understand it the MachineInstr scheduler will also use > > alias analysis. > > AliasAnalysis is important. We want it fully supported and enabled by > default, but that requires effort beyond simply enabling it. Today, > that effort is in the MI scheduler. > > >> The last thing we want to do is duplicate what's already planned > > to happen. > > > The information I provided above is the best I can do, and as early > as I could provide this level of detail. If you follow trunk, you can > see the direction things are heading, but until recently I would not > have been able to tell you "plans" in the form of dates or release > goals. > > -Andy > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > lists.cs.uiuc.edu/mailman/listinfo/llvmdev-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
On May 10, 2012, at 9:06 PM, Hal Finkel <hfinkel at anl.gov> wrote:>> - Target pass configuration: DONE >> - MachineScheduler pass framework: DONE >> - MI Scheduling DAG: DONE >> - AliasAnalysis aware DAG option: In review (Sergei) >> - Bidirectional list scheduling: DONE >> - LiveInterval Update: WIP (simple instruction reordering is >> supported) >> - Target-independent precise modeling of register pressure: DONE >> - Register pressure reduction scheduler: WIP >> - Support for existing HazardChecker plugin > > Is support for the existing hazard detectors working now? [it does not > say DONE or WIP here, but your comment below implies, I think, that it > is at least partially working].Glad you're interested. I can explain. We have several important tools in LLVM that most schedulers will need. That's what I was listing below (Configurable pass, DAG, LI update, RegPressure, Itinerary, HazardChecker--normally called a reservation table). I really should have also mentioned the DFAPacketizer developed by the Hexagon team. It's being used by their VLIW scheduler, but not by the new "standard" scheduler that I'm working on. Now that I mentioned that, I should mention MachineInstrBundles, which was a necessary IR feature to support the VLIW scheduler, but has other random uses--sometimes we want to glue machine instructions. HazardChecker was already being used by the PostRA scheduler before I started working on infrastructure for a new scheduler. So it's there, and can be used by custom schedulers. My first goal was to complete all of these pieces. They're in pretty good shape now but not well tested. The target independent model for register pressure derived from arbitrary register definitions was by far the most difficult aspect. Now I need to develop a standard scheduling algorithm that will work reasonably well for any target given the register description and optionally a scheduling itinerary. The register pressure reduction heuristic was the first that I threw into the standard scheduler because it's potentially useful by itself. It's WIP. I haven't plugged in the HazardChecker, but it's quite straightforward. At that point, I'll have two competing scheduling constraints and will begin implementing a framework for balancing those constraints. I'll also add fuzzy constraints such as expected latency and other cpu resources. When I get to that point, I'll explain more, and I hope you and others will follow along and help with performance analysis and heuristics. I will point out one important aspect of the design now. If scheduling is very important for your target's performance, and you are highly confident that you model your microarchitecture effectively and have just the right heuristics, then it might make sense to give the scheduler free reign to shuffle the instructions. The standard MachineScheduler will not make that assumption. It certainly can be tuned to be as aggressive as we like, but unless there is high level of confidence that reordering instructions will be beneficial, we don't want to do it. Rule #1 is not to follow blind heuristics that reschedule reasonable code into something pathologically bad. This notion of confidence is not something schedulers typically have, and is fundamental to the design. For example, most schedulers have to deal with opposing constraints of register pressure and ILP. An aggressive way to deal with this is by running two separate scheduling passes. First top-down to find the optimal latency, then bottom-up to minimize resources needed to achieve that latency. Naturally, after the first pass, you've shuffled instructions beyond all recognition. Instead, we deal with this problem by scheduling in both directions simultaneously. At each point, we know which resources and constraints are likely to impact the cpu pipeline in both the top and bottom of the scheduling region. Doing this doesn't solve any fundamental problem, but it gives the scheduler great freedom at each point, including the freedom to do absolutely nothing, which is probably exactly what you want for a fair amount of X86 code.>> - New target description feature: buffered resources >> - Modeling min latency, expected latency, and resources constraints > > Can you comment on how min and expected latency will be used in the > scheduling?In the new scheduler's terminology, min latency is an interlocked resource, and expected latency is a buffered resource. Interlocked resources are used to form instruction groups (for performance only, not correctness). For out-of-order targets with register rename, we can use zero-cycle min latency so there is no interlock within an issue groups. Instead we know expected latency of the scheduled instructions relative to the critical path. We can balance the schedule so that neither the expected latency of the top nor bottom scheduled instructions exceed the overall critical path. This way, we will slice up two very long independent chains into neat chunks, instead of the random shuffling that we do today.>> - Heuristics that balance interlocks, regpressure, latency and >> buffered resources >> >> For targets where scheduling is critical, I encourage developers who >> stay in sync with trunk to write their own target-specific scheduler >> based on the pieces that are already available. Hexagon developers >> are doing this now. The LLVM toolkit for scheduling is all there--not >> perfect, but ready for developers. >> >> - Pluggable MachineScheduler pass >> - Scheduling DAG >> - LiveInterval Update >> - RegisterPressure tracking >> - InstructionItinerary and HazardChecker (to be extended) >> >> If you would simply like improved X86 scheduling without rolling your >> own, then providing feedback and test cases is useful so we can >> incorporate improvements into the standard scheduler while it's being >> developed. > > Does this mean that we're going to see a new X86 scheduling paradigm, > or is the existing ILP heuristic, in large part, expected to stay?It's a new paradigm but not a change in focus--we're not modeling the microarchitecture in any greater detail. Although other contributors are encouraged to do that. Both schedulers will be supported for a time. In fact it will make sense to run both in the same compile, until MISched is good enough to take over. It will be easy to determine when one scheduler is doing better than the other. I'm relying on you to tell me when it's doing the wrong thing. -Andy
My 2c... Even though I understand it might be way off in the future, but we are talking about long term plans here anyway. Also as a VLIW backend maintainer, I just have to say it :) - We do need to have a way to assign bundles much earlier than we do now. And it needs to be intertwined with scheduling (Bundler currently reuses a good chunk of scheduler infrastructure). It is also obvious that it will have adverse effect on all the downstream passes. It is further insulting due to the fact that bundling is trivial to do during scheduling, but it goes hard against the original assumptions made elsewhere. Re-bundling is also a distinct task that might need to be addressed in this context. - We do need to have at least a distant plan for global scheduling. BB scope is nice and manageable, but I can easily get several percent of "missed" performance by simply implementing a pull-up pass at the very end of code generation... meaning multiple opportunities were lost earlier. Current way to express and maintain state needed for global scheduling remains to be improved. - SW pipelining and scheduler interaction with it. When (not if:) we will have a robust SW pipeliner it will likely to take place before first scheduling pass, and we do not want to "undo" some decision made there. Sergei -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.> -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Andrew Trick > Sent: Friday, May 11, 2012 12:29 AM > To: Hal Finkel > Cc: dag at cray.com; wrf at cray.com; LLVM Developers Mailing List > Subject: Re: [LLVMdev] Scheduler Roadmap > > On May 10, 2012, at 9:06 PM, Hal Finkel <hfinkel at anl.gov> wrote: > >> - Target pass configuration: DONE > >> - MachineScheduler pass framework: DONE > >> - MI Scheduling DAG: DONE > >> - AliasAnalysis aware DAG option: In review (Sergei) > >> - Bidirectional list scheduling: DONE > >> - LiveInterval Update: WIP (simple instruction reordering is > >> supported) > >> - Target-independent precise modeling of register pressure: DONE > >> - Register pressure reduction scheduler: WIP > >> - Support for existing HazardChecker plugin > > > > Is support for the existing hazard detectors working now? [it does > not > > say DONE or WIP here, but your comment below implies, I think, that > it > > is at least partially working]. > > Glad you're interested. I can explain. We have several important tools > in LLVM that most schedulers will need. That's what I was listing below > (Configurable pass, DAG, LI update, RegPressure, Itinerary, > HazardChecker--normally called a reservation table). > > I really should have also mentioned the DFAPacketizer developed by the > Hexagon team. It's being used by their VLIW scheduler, but not by the > new "standard" scheduler that I'm working on. > > Now that I mentioned that, I should mention MachineInstrBundles, which > was a necessary IR feature to support the VLIW scheduler, but has other > random uses--sometimes we want to glue machine instructions. > > HazardChecker was already being used by the PostRA scheduler before I > started working on infrastructure for a new scheduler. So it's there, > and can be used by custom schedulers. > > My first goal was to complete all of these pieces. They're in pretty > good shape now but not well tested. The target independent model for > register pressure derived from arbitrary register definitions was by > far the most difficult aspect. Now I need to develop a standard > scheduling algorithm that will work reasonably well for any target > given the register description and optionally a scheduling itinerary. > > The register pressure reduction heuristic was the first that I threw > into the standard scheduler because it's potentially useful by itself. > It's WIP. > > I haven't plugged in the HazardChecker, but it's quite straightforward. > > At that point, I'll have two competing scheduling constraints and will > begin implementing a framework for balancing those constraints. I'll > also add fuzzy constraints such as expected latency and other cpu > resources. When I get to that point, I'll explain more, and I hope you > and others will follow along and help with performance analysis and > heuristics. > > I will point out one important aspect of the design now. If scheduling > is very important for your target's performance, and you are highly > confident that you model your microarchitecture effectively and have > just the right heuristics, then it might make sense to give the > scheduler free reign to shuffle the instructions. The standard > MachineScheduler will not make that assumption. It certainly can be > tuned to be as aggressive as we like, but unless there is high level of > confidence that reordering instructions will be beneficial, we don't > want to do it. Rule #1 is not to follow blind heuristics that > reschedule reasonable code into something pathologically bad. This > notion of confidence is not something schedulers typically have, and is > fundamental to the design. > > For example, most schedulers have to deal with opposing constraints of > register pressure and ILP. An aggressive way to deal with this is by > running two separate scheduling passes. First top-down to find the > optimal latency, then bottom-up to minimize resources needed to achieve > that latency. Naturally, after the first pass, you've shuffled > instructions beyond all recognition. Instead, we deal with this problem > by scheduling in both directions simultaneously. At each point, we know > which resources and constraints are likely to impact the cpu pipeline > in both the top and bottom of the scheduling region. Doing this doesn't > solve any fundamental problem, but it gives the scheduler great freedom > at each point, including the freedom to do absolutely nothing, which is > probably exactly what you want for a fair amount of X86 code. > > >> - New target description feature: buffered resources > >> - Modeling min latency, expected latency, and resources constraints > > > > Can you comment on how min and expected latency will be used in the > > scheduling? > > In the new scheduler's terminology, min latency is an interlocked > resource, and expected latency is a buffered resource. Interlocked > resources are used to form instruction groups (for performance only, > not correctness). For out-of-order targets with register rename, we > can use zero-cycle min latency so there is no interlock within an issue > groups. Instead we know expected latency of the scheduled instructions > relative to the critical path. We can balance the schedule so that > neither the expected latency of the top nor bottom scheduled > instructions exceed the overall critical path. This way, we will slice > up two very long independent chains into neat chunks, instead of the > random shuffling that we do today. > > >> - Heuristics that balance interlocks, regpressure, latency and > >> buffered resources > >> > >> For targets where scheduling is critical, I encourage developers who > >> stay in sync with trunk to write their own target-specific scheduler > >> based on the pieces that are already available. Hexagon developers > >> are doing this now. The LLVM toolkit for scheduling is all there-- > not > >> perfect, but ready for developers. > >> > >> - Pluggable MachineScheduler pass > >> - Scheduling DAG > >> - LiveInterval Update > >> - RegisterPressure tracking > >> - InstructionItinerary and HazardChecker (to be extended) > >> > >> If you would simply like improved X86 scheduling without rolling > your > >> own, then providing feedback and test cases is useful so we can > >> incorporate improvements into the standard scheduler while it's > being > >> developed. > > > > Does this mean that we're going to see a new X86 scheduling paradigm, > > or is the existing ILP heuristic, in large part, expected to stay? > > It's a new paradigm but not a change in focus--we're not modeling the > microarchitecture in any greater detail. Although other contributors > are encouraged to do that. > > Both schedulers will be supported for a time. In fact it will make > sense to run both in the same compile, until MISched is good enough to > take over. It will be easy to determine when one scheduler is doing > better than the other. I'm relying on you to tell me when it's doing > the wrong thing. > > -Andy > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > lists.cs.uiuc.edu/mailman/listinfo/llvmdev