Hey Everyone, I'd like to begin a project to rework the scheduler to address some problems we've discovered on this end. The goal is to get a more configurable/flexible scheduler while simplifying maintenance by separating policy from implementation to get independent and interchangeable parts. This is going to be challenging because we are still stuck on LLVM 2.9. We will be upgrading to LLVM 3.1 when it is released but I really need to start this work now to meet work deadlines. But I want to make sure I have something I can contribute upstream so I want to present the idea and get some input on whether it is a good design and how best to move it upstream. It looks like the fundamentals of the scheduler haven't changed much from 2.9 to 3.1 so that makes it a bit easier. Most specifically, I would like some comment on whether it is reasonable to just start a new RegReduction-like scheduler from scratch rather than try to incrementally change the existing one. I'm looking at a pretty invasive reengineering of the scheduler structure so that it will have familiar elements but be put together quite differently. That's why I'm not sure incremental change to the existing code makes sense. The two implementations could live side-by-side for a while if that makes sense. The existing RegReduction scheduler design goes something like this: SchedulingPriorityQueue ^ | RegReductionPQBase ^ | RegReductionPriorityQueue<SF> ^ /|\ ______/ | \______ | | | Heuristic 1 ... Heuristic N The only point of customization right now is the template parameter to RegReductionPriorityQueue. This is the sort criteria for prioritizing which node gets scheduled next. SchedulingPriorityQueue has some virtual functions which can be (and are) overridden by RegReductionPQBase and/or RegReductionPriorityQueue. However, the current inheritance structure makes it impossible to tweak much even by subclassing RegReductionPriorityQueue, either because the customization points aren't virtual (easy to fix) or because customization points are coupled (hard to fix without a redesign). Here are some things we have found useful to tune. Most are not currently orthogonal and thus are not easily separated from each other: - Node priority function (the SF parameter above). This is well-separated and easily changed. - Queue implementation. The current implementation uses a std::vector<> and has very bad asymptotic performance (N^2 currently). Previously this was a std::priority_queue<> but was changed because it missed some of the dynamic variance of node priorities. We have found it useful and essential to be able to replace this but even after inheriting from RegReductionPriorityQueue it's pretty ugly because there are two spearate queues in the scheduler, one completely unused. Different queue types exhibit different heuristic behaviors based on how much dynamism they provide. Thus the queue implementation is policy that really ought to be separated from the scheduler proper. - Register pressure heuristic. The ILP scheduler uses a fairly simplistic register pressure calculation which is pretty good in most cases but we have found it to not work well in a few important cases. It would be nice to make this a customization point. It is a separate heuristic from the node priority function. In fact some node priority functions look at the register pressure heuristic to calculate their result. The two are orthogonal and we should be able to vary them independently. Currently the register pressure heuristic is coupled tightly to RegReductionPQBase. Again due to the data structures begin in RegReductionPQBase it is not convenient to simply make some of the heuristic methods virtual and override them. Such an arrangement doesn't allow easy mix-and-match among register pressure and node priority heuristics. Moreover, more elaborate register pressure calculations will almost certainly involve very different data structures. Here's a quick sketch of a new design floating in my head. I don't have any code yet so this is subject to change but it gets the general idea across. The idea is to configure the scheduler via policy classes. SchedulingPriorityQueue ^ | RegReductionPriorityQueue<Queue, Pressure, Sort> ^ ^ ^ /|\ | | ____/ | \____ | | | | | | | Queue 1 ... Queue N | | | | | | /|\ | ______/ | \_____ | | | | | Pressure 1 ... Pressure N | /|\ ___/ | \___ | | | Sort 1 ... Sort N I'll have to work out exactly how to allow the policy classes to communicate, for example to allow the Sort heuristic to contact the Pressure heuristic for information. There are a few ways to do that. One is to reintroduce RegReductionPQBase with some pure virtual functions and use multiple inheritance from RegReductionPriorityQueue to let the policy classes override them. A better way, I think, would be to use composition and pass pointers to each policy object to the others. The above design also should allow simple expansion in the future. If we identify more customization points we can introduce new policy classes pretty easily. Does the above look like a good design? If so, I believe it is easiest to build this from scratch (borrowing a lot of code from the current scheduler) rather than try to incrementally morph the curent scheduler to this design but I am open to guidance. Thanks for your help! -Dave
On Apr 20, 2012, at 10:31 AM, dag at cray.com wrote:> I'd like to begin a project to rework the scheduler to address some > problems we've discovered on this end. The goal is to get a more > configurable/flexible scheduler while simplifying maintenance by > separating policy from implementation to get independent and > interchangeable parts. > > This is going to be challenging because we are still stuck on LLVM 2.9. > We will be upgrading to LLVM 3.1 when it is released but I really need > to start this work now to meet work deadlines. But I want to make sure > I have something I can contribute upstream so I want to present the idea > and get some input on whether it is a good design and how best to move > it upstream. It looks like the fundamentals of the scheduler haven't > changed much from 2.9 to 3.1 so that makes it a bit easier.We plan to move to the MachineScheduler by 3.2. The structure is: ScheduleDAG: Abstract DAG of SUnits and SDeps | v ScheduleDAGInstrs: Build the DAG from MachineInstrs, each SUnit tied to an MI Delimit the current "region" of code being scheduled. | v ScheduleDAGMI: Concrete implementation that supports both top-down and bottom-up scheduling with live interval update. It divides the region into three zones: Top-scheduled, bottom-scheduled, and unscheduled. The ScheduleDAGMI constructor takes an instance of MachineSchedStrategy. This is currently a very simply interface that provides pickNode(), releaseTopNode(), releaseBottomNode(). The MachineScheduler is plugable at every level. 1. The pass itself is optional. Targets may disable or override it completely. For example, a target that implements global scheduling would need to override the standard pass. 2. Targets may create a MachineSchedRegistry instance that knows how to construct a custom ScheduleDAGInstrs. This is an easy way to reuse the standard scheduling pass and DAG builder. The driver will determine the scheduling region, but everything else is customizable. 3. New scheduling algorithms can be introduced by implementing new MachineSchedStrategies. We can add a target hook to select the strategy if needed. There's still plenty of opportunity to restructure the code and add customization points. Keep in mind that the goal for the target independent code is clarity and maintainability. Code reuse across custom schedulers is secondary and I'm not currently thinking of designing a template library for custom schedulers. As much as possible, the standard scheduling algorithm will be built from standalone utilities and data structures. The customizations that you describe would all be handled by providing a new MachineSchedStrategy. Start by composing your scheduler from the pieces that are available, e.g. HazardChecker, RegisterPressure... (There's not much value providing a scheduling queue abstraction on top of vector or priority_queue). From that point, we can improve the design. Here's what you can expect in the MachineScheduler: 1. Precise register pressure tracking with back off. I'm in the process of checking in the pieces for this. It should be usable sometime in the next couple of weeks. 2. Division of the target-defined resources into "interlocked" vs. "buffered". The HazardChecker will continue to handle the interlocks for the resources that the hardware handles in order. Buffered resources, which the hardware can handle out of order. These will be considered separately for scheduling priority. We will also make a distinction between minimum and expected operation latency. 3. A register pressure reducing scheduler that considers interlocks and register pressure. This will prioritize register reuse above the pressure limits and include a very simple heuristic for avoiding high register pressure situations. 4. A balanced scheduler that still prioritizes interlocks and register pressure limits, but balances buffered resource usage with register pressure avoidance. 5. Improved heuristics. For example, we can precompute register lineages and use that information to avoid high register pressure situations. The overarching goal of the standard scheduling algorithm is to handle the most important cases that the SDScheduler and PostRAScheduler handle today without unnecessarily shuffling instructions. This should result in more debugable generated code with more stable performance. The goal of the scheduler framework is to support the standard scheduler while providing a place for targets to introduce a custom scheduler, or simply any optimization that requires the scheduling DAG. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120423/e008fc18/attachment.html>
Andrew Trick <atrick at apple.com> writes:> We plan to move to the MachineScheduler by 3.2. The structure is:How hard will this be to backport to 3.1? Has woprk on this started yet?> ScheduleDAG: Abstract DAG of SUnits and SDeps > | > v > ScheduleDAGInstrs: Build the DAG from MachineInstrs, each SUnit tied to an MI > Delimit the current "region" of code being scheduled. > | > v > ScheduleDAGMI: Concrete implementation that supports both top-down and bottom-up scheduling > with live interval update. It divides the region into three zones: > Top-scheduled, bottom-scheduled, and unscheduled.So does this happen after SDNodes are converted to MachineInstrs? It's not clear to me given your description of ScheduleDAGInstrs. I assume it uses the current SDNode->SUnit mechanism but I just want to make sure.> 3. New scheduling algorithms can be introduced by implementing new > MachineSchedStrategies. We can add a target hook to select the > strategy if needed.This sounds like what we want and yes, it should be target-selectable.> There's still plenty of opportunity to restructure the code and add > customization points. Keep in mind that the goal for the target > independent code is clarity and maintainability. Code reuse across > custom schedulers is secondary and I'm not currently thinking of > designing a template library for custom schedulers.The two are not contradictory but for me I care less about the specific structure than I do about the flexibility to mix and match heuristics. It just struck me that templates and policy classes is the natural way to do it. I'm glad to hear the top-down scheduler will get some attention. We'll be wanting to use that.> As much as possible, the standard scheduling algorithm will be built > from standalone utilities and data structures. The customizations that > you describe would all be handled by providing a new > MachineSchedStrategy.Makes sense to me.> Start by composing your scheduler from the pieces that are available, > e.g. HazardChecker, RegisterPressure... (There's not much value > providing a scheduling queue abstraction on top of vector or > priority_queue).What do you mean by this last point? We absolutely want to be able to swap out different queue implementations. There is a significant compile time tradeoff to be made here.> 1. Precise register pressure tracking with back off. I'm in the > process of checking in the pieces for this. It should be usable > sometime in the next couple of weeks.Great!> 2. Division of the target-defined resources into "interlocked" > vs. "buffered". The HazardChecker will continue to handle the > interlocks for the resources that the hardware handles in > order.So by "interlocks" you mean hardware-implemented interlocks? So that the scheduler will attempt to avoid them. Not that we have a machine like this, but what about machines with no interlocks where software is responsible for correctness (VLIW, early MIPS, etc.)? I suppose they can be handled with a similar mechanism but the latter is much more strict than the former.> Buffered resources, which the hardware can handle out of order. These > will be considered separately for scheduling priority. We will also > make a distinction between minimum and expected operation latency.Does this mean you want to model the various queues, hardware scheduling windows, etc.? I'm just trying to get a clearer idea of what this means.> 3. A register pressure reducing scheduler that considers interlocks > and register pressure. This will prioritize register reuse above the > pressure limits and include a very simple heuristic for avoiding high > register pressure situations. > > 4. A balanced scheduler that still prioritizes interlocks and register > pressure limits, but balances buffered resource usage with register > pressure avoidance. > > 5. Improved heuristics. For example, we can precompute register > lineages and use that information to avoid high register pressure > situations.This all sounds great and is exactly what we're looking for. As I said, this is a time-critical thing for us. Is there any way I can help to move this along? Thanks! -Dave