Andrew Trick <atrick at apple.com> writes:> We plan to move to the MachineScheduler by 3.2. The structure is:How hard will this be to backport to 3.1? Has woprk on this started yet?> ScheduleDAG: Abstract DAG of SUnits and SDeps > | > v > ScheduleDAGInstrs: Build the DAG from MachineInstrs, each SUnit tied to an MI > Delimit the current "region" of code being scheduled. > | > v > ScheduleDAGMI: Concrete implementation that supports both top-down and bottom-up scheduling > with live interval update. It divides the region into three zones: > Top-scheduled, bottom-scheduled, and unscheduled.So does this happen after SDNodes are converted to MachineInstrs? It's not clear to me given your description of ScheduleDAGInstrs. I assume it uses the current SDNode->SUnit mechanism but I just want to make sure.> 3. New scheduling algorithms can be introduced by implementing new > MachineSchedStrategies. We can add a target hook to select the > strategy if needed.This sounds like what we want and yes, it should be target-selectable.> There's still plenty of opportunity to restructure the code and add > customization points. Keep in mind that the goal for the target > independent code is clarity and maintainability. Code reuse across > custom schedulers is secondary and I'm not currently thinking of > designing a template library for custom schedulers.The two are not contradictory but for me I care less about the specific structure than I do about the flexibility to mix and match heuristics. It just struck me that templates and policy classes is the natural way to do it. I'm glad to hear the top-down scheduler will get some attention. We'll be wanting to use that.> As much as possible, the standard scheduling algorithm will be built > from standalone utilities and data structures. The customizations that > you describe would all be handled by providing a new > MachineSchedStrategy.Makes sense to me.> Start by composing your scheduler from the pieces that are available, > e.g. HazardChecker, RegisterPressure... (There's not much value > providing a scheduling queue abstraction on top of vector or > priority_queue).What do you mean by this last point? We absolutely want to be able to swap out different queue implementations. There is a significant compile time tradeoff to be made here.> 1. Precise register pressure tracking with back off. I'm in the > process of checking in the pieces for this. It should be usable > sometime in the next couple of weeks.Great!> 2. Division of the target-defined resources into "interlocked" > vs. "buffered". The HazardChecker will continue to handle the > interlocks for the resources that the hardware handles in > order.So by "interlocks" you mean hardware-implemented interlocks? So that the scheduler will attempt to avoid them. Not that we have a machine like this, but what about machines with no interlocks where software is responsible for correctness (VLIW, early MIPS, etc.)? I suppose they can be handled with a similar mechanism but the latter is much more strict than the former.> Buffered resources, which the hardware can handle out of order. These > will be considered separately for scheduling priority. We will also > make a distinction between minimum and expected operation latency.Does this mean you want to model the various queues, hardware scheduling windows, etc.? I'm just trying to get a clearer idea of what this means.> 3. A register pressure reducing scheduler that considers interlocks > and register pressure. This will prioritize register reuse above the > pressure limits and include a very simple heuristic for avoiding high > register pressure situations. > > 4. A balanced scheduler that still prioritizes interlocks and register > pressure limits, but balances buffered resource usage with register > pressure avoidance. > > 5. Improved heuristics. For example, we can precompute register > lineages and use that information to avoid high register pressure > situations.This all sounds great and is exactly what we're looking for. As I said, this is a time-critical thing for us. Is there any way I can help to move this along? Thanks! -Dave
On Apr 24, 2012, at 8:59 AM, dag at cray.com wrote:> Andrew Trick <atrick at apple.com> writes: > >> We plan to move to the MachineScheduler by 3.2. The structure is: > > How hard will this be to backport to 3.1? Has woprk on this started > yet?In my previous message I outlined the steps that I would take to bring up the new scheduler. I'm about to checkin the register pressure reducing scheduler. The next step will be plugging in the target itinerary.>> ScheduleDAG: Abstract DAG of SUnits and SDeps >> | >> v >> ScheduleDAGInstrs: Build the DAG from MachineInstrs, each SUnit tied to an MI >> Delimit the current "region" of code being scheduled. >> | >> v >> ScheduleDAGMI: Concrete implementation that supports both top-down and bottom-up scheduling >> with live interval update. It divides the region into three zones: >> Top-scheduled, bottom-scheduled, and unscheduled. > > So does this happen after SDNodes are converted to MachineInstrs? It's > not clear to me given your description of ScheduleDAGInstrs. I assume > it uses the current SDNode->SUnit mechanism but I just want to make > sure.Machine scheduling occurs in the vicinity of register allocation. It uses the existing MachineInstr->SUnit mechanism.> ... > I'm glad to hear the top-down scheduler will get some attention. We'll > be wanting to use that.Out of curiosity what about top-down works better for your target?> >> Start by composing your scheduler from the pieces that are available, >> e.g. HazardChecker, RegisterPressure... (There's not much value >> providing a scheduling queue abstraction on top of vector or >> priority_queue). > > What do you mean by this last point? We absolutely want to be able to > swap out different queue implementations. There is a significant > compile time tradeoff to be made here.Use whatever data structure you like for your queue. I don't have plans to make a reusable one yet. They're not complicated.>> 2. Division of the target-defined resources into "interlocked" >> vs. "buffered". The HazardChecker will continue to handle the >> interlocks for the resources that the hardware handles in >> order. > > So by "interlocks" you mean hardware-implemented interlocks? So that > the scheduler will attempt to avoid them. Not that we have a machine > like this, but what about machines with no interlocks where software is > responsible for correctness (VLIW, early MIPS, etc.)? I suppose they > can be handled with a similar mechanism but the latter is much more > strict than the former.I'm not designing a mechanism for inserting nops to pad latency. If someone needs that, it's easy to add.>> Buffered resources, which the hardware can handle out of order. These >> will be considered separately for scheduling priority. We will also >> make a distinction between minimum and expected operation latency. > Does this mean you want to model the various queues, hardware scheduling > windows, etc.? I'm just trying to get a clearer idea of what this > means.I don't intend to directly model microarchitectural features of OOO processors at the level of buffer sizes. Although that could be done by a highly motivated developer. I do intend to allow a target to identify arbitrary categories of resources, how many are available in each cycle on average, and indicate which resources are used by an operation. I'll initially piggyback on the existing functional units to avoid rewriting target itineraries.> As I said, this is a time-critical thing for us. Is there any way I can > help to move this along?In general, fixing LLVM bugs and keeping build bots green is always helpful. As far as the scheduler, pieces are starting to fall into place. People are starting to use those pieces and contribute. This is a pluggable pass, so there's no reason you can't develop your own machine scheduler in parallel and gradually take advantage of features as I add them. Please expect to do a little code copy-pasting into your target until the infrastructure is mature enough to expose more target interfaces. I'm not going to waste time redesigning APIs until we have working components. It would be very useful to have people testing new features, finding bugs early, and hopefully fixing those bugs. I would also like people to give me interesting bits of code with performance issues that can work as test cases. That's hard if I can't run your target. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120508/3f036ae4/attachment.html>
Andrew Trick <atrick at apple.com> writes:> How hard will this be to backport to 3.1? Has woprk on this started > yet? > > In my previous message I outlined the steps that I would take to bring up the new scheduler. I'm about to checkin the register pressure reducing > scheduler. The next step will be plugging in the target itinerary.Ok, but that doesn't really answer the question. Do you have a feel for whether this can be backported relatively easily?> Machine scheduling occurs in the vicinity of register allocation. It uses the existing MachineInstr->SUnit mechanism.I'm not familiar with this mechanism. I thought that after ISel/ScheduleDAG we no longer have SUnits.> ... > I'm glad to hear the top-down scheduler will get some attention. We'll > be wanting to use that. > > Out of curiosity what about top-down works better for your target?It's easier to analyze certain constructs for heuristic purposes. For example, I posted a while back about the problem that the current scheduler moves calls around. It's easier to schedule those "in the right place" if you don't have to look back through chains to find them. The problem with bottom-up schedulers in general is that if you need some node to be scheduled in a certain spot, you have to make sure all its uses are scheduled "early" (that is, late in the static schedule) so you can even _see_ the node that might cause a problem (like a call). You don't know you have a potential problem until you've already scheduled a bunch of stuff in the wrong place. With a top-down scheduler you can see problem nodes earlier.> What do you mean by this last point? We absolutely want to be able to > swap out different queue implementations. There is a significant > compile time tradeoff to be made here. > > Use whatever data structure you like for your queue. I don't have > plans to make a reusable one yet. They're not complicated.That's fine but there have to be some hooks to plug in a different data structure.> I do intend to allow a target to identify arbitrary categories of > resources, how many are available in each cycle on average, and > indicate which resources are used by an operation. I'll initially > piggyback on the existing functional units to avoid rewriting target > itineraries.Ok.> As far as the scheduler, pieces are starting to fall into > place. People are starting to use those pieces and contribute. This is > a pluggable pass, so there's no reason you can't develop your own > machine scheduler in parallel and gradually take advantage of features > as I add them.That's a bit disappointing. It sounds like a lot of throwaway work. I understand it's a work in progress. I'll take a look at what's there and see where I might be of some help.> Please expect to do a little code copy-pasting into your target until > the infrastructure is mature enough to expose more target > interfaces. I'm not going to waste time redesigning APIs until we have > working components.Which APIs are you talking about?> It would be very useful to have people testing new features, finding > bugs early, and hopefully fixing those bugs. I would also like people > to give me interesting bits of code with performance issues that can > work as test cases. That's hard if I can't run your target.We just use x86, so you can almost certainly run it. But to do testing I'll need something I can backport to 3.1. We can't afford to develop against buggy trunk. We have to work off a release. -Dave