thr3ads.net - llvm dev - [LLVMdev] [RFC] Scheduler Rework [May 2012]

If this information is useful, please help other people find it:
Share via:

dag at cray.com

2012-Apr-24 15:59 UTC

[LLVMdev] [RFC] Scheduler Rework

Andrew Trick <atrick at apple.com> writes:
> We plan to move to the MachineScheduler by 3.2. The structure is:
How hard will this be to backport to 3.1?  Has woprk on this started
yet?
> ScheduleDAG: Abstract DAG of SUnits and SDeps
>   |
>   v
> ScheduleDAGInstrs: Build the DAG from MachineInstrs, each SUnit tied to an
MI
>                    Delimit the current "region" of code being
scheduled.
>   |
>   v
> ScheduleDAGMI: Concrete implementation that supports both top-down and
bottom-up scheduling
>                with live interval update. It divides the region into three
zones:
>                Top-scheduled, bottom-scheduled, and unscheduled.
So does this happen after SDNodes are converted to MachineInstrs?  It's
not clear to me given your description of ScheduleDAGInstrs.  I assume
it uses the current SDNode->SUnit mechanism but I just want to make
sure.
> 3. New scheduling algorithms can be introduced by implementing new
>    MachineSchedStrategies.  We can add a target hook to select the
>    strategy if needed.
This sounds like what we want and yes, it should be target-selectable.
> There's still plenty of opportunity to restructure the code and add
> customization points. Keep in mind that the goal for the target
> independent code is clarity and maintainability. Code reuse across
> custom schedulers is secondary and I'm not currently thinking of
> designing a template library for custom schedulers.
The two are not contradictory but for me I care less about the specific
structure than I do about the flexibility to mix and match heuristics.
It just struck me that templates and policy classes is the natural way
to do it.

I'm glad to hear the top-down scheduler will get some attention.  We'll
be wanting to use that.
> As much as possible, the standard scheduling algorithm will be built
> from standalone utilities and data structures. The customizations that
> you describe would all be handled by providing a new
> MachineSchedStrategy. 
Makes sense to me.
> Start by composing your scheduler from the pieces that are available,
> e.g.  HazardChecker, RegisterPressure... (There's not much value
> providing a scheduling queue abstraction on top of vector or
> priority_queue).
What do you mean by this last point?  We absolutely want to be able to
swap out different queue implementations.  There is a significant
compile time tradeoff to be made here.
> 1. Precise register pressure tracking with back off. I'm in the
> process of checking in the pieces for this. It should be usable
> sometime in the next couple of weeks.
Great!
> 2. Division of the target-defined resources into "interlocked"
> vs. "buffered". The HazardChecker will continue to handle the
> interlocks for the resources that the hardware handles in
> order. 
So by "interlocks" you mean hardware-implemented interlocks?  So that
the scheduler will attempt to avoid them.  Not that we have a machine
like this, but what about machines with no interlocks where software is
responsible for correctness (VLIW, early MIPS, etc.)?  I suppose they
can be handled with a similar mechanism but the latter is much more
strict than the former.
> Buffered resources, which the hardware can handle out of order. These
> will be considered separately for scheduling priority. We will also
> make a distinction between minimum and expected operation latency.
Does this mean you want to model the various queues, hardware scheduling
windows, etc.?  I'm just trying to get a clearer idea of what this
means.
> 3. A register pressure reducing scheduler that considers interlocks
> and register pressure. This will prioritize register reuse above the
> pressure limits and include a very simple heuristic for avoiding high
> register pressure situations.
>
> 4. A balanced scheduler that still prioritizes interlocks and register
> pressure limits, but balances buffered resource usage with register
> pressure avoidance.
>
> 5. Improved heuristics. For example, we can precompute register
> lineages and use that information to avoid high register pressure
> situations.
This all sounds great and is exactly what we're looking for.

As I said, this is a time-critical thing for us.  Is there any way I can
help to move this along?

Thanks!

                             -Dave

Andrew Trick

2012-May-09 02:06 UTC

head link

[LLVMdev] [RFC] Scheduler Rework

On Apr 24, 2012, at 8:59 AM, dag at cray.com wrote:
> Andrew Trick <atrick at apple.com> writes:
> 
>> We plan to move to the MachineScheduler by 3.2. The structure is:
> 
> How hard will this be to backport to 3.1?  Has woprk on this started
> yet?
In my previous message I outlined the steps that I would take to bring up the
new scheduler. I'm about to checkin the register pressure reducing
scheduler. The next step will be plugging in the target itinerary.
>> ScheduleDAG: Abstract DAG of SUnits and SDeps
>>  |
>>  v
>> ScheduleDAGInstrs: Build the DAG from MachineInstrs, each SUnit tied to
an MI
>>                   Delimit the current "region" of code being
scheduled.
>>  |
>>  v
>> ScheduleDAGMI: Concrete implementation that supports both top-down and
bottom-up scheduling
>>               with live interval update. It divides the region into
three zones:
>>               Top-scheduled, bottom-scheduled, and unscheduled.
> 
> So does this happen after SDNodes are converted to MachineInstrs?  It's
> not clear to me given your description of ScheduleDAGInstrs.  I assume
> it uses the current SDNode->SUnit mechanism but I just want to make
> sure.
Machine scheduling occurs in the vicinity of register allocation. It uses the
existing MachineInstr->SUnit mechanism.
> ...
> I'm glad to hear the top-down scheduler will get some attention. 
We'll
> be wanting to use that.
Out of curiosity what about top-down works better for your target?
> 
>> Start by composing your scheduler from the pieces that are available,
>> e.g.  HazardChecker, RegisterPressure... (There's not much value
>> providing a scheduling queue abstraction on top of vector or
>> priority_queue).
> 
> What do you mean by this last point?  We absolutely want to be able to
> swap out different queue implementations.  There is a significant
> compile time tradeoff to be made here.
Use whatever data structure you like for your queue. I don't have plans to
make a reusable one yet. They're not complicated.
>> 2. Division of the target-defined resources into
"interlocked"
>> vs. "buffered". The HazardChecker will continue to handle the
>> interlocks for the resources that the hardware handles in
>> order. 
> 
> So by "interlocks" you mean hardware-implemented interlocks?  So
that
> the scheduler will attempt to avoid them.  Not that we have a machine
> like this, but what about machines with no interlocks where software is
> responsible for correctness (VLIW, early MIPS, etc.)?  I suppose they
> can be handled with a similar mechanism but the latter is much more
> strict than the former.
I'm not designing a mechanism for inserting nops to pad latency. If someone
needs that, it's easy to add.
>> Buffered resources, which the hardware can handle out of order. These
>> will be considered separately for scheduling priority. We will also
>> make a distinction between minimum and expected operation latency.
> Does this mean you want to model the various queues, hardware scheduling
> windows, etc.?  I'm just trying to get a clearer idea of what this
> means.
I don't intend to directly model microarchitectural features of OOO
processors at the level of buffer sizes. Although that could be done by a highly
motivated developer.

I do intend to allow a target to identify arbitrary categories of resources, how
many are available in each cycle on average, and indicate which resources are
used by an operation. I'll initially piggyback on the existing functional
units to avoid rewriting target itineraries.
> As I said, this is a time-critical thing for us.  Is there any way I can
> help to move this along?
In general, fixing LLVM bugs and keeping build bots green is always helpful.

As far as the scheduler, pieces are starting to fall into place. People are
starting to use those pieces and contribute. This is a pluggable pass, so
there's no reason you can't develop your own machine scheduler in
parallel and gradually take advantage of features as I add them. Please expect
to do a little code copy-pasting into your target until the infrastructure is
mature enough to expose more target interfaces. I'm not going to waste time
redesigning APIs until we have working components.

It would be very useful to have people testing new features, finding bugs early,
and hopefully fixing those bugs. I would also like people to give me interesting
bits of code with performance issues that can work as test cases. That's
hard if I can't run your target.

-Andy


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120508/3f036ae4/attachment.html>

dag at cray.com

2012-May-09 15:16 UTC

head link

[LLVMdev] [RFC] Scheduler Rework

Andrew Trick <atrick at apple.com> writes:
>     How hard will this be to backport to 3.1?  Has woprk on this started
>     yet?
>
> In my previous message I outlined the steps that I would take to bring up
the new scheduler. I'm about to checkin the register pressure reducing
> scheduler. The next step will be plugging in the target itinerary.
Ok, but that doesn't really answer the question.  Do you have a feel for
whether this can be backported relatively easily?
> Machine scheduling occurs in the vicinity of register allocation. It uses
the existing MachineInstr->SUnit mechanism.
I'm not familiar with this mechanism.  I thought that after
ISel/ScheduleDAG we no longer have SUnits.
>     ...
>     I'm glad to hear the top-down scheduler will get some attention. 
We'll
>     be wanting to use that.
>
> Out of curiosity what about top-down works better for your target?
It's easier to analyze certain constructs for heuristic purposes.  For
example, I posted a while back about the problem that the current
scheduler moves calls around.  It's easier to schedule those "in the
right place" if you don't have to look back through chains to find
them.

The problem with bottom-up schedulers in general is that if you need
some node to be scheduled in a certain spot, you have to make sure all
its uses are scheduled "early" (that is, late in the static schedule)
so
you can even _see_ the node that might cause a problem (like a call).
You don't know you have a potential problem until you've already
scheduled a bunch of stuff in the wrong place.  With a top-down
scheduler you can see problem nodes earlier.
>     What do you mean by this last point?  We absolutely want to be able to
>     swap out different queue implementations.  There is a significant
>     compile time tradeoff to be made here.
>
> Use whatever data structure you like for your queue. I don't have
> plans to make a reusable one yet. They're not complicated.
That's fine but there have to be some hooks to plug in a different data
structure.
> I do intend to allow a target to identify arbitrary categories of
> resources, how many are available in each cycle on average, and
> indicate which resources are used by an operation. I'll initially
> piggyback on the existing functional units to avoid rewriting target
> itineraries.
Ok.
> As far as the scheduler, pieces are starting to fall into
> place. People are starting to use those pieces and contribute. This is
> a pluggable pass, so there's no reason you can't develop your own
> machine scheduler in parallel and gradually take advantage of features
> as I add them. 
That's a bit disappointing.  It sounds like a lot of throwaway work.  I
understand it's a work in progress.  I'll take a look at what's
there
and see where I might be of some help.
> Please expect to do a little code copy-pasting into your target until
> the infrastructure is mature enough to expose more target
> interfaces. I'm not going to waste time redesigning APIs until we have
> working components.
Which APIs are you talking about?
> It would be very useful to have people testing new features, finding
> bugs early, and hopefully fixing those bugs. I would also like people
> to give me interesting bits of code with performance issues that can
> work as test cases. That's hard if I can't run your target.
We just use x86, so you can almost certainly run it.  But to do testing
I'll need something I can backport to 3.1.  We can't afford to develop
against buggy trunk.  We have to work off a release.

                            -Dave

Seemingly Similar Threads

Search for more apparently analagous threads

llvm dev - May 2012 - [LLVMdev] [RFC] Scheduler Rework

[LLVMdev] [RFC] Scheduler Rework

[LLVMdev] [RFC] Scheduler Rework

[LLVMdev] [RFC] Scheduler Rework

Seemingly Similar Threads