thr3ads.net - llvm dev - [LLVMdev] specializing hybrid_ls_rr_sort (was: Re: Bottom-Up Scheduling?) [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Hal Finkel

2011-Dec-19 14:51 UTC

[LLVMdev] specializing hybrid_ls_rr_sort (was: Re: Bottom-Up Scheduling?)

On Tue, 2011-10-25 at 21:00 -0700, Andrew Trick wrote:
Now, to generate the best PPC schedules, there is one thing you
may> want to override. The scheduler's priority function has a
> HasReadyFilter attribute (enum). It can be overriden by specializing
> hybrid_ls_rr_sort. Setting this to "true" enables proper ILP
> scheduling, and maximizes the instructions that can issue in one
> group, regardless of register pressure. We still care about register
> pressure enough in ARM to avoid enabling this. I'm really not sure how
> much it will help on modern PPC implementations though.
> hybrid_ls_rr_sort
Can this be done without modifying common code? It looks like
hybrid_ls_rr_sort is local to ScheduleDAGRRList.cpp.

Thanks again,
Hal

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Andrew Trick

2011-Dec-19 15:41 UTC

head link

[LLVMdev] specializing hybrid_ls_rr_sort (was: Re: Bottom-Up Scheduling?)

On Dec 19, 2011, at 6:51 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> On Tue, 2011-10-25 at 21:00 -0700, Andrew Trick wrote:
> Now, to generate the best PPC schedules, there is one thing you may
>> want to override. The scheduler's priority function has a
>> HasReadyFilter attribute (enum). It can be overriden by specializing
>> hybrid_ls_rr_sort. Setting this to "true" enables proper ILP
>> scheduling, and maximizes the instructions that can issue in one
>> group, regardless of register pressure. We still care about register
>> pressure enough in ARM to avoid enabling this. I'm really not sure
how
>> much it will help on modern PPC implementations though.
>> hybrid_ls_rr_sort
> 
> Can this be done without modifying common code? It looks like
> hybrid_ls_rr_sort is local to ScheduleDAGRRList.cpp.
> 
> Thanks again,
> Hal
Right. You would need to specialize the priority queue logic. A small amount of
common code.
Andy

Hal Finkel

2011-Dec-19 23:19 UTC

head link

[LLVMdev] specializing hybrid_ls_rr_sort (was: Re: Bottom-Up Scheduling?)

On Mon, 2011-12-19 at 07:41 -0800, Andrew Trick wrote:> On Dec 19, 2011, at 6:51 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> > On Tue, 2011-10-25 at 21:00 -0700, Andrew Trick wrote:
> > Now, to generate the best PPC schedules, there is one thing you may
> >> want to override. The scheduler's priority function has a
> >> HasReadyFilter attribute (enum). It can be overriden by
specializing
> >> hybrid_ls_rr_sort. Setting this to "true" enables proper
ILP
> >> scheduling, and maximizes the instructions that can issue in one
> >> group, regardless of register pressure. We still care about
register
> >> pressure enough in ARM to avoid enabling this. I'm really not
sure how
> >> much it will help on modern PPC implementations though.
> >> hybrid_ls_rr_sort
> > 
> > Can this be done without modifying common code? It looks like
> > hybrid_ls_rr_sort is local to ScheduleDAGRRList.cpp.
> > 
> > Thanks again,
> > Hal
> 
> Right. You would need to specialize the priority queue logic. A small
amount of common code.
> Andy
Andy,

I played around with this some today for my PPC 440 chips. These are
embedded chips (multiple pipelines but in-order), and may be more
similar to your ARMs than to the PPC-970 style designs...

I was able to get reasonable PPC 440 code generation by using the ILP
scheduler pre-RA and then the post-RA scheduler with ANTIDEP_ALL (and my
load/store reordering patch). This worked significantly better than
using either hybrid or ilp alone (with or without setting
HasReadyFilter). I was looking at my primary use case which is
partially-unrolled loops with loads, stores and floating-point
calculations.

This seems to work b/c ILP first groups the instructions to extract
parallelism and then the post-RA scheduler breaks up the groups to avoid
stalls. This allows the scheduler to find its way out of what seems to
be a "local minimum" of sorts, whereby it wants to schedule each
unrolled iteration of the loop sequentially. The reason why this seems
to occur is that the hybrid scheduler would prefer to suffer a large
data-dependency delay over a shorter full-pipeline delay. Do you know
why it would do this? (you can see PR11589 for an example if you'd
like).

Regarding HasReadyFilter: HasReadyFilter just causes isReady() to be
used? Is there a reason that this is a compile-time constant? Both
Hybrid and ILP have isReady() functions. I can certainly propose a patch
to make them command-line options.

Thanks again,
Hal

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Dec 2011 - [LLVMdev] specializing hybrid_ls_rr_sort (was: Re: Bottom-Up Scheduling?)

[LLVMdev] specializing hybrid_ls_rr_sort (was: Re: Bottom-Up Scheduling?)

[LLVMdev] specializing hybrid_ls_rr_sort (was: Re: Bottom-Up Scheduling?)

[LLVMdev] specializing hybrid_ls_rr_sort (was: Re: Bottom-Up Scheduling?)

Possibly Parallel Threads