thr3ads.net - llvm dev - [LLVMdev] scoreboard hazard det. and instruction groupings [Jun 2012]

If this information is useful, please help other people find it:
Share via:

Hal Finkel

2012-Jun-11 16:30 UTC

[LLVMdev] scoreboard hazard det. and instruction groupings

I'm considering writing more-detailed itineraries for some PowerPC CPUs
that use the 'traditional' instruction grouping scheme. In essence,
this means that multiple instructions will stall in some pipeline stage
until a complete group is formed, then all will continue.

I expect to provide CPU-specific code to help determine when the
currently-waiting instructions would form a group. Is there a
straightforward way that I could override the scoreboard hazard
detector (or layer on top of it) to insert this kind of logic?

Should I instead be looking to do something like what Hexagon does
for VLIW cores? I think the main difference between the PowerPC scheme
and a VLIW scheme, is that the CPU itself can form groups internally,
it is just more efficient if the groups are provided pre-made. Maybe
this difference, if it is one, is insignificant.

Thanks again,
Hal

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Andrew Trick

2012-Jun-11 17:48 UTC

head link

[LLVMdev] scoreboard hazard det. and instruction groupings

On Jun 11, 2012, at 9:30 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> I'm considering writing more-detailed itineraries for some PowerPC CPUs
> that use the 'traditional' instruction grouping scheme. In essence,
> this means that multiple instructions will stall in some pipeline stage
> until a complete group is formed, then all will continue.
> 
> I expect to provide CPU-specific code to help determine when the
> currently-waiting instructions would form a group. Is there a
> straightforward way that I could override the scoreboard hazard
> detector (or layer on top of it) to insert this kind of logic?
> 
> Should I instead be looking to do something like what Hexagon does
> for VLIW cores? I think the main difference between the PowerPC scheme
> and a VLIW scheme, is that the CPU itself can form groups internally,
> it is just more efficient if the groups are provided pre-made. Maybe
> this difference, if it is one, is insignificant.
Hal, I think you're asking whether to use the ScheduleHazardRecognizer or
DFAPacketizer. I suggest sticking with the hazard recognizer unless you have an
important problem that can't be solved that way. It's the API used by
most targets and doesn't require a custom scheduler. Although I don't
want to stop you from generalizing the DFA work either if you feel compelled to
do that.

Ignoring compile time for a moment, I think an advantage of a DFA is modeling a
situation where the hardware can assign resources to best fit the entire group
rather then one instruction at a time. For example, if InstA requires either
Unit0 or Unit1, and InstB requires Unit0, is {InstA, InstB} a valid group?
Depending on your cpu, a DFA could either recognize that it's valid, or give
you a chance to reorder the instructions within a group once they've been
selected.

Ideally, you can express your constraints using InstrStage itinerary entries. If
not, then you need to do your own bookkeeping by saving extra state during
EmitInstruction and checking for hazards in getHazardType. At this point, you
need to decide whether your custom logic can be easily generalized to either
top-down or bottom-up scheduling. If not, you can force MISched to scheduling
one either direction. SD scheduling is stuck with bottom-up for the remainder of
its days, and postRA scheduling is top-down.

-Andy

Anshuman Dasgupta

2012-Jun-11 19:02 UTC

head link

[LLVMdev] scoreboard hazard det. and instruction groupings

Hal,

On 6/11/2012 12:48 PM, Andrew Trick wrote:> Ignoring compile time for a moment, I think an advantage of a DFA is
modeling a situation where the hardware can assign resources to best fit the
entire group rather then one instruction at a time. For example, if InstA
requires either Unit0 or Unit1, and InstB requires Unit0, is {InstA, InstB} a
valid group? Depending on your cpu, a DFA could either recognize that it's
valid, or give you a chance to reorder the instructions within a group once
they've been selected.
>
I would recommend the DFA mechanism as well from what you've described. 
It considers all permutations of mapping instructions to functional 
units. To add to what Andrew said, note that the DFA answers the 
question of whether there exists a legal mapping of a group of 
instructions to functional units. It does not, however, return a legal 
mapping. Will that be sufficient for what you want?

-Anshu

Hal Finkel

2012-Jun-11 19:07 UTC

head link

[LLVMdev] scoreboard hazard det. and instruction groupings

On Mon, 11 Jun 2012 10:48:18 -0700
Andrew Trick <atrick at apple.com> wrote:
> On Jun 11, 2012, at 9:30 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> > I'm considering writing more-detailed itineraries for some PowerPC
> > CPUs that use the 'traditional' instruction grouping scheme.
In
> > essence, this means that multiple instructions will stall in some
> > pipeline stage until a complete group is formed, then all will
> > continue.
> > 
> > I expect to provide CPU-specific code to help determine when the
> > currently-waiting instructions would form a group. Is there a
> > straightforward way that I could override the scoreboard hazard
> > detector (or layer on top of it) to insert this kind of logic?
> > 
> > Should I instead be looking to do something like what Hexagon does
> > for VLIW cores? I think the main difference between the PowerPC
> > scheme and a VLIW scheme, is that the CPU itself can form groups
> > internally, it is just more efficient if the groups are provided
> > pre-made. Maybe this difference, if it is one, is insignificant.
> 
> Hal, I think you're asking whether to use the
> ScheduleHazardRecognizer or DFAPacketizer. I suggest sticking with
> the hazard recognizer unless you have an important problem that can't
> be solved that way. It's the API used by most targets and doesn't
> require a custom scheduler. Although I don't want to stop you from
> generalizing the DFA work either if you feel compelled to do that.
I don't yet feel compelled, and I don't know much about the
DFAPacketizer. I just want something that will work cleanly ;)

Looking at VLIWPacketizerList::PacketizeMIs, it seems like the
instructions are first scheduled (via some external scheme?), and then
packetized 'in order'. Is that correct?
> 
> Ignoring compile time for a moment, I think an advantage of a DFA is
> modeling a situation where the hardware can assign resources to best
> fit the entire group rather then one instruction at a time. For
> example, if InstA requires either Unit0 or Unit1, and InstB requires
> Unit0, is {InstA, InstB} a valid group? Depending on your cpu, a DFA
> could either recognize that it's valid, or give you a chance to
> reorder the instructions within a group once they've been selected.
In the PowerPC grouping scheme, resources are assigned on a group
basis (by the instruction dispatching stages). However, once the group
is dispatched to the appropriate functional units, 'bypass' is still
available on an instruction-by-instruction basis to instructions in
later groups. Final writeback waits until all members of the group
complete.
> 
> Ideally, you can express your constraints using InstrStage itinerary
> entries. 
I don't see how, in the current scheme, to express that an instruction
must wait in FU0 until there are also waiting instructions in FU1, FU2
and FU3. Furthermore, there are certain constraints on what those
instructions can be, and which ones will move forward as the next
dispatched group, and I think we need to fallback into C++ to deal with
them.
> If not, then you need to do your own bookkeeping by saving
> extra state during EmitInstruction and checking for hazards in
> getHazardType. At this point, you need to decide whether your custom
> logic can be easily generalized to either top-down or bottom-up
> scheduling.
I think that it can be either. Within the current system, however, it
might need to be top down. To do bottom up, you'd need to have
look-ahead of the depths of the pipelines to emit grouping hazards
when considering the ends of the pipelines (although this may just be
for corner cases, I think that normal dependency analysis should catch
most of these).
> If not, you can force MISched to scheduling one either
> direction. SD scheduling is stuck with bottom-up for the remainder of
> its days, and postRA scheduling is top-down.
I would rather do something that will be easy to maintain going
forward, and so if that can be accomplished within the normal
framework, then that would be great.

I think that I'll try ignoring the issue for now, just use a normal
itinerary with bottom-up scheduling, and then the existing top-down
pass (which attempts to enforce some of the ordering constraints
(which are most severe on the G5s)). If that gives unsatisfactory
results, then we can think about something more involved.

Thanks again,
Hal
> 
> -Andy
-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Jun 2012 - [LLVMdev] scoreboard hazard det. and instruction groupings

[LLVMdev] scoreboard hazard det. and instruction groupings

[LLVMdev] scoreboard hazard det. and instruction groupings

[LLVMdev] scoreboard hazard det. and instruction groupings

[LLVMdev] scoreboard hazard det. and instruction groupings

Possibly Parallel Threads