Hal Finkel
2012-Jun-11 16:30 UTC
[LLVMdev] scoreboard hazard det. and instruction groupings
I'm considering writing more-detailed itineraries for some PowerPC CPUs that use the 'traditional' instruction grouping scheme. In essence, this means that multiple instructions will stall in some pipeline stage until a complete group is formed, then all will continue. I expect to provide CPU-specific code to help determine when the currently-waiting instructions would form a group. Is there a straightforward way that I could override the scoreboard hazard detector (or layer on top of it) to insert this kind of logic? Should I instead be looking to do something like what Hexagon does for VLIW cores? I think the main difference between the PowerPC scheme and a VLIW scheme, is that the CPU itself can form groups internally, it is just more efficient if the groups are provided pre-made. Maybe this difference, if it is one, is insignificant. Thanks again, Hal -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Andrew Trick
2012-Jun-11 17:48 UTC
[LLVMdev] scoreboard hazard det. and instruction groupings
On Jun 11, 2012, at 9:30 AM, Hal Finkel <hfinkel at anl.gov> wrote:> I'm considering writing more-detailed itineraries for some PowerPC CPUs > that use the 'traditional' instruction grouping scheme. In essence, > this means that multiple instructions will stall in some pipeline stage > until a complete group is formed, then all will continue. > > I expect to provide CPU-specific code to help determine when the > currently-waiting instructions would form a group. Is there a > straightforward way that I could override the scoreboard hazard > detector (or layer on top of it) to insert this kind of logic? > > Should I instead be looking to do something like what Hexagon does > for VLIW cores? I think the main difference between the PowerPC scheme > and a VLIW scheme, is that the CPU itself can form groups internally, > it is just more efficient if the groups are provided pre-made. Maybe > this difference, if it is one, is insignificant.Hal, I think you're asking whether to use the ScheduleHazardRecognizer or DFAPacketizer. I suggest sticking with the hazard recognizer unless you have an important problem that can't be solved that way. It's the API used by most targets and doesn't require a custom scheduler. Although I don't want to stop you from generalizing the DFA work either if you feel compelled to do that. Ignoring compile time for a moment, I think an advantage of a DFA is modeling a situation where the hardware can assign resources to best fit the entire group rather then one instruction at a time. For example, if InstA requires either Unit0 or Unit1, and InstB requires Unit0, is {InstA, InstB} a valid group? Depending on your cpu, a DFA could either recognize that it's valid, or give you a chance to reorder the instructions within a group once they've been selected. Ideally, you can express your constraints using InstrStage itinerary entries. If not, then you need to do your own bookkeeping by saving extra state during EmitInstruction and checking for hazards in getHazardType. At this point, you need to decide whether your custom logic can be easily generalized to either top-down or bottom-up scheduling. If not, you can force MISched to scheduling one either direction. SD scheduling is stuck with bottom-up for the remainder of its days, and postRA scheduling is top-down. -Andy
Anshuman Dasgupta
2012-Jun-11 19:02 UTC
[LLVMdev] scoreboard hazard det. and instruction groupings
Hal, On 6/11/2012 12:48 PM, Andrew Trick wrote:> Ignoring compile time for a moment, I think an advantage of a DFA is modeling a situation where the hardware can assign resources to best fit the entire group rather then one instruction at a time. For example, if InstA requires either Unit0 or Unit1, and InstB requires Unit0, is {InstA, InstB} a valid group? Depending on your cpu, a DFA could either recognize that it's valid, or give you a chance to reorder the instructions within a group once they've been selected. >I would recommend the DFA mechanism as well from what you've described. It considers all permutations of mapping instructions to functional units. To add to what Andrew said, note that the DFA answers the question of whether there exists a legal mapping of a group of instructions to functional units. It does not, however, return a legal mapping. Will that be sufficient for what you want? -Anshu
Hal Finkel
2012-Jun-11 19:07 UTC
[LLVMdev] scoreboard hazard det. and instruction groupings
On Mon, 11 Jun 2012 10:48:18 -0700 Andrew Trick <atrick at apple.com> wrote:> On Jun 11, 2012, at 9:30 AM, Hal Finkel <hfinkel at anl.gov> wrote: > > > I'm considering writing more-detailed itineraries for some PowerPC > > CPUs that use the 'traditional' instruction grouping scheme. In > > essence, this means that multiple instructions will stall in some > > pipeline stage until a complete group is formed, then all will > > continue. > > > > I expect to provide CPU-specific code to help determine when the > > currently-waiting instructions would form a group. Is there a > > straightforward way that I could override the scoreboard hazard > > detector (or layer on top of it) to insert this kind of logic? > > > > Should I instead be looking to do something like what Hexagon does > > for VLIW cores? I think the main difference between the PowerPC > > scheme and a VLIW scheme, is that the CPU itself can form groups > > internally, it is just more efficient if the groups are provided > > pre-made. Maybe this difference, if it is one, is insignificant. > > Hal, I think you're asking whether to use the > ScheduleHazardRecognizer or DFAPacketizer. I suggest sticking with > the hazard recognizer unless you have an important problem that can't > be solved that way. It's the API used by most targets and doesn't > require a custom scheduler. Although I don't want to stop you from > generalizing the DFA work either if you feel compelled to do that.I don't yet feel compelled, and I don't know much about the DFAPacketizer. I just want something that will work cleanly ;) Looking at VLIWPacketizerList::PacketizeMIs, it seems like the instructions are first scheduled (via some external scheme?), and then packetized 'in order'. Is that correct?> > Ignoring compile time for a moment, I think an advantage of a DFA is > modeling a situation where the hardware can assign resources to best > fit the entire group rather then one instruction at a time. For > example, if InstA requires either Unit0 or Unit1, and InstB requires > Unit0, is {InstA, InstB} a valid group? Depending on your cpu, a DFA > could either recognize that it's valid, or give you a chance to > reorder the instructions within a group once they've been selected.In the PowerPC grouping scheme, resources are assigned on a group basis (by the instruction dispatching stages). However, once the group is dispatched to the appropriate functional units, 'bypass' is still available on an instruction-by-instruction basis to instructions in later groups. Final writeback waits until all members of the group complete.> > Ideally, you can express your constraints using InstrStage itinerary > entries.I don't see how, in the current scheme, to express that an instruction must wait in FU0 until there are also waiting instructions in FU1, FU2 and FU3. Furthermore, there are certain constraints on what those instructions can be, and which ones will move forward as the next dispatched group, and I think we need to fallback into C++ to deal with them.> If not, then you need to do your own bookkeeping by saving > extra state during EmitInstruction and checking for hazards in > getHazardType. At this point, you need to decide whether your custom > logic can be easily generalized to either top-down or bottom-up > scheduling.I think that it can be either. Within the current system, however, it might need to be top down. To do bottom up, you'd need to have look-ahead of the depths of the pipelines to emit grouping hazards when considering the ends of the pipelines (although this may just be for corner cases, I think that normal dependency analysis should catch most of these).> If not, you can force MISched to scheduling one either > direction. SD scheduling is stuck with bottom-up for the remainder of > its days, and postRA scheduling is top-down.I would rather do something that will be easy to maintain going forward, and so if that can be accomplished within the normal framework, then that would be great. I think that I'll try ignoring the issue for now, just use a normal itinerary with bottom-up scheduling, and then the existing top-down pass (which attempts to enforce some of the ordering constraints (which are most severe on the G5s)). If that gives unsatisfactory results, then we can think about something more involved. Thanks again, Hal> > -Andy-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Possibly Parallel Threads
- [LLVMdev] scoreboard hazard det. and instruction groupings
- [LLVMdev] scoreboard hazard det. and instruction groupings
- [LLVMdev] scoreboard hazard det. and instruction groupings
- [LLVMdev] scoreboard hazard det. and instruction groupings
- [LLVMdev] [llvm-commits] [PATCH] Refactoring the DFA generator