Hal Finkel
2012-Jun-11 19:07 UTC
[LLVMdev] scoreboard hazard det. and instruction groupings
On Mon, 11 Jun 2012 10:48:18 -0700 Andrew Trick <atrick at apple.com> wrote:> On Jun 11, 2012, at 9:30 AM, Hal Finkel <hfinkel at anl.gov> wrote: > > > I'm considering writing more-detailed itineraries for some PowerPC > > CPUs that use the 'traditional' instruction grouping scheme. In > > essence, this means that multiple instructions will stall in some > > pipeline stage until a complete group is formed, then all will > > continue. > > > > I expect to provide CPU-specific code to help determine when the > > currently-waiting instructions would form a group. Is there a > > straightforward way that I could override the scoreboard hazard > > detector (or layer on top of it) to insert this kind of logic? > > > > Should I instead be looking to do something like what Hexagon does > > for VLIW cores? I think the main difference between the PowerPC > > scheme and a VLIW scheme, is that the CPU itself can form groups > > internally, it is just more efficient if the groups are provided > > pre-made. Maybe this difference, if it is one, is insignificant. > > Hal, I think you're asking whether to use the > ScheduleHazardRecognizer or DFAPacketizer. I suggest sticking with > the hazard recognizer unless you have an important problem that can't > be solved that way. It's the API used by most targets and doesn't > require a custom scheduler. Although I don't want to stop you from > generalizing the DFA work either if you feel compelled to do that.I don't yet feel compelled, and I don't know much about the DFAPacketizer. I just want something that will work cleanly ;) Looking at VLIWPacketizerList::PacketizeMIs, it seems like the instructions are first scheduled (via some external scheme?), and then packetized 'in order'. Is that correct?> > Ignoring compile time for a moment, I think an advantage of a DFA is > modeling a situation where the hardware can assign resources to best > fit the entire group rather then one instruction at a time. For > example, if InstA requires either Unit0 or Unit1, and InstB requires > Unit0, is {InstA, InstB} a valid group? Depending on your cpu, a DFA > could either recognize that it's valid, or give you a chance to > reorder the instructions within a group once they've been selected.In the PowerPC grouping scheme, resources are assigned on a group basis (by the instruction dispatching stages). However, once the group is dispatched to the appropriate functional units, 'bypass' is still available on an instruction-by-instruction basis to instructions in later groups. Final writeback waits until all members of the group complete.> > Ideally, you can express your constraints using InstrStage itinerary > entries.I don't see how, in the current scheme, to express that an instruction must wait in FU0 until there are also waiting instructions in FU1, FU2 and FU3. Furthermore, there are certain constraints on what those instructions can be, and which ones will move forward as the next dispatched group, and I think we need to fallback into C++ to deal with them.> If not, then you need to do your own bookkeeping by saving > extra state during EmitInstruction and checking for hazards in > getHazardType. At this point, you need to decide whether your custom > logic can be easily generalized to either top-down or bottom-up > scheduling.I think that it can be either. Within the current system, however, it might need to be top down. To do bottom up, you'd need to have look-ahead of the depths of the pipelines to emit grouping hazards when considering the ends of the pipelines (although this may just be for corner cases, I think that normal dependency analysis should catch most of these).> If not, you can force MISched to scheduling one either > direction. SD scheduling is stuck with bottom-up for the remainder of > its days, and postRA scheduling is top-down.I would rather do something that will be easy to maintain going forward, and so if that can be accomplished within the normal framework, then that would be great. I think that I'll try ignoring the issue for now, just use a normal itinerary with bottom-up scheduling, and then the existing top-down pass (which attempts to enforce some of the ordering constraints (which are most severe on the G5s)). If that gives unsatisfactory results, then we can think about something more involved. Thanks again, Hal> > -Andy-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Andrew Trick
2012-Jun-11 19:23 UTC
[LLVMdev] scoreboard hazard det. and instruction groupings
On Jun 11, 2012, at 12:07 PM, Hal Finkel <hfinkel at anl.gov> wrote:> Looking at VLIWPacketizerList::PacketizeMIs, it seems like the> instructions are first scheduled (via some external scheme?), and then > packetized 'in order'. Is that correct?Anshu?> In the PowerPC grouping scheme, resources are assigned on a group > basis (by the instruction dispatching stages). However, once the group > is dispatched to the appropriate functional units, 'bypass' is still > available on an instruction-by-instruction basis to instructions in > later groups. Final writeback waits until all members of the group > complete. > >> >> Ideally, you can express your constraints using InstrStage itinerary >> entries. > > I don't see how, in the current scheme, to express that an instruction > must wait in FU0 until there are also waiting instructions in FU1, FU2 > and FU3. Furthermore, there are certain constraints on what those > instructions can be, and which ones will move forward as the next > dispatched group, and I think we need to fallback into C++ to deal with > them.Right. I should have mentioned that the static itinerary really can't express dynamic constraints. You can play games by inventing new types of FuncUnits. But there's no way to say an instruction hogs a pipeline until some future event at unknown interval. -Andy
Anshuman Dasgupta
2012-Jun-11 19:51 UTC
[LLVMdev] scoreboard hazard det. and instruction groupings
On 6/11/2012 2:23 PM, Andrew Trick wrote:> On Jun 11, 2012, at 12:07 PM, Hal Finkel<hfinkel at anl.gov> wrote: >> Looking at VLIWPacketizerList::PacketizeMIs, it seems like the >> instructions are first scheduled (via some external scheme?), and then >> packetized 'in order'. Is that correct? > > Anshu?Yes, we schedule first and then packetize in order. -Anshu
Hello everyone, I am working on a release based on the branch 3.1 version of code. Unfortunately it has enough differences that exact rev does not apply. I am hitting an assert in liveness update with seemingly trivial code (attached). /local/mnt/workspace/slarin/tools/llvm-mainline-merged/lib/CodeGen/LiveInter valAnalysis.cpp:1078: void llvm::LiveIntervals::HMEditor::moveAllRangesFrom(llvm::MachineInstr*, llvm::SlotIndex): Assertion `validator.rangesOk() && "moveAllOperandsFrom broke liveness."' failed. The code being scheduled (function "push") is trivial: # Machine code for function push: Post SSA Function Live Outs: %R0 0B BB#0: derived from LLVM BB %entry 16B %vreg9<def> = TFRI_V4 <ga:@xx_stack>; IntRegs:%vreg9 Successors according to CFG: BB#1 48B BB#1: derived from LLVM BB %for.cond Predecessors according to CFG: BB#0 BB#1 80B %vreg1<def> = COPY %vreg10<kill>; IntRegs:%vreg1,%vreg10 96B %vreg10<def> = LDriw %vreg9<kill>, 0; mem:LD4[%stack.0.in] IntRegs:%vreg10,%vreg9 112B %vreg9<def> = ADD_ri %vreg10, 8; IntRegs:%vreg9,%vreg10 128B %vreg6<def> = CMPEQri %vreg10, 0; PredRegs:%vreg6 IntRegs:%vreg10 176B JMP_cNot %vreg6<kill>, <BB#1>, %PC<imp-def>; PredRegs:%vreg6 192B JMP <BB#2> Successors according to CFG: BB#2 BB#1 208B BB#2: derived from LLVM BB %for.end Predecessors according to CFG: BB#1 224B %vreg7<def> = LDriw %vreg1<kill>, 0; mem:LD4[%first1](tbaa=!"any pointer") IntRegs:%vreg7,%vreg1 240B STriw_GP <ga:@yy_instr>, 0, %vreg7<kill>; mem:ST4[@yy_instr](tbaa=!"any pointer") IntRegs:%vreg7 256B JMPR %PC<imp-def>, %R31<imp-use>, %R0<imp-use,undef> Hexagon MI scheduler is working with BB1 and picks this load: %vreg10<def> = LDriw %vreg9<kill>, 0; mem:LD4[%stack.0.in] IntRegs:%vreg10,%vreg9 To be scheduled first (!). Right there after 7 clang 0x000000000226aece llvm::LiveIntervals::handleMove(llvm::MachineInstr*) + 378 8 clang 0x0000000001c2574f llvm::VLIWMachineScheduler::listScheduleTopDown() + 595 9 clang 0x0000000001c24cd5 llvm::VLIWMachineScheduler::schedule() + 505 It does not seem to happen on the trunk. My question is - Does anyone recognizes the issue, and what patch(es) do I need to address it. Since my release is based on 3.1, I have to cherry pick them... Any lead is highly appreciated. Thanks. Sergei -------------- next part -------------- A non-text attachment was scrubbed... Name: test_live.o.ll Type: application/octet-stream Size: 2706 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120612/4c78b2fa/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: test_live.i Type: application/octet-stream Size: 639 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120612/4c78b2fa/attachment-0001.obj>