thr3ads.net - llvm dev - [LLVMdev] scoreboard hazard det. and instruction groupings [Jun 2012]

If this information is useful, please help other people find it:
Share via:

Hal Finkel

2012-Jun-11 19:07 UTC

[LLVMdev] scoreboard hazard det. and instruction groupings

On Mon, 11 Jun 2012 10:48:18 -0700
Andrew Trick <atrick at apple.com> wrote:
> On Jun 11, 2012, at 9:30 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> > I'm considering writing more-detailed itineraries for some PowerPC
> > CPUs that use the 'traditional' instruction grouping scheme.
In
> > essence, this means that multiple instructions will stall in some
> > pipeline stage until a complete group is formed, then all will
> > continue.
> > 
> > I expect to provide CPU-specific code to help determine when the
> > currently-waiting instructions would form a group. Is there a
> > straightforward way that I could override the scoreboard hazard
> > detector (or layer on top of it) to insert this kind of logic?
> > 
> > Should I instead be looking to do something like what Hexagon does
> > for VLIW cores? I think the main difference between the PowerPC
> > scheme and a VLIW scheme, is that the CPU itself can form groups
> > internally, it is just more efficient if the groups are provided
> > pre-made. Maybe this difference, if it is one, is insignificant.
> 
> Hal, I think you're asking whether to use the
> ScheduleHazardRecognizer or DFAPacketizer. I suggest sticking with
> the hazard recognizer unless you have an important problem that can't
> be solved that way. It's the API used by most targets and doesn't
> require a custom scheduler. Although I don't want to stop you from
> generalizing the DFA work either if you feel compelled to do that.
I don't yet feel compelled, and I don't know much about the
DFAPacketizer. I just want something that will work cleanly ;)

Looking at VLIWPacketizerList::PacketizeMIs, it seems like the
instructions are first scheduled (via some external scheme?), and then
packetized 'in order'. Is that correct?
> 
> Ignoring compile time for a moment, I think an advantage of a DFA is
> modeling a situation where the hardware can assign resources to best
> fit the entire group rather then one instruction at a time. For
> example, if InstA requires either Unit0 or Unit1, and InstB requires
> Unit0, is {InstA, InstB} a valid group? Depending on your cpu, a DFA
> could either recognize that it's valid, or give you a chance to
> reorder the instructions within a group once they've been selected.
In the PowerPC grouping scheme, resources are assigned on a group
basis (by the instruction dispatching stages). However, once the group
is dispatched to the appropriate functional units, 'bypass' is still
available on an instruction-by-instruction basis to instructions in
later groups. Final writeback waits until all members of the group
complete.
> 
> Ideally, you can express your constraints using InstrStage itinerary
> entries. 
I don't see how, in the current scheme, to express that an instruction
must wait in FU0 until there are also waiting instructions in FU1, FU2
and FU3. Furthermore, there are certain constraints on what those
instructions can be, and which ones will move forward as the next
dispatched group, and I think we need to fallback into C++ to deal with
them.
> If not, then you need to do your own bookkeeping by saving
> extra state during EmitInstruction and checking for hazards in
> getHazardType. At this point, you need to decide whether your custom
> logic can be easily generalized to either top-down or bottom-up
> scheduling.
I think that it can be either. Within the current system, however, it
might need to be top down. To do bottom up, you'd need to have
look-ahead of the depths of the pipelines to emit grouping hazards
when considering the ends of the pipelines (although this may just be
for corner cases, I think that normal dependency analysis should catch
most of these).
> If not, you can force MISched to scheduling one either
> direction. SD scheduling is stuck with bottom-up for the remainder of
> its days, and postRA scheduling is top-down.
I would rather do something that will be easy to maintain going
forward, and so if that can be accomplished within the normal
framework, then that would be great.

I think that I'll try ignoring the issue for now, just use a normal
itinerary with bottom-up scheduling, and then the existing top-down
pass (which attempts to enforce some of the ordering constraints
(which are most severe on the G5s)). If that gives unsatisfactory
results, then we can think about something more involved.

Thanks again,
Hal
> 
> -Andy
-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Andrew Trick

2012-Jun-11 19:23 UTC

head link

[LLVMdev] scoreboard hazard det. and instruction groupings

On Jun 11, 2012, at 12:07 PM, Hal Finkel <hfinkel at anl.gov>
wrote:> Looking at VLIWPacketizerList::PacketizeMIs, it seems like the
> instructions are first scheduled (via some external scheme?), and then
> packetized 'in order'. Is that correct?
Anshu?
> In the PowerPC grouping scheme, resources are assigned on a group
> basis (by the instruction dispatching stages). However, once the group
> is dispatched to the appropriate functional units, 'bypass' is
still
> available on an instruction-by-instruction basis to instructions in
> later groups. Final writeback waits until all members of the group
> complete.
> 
>> 
>> Ideally, you can express your constraints using InstrStage itinerary
>> entries. 
> 
> I don't see how, in the current scheme, to express that an instruction
> must wait in FU0 until there are also waiting instructions in FU1, FU2
> and FU3. Furthermore, there are certain constraints on what those
> instructions can be, and which ones will move forward as the next
> dispatched group, and I think we need to fallback into C++ to deal with
> them.
Right. I should have mentioned that the static itinerary really can't
express dynamic constraints. You can play games by inventing new types of
FuncUnits. But there's no way to say an instruction hogs a pipeline until
some future event at unknown interval.

-Andy

Anshuman Dasgupta

2012-Jun-11 19:51 UTC

head link

[LLVMdev] scoreboard hazard det. and instruction groupings

On 6/11/2012 2:23 PM, Andrew Trick wrote:> On Jun 11, 2012, at 12:07 PM, Hal Finkel<hfinkel at anl.gov>  wrote:
>> Looking at VLIWPacketizerList::PacketizeMIs, it seems like the
>> instructions are first scheduled (via some external scheme?), and then
>> packetized 'in order'. Is that correct?
>
> Anshu?
Yes, we schedule first and then packetize in order.

-Anshu

Sergei Larin

2012-Jun-12 17:22 UTC

head link

[LLVMdev] Assert in live update from MI scheduler.

Hello everyone,

  I am working on a release based on the branch 3.1 version of code.
Unfortunately it has enough differences that exact rev does not apply.
I am hitting an assert in liveness update with seemingly trivial code
(attached).

/local/mnt/workspace/slarin/tools/llvm-mainline-merged/lib/CodeGen/LiveInter
valAnalysis.cpp:1078: void
llvm::LiveIntervals::HMEditor::moveAllRangesFrom(llvm::MachineInstr*,
llvm::SlotIndex): Assertion `validator.rangesOk() &&
"moveAllOperandsFrom
broke liveness."' failed.

The code being scheduled (function "push") is trivial:

# Machine code for function push: Post SSA
Function Live Outs: %R0

0B  BB#0: derived from LLVM BB %entry
16B     %vreg9<def> = TFRI_V4 <ga:@xx_stack>; IntRegs:%vreg9
        Successors according to CFG: BB#1

48B BB#1: derived from LLVM BB %for.cond
        Predecessors according to CFG: BB#0 BB#1
80B     %vreg1<def> = COPY %vreg10<kill>; IntRegs:%vreg1,%vreg10
96B     %vreg10<def> = LDriw %vreg9<kill>, 0; mem:LD4[%stack.0.in]
IntRegs:%vreg10,%vreg9
112B    %vreg9<def> = ADD_ri %vreg10, 8; IntRegs:%vreg9,%vreg10
128B    %vreg6<def> = CMPEQri %vreg10, 0; PredRegs:%vreg6 IntRegs:%vreg10
176B    JMP_cNot %vreg6<kill>, <BB#1>, %PC<imp-def>;
PredRegs:%vreg6
192B    JMP <BB#2>
        Successors according to CFG: BB#2 BB#1

208B    BB#2: derived from LLVM BB %for.end
        Predecessors according to CFG: BB#1
224B    %vreg7<def> = LDriw %vreg1<kill>, 0;
mem:LD4[%first1](tbaa=!"any
pointer") IntRegs:%vreg7,%vreg1
240B    STriw_GP <ga:@yy_instr>, 0, %vreg7<kill>;
mem:ST4[@yy_instr](tbaa=!"any pointer") IntRegs:%vreg7
256B    JMPR %PC<imp-def>, %R31<imp-use>, %R0<imp-use,undef>

Hexagon MI scheduler is working with BB1 and picks this load:

  %vreg10<def> = LDriw %vreg9<kill>, 0; mem:LD4[%stack.0.in]
IntRegs:%vreg10,%vreg9

To be scheduled first (!). Right there after 

7  clang           0x000000000226aece
llvm::LiveIntervals::handleMove(llvm::MachineInstr*) + 378
8  clang           0x0000000001c2574f
llvm::VLIWMachineScheduler::listScheduleTopDown() + 595
9  clang           0x0000000001c24cd5 llvm::VLIWMachineScheduler::schedule()
+ 505

It does not seem to happen on the trunk.

My question is - Does anyone recognizes the issue, and what patch(es) do I
need to address it. Since my release is based on 3.1, I have to cherry pick
them...
Any lead is highly appreciated.

Thanks.

Sergei  










-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_live.o.ll
Type: application/octet-stream
Size: 2706 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120612/4c78b2fa/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_live.i
Type: application/octet-stream
Size: 639 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120612/4c78b2fa/attachment-0001.obj>

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Jun 2012 - [LLVMdev] scoreboard hazard det. and instruction groupings

[LLVMdev] scoreboard hazard det. and instruction groupings

[LLVMdev] scoreboard hazard det. and instruction groupings

[LLVMdev] scoreboard hazard det. and instruction groupings

[LLVMdev] Assert in live update from MI scheduler.

Seemingly Similar Threads