From: Andrew Trick [mailto:atrick at apple.com]
Sent: 24 January 2014 21:52
To: Daniel Sanders
Cc: LLVM Developers Mailing List (llvmdev at cs.uiuc.edu)
Subject: Re: New machine model questions
On Jan 24, 2014, at 2:21 AM, Daniel Sanders <Daniel.Sanders at
imgtec.com<mailto:Daniel.Sanders at imgtec.com>> wrote:
Hi Andrew,
I seem to be making good progress on the P5600 scheduler using the new machine
model but I've got a few questions about it.
Hi Daniel,
These are really good questions. For future reference, I might provide better
examples if you attach what you have so far for the model.
How would you represent an instruction that splits into two micro-ops and is
dispatched to two different reservation stations?
For example, I have two reservation stations (AGQ and FPQ). An FPU load
instruction is split into a load micro-op which is dispatched to AGQ and a
writeback micro-op which is dispatched to FPQ.
The AGQ micro-op is issued to a four-cycle latency pipeline called LDST. Three
cycles after issue, the LDST pipeline wakes up the FPQ micro-op, which writes
the result of the load back to the register file.
This question illustrates the primary difference between the per-operand machine
model and the itinerary. The itinerary directly models the stages of each
pipeline independently. Some backend maintainers may still want to use
itineraries if that level of precision is critical [1]. Another option is
extending the new model. [2]
I will assume that each queue is fully pipelined (4 ACQ ops can be in-flight).
Forcing all this information into a single SchedWriteRes def would look like
this:
def P5600FLD : SchedWriteRes <[P5600UnitAGQ, P5600UnitFP]> {
let Latency = 5; // 4 cycle load + 1 cycle FP writeback
let NumMicroOps = 2;
}
This is bad (for an in-order processor) because it prevents FPLoad + FPx from
being scheduled in the same cycle and fails to detect a conflict on FP ops 5
scheduled cycles ahead.
A better way to express it would be:
def P5600LD <[P5600UnitAGQ]> { let Latency = 4; }
def P5600FP <[P5600UnitFP]>;
def P5600FLD : WriteSequence<[P5600LD, P5600FP]>;
Unfortunately, the implementation currently aggregates the processor resources,
ignoring the fact that they are used on different cycles. This is totally
fixable [2]. However, I don't know why you would care, since an out-of-order
processor doing its job will make the stalls unpredictable either way.
Thanks. I'll start with the WriteSequence method and see if testing shows
that I need to go any further or not.
The two reservation stations don't seem to be completely independent of each
other for these split instructions. The wakeup signal used to wakeup the second
micro-op seems to be a demand that the micro-op issues in that cycle rather than
permission to issue when it's convenient.
Is it possible to use other instructions already scheduled for the same cycle as
part of the evaluation of a SchedPredicate in a SchedVariant?
I've got a class of instructions (mostly simple addition) that can dispatch
to two different reservation stations (ALQ and AGQ), both of which have a
suitable pipeline with the same latency. The dispatch stage can dispatch two
instructions per cycle. When it has one instruction from this class it
dispatches it to ALQ (this isn't strictly true but I'll come back to
that), and when it has two it dispatches one to ALQ and the other to AGQ.
No. The machine model is used to form a scheduling DAG independent of the
original schedule. If it's important to be this precise, then I suggest you
plugin a new MachineSchedStrategy where you can model stalls for any special
cases during scheduling.
You need a super-resource:
def P5600A : ProcResource<2>;
def P5600AGQ : ProcResource<1> { let Super = P5600A; }
def P5600ALQ : ProcResource<1> { let Super = P5600A; }
I'll take a look at MachineSchedStrategy. I don't know how important
that precision is likely to be at the moment but I've generally found that
the more accurate the machine description is, the harder it is to find one of
the bad cases. That experience comes from a particular in-order scheduler in a
proprietary compiler so I don't know if I can expect similar things from
LLVM or not. I'm expecting out-of-order to help reduce the amount of
precision that's needed for a good result but I don't know how much of a
reduction I can expect at the moment.
I'm not sure I fully understand the super-resource suggestion. I've
attached my WIP so you can take a look at the code in context but the relevant
extracts are below.
def P5600IssueALU : ProcResource<1>;
def P5600IssueAL2 : ProcResource<1>;
def P5600ALQ : ProcResGroup<[P5600IssueALU]> { let BufferSize = 16; }
def P5600AGQ : ProcResGroup<[P5600IssueAL2, ...]> {
let BufferSize = 16;
}
def P5600WriteALU : SchedWriteRes<[P5600IssueALU]>;
def P5600WriteAL2 : SchedWriteRes<[P5600IssueAL2]>;
def P5600WriteEitherALU : SchedWriteVariant<
[SchedVar<SchedPredicate<[{1}]>, [P5600WriteALU]>, // FIXME:
Predicate
SchedVar<SchedPredicate<[{0}]>, [P5600WriteAL2]> // FIXME:
Predicate
]>;
I believe you are suggesting that I change this to:
def P5600IssueEitherALU : ProcResource<2>;
def P5600IssueALU : ProcResource<1> { let Super = P5600IssueEitherALU; }
def P5600IssueAL2 : ProcResource<1> { let Super = P5600IssueEitherALU; }
def P5600ALQ : ProcResGroup<[P5600IssueALU]> { let BufferSize = 16; }
def P5600AGQ : ProcResGroup<[P5600IssueAL2, ...]> {
let BufferSize = 16;
}
def P5600WriteALU : SchedWriteRes<[P5600IssueALU]>;
def P5600WriteAL2 : SchedWriteRes<[P5600IssueAL2]>;
def P5600WriteEitherALU : SchedWriteRes<[P5600IssueEitherALU]>;
Instructions can then use P5600WriteEitherALU to pick between the two
sub-resources at issue time. One curious consequence of this is that by allowing
it to pick which pipeline the instruction is issued to, it effectively allows
the instruction to pick which reservation station to be dispatched to at
issue-time (which is backwards, normally dispatch determines the available
subset of pipelines). That might not be a significant issue as far as the
scheduler output is concerned but it seemed strange to me and it makes me doubt
that I've fully understood it.
One thing about the attached WIP. I'm using ItinRW and InstRW at the moment
but I'm planning on migrating the ItinRW's to InstRW. The reason I'm
not using the Sched<> class on each instruction is that I'm not
confident that there is a common set of SchedReadWrite def's that would make
sense on the full range of MIPS processor implementations. I'm going to have
another think about this once I'm nearer a complete scheduler for P5600.
Is it possible to use historical scheduling decisions as part of the evaluation
of a SchedPredicate in a SchedVariant?
I'm fairly certain the answer to this one is 'no' (because
scheduling can be performed in both directions) but I'll ask anyway. In the
previous question, I said that when the dispatch stage has one instruction that
can be dispatched to either ALQ or AGQ it always picks ALQ. The truth of the
matter is that historical decisions are used to guess which one is most likely
to stall and the dispatch stage picks the other one. I haven't established
exactly what information it's using yet though so I can't give a good
example.
SchedVariant is really just for opcodes that can use different resources/latency
depending on the value of some immediate.
The kind of micro-architectural special rules/heuristics that you are describing
are exactly why we have a plugable MachineSchedStrategy.
That makes sense.
Is there an easy way to check I've covered every valid instruction? I'm
thinking it would be helpful if I could get build warnings from tablegen about
valid instructions with no scheduling information. This would also prevent
someone adding an instruction later and forgetting to add it to the scheduler.
YES! Very good question.
When implementing a new model, it's important to run table-gen with
subtarget-emitter.
You should be able to touch your .td, then grab the command via make
TOOL_VERBOSE=1
This is the line from ARM:
llvm-tblgen -I /s/fix/lib/Target/ARM -I /s/fix/include -I /s/fix/include -I
/s/fix/lib/Target -gen-subtarget -o ARMGenSubtargetInfo.inc
/s/fix/lib/Target/ARM/ARM.td -debug-only=subtarget-emitter
It will list all instructions and print "No machine model for
<subtarget>"
You will also get an assert in the scheduler, unless you add the following flag
to your mode:
let CompleteModel = 0;
That's perfect, thanks.
Thanks
Daniel Sanders
Leading Software Design Engineer, MIPS Processor IP
Imagination Technologies Limited
www.imgtec.com<http://www.imgtec.com/>
[1] I added support for the itineraries into the new MI scheduler because I
realized that some out-of-tree backend maintainers may still want that level of
precision. I'm not sure yet whether you fall into that category. The new
machine model was designed for out-of-order processors, but I also think it is
sufficient for most in-order models. I would like to establish the new machine
model as the preferred choice because it is simpler and more efficient, it will
be easier for most backend developers to bring up a new subtarget, and we will
then eventually have more consistency across targets. I also selfishly want more
good in-tree examples of the new model so it will effectively be better
documented and supported.
I believe it is possible to handle special cases requiring the itinerary's
precision without using an itinerary by either pluging custom logic into the
MachineSchedStrategy, or extending the new machine model...
[2] To model in-order pipeline resource we could
- add a field to MCWriteProcResEntry
+ unsigned DelayCycles;
- Modify the table gen code in SubtargetEmitter to record the delay.
We already to this:
// If this resource is already used in this sequence, add the current
// entry's cycles so that the same resource appears to be used
// serially, rather than multiple parallel uses. This is important for
// in-order machine where the resource consumption is a hazard.
But we could do also add a delay to the resource cycles when the the
processor resource is unbuffered.
- The code in SchedBoundary::bumpNode and SchedBoundary::checkHazard
needs to be updated to increment the cycle accounting for DelayCycles.
-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140128/76c69b5b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MipsScheduleP5600.td
Type: application/octet-stream
Size: 12634 bytes
Desc: MipsScheduleP5600.td
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140128/76c69b5b/attachment.obj>