thr3ads.net - llvm dev - [llvm-dev] Scheduler: modelling long register reservations? [May 2017]

If this information is useful, please help other people find it:
Share via:

Jonas Paulsson via llvm-dev

2017-Apr-12 07:25 UTC

[llvm-dev] Scheduler: modelling long register reservations?

Hi Nick,

ScheduleDAGInstrs::addPhysRegDeps(SUnit *SU, unsigned OperIdx) is the 
method that adds the edges with their latencies for Output dependencies 
(def -> def). It seems unfortunately that there currently isn't a way to 
specify latency for output deps with computeOperandLatency() or similar.

I am then thinking that one option might be to add a DAGMutator where 
you could manually set the latency of the anti-edge to 25, after the DAG 
has been built.

If you have a problem with subregs, did you try to model the stalling 
subreg def as defining the whole vector reg, while in the output 
adjusting the register operand text, or similar?

/Jonas

On 2017-04-10 19:50, Johnson, Nicholas Paul via llvm-dev
wrote:> (Thank you Alex Bradbury for publicizing this thread in the weekly)
>
> I'll update the thread with my partial solution.  I have introduced a
pseudo-instruction 'DontOverwriteFlexResult' as in Snippet1 (below). 
That instruction has no effect.  Then, I updated some instruction selection
patterns so that they wrap every occurrence of FXLV within a
DontOverwriteFlexResult pseudo-instruction (Snippet2, below).   The scheduler
will attempt to schedule the pseudo-instruction to satisfy the long latency. 
This extends the live-interval of the FXLV's result vector register, and
prevents the register allocator from prematurely overwriting subvectors of the
result register.
>
> This solution works in some cases, but doesn't yet support the case in
which the FXLV result is completely unused, since the
'DontOverwriteFlexResult' pseudo will get dead-code-eliminated.  I'm
planning on marking the pseudo as side-effecting to inhibit dead code
elimination, but still need a plan to prevent that from pessimizing the
scheduler.
>
> Nick Johnson
> D. E. Shaw Research
>
>
> // Snippet 1
> // Here is a fancy fake instruction which prevent the compiler
> // from clobbering all or part of a flex api instruction's result.
> let hasNoSchedulingInfo = 1, mayLoad=0, mayStore=0, hasSideEffects=0,
isAsCheapAsAMove=1  in
> {
>    def DontOverwriteFlexResults :
>      DesGCv3PseudoInst<
>        (outs VecRegs:$rd),
>        (ins  VecRegs:$rs),
>        "# DontOverwriteFlexResults_v4i32\t$rd",
>        []>
>    {
>      let Constraints = "$rd = $rs";
>    }
> }
>
> // Snippet 2
> def : Pat<
>    (v4i32 (Aligned16LoadFromFlex (i32 DesGCv3RegPlusInt26:$ptr) )),
>    (DontOverwriteFlexResults (v4i32 (FXLV_UNCOUNTED (i32
DesGCv3RegPlusInt26:$ptr) )))>;
>
>
>> -----Original Message-----
>> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
>> Johnson, Nicholas Paul via llvm-dev
>> Sent: Monday, April 03, 2017 3:38 PM
>> To: llvm-dev at lists.llvm.org
>> Subject: [llvm-dev] Scheduler: modelling long register reservations?
>>
>> Hello,
>>
>> My out-of-tree target features some high latency instructions
(let's call them
>> FXLV).  When an FXLV issues, it reserves its destination register and
>> execution continues; if a subsequent instruction attempts to read or
write
>> that register, the pipline will stall until the FXLV completes.  I have
>> attempted to encode this constraint in the machine scheduler (excerpt
at
>> bottom of email).  This solves half of the problem: the scheduler moves
any
>> instruction that reads the FXLV result register to a much later
position.
>>
>> However, this doesn't solve all of the problem.  In particular, the
scheduler
>> seems indifferent to an instruction which overwrites the FXLV's
result
>> register---including instructions which overwrite only one lane of the
vector
>> result.  Am I specifying the scheduling constraints incorrectly?  Can
llvm
>> support this kind of constraint?
>>
>> Thank you,
>> Nick Johnson
>> D. E. Shaw Research
>>
>>
>> // Excerpted from lib/Target/MyTarget/MyTargetSchedule.td:
>> //
>> def DesGCv3GenericModel : SchedMachineModel
>> {
>>   let IssueWidth = 1;
>>   let MicroOpBufferSize = 0;
>>
>>   let CompleteModel = 1;
>> }
>> // ...
>> def FlexU        : ProcResource<64> { let BufferSize = 1; }
>> def : WriteRes<IIFlexRead,   [FlexU]>          { let Latency =
25; let
>> ResourceCycles = [25]; }
>> class SchedFlexRead    : Sched< [IIFlexRead] >; // I apply this
to the definition
>> of FXLV instruction
>> // ...
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Andrew Trick via llvm-dev

2017-May-22 22:29 UTC

head link

[llvm-dev] Scheduler: modelling long register reservations?

Wow, this was in the digest and I still missed it! Anyway, for future reference…

The scheduler has bits and pieces of in-order support. In this case, the DAG
builder assumes that the WAW instructions are fully pipelined and take the same
latency, hence the one-cycle edge:

unsigned TargetSchedModel::
computeOutputLatency(const MachineInstr *DefMI, unsigned DefOperIdx,
                     const MachineInstr *DepMI) const {
  if (!SchedModel.isOutOfOrder())
    return 1;

It seems perfectly reasonable to me to use the difference in latency between the
two instructions (when the first instruction has higher latency), plus one
cycle.

Note that if the second, dependent instruction is also high latency, but uses
different resources, you don’t want to delay it.

-Andy
> On Apr 12, 2017, at 12:25 AM, Jonas Paulsson via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi Nick,
> 
> ScheduleDAGInstrs::addPhysRegDeps(SUnit *SU, unsigned OperIdx) is the
method that adds the edges with their latencies for Output dependencies (def
-> def). It seems unfortunately that there currently isn't a way to
specify latency for output deps with computeOperandLatency() or similar.
> 
> I am then thinking that one option might be to add a DAGMutator where you
could manually set the latency of the anti-edge to 25, after the DAG has been
built.
> 
> If you have a problem with subregs, did you try to model the stalling
subreg def as defining the whole vector reg, while in the output adjusting the
register operand text, or similar?
> 
> /Jonas
> 
> On 2017-04-10 19:50, Johnson, Nicholas Paul via llvm-dev wrote:
>> (Thank you Alex Bradbury for publicizing this thread in the weekly)
>> 
>> I'll update the thread with my partial solution.  I have introduced
a pseudo-instruction 'DontOverwriteFlexResult' as in Snippet1 (below). 
That instruction has no effect.  Then, I updated some instruction selection
patterns so that they wrap every occurrence of FXLV within a
DontOverwriteFlexResult pseudo-instruction (Snippet2, below).   The scheduler
will attempt to schedule the pseudo-instruction to satisfy the long latency. 
This extends the live-interval of the FXLV's result vector register, and
prevents the register allocator from prematurely overwriting subvectors of the
result register.
>> 
>> This solution works in some cases, but doesn't yet support the case
in which the FXLV result is completely unused, since the
'DontOverwriteFlexResult' pseudo will get dead-code-eliminated.  I'm
planning on marking the pseudo as side-effecting to inhibit dead code
elimination, but still need a plan to prevent that from pessimizing the
scheduler.
>> 
>> Nick Johnson
>> D. E. Shaw Research
>> 
>> 
>> // Snippet 1
>> // Here is a fancy fake instruction which prevent the compiler
>> // from clobbering all or part of a flex api instruction's result.
>> let hasNoSchedulingInfo = 1, mayLoad=0, mayStore=0, hasSideEffects=0,
isAsCheapAsAMove=1  in
>> {
>>   def DontOverwriteFlexResults :
>>     DesGCv3PseudoInst<
>>       (outs VecRegs:$rd),
>>       (ins  VecRegs:$rs),
>>       "# DontOverwriteFlexResults_v4i32\t$rd",
>>       []>
>>   {
>>     let Constraints = "$rd = $rs";
>>   }
>> }
>> 
>> // Snippet 2
>> def : Pat<
>>   (v4i32 (Aligned16LoadFromFlex (i32 DesGCv3RegPlusInt26:$ptr) )),
>>   (DontOverwriteFlexResults (v4i32 (FXLV_UNCOUNTED (i32
DesGCv3RegPlusInt26:$ptr) )))>;
>> 
>> 
>>> -----Original Message-----
>>> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On
Behalf Of
>>> Johnson, Nicholas Paul via llvm-dev
>>> Sent: Monday, April 03, 2017 3:38 PM
>>> To: llvm-dev at lists.llvm.org
>>> Subject: [llvm-dev] Scheduler: modelling long register
reservations?
>>> 
>>> Hello,
>>> 
>>> My out-of-tree target features some high latency instructions
(let's call them
>>> FXLV).  When an FXLV issues, it reserves its destination register
and
>>> execution continues; if a subsequent instruction attempts to read
or write
>>> that register, the pipline will stall until the FXLV completes.  I
have
>>> attempted to encode this constraint in the machine scheduler
(excerpt at
>>> bottom of email).  This solves half of the problem: the scheduler
moves any
>>> instruction that reads the FXLV result register to a much later
position.
>>> 
>>> However, this doesn't solve all of the problem.  In particular,
the scheduler
>>> seems indifferent to an instruction which overwrites the FXLV's
result
>>> register---including instructions which overwrite only one lane of
the vector
>>> result.  Am I specifying the scheduling constraints incorrectly? 
Can llvm
>>> support this kind of constraint?
>>> 
>>> Thank you,
>>> Nick Johnson
>>> D. E. Shaw Research
>>> 
>>> 
>>> // Excerpted from lib/Target/MyTarget/MyTargetSchedule.td:
>>> //
>>> def DesGCv3GenericModel : SchedMachineModel
>>> {
>>>  let IssueWidth = 1;
>>>  let MicroOpBufferSize = 0;
>>> 
>>>  let CompleteModel = 1;
>>> }
>>> // ...
>>> def FlexU        : ProcResource<64> { let BufferSize = 1; }
>>> def : WriteRes<IIFlexRead,   [FlexU]>          { let Latency
= 25; let
>>> ResourceCycles = [25]; }
>>> class SchedFlexRead    : Sched< [IIFlexRead] >; // I apply
this to the definition
>>> of FXLV instruction
>>> // ...
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - May 2017 - Scheduler: modelling long register reservations?

[llvm-dev] Scheduler: modelling long register reservations?

[llvm-dev] Scheduler: modelling long register reservations?