thr3ads.net - llvm dev - [llvm-dev] Instruction Itineraries: question about operand latencies [Jun 2016]

If this information is useful, please help other people find it:
Share via:

Phil Tomson via llvm-dev

2016-Jun-08 02:02 UTC

[llvm-dev] Instruction Itineraries: question about operand latencies

I overrode getInstrLatency and did some printing to see what is available
there. It looks like the registers are still virtual at that point when
getInstrLatency is called - is that correct? (we needed to make some
decisions based on actual registers that have been assigned since some
registers are reserved as address space pointers and we could vary the
latency based on which address space pointer register is being used - but
it looks like they're virtual there)

Phil

On Mon, Jun 6, 2016 at 3:10 PM, Ehsan Amiri <ehsanamiri at gmail.com>
wrote:
> Hi Phil
>
> There are some comments in
"include/llvm/Target/TargetItinerary.td" where
> class InstrItinData is defined.
>
>  B is the number of cycles after issue where the first operand of the
> instruction is defined. A is the number of cycles that the instruction will
> stay in that particular stage in the pipeline. So for simple cases, like
> your example, one would expect that A and B should have the same value.But
> there is different API for accessing to A and B.
>
> An example of accessing to B in the source code can be found here:
> PPCInstrInfo::getInstrLatency. You can also look at getStageLatency in
> include/llvm/MC/MCInstrItineraries.h. From this two you can probably find
> other relevant places.
>
> Hope this helps
> Ehsan
>
>
> On Mon, Jun 6, 2016 at 2:37 PM, Phil Tomson via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> In our architecture loads from certain memory locations take a long
time
>> to complete (on the order of 150 clock cycles). Since we don't have
a way
>> to tell at compile time if the address being loaded from lies in slow
or
>> fast memory, I've gone ahead and made all of the load numbers high,
such as:
>>
>>   InstrItinData< II_LOAD1,     [InstrStage<150, [AGU]>]>,
>>
>> However, I see that there is another field which I haven't
specified
>> where operand latencies are specified.  Here's an example from
>> ARMScheduleA8.td:
>>
>>   InstrItinData<IIC_iALUi ,[InstrStage<1, [A8_Pipe0,
A8_Pipe1]>], [2, 2]>,
>>
>> Now I'm wondering if Instead of what I had above, I should instead
have
>> specified:
>>
>>   InstrItinData< II_LOAD1,     [InstrStage<150,
[AGU]>],[150,1,1]>,
>>
>> ?
>>
>> but is that first '150' parameter there redundant? Since
it's specified
>> in the operand latency list ([150,1,1] - the first element of that
array
>> being the latency for the output)?
>>
>>
>> To clarify, for values of  'A' and 'B' below:
>>
>>   InstrItinData< II_LOAD1,     [InstrStage<A, [AGU]>],
[B,1,1]>,
>>
>> ...what is the difference in the meaning for 'A' and
'B'? Are they
>> essentially the same value since only one functional unit is specified?
>> ([AGU])
>>
>> Phil
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160607/9acfb95c/attachment.html>

Ehsan Amiri via llvm-dev

2016-Jun-08 03:57 UTC

head link

[llvm-dev] Instruction Itineraries: question about operand latencies

There are two scheduling passes. One is before register allocation and the
other one is after register allocation. You probably looked at the print
outs during first (pre-ra) scheduling pass. Start from
TargetPassConfig::addMachinePasses to find more details about code gen
passes.

On Tue, Jun 7, 2016 at 10:02 PM, Phil Tomson <phil.a.tomson at gmail.com>
wrote:
> I overrode getInstrLatency and did some printing to see what is available
> there. It looks like the registers are still virtual at that point when
> getInstrLatency is called - is that correct? (we needed to make some
> decisions based on actual registers that have been assigned since some
> registers are reserved as address space pointers and we could vary the
> latency based on which address space pointer register is being used - but
> it looks like they're virtual there)
>
> Phil
>
> On Mon, Jun 6, 2016 at 3:10 PM, Ehsan Amiri <ehsanamiri at gmail.com>
wrote:
>
>> Hi Phil
>>
>> There are some comments in
"include/llvm/Target/TargetItinerary.td" where
>> class InstrItinData is defined.
>>
>>  B is the number of cycles after issue where the first operand of the
>> instruction is defined. A is the number of cycles that the instruction
will
>> stay in that particular stage in the pipeline. So for simple cases,
like
>> your example, one would expect that A and B should have the same
value.But
>> there is different API for accessing to A and B.
>>
>> An example of accessing to B in the source code can be found here:
>> PPCInstrInfo::getInstrLatency. You can also look at getStageLatency in
>> include/llvm/MC/MCInstrItineraries.h. From this two you can probably
find
>> other relevant places.
>>
>> Hope this helps
>> Ehsan
>>
>>
>> On Mon, Jun 6, 2016 at 2:37 PM, Phil Tomson via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> In our architecture loads from certain memory locations take a long
time
>>> to complete (on the order of 150 clock cycles). Since we don't
have a way
>>> to tell at compile time if the address being loaded from lies in
slow or
>>> fast memory, I've gone ahead and made all of the load numbers
high, such as:
>>>
>>>   InstrItinData< II_LOAD1,     [InstrStage<150,
[AGU]>]>,
>>>
>>> However, I see that there is another field which I haven't
specified
>>> where operand latencies are specified.  Here's an example from
>>> ARMScheduleA8.td:
>>>
>>>   InstrItinData<IIC_iALUi ,[InstrStage<1, [A8_Pipe0,
A8_Pipe1]>], [2,
>>> 2]>,
>>>
>>> Now I'm wondering if Instead of what I had above, I should
instead have
>>> specified:
>>>
>>>   InstrItinData< II_LOAD1,     [InstrStage<150,
[AGU]>],[150,1,1]>,
>>>
>>> ?
>>>
>>> but is that first '150' parameter there redundant? Since
it's specified
>>> in the operand latency list ([150,1,1] - the first element of that
array
>>> being the latency for the output)?
>>>
>>>
>>> To clarify, for values of  'A' and 'B' below:
>>>
>>>   InstrItinData< II_LOAD1,     [InstrStage<A, [AGU]>],
[B,1,1]>,
>>>
>>> ...what is the difference in the meaning for 'A' and
'B'? Are they
>>> essentially the same value since only one functional unit is
specified?
>>> ([AGU])
>>>
>>> Phil
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160607/66a98e98/attachment.html>

Phil Tomson via llvm-dev

2016-Jun-08 22:13 UTC

head link

[llvm-dev] Instruction Itineraries: question about operand latencies

I did some looking around and found this in Passes.cpp:
// Temporary option to allow experimenting with MachineScheduler as a
post-RA
// scheduler. Targets can "properly" enable this with
// substitutePass(&PostRASchedulerID, &PostMachineSchedulerID); Ideally
it
// wouldn't be part of the standard pass pipeline, and the target would
just add
// a PostRA scheduling pass wherever it wants.
static cl::opt<bool> MISchedPostRA("misched-postra", cl::Hidden,
  cl::desc("Run MachineScheduler post regalloc (independent of preRA
sched)"));

So I added this to our target's passConfig subclass:
class XSTGPassConfig : public TargetPassConfig {
public:
    XSTGPassConfig(XSTGTargetMachine *TM, PassManagerBase &PM) :
        TargetPassConfig(TM, PM) {
           if (TM->getOptLevel() != CodeGenOpt::None)
              substitutePass(&PostRASchedulerID,
&PostMachineSchedulerID);
        }


Then built and ran clang on some code. I had added some couts to the
getInstrLatency to display the UseInstr. This is an example of what I see
on the output:
>>> DefInstr: %vreg34<def> = LOADI32_RI %vreg3, 268;mem:LD4[%3](tbaa=<0x75a64d8>) R32C:%vreg34 GPRC:%vreg3
dbg:pipeline_routing_be.c:223:3 @[ pipeline_routing_be.c:1120:1 @[
pipeline_routing_be.c:1120:1 ] ]
 Latency: 142 for: DefIdx= 0 UseIdx= 1
    UseInstr: %vreg34<def> = LOADI32_RI %vreg3, 268;
mem:LD4[%3](tbaa=<0x75a64d8>) R32C:%vreg34 GPRC:%vreg3
dbg:pipeline_routing_be.c:223:3 @[ pipeline_routing_be.c:1120:1 @[
pipeline_routing_be.c:1120:1 ] ]
%vreg35<def> = CVT_U32_TO_U64 %vreg34; GPRC:%vreg35 R32C:%vreg34
dbg:pipeline_routing_be.c:223:3 @[ pipeline_routing_be.c:1120:1 @[
pipeline_routing_be.c:1120:1 ] ]

...since I see vreg's mentioned there, I'm assuming this didn't run
postRA
as I would have expected.

(Our code is based on LLVM 3.6 if that's relevant)

Phil



On Tue, Jun 7, 2016 at 8:57 PM, Ehsan Amiri <ehsanamiri at gmail.com>
wrote:
> There are two scheduling passes. One is before register allocation and the
> other one is after register allocation. You probably looked at the print
> outs during first (pre-ra) scheduling pass. Start from
> TargetPassConfig::addMachinePasses to find more details about code gen
> passes.
>
> On Tue, Jun 7, 2016 at 10:02 PM, Phil Tomson <phil.a.tomson at
gmail.com>
> wrote:
>
>> I overrode getInstrLatency and did some printing to see what is
available
>> there. It looks like the registers are still virtual at that point when
>> getInstrLatency is called - is that correct? (we needed to make some
>> decisions based on actual registers that have been assigned since some
>> registers are reserved as address space pointers and we could vary the
>> latency based on which address space pointer register is being used -
but
>> it looks like they're virtual there)
>>
>> Phil
>>
>> On Mon, Jun 6, 2016 at 3:10 PM, Ehsan Amiri <ehsanamiri at
gmail.com> wrote:
>>
>>> Hi Phil
>>>
>>> There are some comments in
"include/llvm/Target/TargetItinerary.td"
>>> where class InstrItinData is defined.
>>>
>>>  B is the number of cycles after issue where the first operand of
the
>>> instruction is defined. A is the number of cycles that the
instruction will
>>> stay in that particular stage in the pipeline. So for simple cases,
like
>>> your example, one would expect that A and B should have the same
value.But
>>> there is different API for accessing to A and B.
>>>
>>> An example of accessing to B in the source code can be found here:
>>> PPCInstrInfo::getInstrLatency. You can also look at getStageLatency
in
>>> include/llvm/MC/MCInstrItineraries.h. From this two you can
probably find
>>> other relevant places.
>>>
>>> Hope this helps
>>> Ehsan
>>>
>>>
>>> On Mon, Jun 6, 2016 at 2:37 PM, Phil Tomson via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> In our architecture loads from certain memory locations take a
long
>>>> time to complete (on the order of 150 clock cycles). Since we
don't have a
>>>> way to tell at compile time if the address being loaded from
lies in slow
>>>> or fast memory, I've gone ahead and made all of the load
numbers high, such
>>>> as:
>>>>
>>>>   InstrItinData< II_LOAD1,     [InstrStage<150,
[AGU]>]>,
>>>>
>>>> However, I see that there is another field which I haven't
specified
>>>> where operand latencies are specified.  Here's an example
from
>>>> ARMScheduleA8.td:
>>>>
>>>>   InstrItinData<IIC_iALUi ,[InstrStage<1, [A8_Pipe0,
A8_Pipe1]>], [2,
>>>> 2]>,
>>>>
>>>> Now I'm wondering if Instead of what I had above, I should
instead have
>>>> specified:
>>>>
>>>>   InstrItinData< II_LOAD1,     [InstrStage<150,
[AGU]>],[150,1,1]>,
>>>>
>>>> ?
>>>>
>>>> but is that first '150' parameter there redundant?
Since it's specified
>>>> in the operand latency list ([150,1,1] - the first element of
that array
>>>> being the latency for the output)?
>>>>
>>>>
>>>> To clarify, for values of  'A' and 'B' below:
>>>>
>>>>   InstrItinData< II_LOAD1,     [InstrStage<A, [AGU]>],
[B,1,1]>,
>>>>
>>>> ...what is the difference in the meaning for 'A' and
'B'? Are they
>>>> essentially the same value since only one functional unit is
specified?
>>>> ([AGU])
>>>>
>>>> Phil
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160608/917d0c8a/attachment.html>

llvm dev - Jun 2016 - Instruction Itineraries: question about operand latencies

[llvm-dev] Instruction Itineraries: question about operand latencies

[llvm-dev] Instruction Itineraries: question about operand latencies

[llvm-dev] Instruction Itineraries: question about operand latencies