thr3ads.net - llvm dev - [LLVMdev] extra one cycle of getOperandLatency [Dec 2013]

If this information is useful, please help other people find it:
Share via:

Wei-cheng Wang

2013-Dec-20 06:35 UTC

[LLVMdev] extra one cycle of getOperandLatency

Hi llvm-dev,

I wonder why there is an extra cycle for getOperandLatency.
It doesn't seem intuitive.

  UseCycle = DefCycle - UseCycle + 1;

When I read the comments in TargetItinerary.td, it said

  OperandCycles are optional "cycle counts". They specify the cycle
after
  instruction issue the values which correspond to specific operand indices
  are defined or read.

I thought if an instruction reads the operands at the first cycle
and produces the result at the second cycle.  InstrItinData should be written
in something like this,

   InstrItinData<IIC_iALUr ,[InstrStage<1, [FU_x]>], [2, 1, 1]>

Therefore, for operand latency of iALUr output to iALUr input is latency
of "1".  However, by the implementatoin of getOperandLatency, the
latency
of such definition is latency of "2".  That's not what I want.

After some digging around, I found the expression, "DefCycle - UseCycle +
1",
was first appearing in r79425 committed by David Goodwin, and seems
OperandCycles
was initially designed for ARM cortex-a8 (see also r79247 and r79436).

Then I checked "Cortex-A8 Technical Reference Manual - Instruction
Cycle Timing".
There are tables for instructions, for example

   Data-processing instructions
   Source1    Source2    Result1
   Rn:E2      Rm:E2      Rd:E2

That means Rn and Rm are read at the begin of E2 stage,
Rd is produced at the end of E2, and there is 1 cycle latency.

And that was implemented in llvm as such

  InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [2,
2, 2]>,



Is that mean, OperandCycles and getOperandLatency were simply designed
in such a way, so it is easier to use the table from cortex-a8 RTM?
So OperandCycles are not actually referred to "cycle",
for input operand it means at the begin of what stage
and for output operand it means at the end of what stage?

If so, is there any other reasons it should be designed this way?
What not remove the +1 cycle and define the instruction as such?

  InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [3,
2, 2]>,


Thanks

Wei-cheng Wang

Andrew Trick

2013-Dec-31 20:22 UTC

head link

[LLVMdev] extra one cycle of getOperandLatency

On Dec 19, 2013, at 10:35 PM, Wei-cheng Wang <cole945 at gmail.com> wrote:
> Hi llvm-dev,
> 
> I wonder why there is an extra cycle for getOperandLatency.
> It doesn't seem intuitive.
> 
>  UseCycle = DefCycle - UseCycle + 1;
> 
> When I read the comments in TargetItinerary.td, it said
> 
>  OperandCycles are optional "cycle counts". They specify the
cycle after
>  instruction issue the values which correspond to specific operand indices
>  are defined or read.
> 
> I thought if an instruction reads the operands at the first cycle
> and produces the result at the second cycle.  InstrItinData should be
written
> in something like this,
> 
>   InstrItinData<IIC_iALUr ,[InstrStage<1, [FU_x]>], [2, 1, 1]>
> 
> Therefore, for operand latency of iALUr output to iALUr input is latency
> of "1".  However, by the implementatoin of getOperandLatency, the
latency
> of such definition is latency of "2".  That's not what I
want.
> 
> After some digging around, I found the expression, "DefCycle -
UseCycle + 1",
> was first appearing in r79425 committed by David Goodwin, and seems
> OperandCycles
> was initially designed for ARM cortex-a8 (see also r79247 and r79436).
> 
> Then I checked "Cortex-A8 Technical Reference Manual - Instruction
> Cycle Timing".
> There are tables for instructions, for example
> 
>   Data-processing instructions
>   Source1    Source2    Result1
>   Rn:E2      Rm:E2      Rd:E2
> 
> That means Rn and Rm are read at the begin of E2 stage,
> Rd is produced at the end of E2, and there is 1 cycle latency.
> 
> And that was implemented in llvm as such
> 
>  InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>],
[2, 2, 2]>,
> 
> 
> 
> Is that mean, OperandCycles and getOperandLatency were simply designed
> in such a way, so it is easier to use the table from cortex-a8 RTM?
> So OperandCycles are not actually referred to "cycle",
> for input operand it means at the begin of what stage
> and for output operand it means at the end of what stage?
> 
> If so, is there any other reasons it should be designed this way?
> What not remove the +1 cycle and define the instruction as such?
> 
>  InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>],
[3, 2, 2]>,
I think it’s done this way so that if both def and use cycles are unspecified we
get a default of one cycle latency.

At any rate, the itineraries have been around a long time with many out-of-tree
targets. I don’t think it’s a good idea to change that old API. New ports should
try to use the new machine model instead.

-Andy
> 
> Thanks
> 
> Wei-cheng Wang
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Dec 2013 - [LLVMdev] extra one cycle of getOperandLatency

[LLVMdev] extra one cycle of getOperandLatency

[LLVMdev] extra one cycle of getOperandLatency

Reasonably Related Threads