thr3ads.net - llvm dev - [llvm-dev] [llvm-mca] What's the difference between Rthroughput and "total cycles" in llvm-mca [Jun 2019]

If this information is useful, please help other people find it:
Share via:

Tom Chen via llvm-dev

2019-Jun-07 11:42 UTC

[llvm-dev] [llvm-mca] What's the difference between Rthroughput and "total cycles" in llvm-mca

Hi Andrea,
So does this definition make sense for basic blocks with more than one
instructions? E.g. how should one interpret a basic block with RThroughput
of 2.3?

On Fri, Jun 7, 2019 at 7:39 AM Andrea Di Biagio <andrea.dibiagio at
gmail.com>
wrote:
> Hi Tom,
>
> Field 'Total Cycles' from the summary view simply reports the
elapsed
> number of cycles for the entire simulation.
>
> Rthroughput (from the "Instruction Info" view) is the reciprocal
of the
> instruction throughput.
> Throughput is computed as the maximum number of instructions of a same
> type that can be executed per clock cycle in the absence of operand
> dependencies.
>
> Example (x86 - AMD Jaguar):
>    ADD EAX, ESI
>
> The integer unit in Jaguar has two ALU pipelines. An ADD instruction can
> issue to any of those pipelines. That means, two independent ADD can be
> issue during a same cycle. Therefore, throughput is 2  (instructions per
> cycle), and RThrougput (1/throughput) is 0.5.
>
> I hope it helps,
> -Andrea
>
> On Thu, Jun 6, 2019 at 10:11 PM Tom Chen via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> What is the difference between the two? I thought
"Rthroughput" is
>> basically the number of cycles required to execute a single iteration
at
>> steady state, but this does not seem to match with the
schedule/timeline
>> generated by llvm-mca.
>> Thanks in advance,
>> Tom
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190607/2cf5c31e/attachment.html>

Andrea Di Biagio via llvm-dev

2019-Jun-07 13:30 UTC

head link

[llvm-dev] [llvm-mca] What's the difference between Rthroughput and "total cycles" in llvm-mca

In the absence of data dependencies, throughput of a block of code is
superiorly limited by the dispatch rate (i.e. our DispatchWidth), and the
availability of hardware resources.

DispatchWidth is the maximum number of micro opcodes that can be dispatched
to the out-of-order every cycle. That value inevitably affects the block
throughput. Example: if a block in input decodes to 4 micro-opcodes in
total, and the processor can only dispatch up to 2 opcodes per cycle, then
the maximum block throughput cannot exceed 0.5 (i.e. one block every two
cycles).

Block throughput is also constrained by the availability of hardware
resources.
Example: if we have 4 ADD micro-opcodes, and each opcode consumes 1cy of
ALU pipeline, then the block throughput is superiorly limited by N/4, where
N is the number of ALU pipelines available on the target, and 4 is the
number of ALU cycles consumed. So, if there is only 1 ALU pipeline, then
the block throughput is superiorly limited to 1/4 = 0.25 (blocks per cycle)

Back to the computation of the "Block Throughput".
It is statically computed as the reciprocal of the block throughput. As for
the normal instruction throughput, the computation doesn't take into
account operand dependencies. Therefore, we could say that it is computed
as the MAX of:
 - #MicroOpcodes of a block / DispatchWidth
 - #Consumed resource cycles / #Resources   [ for every resource kind ].

In the absence of loop-carried dependencies between different iterations,
the observed ‘uOps Per Cycle’ tends to a theoretical maximum throughput
which can be computed by dividing the total number of uOps of a block by
the Block RThroughput.

You can find more information about it in the llvm-mca docs under section
"How LLVM-MCA works".

I hope it helps!
-Andrea

On Fri, Jun 7, 2019 at 12:43 PM Tom Chen <cyt046 at gmail.com> wrote:
> Hi Andrea,
> So does this definition make sense for basic blocks with more than one
> instructions? E.g. how should one interpret a basic block with RThroughput
> of 2.3?
>
> On Fri, Jun 7, 2019 at 7:39 AM Andrea Di Biagio <andrea.dibiagio at
gmail.com>
> wrote:
>
>> Hi Tom,
>>
>> Field 'Total Cycles' from the summary view simply reports the
elapsed
>> number of cycles for the entire simulation.
>>
>> Rthroughput (from the "Instruction Info" view) is the
reciprocal of the
>> instruction throughput.
>> Throughput is computed as the maximum number of instructions of a same
>> type that can be executed per clock cycle in the absence of operand
>> dependencies.
>>
>> Example (x86 - AMD Jaguar):
>>    ADD EAX, ESI
>>
>> The integer unit in Jaguar has two ALU pipelines. An ADD instruction
can
>> issue to any of those pipelines. That means, two independent ADD can be
>> issue during a same cycle. Therefore, throughput is 2  (instructions
per
>> cycle), and RThrougput (1/throughput) is 0.5.
>>
>> I hope it helps,
>> -Andrea
>>
>> On Thu, Jun 6, 2019 at 10:11 PM Tom Chen via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> What is the difference between the two? I thought
"Rthroughput" is
>>> basically the number of cycles required to execute a single
iteration at
>>> steady state, but this does not seem to match with the
schedule/timeline
>>> generated by llvm-mca.
>>> Thanks in advance,
>>> Tom
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190607/4dad0a0e/attachment.html>

Andrea Di Biagio via llvm-dev

2019-Jun-07 13:33 UTC

head link

[llvm-dev] [llvm-mca] What's the difference between Rthroughput and "total cycles" in llvm-mca

On Fri, Jun 7, 2019 at 2:30 PM Andrea Di Biagio <andrea.dibiagio at
gmail.com>
wrote:
> In the absence of data dependencies, throughput of a block of code is
> superiorly limited by the dispatch rate (i.e. our DispatchWidth), and the
> availability of hardware resources.
>
> DispatchWidth is the maximum number of micro opcodes that can be
> dispatched to the out-of-order every cycle. That value inevitably affects
> the block throughput. Example: if a block in input decodes to 4
> micro-opcodes in total, and the processor can only dispatch up to 2 opcodes
> per cycle, then the maximum block throughput cannot exceed 0.5 (i.e. one
> block every two cycles).
>
> Block throughput is also constrained by the availability of hardware
> resources.
> Example: if we have 4 ADD micro-opcodes, and each opcode consumes 1cy of
> ALU pipeline, then the block throughput is superiorly limited by N/4, where
> N is the number of ALU pipelines available on the target, and 4 is the
> number of ALU cycles consumed. So, if there is only 1 ALU pipeline, then
> the block throughput is superiorly limited to 1/4 = 0.25 (blocks per cycle)
>
> Back to the computation of the "Block Throughput".
>
Sorry, I should have written "Block RThroughput" here.

It is statically computed as the reciprocal of the block throughput. As
for> the normal instruction throughput, the computation doesn't take into
> account operand dependencies. Therefore, we could say that it is computed
> as the MAX of:
>  - #MicroOpcodes of a block / DispatchWidth
>  - #Consumed resource cycles / #Resources   [ for every resource kind ].
>
> In the absence of loop-carried dependencies between different iterations,
> the observed ‘uOps Per Cycle’ tends to a theoretical maximum throughput
> which can be computed by dividing the total number of uOps of a block by
> the Block RThroughput.
>
> You can find more information about it in the llvm-mca docs under section
> "How LLVM-MCA works".
>
> I hope it helps!
> -Andrea
>
> On Fri, Jun 7, 2019 at 12:43 PM Tom Chen <cyt046 at gmail.com> wrote:
>
>> Hi Andrea,
>> So does this definition make sense for basic blocks with more than one
>> instructions? E.g. how should one interpret a basic block with
RThroughput
>> of 2.3?
>>
>> On Fri, Jun 7, 2019 at 7:39 AM Andrea Di Biagio <
>> andrea.dibiagio at gmail.com> wrote:
>>
>>> Hi Tom,
>>>
>>> Field 'Total Cycles' from the summary view simply reports
the elapsed
>>> number of cycles for the entire simulation.
>>>
>>> Rthroughput (from the "Instruction Info" view) is the
reciprocal of the
>>> instruction throughput.
>>> Throughput is computed as the maximum number of instructions of a
same
>>> type that can be executed per clock cycle in the absence of operand
>>> dependencies.
>>>
>>> Example (x86 - AMD Jaguar):
>>>    ADD EAX, ESI
>>>
>>> The integer unit in Jaguar has two ALU pipelines. An ADD
instruction can
>>> issue to any of those pipelines. That means, two independent ADD
can be
>>> issue during a same cycle. Therefore, throughput is 2 
(instructions per
>>> cycle), and RThrougput (1/throughput) is 0.5.
>>>
>>> I hope it helps,
>>> -Andrea
>>>
>>> On Thu, Jun 6, 2019 at 10:11 PM Tom Chen via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> What is the difference between the two? I thought
"Rthroughput" is
>>>> basically the number of cycles required to execute a single
iteration at
>>>> steady state, but this does not seem to match with the
schedule/timeline
>>>> generated by llvm-mca.
>>>> Thanks in advance,
>>>> Tom
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190607/affd46d3/attachment.html>

llvm dev - Jun 2019 - [llvm-mca] What's the difference between Rthroughput and "total cycles" in llvm-mca

[llvm-dev] [llvm-mca] What's the difference between Rthroughput and "total cycles" in llvm-mca

[llvm-dev] [llvm-mca] What's the difference between Rthroughput and "total cycles" in llvm-mca

[llvm-dev] [llvm-mca] What's the difference between Rthroughput and "total cycles" in llvm-mca