thr3ads.net - llvm dev - [llvm-dev] [EXTERNAL] Re: Simulation of load-store forwarding with MI scheduler on AArch64 [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Evgeny Leviant via llvm-dev

2020-Sep-15 11:24 UTC

[llvm-dev] [EXTERNAL] Re: Simulation of load-store forwarding with MI scheduler on AArch64

Thanks for prompt response, Andy

This will work for cases when address is not modified. However this doesn’t seem
to work for pre/post increment load stores.
Consider data to address forwarding:

$x0 = ldr x0, [x1]
$x0, $x2 = ldr x2, [x0, 16]!

The second instruction will have it’s own latency for address modification ($x0
register). So I don’t see how we can use ReadAdr stuff
here. May be forwarding is not supposed to work in such cases for ARM cpus?
Cortex-A55 software optimization guide says this:

“load data from a limited set of load instructions can be forwarded from the
beginning of the wr pipeline stage to either the load or store AGU base operand”

However nothing is said about pre/post indexed forms.

From: Andrew Trick<mailto:atrick at apple.com>
Sent: 15 сентября 2020 г. 7:04
To: Evgeny Leviant<mailto:eleviant at accesssoftek.com>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: [EXTERNAL] Re: [llvm-dev] Simulation of load-store forwarding with MI
scheduler on AArch64

CAUTION: This email originated from outside of the organization. Do not click
links or open attachments unless you recognize the sender and know the content
is safe.  If you suspect potential phishing or spam email, report it to
ReportSpam at accesssoftek.com


On Sep 14, 2020, at 9:40 AM, Evgeny Leviant via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hi list,

Is it possible to simulate load to store forwarding on aarch64 with MI
scheduling model on AArch64?
For instance $x0 data latency in the example below should be 1 cycle

ldr $x0, [$x1]
str $x0, [$x2]

But it should be 4 cycles if we have another instruction:

ldr $x0, [$x1]
add $x0, $x0, 4

For ALU instructions it’s possible to use either ReadAdvance or
SchedReadAdvance, but I don’t see how
to do this with WriteLD or WriteST. Is there some workaround?

The main purpose of ReadAdvance is pipeline forwarding.

I think you can just want a read resource in your subtarget like this:

  def ReadAdr : SchedReadAdvance<3, [WriteLD]>

Briefly glancing at the AArch64 target I see this for stores:

  Sched<[WriteST]>;

So it doesn't look like there's any existing name for the store’s
address operand. You could add a general ReadAdr SchedRead resource
in AArch64Schedule.td. Then you would need to change the ReadAdr line in your
subtarget to an override:

  def : ReadAdvance<ReadAdr, 3, [WriteLD]>

Or instead you can just add a rule in your subtarget listing the opcodes or
using a regex, and using the ReadAdr resource that you defined in the same file.

  def : InstRW<[WriteST, ReadAdr], (instregex
"ST(someregex)$")>;

Being careful about store-pair and vector stores.

Then you always want to debug your target’s llvm-tblgen command by adding a flag
-debug-only=subtarget-emitter

And even trace the schedule for some simple cases with
-debug-only=machine-scheduler

I haven't actually done any of this in several years, someone with more
recent experience may have better tips.

-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200915/471c8075/attachment.html>

Evgeny Leviant via llvm-dev

2020-Sep-15 12:01 UTC

head link

[llvm-dev] [EXTERNAL] Re: Simulation of load-store forwarding with MI scheduler on AArch64

Sorry, it seems I have figured out the answer myself:

Instruction
$x0, $x2 = LDRXpre $x0, 1

will have 4 arguments, so it seems possible to assign both SchedRead and
SchedWrite  for $x0 and the result sched
list for LDRXpre would be:

[WriteAdr, WriteLD, ReadAdr]

Strange that AArch64InstrFormats.td doesn’t implement this

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows
10

From: Evgeny Leviant<mailto:eleviant at accesssoftek.com>
Sent: 15 сентября 2020 г. 14:24
To: Andrew Trick<mailto:atrick at apple.com>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: RE: [EXTERNAL] Re: [llvm-dev] Simulation of load-store forwarding with
MI scheduler on AArch64

Thanks for prompt response, Andy

This will work for cases when address is not modified. However this doesn’t seem
to work for pre/post increment load stores.
Consider data to address forwarding:

$x0 = ldr x0, [x1]
$x0, $x2 = ldr x2, [x0, 16]!

The second instruction will have it’s own latency for address modification ($x0
register). So I don’t see how we can use ReadAdr stuff
here. May be forwarding is not supposed to work in such cases for ARM cpus?
Cortex-A55 software optimization guide says this:

“load data from a limited set of load instructions can be forwarded from the
beginning of the wr pipeline stage to either the load or store AGU base operand”

However nothing is said about pre/post indexed forms.

From: Andrew Trick<mailto:atrick at apple.com>
Sent: 15 сентября 2020 г. 7:04
To: Evgeny Leviant<mailto:eleviant at accesssoftek.com>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: [EXTERNAL] Re: [llvm-dev] Simulation of load-store forwarding with MI
scheduler on AArch64

CAUTION: This email originated from outside of the organization. Do not click
links or open attachments unless you recognize the sender and know the content
is safe.  If you suspect potential phishing or spam email, report it to
ReportSpam at accesssoftek.com


On Sep 14, 2020, at 9:40 AM, Evgeny Leviant via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hi list,

Is it possible to simulate load to store forwarding on aarch64 with MI
scheduling model on AArch64?
For instance $x0 data latency in the example below should be 1 cycle

ldr $x0, [$x1]
str $x0, [$x2]

But it should be 4 cycles if we have another instruction:

ldr $x0, [$x1]
add $x0, $x0, 4

For ALU instructions it’s possible to use either ReadAdvance or
SchedReadAdvance, but I don’t see how
to do this with WriteLD or WriteST. Is there some workaround?

The main purpose of ReadAdvance is pipeline forwarding.

I think you can just want a read resource in your subtarget like this:

  def ReadAdr : SchedReadAdvance<3, [WriteLD]>

Briefly glancing at the AArch64 target I see this for stores:

  Sched<[WriteST]>;

So it doesn't look like there's any existing name for the store’s
address operand. You could add a general ReadAdr SchedRead resource
in AArch64Schedule.td. Then you would need to change the ReadAdr line in your
subtarget to an override:

  def : ReadAdvance<ReadAdr, 3, [WriteLD]>

Or instead you can just add a rule in your subtarget listing the opcodes or
using a regex, and using the ReadAdr resource that you defined in the same file.

  def : InstRW<[WriteST, ReadAdr], (instregex
"ST(someregex)$")>;

Being careful about store-pair and vector stores.

Then you always want to debug your target’s llvm-tblgen command by adding a flag
-debug-only=subtarget-emitter

And even trace the schedule for some simple cases with
-debug-only=machine-scheduler

I haven't actually done any of this in several years, someone with more
recent experience may have better tips.

-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200915/7eb84e45/attachment.html>

Andrew Trick via llvm-dev

2020-Sep-15 15:41 UTC

head link

[llvm-dev] [EXTERNAL] Simulation of load-store forwarding with MI scheduler on AArch64

> On Sep 15, 2020, at 5:01 AM, Evgeny Leviant <eleviant at
accesssoftek.com> wrote:
> 
> Sorry, it seems I have figured out the answer myself:
>  
> Instruction
> $x0, $x2 = LDRXpre $x0, 1
>  
> will have 4 arguments, so it seems possible to assign both SchedRead and
SchedWrite  for $x0 and the result sched
> list for LDRXpre would be:
>  
> [WriteAdr, WriteLD, ReadAdr]
>  
> Strange that AArch64InstrFormats.td doesn’t implement this
Looking at AArch64InstrFormats.td, it has the writeback operand in a different
order.
Sched<[WriteLD, WriteAdr]>;

If that’s incorrect, it might be worth fixing.

Otherwise, that looks like the right answer. The address writeback is not really
load forwarding. It's a totally separate scheduling resource with its own
latency.

-Andy
> From: Evgeny Leviant <mailto:eleviant at accesssoftek.com>
> Sent: 15 сентября 2020 г. 14:24
> To: Andrew Trick <mailto:atrick at apple.com>
> Cc: llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> Subject: RE: [EXTERNAL] Re: [llvm-dev] Simulation of load-store forwarding
with MI scheduler on AArch64
>  
> Thanks for prompt response, Andy
>  
> This will work for cases when address is not modified. However this doesn’t
seem to work for pre/post increment load stores.
> Consider data to address forwarding:
>  
> $x0 = ldr x0, [x1]
> $x0, $x2 = ldr x2, [x0, 16]!
>  
> The second instruction will have it’s own latency for address modification
($x0 register). So I don’t see how we can use ReadAdr stuff
> here. May be forwarding is not supposed to work in such cases for ARM cpus?
Cortex-A55 software optimization guide says this:
>  
> “load data from a limited set of load instructions can be forwarded from
the beginning of the wr pipeline stage to either the load or store AGU base
operand”
>  
> However nothing is said about pre/post indexed forms.
>  
> From: Andrew Trick <mailto:atrick at apple.com>
> Sent: 15 сентября 2020 г. 7:04
> To: Evgeny Leviant <mailto:eleviant at accesssoftek.com>
> Cc: llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> Subject: [EXTERNAL] Re: [llvm-dev] Simulation of load-store forwarding with
MI scheduler on AArch64
>  
> CAUTION: This email originated from outside of the organization. Do not
click links or open attachments unless you recognize the sender and know the
content is safe.  If you suspect potential phishing or spam email, report it to
ReportSpam at accesssoftek.com
> 
> 
> 
>> On Sep 14, 2020, at 9:40 AM, Evgeny Leviant via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> Hi list,
>>  
>> Is it possible to simulate load to store forwarding on aarch64 with MI
scheduling model on AArch64?
>> For instance $x0 data latency in the example below should be 1 cycle
>>  
>> ldr $x0, [$x1]
>> str $x0, [$x2]
>>  
>> But it should be 4 cycles if we have another instruction:
>>  
>> ldr $x0, [$x1]
>> add $x0, $x0, 4
>>  
>> For ALU instructions it’s possible to use either ReadAdvance or
SchedReadAdvance, but I don’t see how
>> to do this with WriteLD or WriteST. Is there some workaround?
> 
> The main purpose of ReadAdvance is pipeline forwarding.
> 
> I think you can just want a read resource in your subtarget like this:
> 
>   def ReadAdr : SchedReadAdvance<3, [WriteLD]>
> 
> Briefly glancing at the AArch64 target I see this for stores:
> 
>   Sched<[WriteST]>;
> 
> So it doesn't look like there's any existing name for the store’s
address operand. You could add a general ReadAdr SchedRead resource
> in AArch64Schedule.td. Then you would need to change the ReadAdr line in
your subtarget to an override:
> 
>   def : ReadAdvance<ReadAdr, 3, [WriteLD]>
> 
> Or instead you can just add a rule in your subtarget listing the opcodes or
using a regex, and using the ReadAdr resource that you defined in the same file.
> 
>   def : InstRW<[WriteST, ReadAdr], (instregex
"ST(someregex)$")>;
> 
> Being careful about store-pair and vector stores.
> 
> Then you always want to debug your target’s llvm-tblgen command by adding a
flag
> -debug-only=subtarget-emitter
> 
> And even trace the schedule for some simple cases with
-debug-only=machine-scheduler
> 
> I haven't actually done any of this in several years, someone with more
recent experience may have better tips.
> 
> -Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200915/37037745/attachment.html>

llvm dev - Sep 2020 - [EXTERNAL] Re: Simulation of load-store forwarding with MI scheduler on AArch64

[llvm-dev] [EXTERNAL] Re: Simulation of load-store forwarding with MI scheduler on AArch64

[llvm-dev] [EXTERNAL] Re: Simulation of load-store forwarding with MI scheduler on AArch64

[llvm-dev] [EXTERNAL] Simulation of load-store forwarding with MI scheduler on AArch64