thr3ads.net - llvm dev - [llvm-dev] Wide load/store optimization question [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Peter Bel via llvm-dev

2017-Jun-13 15:44 UTC

[llvm-dev] Wide load/store optimization question

Hi,

I'm trying to write an LLVM backend for Epiphany arch, and I wonder if
someone can give me some advice on how to implement load/store
optimization. The CPU itself is 32-bit, but it supports wider 64-bit loads
and store. So the basic idea is to make use of those by combining narrow
ones.

I've checked how it is done in AArch64 and Hexagon, and my current code is
very close to the AArch64 one (used it as a kick-off). The problem lies in
constraints imposed by the platform.

The main constraint is that regs used should be sequential, lower reg
should be even/zero. And obviously frame offsets should be sequential to be
merged, dword-aligned for the lower reg offset.

Because of those constraints I'm currently running this pass on pre-emit,
after RA and frame finalization. But at that point most of the choices made
(RA, frame offsets), and those are obviously suboptimal. The most common
issue can look somehow like this:
str r1, [fp, -4]
str r2, [fp, -8]
Those two stores can't be merged because the lower reg (r1) is not even. To
merge them, r1 should be changed to r0, and r2 to r1. Sometimes the same
problem happens when the frame offset is misaligned, e.g. r0 will have
offset aligned to word, not dword.

Can someone please point me out in which direction should I move? And also
- at which step should I apply such pass? If on PreRA - how to set reg
constraints such as regsequence, as well as frame constraints? If before
frame finalization - how to set frame constraints? If on pre-emit like i'm
doing now - how to optimize and rewrite frame offsets and regs?

Thanks,
Petr
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170613/11623a11/attachment-0001.html>

陳韋任 via llvm-dev

2017-Jun-16 10:43 UTC

head link

[llvm-dev] Wide load/store optimization question

Hi Peter,

  For i64, our custom backend only support load/store instruction. We refer
to Sparc,
making load i64 as load v2i32 rather then two load i32 (LLVM default
Expand). I would
be happy to hear other's experience on this, too.

HTH,
chenwj


2017-06-13 23:44 GMT+08:00 Peter Bel via llvm-dev <llvm-dev at
lists.llvm.org>:
> Hi,
>
> I'm trying to write an LLVM backend for Epiphany arch, and I wonder if
> someone can give me some advice on how to implement load/store
> optimization. The CPU itself is 32-bit, but it supports wider 64-bit loads
> and store. So the basic idea is to make use of those by combining narrow
> ones.
>
> I've checked how it is done in AArch64 and Hexagon, and my current code
is
> very close to the AArch64 one (used it as a kick-off). The problem lies in
> constraints imposed by the platform.
>
> The main constraint is that regs used should be sequential, lower reg
> should be even/zero. And obviously frame offsets should be sequential to be
> merged, dword-aligned for the lower reg offset.
>
> Because of those constraints I'm currently running this pass on
pre-emit,
> after RA and frame finalization. But at that point most of the choices made
> (RA, frame offsets), and those are obviously suboptimal. The most common
> issue can look somehow like this:
>     str r1, [fp, -4]
>     str r2, [fp, -8]
> Those two stores can't be merged because the lower reg (r1) is not
even.
> To merge them, r1 should be changed to r0, and r2 to r1. Sometimes the same
> problem happens when the frame offset is misaligned, e.g. r0 will have
> offset aligned to word, not dword.
>
> Can someone please point me out in which direction should I move? And also
> - at which step should I apply such pass? If on PreRA - how to set reg
> constraints such as regsequence, as well as frame constraints? If before
> frame finalization - how to  set frame constraints? If on pre-emit like
i'm
> doing now - how to optimize and rewrite frame offsets and regs?
>
> Thanks,
> Petr
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>

-- 
Wei-Ren Chen (陳韋任)
Homepage: https://people.cs.nctu.edu.tw/~chenwj
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170616/7c944897/attachment.html>

Tom Stellard via llvm-dev

2017-Jun-16 11:00 UTC

head link

[llvm-dev] Wide load/store optimization question

On 06/13/2017 11:44 AM, Peter Bel via llvm-dev wrote:> Hi,
> 
> I'm trying to write an LLVM backend for Epiphany arch, and I wonder if
someone can give me some advice on how to implement load/store optimization. The
CPU itself is 32-bit, but it supports wider 64-bit loads and store. So the basic
idea is to make use of those by combining narrow ones.
> 
> I've checked how it is done in AArch64 and Hexagon, and my current code
is very close to the AArch64 one (used it as a kick-off). The problem lies in
constraints imposed by the platform.
> 
> The main constraint is that regs used should be sequential, lower reg
should be even/zero. And obviously frame offsets should be sequential to be
merged, dword-aligned for the lower reg offset.
> 
> Because of those constraints I'm currently running this pass on
pre-emit, after RA and frame finalization. But at that point most of the choices
made (RA, frame offsets), and those are obviously suboptimal. The most common
issue can look somehow like this:
>     str r1, [fp, -4]
>     str r2, [fp, -8]
> Those two stores can't be merged because the lower reg (r1) is not
even. To merge them, r1 should be changed to r0, and r2 to r1. Sometimes the
same problem happens when the frame offset is misaligned, e.g. r0 will have
offset aligned to word, not dword.
> 
> Can someone please point me out in which direction should I move? And also
- at which step should I apply such pass? If on PreRA - how to set reg
constraints such as regsequence, as well as frame constraints? If before frame
finalization - how to  set frame constraints? If on pre-emit like i'm doing
now - how to optimize and rewrite frame offsets and regs?
> 
One thing you can do is define a register class that is made up of register
tuples e.g. r0r1, r2r3, etc., and use that register class for the 64-bit
load/store instructions.  This will allow you to do the load/store
merging before register allocation without the register constraints.

The AMDGPU backend has similar alignment constraints for its
SGPR classes, where if you are writing to N-consecutive SGPRs,
then the lower register index must be divisible by N.

-Tom
> Thanks,
> Petr
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

陳韋任 via llvm-dev

2017-Jun-16 19:03 UTC

head link

[llvm-dev] Wide load/store optimization question

>
> One thing you can do is define a register class that is made up of register
> tuples e.g. r0r1, r2r3, etc., and use that register class for the 64-bit
> load/store instructions.  This will allow you to do the load/store
> merging before register allocation without the register constraints.

Our backend only support load/store for i64 type, hence i64 is not legal
for us.
I guess Peter's Epiphany arch has similar situation.

IIRC, LLVM expand load i64 to two load i32. Right now, we have to custom
lowering load i64 to load v2i32, then map v2i32 to the tuple register
(similar
to Sparc backend). How can we use the tuple register for those two i32?
Any existing example?

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Homepage: https://people.cs.nctu.edu.tw/~chenwj
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170617/701f31be/attachment.html>

llvm dev - Jun 2017 - Wide load/store optimization question

[llvm-dev] Wide load/store optimization question

[llvm-dev] Wide load/store optimization question

[llvm-dev] Wide load/store optimization question

[llvm-dev] Wide load/store optimization question