thr3ads.net - llvm dev - [llvm-dev] Wide load/store optimization question [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Matthias Braun via llvm-dev

2017-Jun-17 00:05 UTC

[llvm-dev] Wide load/store optimization question

> On Jun 16, 2017, at 2:43 PM, 陳韋任 via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> 
> 2017-06-17 4:36 GMT+08:00 upcfrost <upcfrost at gmail.com
<mailto:upcfrost at gmail.com>>:
> Hi,
> 
> Same here, my backend only has 64bit load/store. But i still use 64bit virt
regs and expand/declare missing instructions by myself.
> 
> I'll try looking into sparc backend, thanks. Also, only after writing
this post I found a bunch of built-in transforms. Still trying to understand how
to use those.
> 
> By the way, constraint-wise (alignment), is there any difference between
virt regclass and regtuple?
That question makes no sense.
- Every virtual register has a register class assigned.
- You can construct special register classes that represent register tuples so
that when the allocator chooses an entry from that register class it really has
choosen a tuple of machine registers (even though it looks like a single
register with funny aliasing as far as llvm codegen is concerned).

- Matthias
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170616/30f9fd06/attachment-0001.html>

陳韋任 via llvm-dev

2017-Jun-17 07:31 UTC

head link

[llvm-dev] Wide load/store optimization question

>
> That question makes no sense.
> - Every virtual register has a register class assigned.
> - You can construct special register classes that represent register
> tuples so that when the allocator chooses an entry from that register class
> it really has choosen a tuple of machine registers (even though it looks
> like a single register with funny aliasing as far as llvm codegen is
> concerned).
>
And we still have to lower load i64 to load v2i32, right?

-- 
Wei-Ren Chen (陳韋任)
Homepage: https://people.cs.nctu.edu.tw/~chenwj
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170617/0da86550/attachment.html>

Peter Bel via llvm-dev

2017-Jun-28 09:43 UTC

head link

[llvm-dev] Wide load/store optimization question

Hi,

I've looked through both AMDGPU and Sparc backends, and it seems they also
do not perform the thing I want to make. The only backend which is doing it
is AArch64, but it doesn't have reg constraints.
So, just with an example. I have the following C code:

void test()
{
  int a = 1; int b = 2; int c = 3; int d = 4;
  a++; b++; c++; d++;
}

Without any frontend optimization is compiles to the following IR.

define void @test(i32* %z) #0 {
  %1 = alloca i32*, align 4
  %a = alloca i32, align 4
  %b = alloca i32, align 4
  %c = alloca i32, align 4
  %d = alloca i32, align 4
  store i32* %z, i32** %1, align 4
  store i32 1, i32* %a, align 4
  store i32 2, i32* %b, align 4
  store i32 3, i32* %c, align 4
  store i32 4, i32* %d, align 4
  %2 = load i32, i32* %a, align 4
  %3 = add nsw i32 %2, 1
  store i32 %3, i32* %a, align 4
  %4 = load i32, i32* %b, align 4
  %5 = add nsw i32 %4, 1
  store i32 %5, i32* %b, align 4
  .....
}

Which produces the following asm code.

        mov     r2, #1
        str     r2, [fp, #-2]
        mov     r3, #2
        mov     r2, #3
        str     r3, [fp, #-3]
        str     r2, [fp, #-4]
        mov     r3, #4
        ldr     r2, [fp, #-2]
        str     r3, [fp, #-5]
        .....

What I want to do is to merge neighboring stores and loads. For example
        mov     r3, #2
        mov     r2, #3
        str     r3, [fp, #-5]
        str     r2, [fp, #-4]
Can be converted to
        mov     r3, #2
        mov     r2, #3
        strd    r2, [fp, #-4]
But the main problem is that the offset for r3 in the snippet above was -3,
not -5.

Currently, i'm doing the following. During the pre-RA i'm creating a
REG_SEQUENCE with the target class, assigning vregs in question as its
subregs, and create a load/store inst for the sequence with mem references
merged.
It solves the register constraint problem, but the frame allocation problem
still exists. Probably I'll need to use fixed stack objects and manually
pre-allocate the frame, which i really don't want to do as it can break
some other passes.

Petr


On Sat, Jun 17, 2017 at 10:31 AM, 陳韋任 <chenwj.cs97g at g2.nctu.edu.tw>
wrote:
> That question makes no sense.
>> - Every virtual register has a register class assigned.
>> - You can construct special register classes that represent register
>> tuples so that when the allocator chooses an entry from that register
class
>> it really has choosen a tuple of machine registers (even though it
looks
>> like a single register with funny aliasing as far as llvm codegen is
>> concerned).
>>
>
> And we still have to lower load i64 to load v2i32, right?
>
> --
> Wei-Ren Chen (陳韋任)
> Homepage: https://people.cs.nctu.edu.tw/~chenwj
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170628/76585ca2/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Jun 2017 - Wide load/store optimization question

[llvm-dev] Wide load/store optimization question

[llvm-dev] Wide load/store optimization question

[llvm-dev] Wide load/store optimization question

Possibly Parallel Threads