thr3ads.net - llvm dev - [LLVMdev] Possible missed optimization? [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Borja Ferrer

2010-Sep-04 16:54 UTC

[LLVMdev] Possible missed optimization?

Hello, while testing trivial functions in my backend i noticed a suboptimal
way of assigning regs that had the following pattern, consider the following
function:

typedef unsigned short t;
t foo(t a, t b)
{
    t a4 = b^a^18;
    return a4;
}

Argument "a" is passed in R15:R14 and argument "b" is passed
in R13:R12, the
return value is stored in R15:R14.

Producing the following asm code:
    xor    r15, r13  ; xor top part
    mov    r8, r14
    xor    r8, r12   ; xor bottom part
    movi    r14, 18
    xor    r14, r8  ; xor bottom part with imm value

However it could have produced:
    xor     r15, r13
    xor     r14, r12
    movi   r8, 18
    xor     r14, r8

This saves the first move instruction by using any scratch register when
moving the imm value instead of using r14 which is used as an incoming reg
and as the return value. Is this a missed optimization from LLVM or did i
miss something out?
Changing the register allocation order didnt work.

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100904/a6db132b/attachment.html>

Jakob Stoklund Olesen

2010-Sep-04 17:52 UTC

head link

[LLVMdev] Possible missed optimization?

On Sep 4, 2010, at 9:54 AM, Borja Ferrer <borja.ferav at gmail.com> wrote:
> Producing the following asm code:
>     xor    r15, r13  ; xor top part
>     mov    r8, r14
>     xor    r8, r12   ; xor bottom part
>     movi    r14, 18
>     xor    r14, r8  ; xor bottom part with imm value
> 
> However it could have produced:
>     xor     r15, r13
>     xor     r14, r12
>     movi   r8, 18
>     xor     r14, r8
Only by commuting the xor operands. Have you marked xor as commutable? See the
other targets for how to do that.

Borja Ferrer

2010-Sep-04 18:00 UTC

head link

[LLVMdev] Possible missed optimization?

Indeed, i've marked it as commutable:

let isCommutable = 1,
isTwoAddress = 1 in
def XORRdRr : FRdRr<0b0010,
                    0b01,
                    (outs GPR8:$dst),
                    (ins GPR8:$src1, GPR8:$src2),
                    "xor\t$dst, $src2",
                    [(set GPR8:$dst, (xor GPR8:$src1, GPR8:$src2))]>;
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100904/60166a0b/attachment.html>

Borja Ferrer

2010-Sep-04 18:21 UTC

head link

[LLVMdev] Possible missed optimization?

I've noticed this pattern happening with other operators aswell, but used
xor in this example. As i said before, i tried with different register
allocation orders, but it will produce always the same result. GCC is
emitting longer code, but since LLVM is so nearer to the optimal code
sequence i wanted to reach it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100904/ab1c47bd/attachment.html>

Anton Korobeynikov

2010-Sep-04 18:25 UTC

head link

[LLVMdev] Possible missed optimization?

Hello
> and as the return value. Is this a missed optimization from LLVM or did i
> miss something out?
> Changing the register allocation order didnt work.What are the patterns for xor / mov ?

-- 
With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University

Borja Ferrer

2010-Sep-04 18:30 UTC

head link

[LLVMdev] Possible missed optimization?

The pattern for the copy integer to reg is:

let isReMaterializable = 1,
isAsCheapAsAMove = 1 in
def MOVIRdK : FRdK<0b1110,
                  (outs GPR8:$dst),
                  (ins i8imm:$src),
                  "movi\t$dst, $src",
                  [(set GPR8:$dst, imm:$src)]>;

The xor pattern is :

let isCommutable = 1,
isTwoAddress = 1 in
def XORRdRr : FRdRr<0b0010,
                    0b01,
                    (outs GPR8:$dst),
                    (ins GPR8:$src1, GPR8:$src2),
                    "xor\t$dst, $src2",
                    [(set GPR8:$dst, (xor GPR8:$src1, GPR8:$src2))]>;
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100904/965e07e0/attachment.html>

Jeff Kunkel

2010-Sep-04 19:26 UTC

head link

[LLVMdev] Possible missed optimization?

Could it be that an optimization was actually missed?
It seems that there is some sort of look-ahead optimization is
happening because the constant value was moved into r14, when the
optimal choice was not to move r14 and keep the variable into r14. All
these R14 with different values is confusing. My mental SSA form looks
like:

t1 = b_high ^ a_low
t2 = b_low ^ a_low
t3 = 18
t4 = t2 ^ t3
5: return t1, t4

Now, when the registers are allocated, the t4 attempts to use a
register already used by either t2 or t3. However a constant value is
in t3. The allocator could attempt to re-use t3 instead of t2 because
it doesn't know about t3 being a constant. Thus it inserts a move on
t2 and re-uses t3. At the allocator standpoint consider:

1: t1 = b_high ^ a_high ; b_high and a_high are dead and t1 may be any
b_high or b_low.
2: t2 = b_low ^ a_low   ; b_low and a_low are also dead. t2 may be either.
3: t3 = 18                    ; 18 may be any register
4: t4 = t2 ^ t3               ; t4 may reuse t2 or t3.
5: return t1, t4

The optimal value at (1) is of course R14 or a_high's old value. This
is a simple choice. The other choice of t4 is a bit harder because it
may be t2 or t3. At first glance a large enumerative tree search would
be exponential. However, I don't know much of the theory behind
reusing registers like this.

Thanks,
Jeff Kunkel

On Sat, Sep 4, 2010 at 12:54 PM, Borja Ferrer <borja.ferav at gmail.com>
wrote:>
> Hello, while testing trivial functions in my backend i noticed a suboptimal
way of assigning regs that had the following pattern, consider the following
function:
>
> typedef unsigned short t;
> t foo(t a, t b)
> {
>     t a4 = b^a^18;
>     return a4;
> }
>
> Argument "a" is passed in R15:R14 and argument "b" is
passed in R13:R12, the return value is stored in R15:R14.
>
> Producing the following asm code:
>     xor    r15, r13  ; xor top part
>     mov    r8, r14
>     xor    r8, r12   ; xor bottom part
>     movi    r14, 18
>     xor    r14, r8  ; xor bottom part with imm value
>
> However it could have produced:
>     xor     r15, r13
>     xor     r14, r12
>     movi   r8, 18
>     xor     r14, r8
>
> This saves the first move instruction by using any scratch register when
moving the imm value instead of using r14 which is used as an incoming reg and
as the return value. Is this a missed optimization from LLVM or did i miss
something out?
> Changing the register allocation order didnt work.
>
> Thanks
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Sep 2010 - [LLVMdev] Possible missed optimization?

[LLVMdev] Possible missed optimization?

[LLVMdev] Possible missed optimization?

[LLVMdev] Possible missed optimization?

[LLVMdev] Possible missed optimization?

[LLVMdev] Possible missed optimization?

[LLVMdev] Possible missed optimization?

[LLVMdev] Possible missed optimization?

Maybe Matching Threads