thr3ads.net - llvm dev - [LLVMdev] LICM promoting memory to scalar [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Balaram Makam

2014-Sep-03 16:59 UTC

[LLVMdev] LICM promoting memory to scalar

Thanks for the background on the concurrent memory model.

So, is it sufficient that the loop entry is guarded by condition (cbz at
top) for preventing the race?
The loop entry will be guarded by condition if loop has been rotated by loop
rotate pass. 

Since LICM runs after loop rotate, we can use
ScalarEvolution::isLoopEntryGuardedByCond to check if we can speculatively
execute load without causing a race.
Is it heavy-handed solution for this problem? I can create a bug if you
would like to.

Thanks,
Balaram

From: Robin Morisset [mailto:morisset at google.com] 
Sent: Tuesday, September 02, 2014 7:40 PM
To: Filip Pizlo
Cc: Balaram Makam; LLVM Developers Mailing List
Subject: Re: [LLVMdev] LICM promoting memory to scalar

Ah right, I had missed the cbz, my bad. It is indeed sound under that
condition.
My point was that just hoisting/sinking unconditionally the memory accesses
is unsound. It is good news that gcc learnt how to do it carefully :)

On Tue, Sep 2, 2014 at 4:36 PM, Filip Pizlo <fpizlo at apple.com> wrote:
I think gcc is right. 

It inserted a branch for n == 0 (the cbz at the top), so that's not a
problem. 

In all other regards, this is safe: if you examine the sequence of loads and
stores, it eliminated all but the first load and all but the last store.
How's that unsafe?

If I had to guess, the bug here is that LLVM doesn't want to hoist the load
over the condition (which it is right to want to avoid) but it fails to
realize that the condition is true for the first iteration. 

-Filip

On Sep 2, 2014, at 4:23 PM, Robin Morisset <morisset at google.com> wrote:
The problem here is that doing this can introduce a race which is undefined
at the IR level.
In the example you gave above I suspect that this is a bug in GCC. I may
have missed something in the assembly, but it appears that gcc loads a copy
of globalvar in w3, does stuff with w3 and then stores w3 in globalvar. This
is unsound in the case where the loop is run with n = 0.
For an example, if you have a thread run foo(0,0) in a loop (so not doing
anything) and another thread doing:
for (int i = 0 ; i <1000000 ; ++i) {
    globalvar = i;
    assert(globalvar == i);
}
the code should never assert. But if you host the load/store out of the loop
(as GCC appears to do), then occasionaly there will be something like
thread 2: globalvar = i (= 42)
thread 1: w3 = globalvar (= 42)
thread 2: ++i
thread 2: globalvar = i (= 43)
thread 1: globalvar =w3 (= 42)
thread 2: assert(globalvar == i) 
And it will assert (42 == 43) and crash.

Shorter answer:
- speculatively executing stores is unsound because the value may have been
modified behind your back by another thread.
- speculatively executing loads might not be observable in some specific
case but may always introduce races, thus introducing undefined behaviour
and breaking assumptions that other passes may rely on.

Best regards,
Robin

On Tue, Sep 2, 2014 at 3:46 PM, Balaram Makam <bmakam at codeaurora.org>
wrote:
All,

If we can speculatively execute a load instruction, why isn’t it safe to
hoist it out by promoting it to a scalar in LICM pass?


There is a comment in LICM pass that if a load/store is conditional then it
is not safe because it would break the LLVM concurrency model (See commit
73bfa4a).
It has an IR test for checking this in
test/Transforms/LICM/scalar-promote-memmodel.ll

However, I have a sample code where GCC is able to promote the memory to
scalar and hoist/sink load/store out of loop but LLVM cannot.
Is GCC being aggressive here or LLVM missing out an opportunity?

Here is my sample code:

extern int globalvar;
void foo(int n , int incr) {
  unsigned int i;
  for (i = 0 ; i < n; i += incr ) {
    if (i < n/2)
      globalvar += incr;
  }
return;
}

GCC output:

$ aarch64-linux-gnu-g++ -S -o -  -O3  -ffast-math -march=armv8-a+simd
test.cpp
        .arch armv8-a+fp+simd
        .file   "test.cpp"
        .text
        .align  2
        .global _Z3fooii
        .type   _Z3fooii, %function
_Z3fooii:
.LFB0:
        .cfi_startproc
        cbz     w0, .L1
        adrp    x6, globalvar
        add     w5, w0, w0, lsr 31
        ldr     w3, [x6,#:lo12:globalvar]                        <== hoist
load of globalvar
        mov     w2, 0
        asr     w5, w5, 1
.L4:
        cmp     w5, w2
        add     w2, w2, w1
        add     w4, w3, w1
        csel    w3, w4, w3, hi
        cmp     w2, w0
        bcc     .L4
        str     w3, [x6,#:lo12:globalvar]                       <== sink
store of globalvar
.L1:
        ret
        .cfi_endproc
.LFE0:
        .size   _Z3fooii, .-_Z3fooii
        .ident  "GCC: (crosstool-NG linaro-1.13.1-4.8-2014.01 - Linaro GCC
2013.11) 4.9.0"

LLVM output:

$ clang-aarch64-x++ -S -o - -O3 -ffast-math -fslp-vectorize test.cpp
        .text
        .file   "test.cpp"
        .globl  _Z3fooii
        .align  2
        .type   _Z3fooii, at function
_Z3fooii:                               // @_Z3fooii
// BB#0:                                // %entry
        cbz     w0, .LBB0_5
// BB#1:                                // %for.body.lr.ph
        mov      w8, wzr
        cmp      w0, #0                 // =0
        cinc     w9, w0, lt
        asr     w9, w9, #1
        adrp    x10, globalvar
.LBB0_2:                                // %for.body
                                        // =>This Inner Loop Header: Depth=1
        cmp      w8, w9
        b.hs    .LBB0_4
// BB#3:                                // %if.then
                                        //   in Loop: Header=BB0_2 Depth=1
        ldr     w11, [x10, :lo12:globalvar]                     <===== load
inside loop
        add      w11, w11, w1
        str     w11, [x10, :lo12:globalvar]                      <==== store
inside loop
.LBB0_4:                                // %for.inc
                                        //   in Loop: Header=BB0_2 Depth=1
        add      w8, w8, w1
        cmp      w8, w0
        b.lo    .LBB0_2
.LBB0_5:                                // %for.end
        ret
.Ltmp1:
        .size   _Z3fooii, .Ltmp1-_Z3fooii


        .ident  "clang version 3.6.0 "

Thanks,
Balaram


_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 5374 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140903/319bb074/attachment.bin>

Robin Morisset

2014-Sep-03 20:19 UTC

head link

[LLVMdev] LICM promoting memory to scalar

I do not know the loop infrastructure in LLVM well, so I cannot really give
advice about how to proceed.
It would be fine in this case to do the optimization as long as the loop is
entered, but the real condition is "would there be at least one load/store
to this variable ?".
For example, if instead of "i < n/2" there was "i % 10 ==
5" in your code,
the optimization would have to be guarded by a check for n >= 6.

Robin


On Wed, Sep 3, 2014 at 9:59 AM, Balaram Makam <bmakam at codeaurora.org>
wrote:
> Thanks for the background on the concurrent memory model.
>
> So, is it sufficient that the loop entry is guarded by condition (cbz at
> top) for preventing the race?
> The loop entry will be guarded by condition if loop has been rotated by
> loop
> rotate pass.
>
> Since LICM runs after loop rotate, we can use
> ScalarEvolution::isLoopEntryGuardedByCond to check if we can speculatively
> execute load without causing a race.
> Is it heavy-handed solution for this problem? I can create a bug if you
> would like to.
>
> Thanks,
> Balaram
>
> From: Robin Morisset [mailto:morisset at google.com]
> Sent: Tuesday, September 02, 2014 7:40 PM
> To: Filip Pizlo
> Cc: Balaram Makam; LLVM Developers Mailing List
> Subject: Re: [LLVMdev] LICM promoting memory to scalar
>
> Ah right, I had missed the cbz, my bad. It is indeed sound under that
> condition.
> My point was that just hoisting/sinking unconditionally the memory accesses
> is unsound. It is good news that gcc learnt how to do it carefully :)
>
> On Tue, Sep 2, 2014 at 4:36 PM, Filip Pizlo <fpizlo at apple.com>
wrote:
> I think gcc is right.
>
> It inserted a branch for n == 0 (the cbz at the top), so that's not a
> problem.
>
> In all other regards, this is safe: if you examine the sequence of loads
> and
> stores, it eliminated all but the first load and all but the last store.
> How's that unsafe?
>
> If I had to guess, the bug here is that LLVM doesn't want to hoist the
load
> over the condition (which it is right to want to avoid) but it fails to
> realize that the condition is true for the first iteration.
>
> -Filip
>
> On Sep 2, 2014, at 4:23 PM, Robin Morisset <morisset at google.com>
wrote:
> The problem here is that doing this can introduce a race which is undefined
> at the IR level.
> In the example you gave above I suspect that this is a bug in GCC. I may
> have missed something in the assembly, but it appears that gcc loads a copy
> of globalvar in w3, does stuff with w3 and then stores w3 in globalvar.
> This
> is unsound in the case where the loop is run with n = 0.
> For an example, if you have a thread run foo(0,0) in a loop (so not doing
> anything) and another thread doing:
> for (int i = 0 ; i <1000000 ; ++i) {
>     globalvar = i;
>     assert(globalvar == i);
> }
> the code should never assert. But if you host the load/store out of the
> loop
> (as GCC appears to do), then occasionaly there will be something like
> thread 2: globalvar = i (= 42)
> thread 1: w3 = globalvar (= 42)
> thread 2: ++i
> thread 2: globalvar = i (= 43)
> thread 1: globalvar =w3 (= 42)
> thread 2: assert(globalvar == i)
> And it will assert (42 == 43) and crash.
>
> Shorter answer:
> - speculatively executing stores is unsound because the value may have been
> modified behind your back by another thread.
> - speculatively executing loads might not be observable in some specific
> case but may always introduce races, thus introducing undefined behaviour
> and breaking assumptions that other passes may rely on.
>
> Best regards,
> Robin
>
> On Tue, Sep 2, 2014 at 3:46 PM, Balaram Makam <bmakam at
codeaurora.org>
> wrote:
> All,
>
> If we can speculatively execute a load instruction, why isn’t it safe to
> hoist it out by promoting it to a scalar in LICM pass?
>
>
> There is a comment in LICM pass that if a load/store is conditional then it
> is not safe because it would break the LLVM concurrency model (See commit
> 73bfa4a).
> It has an IR test for checking this in
> test/Transforms/LICM/scalar-promote-memmodel.ll
>
> However, I have a sample code where GCC is able to promote the memory to
> scalar and hoist/sink load/store out of loop but LLVM cannot.
> Is GCC being aggressive here or LLVM missing out an opportunity?
>
> Here is my sample code:
>
> extern int globalvar;
> void foo(int n , int incr) {
>   unsigned int i;
>   for (i = 0 ; i < n; i += incr ) {
>     if (i < n/2)
>       globalvar += incr;
>   }
> return;
> }
>
> GCC output:
>
> $ aarch64-linux-gnu-g++ -S -o -  -O3  -ffast-math -march=armv8-a+simd
> test.cpp
>         .arch armv8-a+fp+simd
>         .file   "test.cpp"
>         .text
>         .align  2
>         .global _Z3fooii
>         .type   _Z3fooii, %function
> _Z3fooii:
> .LFB0:
>         .cfi_startproc
>         cbz     w0, .L1
>         adrp    x6, globalvar
>         add     w5, w0, w0, lsr 31
>         ldr     w3, [x6,#:lo12:globalvar]                        <==
hoist
> load of globalvar
>         mov     w2, 0
>         asr     w5, w5, 1
> .L4:
>         cmp     w5, w2
>         add     w2, w2, w1
>         add     w4, w3, w1
>         csel    w3, w4, w3, hi
>         cmp     w2, w0
>         bcc     .L4
>         str     w3, [x6,#:lo12:globalvar]                       <== sink
> store of globalvar
> .L1:
>         ret
>         .cfi_endproc
> .LFE0:
>         .size   _Z3fooii, .-_Z3fooii
>         .ident  "GCC: (crosstool-NG linaro-1.13.1-4.8-2014.01 - Linaro
GCC
> 2013.11) 4.9.0"
>
> LLVM output:
>
> $ clang-aarch64-x++ -S -o - -O3 -ffast-math -fslp-vectorize test.cpp
>         .text
>         .file   "test.cpp"
>         .globl  _Z3fooii
>         .align  2
>         .type   _Z3fooii, at function
> _Z3fooii:                               // @_Z3fooii
> // BB#0:                                // %entry
>         cbz     w0, .LBB0_5
> // BB#1:                                // %for.body.lr.ph
>         mov      w8, wzr
>         cmp      w0, #0                 // =0
>         cinc     w9, w0, lt
>         asr     w9, w9, #1
>         adrp    x10, globalvar
> .LBB0_2:                                // %for.body
>                                         // =>This Inner Loop Header:
> Depth=1
>         cmp      w8, w9
>         b.hs    .LBB0_4
> // BB#3:                                // %if.then
>                                         //   in Loop: Header=BB0_2 Depth=1
>         ldr     w11, [x10, :lo12:globalvar]                     <=====
load
> inside loop
>         add      w11, w11, w1
>         str     w11, [x10, :lo12:globalvar]                     
<===> store
> inside loop
> .LBB0_4:                                // %for.inc
>                                         //   in Loop: Header=BB0_2 Depth=1
>         add      w8, w8, w1
>         cmp      w8, w0
>         b.lo    .LBB0_2
> .LBB0_5:                                // %for.end
>         ret
> .Ltmp1:
>         .size   _Z3fooii, .Ltmp1-_Z3fooii
>
>
>         .ident  "clang version 3.6.0 "
>
> Thanks,
> Balaram
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140903/bf6c159e/attachment.html>

Hal Finkel

2014-Sep-28 08:42 UTC

head link

[LLVMdev] LICM promoting memory to scalar

----- Original Message -----> From: "Balaram Makam" <bmakam at codeaurora.org>
> To: "Robin Morisset" <morisset at google.com>, "Filip
Pizlo" <fpizlo at apple.com>
> Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Wednesday, September 3, 2014 11:59:26 AM
> Subject: Re: [LLVMdev] LICM promoting memory to scalar
> 
> Thanks for the background on the concurrent memory model.
> 
> So, is it sufficient that the loop entry is guarded by condition (cbz
> at top) for preventing the race?
> The loop entry will be guarded by condition if loop has been rotated
> by loop rotate pass.
> 
> Since LICM runs after loop rotate, we can use
> ScalarEvolution::isLoopEntryGuardedByCond to check if we can
> speculatively execute load without causing a race.
> Is it heavy-handed solution for this problem? I can create a bug if
> you would like to.
If you did not file a bug report, please do.

 -Hal
> 
> Thanks,
> Balaram
> 
> From: Robin Morisset [mailto:morisset at google.com]
> Sent: Tuesday, September 02, 2014 7:40 PM
> To: Filip Pizlo
> Cc: Balaram Makam; LLVM Developers Mailing List
> Subject: Re: [LLVMdev] LICM promoting memory to scalar
> 
> Ah right, I had missed the cbz, my bad. It is indeed sound under that
> condition.
> My point was that just hoisting/sinking unconditionally the memory
> accesses is unsound. It is good news that gcc learnt how to do it
> carefully :)
> 
> On Tue, Sep 2, 2014 at 4:36 PM, Filip Pizlo <fpizlo at apple.com>
wrote:
> I think gcc is right.
> 
> It inserted a branch for n == 0 (the cbz at the top), so that's not a
> problem.
> 
> In all other regards, this is safe: if you examine the sequence of
> loads and stores, it eliminated all but the first load and all but
> the last store. How's that unsafe?
> 
> If I had to guess, the bug here is that LLVM doesn't want to hoist
> the load over the condition (which it is right to want to avoid) but
> it fails to realize that the condition is true for the first
> iteration.
> 
> -Filip
> 
> On Sep 2, 2014, at 4:23 PM, Robin Morisset <morisset at google.com>
> wrote:
> The problem here is that doing this can introduce a race which is
> undefined at the IR level.
> In the example you gave above I suspect that this is a bug in GCC. I
> may have missed something in the assembly, but it appears that gcc
> loads a copy of globalvar in w3, does stuff with w3 and then stores
> w3 in globalvar. This is unsound in the case where the loop is run
> with n = 0.
> For an example, if you have a thread run foo(0,0) in a loop (so not
> doing anything) and another thread doing:
> for (int i = 0 ; i <1000000 ; ++i) {
>     globalvar = i;
>     assert(globalvar == i);
> }
> the code should never assert. But if you host the load/store out of
> the loop (as GCC appears to do), then occasionaly there will be
> something like
> thread 2: globalvar = i (= 42)
> thread 1: w3 = globalvar (= 42)
> thread 2: ++i
> thread 2: globalvar = i (= 43)
> thread 1: globalvar =w3 (= 42)
> thread 2: assert(globalvar == i)
> And it will assert (42 == 43) and crash.
> 
> Shorter answer:
> - speculatively executing stores is unsound because the value may
> have been modified behind your back by another thread.
> - speculatively executing loads might not be observable in some
> specific case but may always introduce races, thus introducing
> undefined behaviour and breaking assumptions that other passes may
> rely on.
> 
> Best regards,
> Robin
> 
> On Tue, Sep 2, 2014 at 3:46 PM, Balaram Makam <bmakam at
codeaurora.org>
> wrote:
> All,
> 
> If we can speculatively execute a load instruction, why isn’t it safe
> to hoist it out by promoting it to a scalar in LICM pass?
> 
> 
> There is a comment in LICM pass that if a load/store is conditional
> then it is not safe because it would break the LLVM concurrency
> model (See commit 73bfa4a).
> It has an IR test for checking this in
> test/Transforms/LICM/scalar-promote-memmodel.ll
> 
> However, I have a sample code where GCC is able to promote the memory
> to scalar and hoist/sink load/store out of loop but LLVM cannot.
> Is GCC being aggressive here or LLVM missing out an opportunity?
> 
> Here is my sample code:
> 
> extern int globalvar;
> void foo(int n , int incr) {
>   unsigned int i;
>   for (i = 0 ; i < n; i += incr ) {
>     if (i < n/2)
>       globalvar += incr;
>   }
> return;
> }
> 
> GCC output:
> 
> $ aarch64-linux-gnu-g++ -S -o -  -O3  -ffast-math -march=armv8-a+simd
> test.cpp
>         .arch armv8-a+fp+simd
>         .file   "test.cpp"
>         .text
>         .align  2
>         .global _Z3fooii
>         .type   _Z3fooii, %function
> _Z3fooii:
> .LFB0:
>         .cfi_startproc
>         cbz     w0, .L1
>         adrp    x6, globalvar
>         add     w5, w0, w0, lsr 31
>         ldr     w3, [x6,#:lo12:globalvar]                        <=> 
hoist load of globalvar
>         mov     w2, 0
>         asr     w5, w5, 1
> .L4:
>         cmp     w5, w2
>         add     w2, w2, w1
>         add     w4, w3, w1
>         csel    w3, w4, w3, hi
>         cmp     w2, w0
>         bcc     .L4
>         str     w3, [x6,#:lo12:globalvar]                       <=>  
sink store of globalvar
> .L1:
>         ret
>         .cfi_endproc
> .LFE0:
>         .size   _Z3fooii, .-_Z3fooii
>         .ident  "GCC: (crosstool-NG linaro-1.13.1-4.8-2014.01 -
>         Linaro GCC 2013.11) 4.9.0"
> 
> LLVM output:
> 
> $ clang-aarch64-x++ -S -o - -O3 -ffast-math -fslp-vectorize test.cpp
>         .text
>         .file   "test.cpp"
>         .globl  _Z3fooii
>         .align  2
>         .type   _Z3fooii, at function
> _Z3fooii:                               // @_Z3fooii
> // BB#0:                                // %entry
>         cbz     w0, .LBB0_5
> // BB#1:                                // %for.body.lr.ph
>         mov      w8, wzr
>         cmp      w0, #0                 // =0
>         cinc     w9, w0, lt
>         asr     w9, w9, #1
>         adrp    x10, globalvar
> .LBB0_2:                                // %for.body
>                                         // =>This Inner Loop Header:
>                                         Depth=1
>         cmp      w8, w9
>         b.hs    .LBB0_4
> // BB#3:                                // %if.then
>                                         //   in Loop: Header=BB0_2
>                                         Depth=1
>         ldr     w11, [x10, :lo12:globalvar]
>                             <===== load inside loop
>         add      w11, w11, w1
>         str     w11, [x10, :lo12:globalvar]
>                              <==== store inside loop
> .LBB0_4:                                // %for.inc
>                                         //   in Loop: Header=BB0_2
>                                         Depth=1
>         add      w8, w8, w1
>         cmp      w8, w0
>         b.lo    .LBB0_2
> .LBB0_5:                                // %for.end
>         ret
> .Ltmp1:
>         .size   _Z3fooii, .Ltmp1-_Z3fooii
> 
> 
>         .ident  "clang version 3.6.0 "
> 
> Thanks,
> Balaram
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Balaram Makam

2014-Oct-09 19:29 UTC

head link

[LLVMdev] LICM promoting memory to scalar

-----Original Message-----> From: Hal Finkel [mailto:hfinkel at anl.gov] 
> Sent: Sunday, September 28, 2014 4:43 AM
> To: Balaram Makam
> Cc: LLVM Developers Mailing List; Robin Morisset; Filip Pizlo
> Subject: Re: [LLVMdev] LICM promoting memory to scalar
----- Original Message -----> From: "Balaram Makam" <bmakam at codeaurora.org>
> To: "Robin Morisset" <morisset at google.com>, "Filip
Pizlo"
> <fpizlo at apple.com>
> Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Wednesday, September 3, 2014 11:59:26 AM
> Subject: Re: [LLVMdev] LICM promoting memory to scalar
> 
> Thanks for the background on the concurrent memory model.
> 
> So, is it sufficient that the loop entry is guarded by condition (cbz 
> at top) for preventing the race?
> The loop entry will be guarded by condition if loop has been rotated 
> by loop rotate pass.
> 
> Since LICM runs after loop rotate, we can use 
> ScalarEvolution::isLoopEntryGuardedByCond to check if we can 
> speculatively execute load without causing a race.
> Is it heavy-handed solution for this problem? I can create a bug if 
> you would like to.
> If you did not file a bug report, please do.
 > -Hal

Sorry for the late response. 
Bug reported here:
http://llvm.org/bugs/show_bug.cgi?id=21229

Thanks,
Balaram

llvm dev - Sep 2014 - [LLVMdev] LICM promoting memory to scalar

[LLVMdev] LICM promoting memory to scalar

[LLVMdev] LICM promoting memory to scalar

[LLVMdev] LICM promoting memory to scalar

[LLVMdev] LICM promoting memory to scalar