thr3ads.net - llvm dev - [llvm-dev] Comparing Clang and GCC: only clang stores updated value in each iteration. [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Jonas Paulsson via llvm-dev

2018-Sep-21 07:15 UTC

[llvm-dev] Comparing Clang and GCC: only clang stores updated value in each iteration.

Hi Philip and Eli,

> I think your example may be a bit over reduced.  Unless I'm misreading 
> this, a starts at 1, is incremented one each iteration, and then is 
> tested against zero.  The only way this loop can exit is if a has 
> wrapped around and C++ states that signed integers are assumed to not 
> overflow.  We can/should be replacing the whole loop with an unreachable.
>
> Do we still fail to optimize if either a) you use an unsigned which 
> has defined overflow or b) you use a non-zero exit test? That is, 
> change the example to something like:
> int a = 1;
> void b() {
>   do
>     if (a)
>       a++;
>   while (a != 500);
> }
Yes, both if I change 'a' to unsigned, or replace the exit test with 
500, clang stores in each iteration while gcc does not.

 > (Eli) Your testcase is a bit weird because the condition of the while 
loop is the same as the condition of the if statement.  Is that really 
what the original loop looks like?

No, not really, the reduced one just shows the difference between gcc 
and clang. There were some variations to this, but I chose this since it 
gave a very small output. Sorry if it was confusing.
>
> If so, then yes, this is probably a case where the aggressive LoopPRE 
> mentioned in the other thread that Eli linked to would be useful.  
> Once we'd done the PRE, then everything else should collapse.Thanks for the link, it's good to know this issue is recognized. If I 
understand it correctly, the reason clang is storing in each iteration 
is due to concurrency. As a newbie I wonder how this works in practice 
since even if the value is stored in each iteration two threads could 
still do this simultaneously if not some sort of atomic operation is 
doing it, right? What happens here is that the value of 'a' is loaded 
once before the loop, then incremented and stored in each iteration. How 
does that help with multiple threads compared to storing it after the loop?

Is there an option to change this behavior in gcc or clang? It seems 
that gcc is assuming a single thread, while clang is not. It would be 
nice to have the same setting here when comparing them. Or am I missing 
something?

Thanks

Jonas
>
>>
>> bin/clang -O3 -march=z13 -mllvm -unroll-count=1
>>
>>         .text
>>         .file   "testfun.i"
>>         .globl  b                       # -- Begin function b
>>         .p2align        4
>>         .type   b, at function
>> b:                                      # @b
>> # %bb.0:                                # %entry
>>         lrl     %r0, a
>> .LBB0_1:                                # %do.body
>>                                         # =>This Inner Loop Header: 
>> Depth=1
>>         cije    %r0, 0, .LBB0_3
>> # %bb.2:                                # %if.then
>>                                         #   in Loop: Header=BB0_1 
>> Depth=1
>>         ahi     %r0, 1
>>         strl    %r0, a
>> .LBB0_3:                                # %do.cond
>>                                         #   in Loop: Header=BB0_1 
>> Depth=1
>>         cijlh   %r0, 0, .LBB0_1
>> # %bb.4:                                # %do.end
>>         br      %r14
>> .Lfunc_end0:
>>         .size   b, .Lfunc_end0-b
>>                                         # -- End function
>>         .type   a, at object               # @a
>>         .data
>>         .globl  a
>>         .p2align        2
>> a:
>>         .long   1                       # 0x1
>>         .size   a, 4
>>
>>
>> gcc -O3 -march=z13:
>>
>>         .file   "testfun.i"
>>         .machinemode zarch
>>         .machine "z13"
>> .text
>>         .align  8
>> .globl b
>>         .type   b, @function
>> b:
>> .LFB0:
>>         .cfi_startproc
>>         larl    %r1,a
>>         lt      %r1,0(%r1)
>>         je      .L1
>>         larl    %r1,a
>>         mvhi    0(%r1),0
>> .L1:
>>         br      %r14
>>         .cfi_endproc
>> .LFE0:
>>         .size   b, .-b
>> .globl a
>> .data
>>         .align  4
>>         .type   a, @object
>>         .size   a, 4
>> a:
>>         .long   1
>>         .ident  "GCC: (GNU) 8.0.1 20180324 (Red Hat
8.0.1-0.20)"
>>         .section        .note.GNU-stack,"", at progbits
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Friedman, Eli via llvm-dev

2018-Sep-21 18:23 UTC

head link

[llvm-dev] Comparing Clang and GCC: only clang stores updated value in each iteration.

On 9/21/2018 12:15 AM, Jonas Paulsson wrote:>>
>> If so, then yes, this is probably a case where the aggressive LoopPRE 
>> mentioned in the other thread that Eli linked to would be useful.  
>> Once we'd done the PRE, then everything else should collapse.
> Thanks for the link, it's good to know this issue is recognized. If I 
> understand it correctly, the reason clang is storing in each iteration 
> is due to concurrency.
Yes, basically... IIRC LLVM did the wrong thing a long time ago, but we 
fixed it as part of implementing the C++11 atomics model.
> As a newbie I wonder how this works in practice since even if the 
> value is stored in each iteration two threads could still do this 
> simultaneously if not some sort of atomic operation is doing it, 
> right? What happens here is that the value of 'a' is loaded once 
> before the loop, then incremented and stored in each iteration. How 
> does that help with multiple threads compared to storing it after the 
> loop?
The interesting case is the case where the store is dynamically dead 
(not executed in any iteration of the loop); we have to make sure we 
don't introduce a race in that case.  As you note, if the store is 
executed in any iteration, and there isn't any synchronization inside 
the loop, we can ignore the possibility of a race.
> Is there an option to change this behavior in gcc or clang? It seems 
> that gcc is assuming a single thread, while clang is not. It would be 
> nice to have the same setting here when comparing them. Or am I 
> missing something?
There is no option to control it; theoretically, we could add one, I 
guess, but it's a minor optimization in most cases, and most non-trivial 
programs are concurrent anyway.

For your loop, the condition of the if statement is a comparison of a 
constant and an induction variable, so it's possible to prove the store 
is always executed.  I assume gcc is proving that (either directly, or 
by performing some other transform which makes the condition trivial).

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

JF Bastien via llvm-dev

2018-Sep-21 18:27 UTC

head link

[llvm-dev] Comparing Clang and GCC: only clang stores updated value in each iteration.

> On Sep 21, 2018, at 11:23 AM, Friedman, Eli via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On 9/21/2018 12:15 AM, Jonas Paulsson wrote:
>>> 
>>> If so, then yes, this is probably a case where the aggressive
LoopPRE mentioned in the other thread that Eli linked to would be useful.  Once
we'd done the PRE, then everything else should collapse.
>> Thanks for the link, it's good to know this issue is recognized. If
I understand it correctly, the reason clang is storing in each iteration is due
to concurrency.
> 
> Yes, basically... IIRC LLVM did the wrong thing a long time ago, but we
fixed it as part of implementing the C++11 atomics model.
> 
>> As a newbie I wonder how this works in practice since even if the value
is stored in each iteration two threads could still do this simultaneously if
not some sort of atomic operation is doing it, right? What happens here is that
the value of 'a' is loaded once before the loop, then incremented and
stored in each iteration. How does that help with multiple threads compared to
storing it after the loop?
> 
> The interesting case is the case where the store is dynamically dead (not
executed in any iteration of the loop); we have to make sure we don't
introduce a race in that case.  As you note, if the store is executed in any
iteration, and there isn't any synchronization inside the loop, we can
ignore the possibility of a race.
See:
https://www.di.ens.fr/~zappa/readings/pldi13.pdf
<https://www.di.ens.fr/~zappa/readings/pldi13.pdf>
https://www.di.ens.fr/~zappa/readings/c11comp.pdf
<https://www.di.ens.fr/~zappa/readings/c11comp.pdf>

>> Is there an option to change this behavior in gcc or clang? It seems
that gcc is assuming a single thread, while clang is not. It would be nice to
have the same setting here when comparing them. Or am I missing something?
> 
> There is no option to control it; theoretically, we could add one, I guess,
but it's a minor optimization in most cases, and most non-trivial programs
are concurrent anyway.
> 
> For your loop, the condition of the if statement is a comparison of a
constant and an induction variable, so it's possible to prove the store is
always executed.  I assume gcc is proving that (either directly, or by
performing some other transform which makes the condition trivial).
> 
> -Eli
> 
> -- 
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180921/98a19b73/attachment.html>

llvm dev - Sep 2018 - Comparing Clang and GCC: only clang stores updated value in each iteration.

[llvm-dev] Comparing Clang and GCC: only clang stores updated value in each iteration.

[llvm-dev] Comparing Clang and GCC: only clang stores updated value in each iteration.

[llvm-dev] Comparing Clang and GCC: only clang stores updated value in each iteration.