thr3ads.net - llvm dev - [LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer) [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Kostya Serebryany

2013-Nov-19 17:58 UTC

[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)

On Tue, Nov 19, 2013 at 9:56 PM, Kuperstein, Michael M <
michael.m.kuperstein at intel.com> wrote:
>  What I’m trying to say is that according to my understanding of the
> C++11 memory model, even in that small reproducer, the store to g and the
> load from g are in fact a data race.
>
> (This is regardless of the fact the load is protected by a branch that is
> not taken.)
>
My understanding of the standard is quite the opposite.

>
>
> *From:* Kostya Serebryany [mailto:kcc at google.com]
> *Sent:* Tuesday, November 19, 2013 19:46
> *To:* Kuperstein, Michael M
> *Cc:* Evgeniy Stepanov; LLVM Developers Mailing List
>
> *Subject:* Re: [LLVMdev] Curiosity about transform changes under
> Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)
>
>
>
>
>
>
>
> On Tue, Nov 19, 2013 at 9:05 PM, Kuperstein, Michael M <
> michael.m.kuperstein at intel.com> wrote:
>
> My $0.02 - I'm not sure the transformation introduces a data race.
>
> To the best of my understanding, the point of the C++11/C11 memory model
> is to allow a wide array of compiler transformations - including
> speculative loads - for non-atomic variables.
> I believe what's most likely happening (without looking at the Mozilla
> source) is that the original program contains a C++ data race, and the
> transformation exposes it to TSan.
>
>
>
> The original program is race-free.
>
> I've posted a minimized reproducer that actually triggers a tsan false
> positive at O1 here:
>
> https://code.google.com/p/thread-sanitizer/issues/detail?id=40#c5
>
>
>
>
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu] On
> Behalf Of Evgeniy Stepanov
> Sent: Tuesday, November 19, 2013 18:55
> To: Kostya Serebryany
> Cc: LLVM Developers Mailing List
> Subject: Re: [LLVMdev] Curiosity about transform changes under Sanitizers
> (Was: [PATCH] Disable branch folding with MemorySanitizer)
>
> The root cause of those issues is the fact that sanitizers verify
> C++-level semantics with LLVM IR level instrumentation. For example,
> speculative loads are OK in IR if it can be proved that the load won't
> trap, but in C++ it would be a data race.
>
>
> On Tue, Nov 19, 2013 at 8:38 PM, Kostya Serebryany <kcc at
google.com> wrote:
> >
> >
> >
> > On Tue, Nov 19, 2013 at 8:25 PM, David Blaikie <dblaikie at
gmail.com>
> wrote:
> >>
> >> Just moving this branch of the thread out of the review because I
> >> don't want to derail the review  thread...
> >>
> >> Kostya - why are these two cases not optimization bugs in general?
> >> (why do they only affect sanitizers?)
> >
> >
> > The recent case from mozilla
> > (https://code.google.com/p/thread-sanitizer/issues/detail?id=40#c2) is
> > a legal optimization -- it hoists a safe load (i.e. a load which is
> > known not to
> > fail) out of conditional branch.
> > It reduces the number of basic blocks and branches, and so I think
> > it's good in general.
> > I can't imagine a case where this optimization will break a valid
> program.
> > Which is the second one you are referring to?
> >
> > --kcc
> >
> >>
> >>
> >>
> >>
> >> On Mon, Nov 18, 2013 at 8:37 PM, Kostya Serebryany <kcc at
google.com>
> wrote:
> >>>
> >>> And we've been just informed by the mozilla folks about
yet another
> >>> case of optimization being hostile to sanitizers:
> >>> hoisting a safe load out of conditional branch introduces a
race
> >>> which tsan happily reports.
> >>>
https://code.google.com/p/thread-sanitizer/issues/detail?id=40#c2
> >>>
> >>> --kcc
> >>>
> >>>
> >>> On Tue, Nov 19, 2013 at 8:06 AM, Kostya Serebryany <kcc at
google.com>
> >>> wrote:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Nov 19, 2013 at 1:27 AM, David Blaikie
<dblaikie at gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Do we have precedence for this kind of change (where
sanitizers
> >>>>> affect optimizations in arbitrary internal ways - not
simply by
> >>>>> enabling/disabling certain passes)?
> >>>>
> >>>>
> >>>> Yes. AddressSanitizer and ThreadSanitizer disable load
widening
> >>>> that would create a partially out-of-bounds or a racy
access.
> >>>> See lib/Analysis/MemoryDependenceAnalysis.cpp (search for
> >>>> Attribute::SanitizeAddress and Attribute::SanitizeThread).
> >>>> This case with MemorySanitizer is slightly different
because we are
> >>>> not fighting a false positive, but rather a
debug-info-damaging
> optimization.
> >>>>
> >>>> --kcc
> >>>>
> >>>>>
> >>>>> If not, does this need some deeper discussion about
alternatives
> >>>>> (is it important that we be able to produce equivalent
code
> >>>>> without the sanitizers enabled?)?
> >>>>>
> >>>>>
> >>>>> On Mon, Nov 18, 2013 at 7:02 AM, Evgeniy Stepanov
> >>>>> <eugenis at google.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Branch folding optimization often leads to
confusing MSan reports
> >>>>>> due to lost debug info.
> >>>>>> For example,
> >>>>>> 1: if (x < 0)
> >>>>>> 2:   if (y < 0)
> >>>>>> 3:    do_something();
> >>>>>> is transformed into something like
> >>>>>>   %0 = and i32 %y, %x
> >>>>>>   %1 = icmp slt i32 %0, 0
> >>>>>>   br i1 %1, label %if.then2, label %if.end3 where
all 3
> >>>>>> instructions are associated with line 1.
> >>>>>>
> >>>>>> This patch disables folding of conditional
branches in functions
> >>>>>> with sanitize_memory attribute.
> >>>>>>
> >>>>>> http://llvm-reviews.chandlerc.com/D2214
> >>>>>>
> >>>>>> Files:
> >>>>>>   lib/Transforms/Utils/SimplifyCFG.cpp
> >>>>>>   test/Transforms/SimplifyCFG/branch-fold-msan.ll
> >>>>>>
> >>>>>> Index: lib/Transforms/Utils/SimplifyCFG.cpp
> >>>>>>
================================================================>
>>>>>> => >>>>>> ---
lib/Transforms/Utils/SimplifyCFG.cpp
> >>>>>> +++ lib/Transforms/Utils/SimplifyCFG.cpp
> >>>>>> @@ -1967,6 +1967,13 @@
> >>>>>>  bool llvm::FoldBranchToCommonDest(BranchInst *BI)
{
> >>>>>>    BasicBlock *BB = BI->getParent();
> >>>>>>
> >>>>>> +  // This optimization results in confusing
MemorySanitizer
> reports.
> >>>>>> Use of
> >>>>>> +  // uninitialized value in this branch
instruction is reported
> >>>>>> + with
> >>>>>> the
> >>>>>> +  // predecessor's debug location.
> >>>>>> +  if
(BB->getParent()->hasFnAttribute(Attribute::SanitizeMemory) &&
> >>>>>> +      BI->isConditional())
> >>>>>> +    return false;
> >>>>>> +
> >>>>>>    Instruction *Cond = 0;
> >>>>>>    if (BI->isConditional())
> >>>>>>      Cond =
dyn_cast<Instruction>(BI->getCondition());
> >>>>>> Index:
test/Transforms/SimplifyCFG/branch-fold-msan.ll
> >>>>>>
================================================================>
>>>>>> => >>>>>> ---
test/Transforms/SimplifyCFG/branch-fold-msan.ll
> >>>>>> +++
test/Transforms/SimplifyCFG/branch-fold-msan.ll
> >>>>>> @@ -0,0 +1,29 @@
> >>>>>> +; RUN: opt < %s -simplifycfg -S | FileCheck %s
> >>>>>> +
> >>>>>> +declare void @callee()
> >>>>>> +
> >>>>>> +; Test that conditional branches are not folded
with
> sanitize_memory.
>
> >>>>>> +define void @caller(i32 %x, i32 %y)
sanitize_memory { ; CHECK:
> >>>>>> +define void @caller(i32 [[X:%.*]], i32 [[Y:%.*]])
; CHECK: icmp
> >>>>>> +slt i32 {{.*}}[[X]] ; CHECK: icmp slt i32
{{.*}}[[Y]] ; CHECK:
> >>>>>> +ret void
>
> >>>>>> +
> >>>>>> +entry:
> >>>>>> +  %cmp = icmp slt i32 %x, 0
> >>>>>> +  br i1 %cmp, label %if.then, label %if.end3
> >>>>>> +
> >>>>>> +if.then:                                         
; preds = %entry
> >>>>>> +  %cmp1 = icmp slt i32 %y, 0
> >>>>>> +  br i1 %cmp1, label %if.then2, label %if.end
> >>>>>> +
> >>>>>> +if.then2:                                        
; preds > %if.then
> >>>>>> +  call void @callee()
> >>>>>> +  br label %if.end
> >>>>>> +
> >>>>>> +if.end:                                          
; preds > >>>>>> %if.then2, %if.then
> >>>>>> +  br label %if.end3
> >>>>>> +
> >>>>>> +if.end3:                                         
; preds > %if.end,
> >>>>>> %entry
> >>>>>> +  ret void
> >>>>>> +}
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> llvm-commits mailing list
> >>>>>> llvm-commits at cs.uiuc.edu
> >>>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> llvm-commits mailing list
> >>>>> llvm-commits at cs.uiuc.edu
> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>>>
> >>>>
> >>>
> >>
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131119/cc6e567d/attachment.html>

David Chisnall

2013-Nov-19 18:20 UTC

head link

[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)

On 19 Nov 2013, at 17:58, Kostya Serebryany <kcc at google.com> wrote:
> On Tue, Nov 19, 2013 at 9:56 PM, Kuperstein, Michael M
<michael.m.kuperstein at intel.com> wrote:
>> What I’m trying to say is that according to my understanding of the
C++11 memory model, even in that small reproducer, the store to g and the load
from g are in fact a data race.
>> 
>> (This is regardless of the fact the load is protected by a branch that
is not taken.)
> 
> My understanding of the standard is quite the opposite. 
I agree.  It is a potential data race, if the branch is taken then it definitely
is a race because the standard explicitly does not define an ordering on loads
and stores of non-atomic variables unless some happens-before relationship is
established by some other operation (which does not occur in this example).  In
the case where the branch is not taken, then there is no data race because in
the C++ abstract machine there is no load, whether there is one in the
underlying concrete machine or not.

I believe that this transform is valid, because there are only two possible
cases here:

- Some other code has established a happens-before relationship between the load
and the stores, giving a well-defined ordering, however this code is not visible
in the function foo() and so the load may happen anywhere.

- No other code establishes a happens-before relationship, and therefore the
order of the load and the store is completely undefined and as long as the load
doesn't trap it is completely safe to hoist it.

HOWEVER, I would dispute describing this transform as an optimisation in this
case.  If the two threads are running on different cores, then we have likely
introduced some false sharing and have forced the two cores to exchange some bus
traffic for cache coherency, even though is is completely valid in the C++
abstract machine model for the load of g to return a stale cache value.  This is
only a real problem on strongly ordered  architectures (e.g. x86), but given the
relative cost of a cache shootdown and everything else in this test case (with
the exception of the thread creation), I wouldn't be surprised if it ended
up slowing things down.  Especially given that a modern superscalar CPU will
speculatively execute the load ANYWAY if it can do so from cache, and if it
can't then the performance improvement from doing it before the branch will
likely be negligible.

For single-core, in-order, single-issue architectures, or multicore, weakly
ordered, in-order, single-issue architectures, it's probably a win...

David

Kuperstein, Michael M

2013-Nov-19 18:21 UTC

head link

[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)

Quoting the C11 standard, section 5.1.2.4:

(4) “Two expression evaluations conﬂict if one of them modiﬁes a memory location
and the other one reads or modiﬁes the same memory location”

(25) “The execution of a program contains a data race if it contains two
conﬂicting actions in different threads, at least one of which is not atomic,
and neither happens before the other. Any such data race results in undeﬁned
behavior”

(18) “An evaluation A happens before an evaluation B if A is sequenced before B
or A inter-thread happens before B”

The load and the store are in conflict (one is a load and the other is a store
to the same location) and neither is atomic. So the question is whether they
happen-before each other.
Since they are not in the same thread, they are not sequenced before one
another, so the only option is inter-thread happens before.

I can’t see an inter-thread happens-before relation here.

From: Kostya Serebryany [mailto:kcc at google.com]
Sent: Tuesday, November 19, 2013 19:59
To: Kuperstein, Michael M
Cc: Evgeniy Stepanov; LLVM Developers Mailing List
Subject: Re: [LLVMdev] Curiosity about transform changes under Sanitizers (Was:
[PATCH] Disable branch folding with MemorySanitizer)



On Tue, Nov 19, 2013 at 9:56 PM, Kuperstein, Michael M <michael.m.kuperstein
at intel.com<mailto:michael.m.kuperstein at intel.com>> wrote:
What I’m trying to say is that according to my understanding of the C++11 memory
model, even in that small reproducer, the store to g and the load from g are in
fact a data race.
(This is regardless of the fact the load is protected by a branch that is not
taken.)

My understanding of the standard is quite the opposite.


From: Kostya Serebryany [mailto:kcc at google.com<mailto:kcc at
google.com>]
Sent: Tuesday, November 19, 2013 19:46
To: Kuperstein, Michael M
Cc: Evgeniy Stepanov; LLVM Developers Mailing List

Subject: Re: [LLVMdev] Curiosity about transform changes under Sanitizers (Was:
[PATCH] Disable branch folding with MemorySanitizer)



On Tue, Nov 19, 2013 at 9:05 PM, Kuperstein, Michael M <michael.m.kuperstein
at intel.com<mailto:michael.m.kuperstein at intel.com>> wrote:
My $0.02 - I'm not sure the transformation introduces a data race.

To the best of my understanding, the point of the C++11/C11 memory model is to
allow a wide array of compiler transformations - including speculative loads -
for non-atomic variables.
I believe what's most likely happening (without looking at the Mozilla
source) is that the original program contains a C++ data race, and the
transformation exposes it to TSan.

The original program is race-free.
I've posted a minimized reproducer that actually triggers a tsan false
positive at O1 here:
https://code.google.com/p/thread-sanitizer/issues/detail?id=40#c5


-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu<mailto:llvmdev-bounces at
cs.uiuc.edu> [mailto:llvmdev-bounces at cs.uiuc.edu<mailto:llvmdev-bounces
at cs.uiuc.edu>] On Behalf Of Evgeniy Stepanov
Sent: Tuesday, November 19, 2013 18:55
To: Kostya Serebryany
Cc: LLVM Developers Mailing List
Subject: Re: [LLVMdev] Curiosity about transform changes under Sanitizers (Was:
[PATCH] Disable branch folding with MemorySanitizer)

The root cause of those issues is the fact that sanitizers verify
C++-level semantics with LLVM IR level instrumentation. For example,
speculative loads are OK in IR if it can be proved that the load won't trap,
but in C++ it would be a data race.


On Tue, Nov 19, 2013 at 8:38 PM, Kostya Serebryany <kcc at
google.com<mailto:kcc at google.com>> wrote:>
>
>
> On Tue, Nov 19, 2013 at 8:25 PM, David Blaikie <dblaikie at
gmail.com<mailto:dblaikie at gmail.com>> wrote:
>>
>> Just moving this branch of the thread out of the review because I
>> don't want to derail the review  thread...
>>
>> Kostya - why are these two cases not optimization bugs in general?
>> (why do they only affect sanitizers?)
>
>
> The recent case from mozilla
> (https://code.google.com/p/thread-sanitizer/issues/detail?id=40#c2) is
> a legal optimization -- it hoists a safe load (i.e. a load which is
> known not to
> fail) out of conditional branch.
> It reduces the number of basic blocks and branches, and so I think
> it's good in general.
> I can't imagine a case where this optimization will break a valid
program.
> Which is the second one you are referring to?
>
> --kcc
>
>>
>>
>>
>>
>> On Mon, Nov 18, 2013 at 8:37 PM, Kostya Serebryany <kcc at
google.com<mailto:kcc at google.com>> wrote:
>>>
>>> And we've been just informed by the mozilla folks about yet
another
>>> case of optimization being hostile to sanitizers:
>>> hoisting a safe load out of conditional branch introduces a race
>>> which tsan happily reports.
>>> https://code.google.com/p/thread-sanitizer/issues/detail?id=40#c2
>>>
>>> --kcc
>>>
>>>
>>> On Tue, Nov 19, 2013 at 8:06 AM, Kostya Serebryany <kcc at
google.com<mailto:kcc at google.com>>
>>> wrote:
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Nov 19, 2013 at 1:27 AM, David Blaikie <dblaikie at
gmail.com<mailto:dblaikie at gmail.com>>
>>>> wrote:
>>>>>
>>>>> Do we have precedence for this kind of change (where
sanitizers
>>>>> affect optimizations in arbitrary internal ways - not
simply by
>>>>> enabling/disabling certain passes)?
>>>>
>>>>
>>>> Yes. AddressSanitizer and ThreadSanitizer disable load widening
>>>> that would create a partially out-of-bounds or a racy access.
>>>> See lib/Analysis/MemoryDependenceAnalysis.cpp (search for
>>>> Attribute::SanitizeAddress and Attribute::SanitizeThread).
>>>> This case with MemorySanitizer is slightly different because we
are
>>>> not fighting a false positive, but rather a debug-info-damaging
optimization.
>>>>
>>>> --kcc
>>>>
>>>>>
>>>>> If not, does this need some deeper discussion about
alternatives
>>>>> (is it important that we be able to produce equivalent code
>>>>> without the sanitizers enabled?)?
>>>>>
>>>>>
>>>>> On Mon, Nov 18, 2013 at 7:02 AM, Evgeniy Stepanov
>>>>> <eugenis at google.com<mailto:eugenis at
google.com>>
>>>>> wrote:
>>>>>>
>>>>>> Branch folding optimization often leads to confusing
MSan reports
>>>>>> due to lost debug info.
>>>>>> For example,
>>>>>> 1: if (x < 0)
>>>>>> 2:   if (y < 0)
>>>>>> 3:    do_something();
>>>>>> is transformed into something like
>>>>>>   %0 = and i32 %y, %x
>>>>>>   %1 = icmp slt i32 %0, 0
>>>>>>   br i1 %1, label %if.then2, label %if.end3 where all 3
>>>>>> instructions are associated with line 1.
>>>>>>
>>>>>> This patch disables folding of conditional branches in
functions
>>>>>> with sanitize_memory attribute.
>>>>>>
>>>>>> http://llvm-reviews.chandlerc.com/D2214
>>>>>>
>>>>>> Files:
>>>>>>   lib/Transforms/Utils/SimplifyCFG.cpp
>>>>>>   test/Transforms/SimplifyCFG/branch-fold-msan.ll
>>>>>>
>>>>>> Index: lib/Transforms/Utils/SimplifyCFG.cpp
>>>>>>
================================================================>>>>>>
=>>>>>> --- lib/Transforms/Utils/SimplifyCFG.cpp
>>>>>> +++ lib/Transforms/Utils/SimplifyCFG.cpp
>>>>>> @@ -1967,6 +1967,13 @@
>>>>>>  bool llvm::FoldBranchToCommonDest(BranchInst *BI) {
>>>>>>    BasicBlock *BB = BI->getParent();
>>>>>>
>>>>>> +  // This optimization results in confusing
MemorySanitizer reports.
>>>>>> Use of
>>>>>> +  // uninitialized value in this branch instruction is
reported
>>>>>> + with
>>>>>> the
>>>>>> +  // predecessor's debug location.
>>>>>> +  if
(BB->getParent()->hasFnAttribute(Attribute::SanitizeMemory) &&
>>>>>> +      BI->isConditional())
>>>>>> +    return false;
>>>>>> +
>>>>>>    Instruction *Cond = 0;
>>>>>>    if (BI->isConditional())
>>>>>>      Cond =
dyn_cast<Instruction>(BI->getCondition());
>>>>>> Index: test/Transforms/SimplifyCFG/branch-fold-msan.ll
>>>>>>
================================================================>>>>>>
=>>>>>> --- test/Transforms/SimplifyCFG/branch-fold-msan.ll
>>>>>> +++ test/Transforms/SimplifyCFG/branch-fold-msan.ll
>>>>>> @@ -0,0 +1,29 @@
>>>>>> +; RUN: opt < %s -simplifycfg -S | FileCheck %s
>>>>>> +
>>>>>> +declare void @callee()
>>>>>> +
>>>>>> +; Test that conditional branches are not folded with
sanitize_memory.
>>>>>> +define void @caller(i32 %x, i32 %y) sanitize_memory {
; CHECK:
>>>>>> +define void @caller(i32 [[X:%.*]], i32 [[Y:%.*]]) ;
CHECK: icmp
>>>>>> +slt i32 {{.*}}[[X]] ; CHECK: icmp slt i32 {{.*}}[[Y]]
; CHECK:
>>>>>> +ret void
>>>>>> +
>>>>>> +entry:
>>>>>> +  %cmp = icmp slt i32 %x, 0
>>>>>> +  br i1 %cmp, label %if.then, label %if.end3
>>>>>> +
>>>>>> +if.then:                                          ;
preds = %entry
>>>>>> +  %cmp1 = icmp slt i32 %y, 0
>>>>>> +  br i1 %cmp1, label %if.then2, label %if.end
>>>>>> +
>>>>>> +if.then2:                                         ;
preds = %if.then
>>>>>> +  call void @callee()
>>>>>> +  br label %if.end
>>>>>> +
>>>>>> +if.end:                                           ;
preds >>>>>> %if.then2, %if.then
>>>>>> +  br label %if.end3
>>>>>> +
>>>>>> +if.end3:                                          ;
preds = %if.end,
>>>>>> %entry
>>>>>> +  ret void
>>>>>> +}
>>>>>>
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at cs.uiuc.edu<mailto:llvm-commits at
cs.uiuc.edu>
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu<mailto:llvm-commits at
cs.uiuc.edu>
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>
>>>>
>>>
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131119/e2d5401b/attachment.html>

Kuperstein, Michael M

2013-Nov-19 19:00 UTC

head link

[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)

Sorry, you're right, I've read the spec more carefully, and there is no
evaluation in the original code.

In fact, C11 addresses this case specifically:
Note 14 to 5.1.24: 
"Transformations that introduce a speculative read of a potentially shared
memory location may not preserve the semantics of the program as deﬁned in this
standard, since they potentially introduce a data race. However, they are
typically valid in the context of an optimizing compiler that targets a speciﬁc
machine with well-deﬁned semantics for data races. They would be invalid for a
hypothetical machine that is not tolerant of races or provides hardware race
detection."

So, I guess, under C/C++ semantics, the transformation is "illegal unless
you know what you're doing for the specific target", and having TSan
would make it illegal.
Or does that interpretation sound wrong too? :-)

-----Original Message-----
From: Dr D. Chisnall [mailto:dc552 at hermes.cam.ac.uk] On Behalf Of David
Chisnall
Sent: Tuesday, November 19, 2013 20:21
To: Kostya Serebryany
Cc: Kuperstein, Michael M; LLVM Developers Mailing List
Subject: Re: [LLVMdev] Curiosity about transform changes under Sanitizers (Was:
[PATCH] Disable branch folding with MemorySanitizer)

On 19 Nov 2013, at 17:58, Kostya Serebryany <kcc at google.com> wrote:
> On Tue, Nov 19, 2013 at 9:56 PM, Kuperstein, Michael M
<michael.m.kuperstein at intel.com> wrote:
>> What I’m trying to say is that according to my understanding of the
C++11 memory model, even in that small reproducer, the store to g and the load
from g are in fact a data race.
>> 
>> (This is regardless of the fact the load is protected by a branch that
is not taken.)
> 
> My understanding of the standard is quite the opposite. 
I agree.  It is a potential data race, if the branch is taken then it definitely
is a race because the standard explicitly does not define an ordering on loads
and stores of non-atomic variables unless some happens-before relationship is
established by some other operation (which does not occur in this example).  In
the case where the branch is not taken, then there is no data race because in
the C++ abstract machine there is no load, whether there is one in the
underlying concrete machine or not.

I believe that this transform is valid, because there are only two possible
cases here:

- Some other code has established a happens-before relationship between the load
and the stores, giving a well-defined ordering, however this code is not visible
in the function foo() and so the load may happen anywhere.

- No other code establishes a happens-before relationship, and therefore the
order of the load and the store is completely undefined and as long as the load
doesn't trap it is completely safe to hoist it.

HOWEVER, I would dispute describing this transform as an optimisation in this
case.  If the two threads are running on different cores, then we have likely
introduced some false sharing and have forced the two cores to exchange some bus
traffic for cache coherency, even though is is completely valid in the C++
abstract machine model for the load of g to return a stale cache value.  This is
only a real problem on strongly ordered  architectures (e.g. x86), but given the
relative cost of a cache shootdown and everything else in this test case (with
the exception of the thread creation), I wouldn't be surprised if it ended
up slowing things down.  Especially given that a modern superscalar CPU will
speculatively execute the load ANYWAY if it can do so from cache, and if it
can't then the performance improvement from doing it before the branch will
likely be negligible.

For single-core, in-order, single-issue architectures, or multicore, weakly
ordered, in-order, single-issue architectures, it's probably a win...

David

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Kostya Serebryany

2013-Nov-20 05:01 UTC

head link

[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)

On Tue, Nov 19, 2013 at 10:20 PM, David Chisnall <
David.Chisnall at cl.cam.ac.uk> wrote:
> On 19 Nov 2013, at 17:58, Kostya Serebryany <kcc at google.com>
wrote:
>
> > On Tue, Nov 19, 2013 at 9:56 PM, Kuperstein, Michael M <
> michael.m.kuperstein at intel.com> wrote:
> >> What I’m trying to say is that according to my understanding of
the
> C++11 memory model, even in that small reproducer, the store to g and the
> load from g are in fact a data race.
> >>
> >> (This is regardless of the fact the load is protected by a branch
that
> is not taken.)
> >
> > My understanding of the standard is quite the opposite.
>
> I agree.  It is a potential data race, if the branch is taken then it
> definitely is a race because the standard explicitly does not define an
> ordering on loads and stores of non-atomic variables unless some
> happens-before relationship is established by some other operation (which
> does not occur in this example).  In the case where the branch is not
> taken, then there is no data race because in the C++ abstract machine there
> is no load, whether there is one in the underlying concrete machine or not.
>
> I believe that this transform is valid, because there are only two
> possible cases here:
>
> - Some other code has established a happens-before relationship between
> the load and the stores, giving a well-defined ordering, however this code
> is not visible in the function foo() and so the load may happen anywhere.
>
> - No other code establishes a happens-before relationship, and therefore
> the order of the load and the store is completely undefined and as long as
> the load doesn't trap it is completely safe to hoist it.
>
> HOWEVER, I would dispute describing this transform as an optimisation in
> this case.

You have a good point: evaluating such transformation on single-threaded
benchmarks (e.g. SPEC) makes no sense
if we care about multi-threaded code. And evaluating performance of
anything like this on a multi-threaded code is hard too.
And it's very easy to prepare a synthetic test case where this optimization
will cause 4x slowdown (see below).
But if we disable this transformation someone surely will come and complain
:)
Another data point: gcc 4.6.3 does this too.


==> if-bench.cc <=#include <pthread.h>
#include <unistd.h>

extern int foo(int);
extern void bar(int);

#ifndef N
#define N (1UL << 30)
#endif

void *Thread(void *p) {
  for (long i = 0; i < N; i++)
     foo(0);
  return 0;
}

int main() {
  pthread_t t[3];
  pthread_create(&t[0], 0, Thread, 0);
  pthread_create(&t[1], 0, Thread, 0);
  pthread_create(&t[2], 0, Thread, 0);
  for (long i = 0; i < N; i++)
    bar(i);
  pthread_join(t[0], 0);
  pthread_join(t[1], 0);
  pthread_join(t[2], 0);
}

==> g.cc <=int g;

int foo(int cond) {
  if (cond)
    return g;
  // If we replace 'g' with g*g*g*g, the benchmark becomes 4x faster.
  return 0;
}

void bar(int i) {  g = i; }


% clang if-bench.cc  g.cc  -lpthread -O1  && time ./a.out


--kcc


>  If the two threads are running on different cores, then we have likely
> introduced some false sharing and have forced the two cores to exchange
> some bus traffic for cache coherency, even though is is completely valid in
> the C++ abstract machine model for the load of g to return a stale cache
> value.  This is only a real problem on strongly ordered  architectures
> (e.g. x86), but given the relative cost of a cache shootdown and everything
> else in this test case (with the exception of the thread creation), I
> wouldn't be surprised if it ended up slowing things down.  Especially
given
> that a modern superscalar CPU will speculatively execute the load ANYWAY if
> it can do so from cache, and if it can't then the performance
improvement
> from doing it before the branch will likely be negligible.
>
> For single-core, in-order, single-issue architectures, or multicore,
> weakly ordered, in-order, single-issue architectures, it's probably a
win...
>
> David
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131120/c953f891/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Nov 2013 - [LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)

[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)

[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)

[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)

[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)

[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)

Possibly Parallel Threads