Kostya Serebryany
2013-Nov-19 17:58 UTC
[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)
On Tue, Nov 19, 2013 at 9:56 PM, Kuperstein, Michael M < michael.m.kuperstein at intel.com> wrote:> What I’m trying to say is that according to my understanding of the > C++11 memory model, even in that small reproducer, the store to g and the > load from g are in fact a data race. > > (This is regardless of the fact the load is protected by a branch that is > not taken.) >My understanding of the standard is quite the opposite.> > > *From:* Kostya Serebryany [mailto:kcc at google.com] > *Sent:* Tuesday, November 19, 2013 19:46 > *To:* Kuperstein, Michael M > *Cc:* Evgeniy Stepanov; LLVM Developers Mailing List > > *Subject:* Re: [LLVMdev] Curiosity about transform changes under > Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer) > > > > > > > > On Tue, Nov 19, 2013 at 9:05 PM, Kuperstein, Michael M < > michael.m.kuperstein at intel.com> wrote: > > My $0.02 - I'm not sure the transformation introduces a data race. > > To the best of my understanding, the point of the C++11/C11 memory model > is to allow a wide array of compiler transformations - including > speculative loads - for non-atomic variables. > I believe what's most likely happening (without looking at the Mozilla > source) is that the original program contains a C++ data race, and the > transformation exposes it to TSan. > > > > The original program is race-free. > > I've posted a minimized reproducer that actually triggers a tsan false > positive at O1 here: > > https://code.google.com/p/thread-sanitizer/issues/detail?id=40#c5 > > > > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On > Behalf Of Evgeniy Stepanov > Sent: Tuesday, November 19, 2013 18:55 > To: Kostya Serebryany > Cc: LLVM Developers Mailing List > Subject: Re: [LLVMdev] Curiosity about transform changes under Sanitizers > (Was: [PATCH] Disable branch folding with MemorySanitizer) > > The root cause of those issues is the fact that sanitizers verify > C++-level semantics with LLVM IR level instrumentation. For example, > speculative loads are OK in IR if it can be proved that the load won't > trap, but in C++ it would be a data race. > > > On Tue, Nov 19, 2013 at 8:38 PM, Kostya Serebryany <kcc at google.com> wrote: > > > > > > > > On Tue, Nov 19, 2013 at 8:25 PM, David Blaikie <dblaikie at gmail.com> > wrote: > >> > >> Just moving this branch of the thread out of the review because I > >> don't want to derail the review thread... > >> > >> Kostya - why are these two cases not optimization bugs in general? > >> (why do they only affect sanitizers?) > > > > > > The recent case from mozilla > > (https://code.google.com/p/thread-sanitizer/issues/detail?id=40#c2) is > > a legal optimization -- it hoists a safe load (i.e. a load which is > > known not to > > fail) out of conditional branch. > > It reduces the number of basic blocks and branches, and so I think > > it's good in general. > > I can't imagine a case where this optimization will break a valid > program. > > Which is the second one you are referring to? > > > > --kcc > > > >> > >> > >> > >> > >> On Mon, Nov 18, 2013 at 8:37 PM, Kostya Serebryany <kcc at google.com> > wrote: > >>> > >>> And we've been just informed by the mozilla folks about yet another > >>> case of optimization being hostile to sanitizers: > >>> hoisting a safe load out of conditional branch introduces a race > >>> which tsan happily reports. > >>> https://code.google.com/p/thread-sanitizer/issues/detail?id=40#c2 > >>> > >>> --kcc > >>> > >>> > >>> On Tue, Nov 19, 2013 at 8:06 AM, Kostya Serebryany <kcc at google.com> > >>> wrote: > >>>> > >>>> > >>>> > >>>> > >>>> On Tue, Nov 19, 2013 at 1:27 AM, David Blaikie <dblaikie at gmail.com> > >>>> wrote: > >>>>> > >>>>> Do we have precedence for this kind of change (where sanitizers > >>>>> affect optimizations in arbitrary internal ways - not simply by > >>>>> enabling/disabling certain passes)? > >>>> > >>>> > >>>> Yes. AddressSanitizer and ThreadSanitizer disable load widening > >>>> that would create a partially out-of-bounds or a racy access. > >>>> See lib/Analysis/MemoryDependenceAnalysis.cpp (search for > >>>> Attribute::SanitizeAddress and Attribute::SanitizeThread). > >>>> This case with MemorySanitizer is slightly different because we are > >>>> not fighting a false positive, but rather a debug-info-damaging > optimization. > >>>> > >>>> --kcc > >>>> > >>>>> > >>>>> If not, does this need some deeper discussion about alternatives > >>>>> (is it important that we be able to produce equivalent code > >>>>> without the sanitizers enabled?)? > >>>>> > >>>>> > >>>>> On Mon, Nov 18, 2013 at 7:02 AM, Evgeniy Stepanov > >>>>> <eugenis at google.com> > >>>>> wrote: > >>>>>> > >>>>>> Branch folding optimization often leads to confusing MSan reports > >>>>>> due to lost debug info. > >>>>>> For example, > >>>>>> 1: if (x < 0) > >>>>>> 2: if (y < 0) > >>>>>> 3: do_something(); > >>>>>> is transformed into something like > >>>>>> %0 = and i32 %y, %x > >>>>>> %1 = icmp slt i32 %0, 0 > >>>>>> br i1 %1, label %if.then2, label %if.end3 where all 3 > >>>>>> instructions are associated with line 1. > >>>>>> > >>>>>> This patch disables folding of conditional branches in functions > >>>>>> with sanitize_memory attribute. > >>>>>> > >>>>>> http://llvm-reviews.chandlerc.com/D2214 > >>>>>> > >>>>>> Files: > >>>>>> lib/Transforms/Utils/SimplifyCFG.cpp > >>>>>> test/Transforms/SimplifyCFG/branch-fold-msan.ll > >>>>>> > >>>>>> Index: lib/Transforms/Utils/SimplifyCFG.cpp > >>>>>> ================================================================> >>>>>> => >>>>>> --- lib/Transforms/Utils/SimplifyCFG.cpp > >>>>>> +++ lib/Transforms/Utils/SimplifyCFG.cpp > >>>>>> @@ -1967,6 +1967,13 @@ > >>>>>> bool llvm::FoldBranchToCommonDest(BranchInst *BI) { > >>>>>> BasicBlock *BB = BI->getParent(); > >>>>>> > >>>>>> + // This optimization results in confusing MemorySanitizer > reports. > >>>>>> Use of > >>>>>> + // uninitialized value in this branch instruction is reported > >>>>>> + with > >>>>>> the > >>>>>> + // predecessor's debug location. > >>>>>> + if (BB->getParent()->hasFnAttribute(Attribute::SanitizeMemory) && > >>>>>> + BI->isConditional()) > >>>>>> + return false; > >>>>>> + > >>>>>> Instruction *Cond = 0; > >>>>>> if (BI->isConditional()) > >>>>>> Cond = dyn_cast<Instruction>(BI->getCondition()); > >>>>>> Index: test/Transforms/SimplifyCFG/branch-fold-msan.ll > >>>>>> ================================================================> >>>>>> => >>>>>> --- test/Transforms/SimplifyCFG/branch-fold-msan.ll > >>>>>> +++ test/Transforms/SimplifyCFG/branch-fold-msan.ll > >>>>>> @@ -0,0 +1,29 @@ > >>>>>> +; RUN: opt < %s -simplifycfg -S | FileCheck %s > >>>>>> + > >>>>>> +declare void @callee() > >>>>>> + > >>>>>> +; Test that conditional branches are not folded with > sanitize_memory. > > >>>>>> +define void @caller(i32 %x, i32 %y) sanitize_memory { ; CHECK: > >>>>>> +define void @caller(i32 [[X:%.*]], i32 [[Y:%.*]]) ; CHECK: icmp > >>>>>> +slt i32 {{.*}}[[X]] ; CHECK: icmp slt i32 {{.*}}[[Y]] ; CHECK: > >>>>>> +ret void > > >>>>>> + > >>>>>> +entry: > >>>>>> + %cmp = icmp slt i32 %x, 0 > >>>>>> + br i1 %cmp, label %if.then, label %if.end3 > >>>>>> + > >>>>>> +if.then: ; preds = %entry > >>>>>> + %cmp1 = icmp slt i32 %y, 0 > >>>>>> + br i1 %cmp1, label %if.then2, label %if.end > >>>>>> + > >>>>>> +if.then2: ; preds > %if.then > >>>>>> + call void @callee() > >>>>>> + br label %if.end > >>>>>> + > >>>>>> +if.end: ; preds > >>>>>> %if.then2, %if.then > >>>>>> + br label %if.end3 > >>>>>> + > >>>>>> +if.end3: ; preds > %if.end, > >>>>>> %entry > >>>>>> + ret void > >>>>>> +} > >>>>>> > >>>>>> _______________________________________________ > >>>>>> llvm-commits mailing list > >>>>>> llvm-commits at cs.uiuc.edu > >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits > >>>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> llvm-commits mailing list > >>>>> llvm-commits at cs.uiuc.edu > >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits > >>>>> > >>>> > >>> > >> > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131119/cc6e567d/attachment.html>
David Chisnall
2013-Nov-19 18:20 UTC
[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)
On 19 Nov 2013, at 17:58, Kostya Serebryany <kcc at google.com> wrote:> On Tue, Nov 19, 2013 at 9:56 PM, Kuperstein, Michael M <michael.m.kuperstein at intel.com> wrote: >> What I’m trying to say is that according to my understanding of the C++11 memory model, even in that small reproducer, the store to g and the load from g are in fact a data race. >> >> (This is regardless of the fact the load is protected by a branch that is not taken.) > > My understanding of the standard is quite the opposite.I agree. It is a potential data race, if the branch is taken then it definitely is a race because the standard explicitly does not define an ordering on loads and stores of non-atomic variables unless some happens-before relationship is established by some other operation (which does not occur in this example). In the case where the branch is not taken, then there is no data race because in the C++ abstract machine there is no load, whether there is one in the underlying concrete machine or not. I believe that this transform is valid, because there are only two possible cases here: - Some other code has established a happens-before relationship between the load and the stores, giving a well-defined ordering, however this code is not visible in the function foo() and so the load may happen anywhere. - No other code establishes a happens-before relationship, and therefore the order of the load and the store is completely undefined and as long as the load doesn't trap it is completely safe to hoist it. HOWEVER, I would dispute describing this transform as an optimisation in this case. If the two threads are running on different cores, then we have likely introduced some false sharing and have forced the two cores to exchange some bus traffic for cache coherency, even though is is completely valid in the C++ abstract machine model for the load of g to return a stale cache value. This is only a real problem on strongly ordered architectures (e.g. x86), but given the relative cost of a cache shootdown and everything else in this test case (with the exception of the thread creation), I wouldn't be surprised if it ended up slowing things down. Especially given that a modern superscalar CPU will speculatively execute the load ANYWAY if it can do so from cache, and if it can't then the performance improvement from doing it before the branch will likely be negligible. For single-core, in-order, single-issue architectures, or multicore, weakly ordered, in-order, single-issue architectures, it's probably a win... David
Kuperstein, Michael M
2013-Nov-19 18:21 UTC
[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)
Quoting the C11 standard, section 5.1.2.4: (4) “Two expression evaluations conflict if one of them modifies a memory location and the other one reads or modifies the same memory location” (25) “The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior” (18) “An evaluation A happens before an evaluation B if A is sequenced before B or A inter-thread happens before B” The load and the store are in conflict (one is a load and the other is a store to the same location) and neither is atomic. So the question is whether they happen-before each other. Since they are not in the same thread, they are not sequenced before one another, so the only option is inter-thread happens before. I can’t see an inter-thread happens-before relation here. From: Kostya Serebryany [mailto:kcc at google.com] Sent: Tuesday, November 19, 2013 19:59 To: Kuperstein, Michael M Cc: Evgeniy Stepanov; LLVM Developers Mailing List Subject: Re: [LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer) On Tue, Nov 19, 2013 at 9:56 PM, Kuperstein, Michael M <michael.m.kuperstein at intel.com<mailto:michael.m.kuperstein at intel.com>> wrote: What I’m trying to say is that according to my understanding of the C++11 memory model, even in that small reproducer, the store to g and the load from g are in fact a data race. (This is regardless of the fact the load is protected by a branch that is not taken.) My understanding of the standard is quite the opposite. From: Kostya Serebryany [mailto:kcc at google.com<mailto:kcc at google.com>] Sent: Tuesday, November 19, 2013 19:46 To: Kuperstein, Michael M Cc: Evgeniy Stepanov; LLVM Developers Mailing List Subject: Re: [LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer) On Tue, Nov 19, 2013 at 9:05 PM, Kuperstein, Michael M <michael.m.kuperstein at intel.com<mailto:michael.m.kuperstein at intel.com>> wrote: My $0.02 - I'm not sure the transformation introduces a data race. To the best of my understanding, the point of the C++11/C11 memory model is to allow a wide array of compiler transformations - including speculative loads - for non-atomic variables. I believe what's most likely happening (without looking at the Mozilla source) is that the original program contains a C++ data race, and the transformation exposes it to TSan. The original program is race-free. I've posted a minimized reproducer that actually triggers a tsan false positive at O1 here: https://code.google.com/p/thread-sanitizer/issues/detail?id=40#c5 -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu<mailto:llvmdev-bounces at cs.uiuc.edu> [mailto:llvmdev-bounces at cs.uiuc.edu<mailto:llvmdev-bounces at cs.uiuc.edu>] On Behalf Of Evgeniy Stepanov Sent: Tuesday, November 19, 2013 18:55 To: Kostya Serebryany Cc: LLVM Developers Mailing List Subject: Re: [LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer) The root cause of those issues is the fact that sanitizers verify C++-level semantics with LLVM IR level instrumentation. For example, speculative loads are OK in IR if it can be proved that the load won't trap, but in C++ it would be a data race. On Tue, Nov 19, 2013 at 8:38 PM, Kostya Serebryany <kcc at google.com<mailto:kcc at google.com>> wrote:> > > > On Tue, Nov 19, 2013 at 8:25 PM, David Blaikie <dblaikie at gmail.com<mailto:dblaikie at gmail.com>> wrote: >> >> Just moving this branch of the thread out of the review because I >> don't want to derail the review thread... >> >> Kostya - why are these two cases not optimization bugs in general? >> (why do they only affect sanitizers?) > > > The recent case from mozilla > (https://code.google.com/p/thread-sanitizer/issues/detail?id=40#c2) is > a legal optimization -- it hoists a safe load (i.e. a load which is > known not to > fail) out of conditional branch. > It reduces the number of basic blocks and branches, and so I think > it's good in general. > I can't imagine a case where this optimization will break a valid program. > Which is the second one you are referring to? > > --kcc > >> >> >> >> >> On Mon, Nov 18, 2013 at 8:37 PM, Kostya Serebryany <kcc at google.com<mailto:kcc at google.com>> wrote: >>> >>> And we've been just informed by the mozilla folks about yet another >>> case of optimization being hostile to sanitizers: >>> hoisting a safe load out of conditional branch introduces a race >>> which tsan happily reports. >>> https://code.google.com/p/thread-sanitizer/issues/detail?id=40#c2 >>> >>> --kcc >>> >>> >>> On Tue, Nov 19, 2013 at 8:06 AM, Kostya Serebryany <kcc at google.com<mailto:kcc at google.com>> >>> wrote: >>>> >>>> >>>> >>>> >>>> On Tue, Nov 19, 2013 at 1:27 AM, David Blaikie <dblaikie at gmail.com<mailto:dblaikie at gmail.com>> >>>> wrote: >>>>> >>>>> Do we have precedence for this kind of change (where sanitizers >>>>> affect optimizations in arbitrary internal ways - not simply by >>>>> enabling/disabling certain passes)? >>>> >>>> >>>> Yes. AddressSanitizer and ThreadSanitizer disable load widening >>>> that would create a partially out-of-bounds or a racy access. >>>> See lib/Analysis/MemoryDependenceAnalysis.cpp (search for >>>> Attribute::SanitizeAddress and Attribute::SanitizeThread). >>>> This case with MemorySanitizer is slightly different because we are >>>> not fighting a false positive, but rather a debug-info-damaging optimization. >>>> >>>> --kcc >>>> >>>>> >>>>> If not, does this need some deeper discussion about alternatives >>>>> (is it important that we be able to produce equivalent code >>>>> without the sanitizers enabled?)? >>>>> >>>>> >>>>> On Mon, Nov 18, 2013 at 7:02 AM, Evgeniy Stepanov >>>>> <eugenis at google.com<mailto:eugenis at google.com>> >>>>> wrote: >>>>>> >>>>>> Branch folding optimization often leads to confusing MSan reports >>>>>> due to lost debug info. >>>>>> For example, >>>>>> 1: if (x < 0) >>>>>> 2: if (y < 0) >>>>>> 3: do_something(); >>>>>> is transformed into something like >>>>>> %0 = and i32 %y, %x >>>>>> %1 = icmp slt i32 %0, 0 >>>>>> br i1 %1, label %if.then2, label %if.end3 where all 3 >>>>>> instructions are associated with line 1. >>>>>> >>>>>> This patch disables folding of conditional branches in functions >>>>>> with sanitize_memory attribute. >>>>>> >>>>>> http://llvm-reviews.chandlerc.com/D2214 >>>>>> >>>>>> Files: >>>>>> lib/Transforms/Utils/SimplifyCFG.cpp >>>>>> test/Transforms/SimplifyCFG/branch-fold-msan.ll >>>>>> >>>>>> Index: lib/Transforms/Utils/SimplifyCFG.cpp >>>>>> ================================================================>>>>>> =>>>>>> --- lib/Transforms/Utils/SimplifyCFG.cpp >>>>>> +++ lib/Transforms/Utils/SimplifyCFG.cpp >>>>>> @@ -1967,6 +1967,13 @@ >>>>>> bool llvm::FoldBranchToCommonDest(BranchInst *BI) { >>>>>> BasicBlock *BB = BI->getParent(); >>>>>> >>>>>> + // This optimization results in confusing MemorySanitizer reports. >>>>>> Use of >>>>>> + // uninitialized value in this branch instruction is reported >>>>>> + with >>>>>> the >>>>>> + // predecessor's debug location. >>>>>> + if (BB->getParent()->hasFnAttribute(Attribute::SanitizeMemory) && >>>>>> + BI->isConditional()) >>>>>> + return false; >>>>>> + >>>>>> Instruction *Cond = 0; >>>>>> if (BI->isConditional()) >>>>>> Cond = dyn_cast<Instruction>(BI->getCondition()); >>>>>> Index: test/Transforms/SimplifyCFG/branch-fold-msan.ll >>>>>> ================================================================>>>>>> =>>>>>> --- test/Transforms/SimplifyCFG/branch-fold-msan.ll >>>>>> +++ test/Transforms/SimplifyCFG/branch-fold-msan.ll >>>>>> @@ -0,0 +1,29 @@ >>>>>> +; RUN: opt < %s -simplifycfg -S | FileCheck %s >>>>>> + >>>>>> +declare void @callee() >>>>>> + >>>>>> +; Test that conditional branches are not folded with sanitize_memory. >>>>>> +define void @caller(i32 %x, i32 %y) sanitize_memory { ; CHECK: >>>>>> +define void @caller(i32 [[X:%.*]], i32 [[Y:%.*]]) ; CHECK: icmp >>>>>> +slt i32 {{.*}}[[X]] ; CHECK: icmp slt i32 {{.*}}[[Y]] ; CHECK: >>>>>> +ret void >>>>>> + >>>>>> +entry: >>>>>> + %cmp = icmp slt i32 %x, 0 >>>>>> + br i1 %cmp, label %if.then, label %if.end3 >>>>>> + >>>>>> +if.then: ; preds = %entry >>>>>> + %cmp1 = icmp slt i32 %y, 0 >>>>>> + br i1 %cmp1, label %if.then2, label %if.end >>>>>> + >>>>>> +if.then2: ; preds = %if.then >>>>>> + call void @callee() >>>>>> + br label %if.end >>>>>> + >>>>>> +if.end: ; preds >>>>>> %if.then2, %if.then >>>>>> + br label %if.end3 >>>>>> + >>>>>> +if.end3: ; preds = %if.end, >>>>>> %entry >>>>>> + ret void >>>>>> +} >>>>>> >>>>>> _______________________________________________ >>>>>> llvm-commits mailing list >>>>>> llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu> >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> llvm-commits mailing list >>>>> llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits >>>>> >>>> >>> >> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >_______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131119/e2d5401b/attachment.html>
Kuperstein, Michael M
2013-Nov-19 19:00 UTC
[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)
Sorry, you're right, I've read the spec more carefully, and there is no evaluation in the original code. In fact, C11 addresses this case specifically: Note 14 to 5.1.24: "Transformations that introduce a speculative read of a potentially shared memory location may not preserve the semantics of the program as defined in this standard, since they potentially introduce a data race. However, they are typically valid in the context of an optimizing compiler that targets a specific machine with well-defined semantics for data races. They would be invalid for a hypothetical machine that is not tolerant of races or provides hardware race detection." So, I guess, under C/C++ semantics, the transformation is "illegal unless you know what you're doing for the specific target", and having TSan would make it illegal. Or does that interpretation sound wrong too? :-) -----Original Message----- From: Dr D. Chisnall [mailto:dc552 at hermes.cam.ac.uk] On Behalf Of David Chisnall Sent: Tuesday, November 19, 2013 20:21 To: Kostya Serebryany Cc: Kuperstein, Michael M; LLVM Developers Mailing List Subject: Re: [LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer) On 19 Nov 2013, at 17:58, Kostya Serebryany <kcc at google.com> wrote:> On Tue, Nov 19, 2013 at 9:56 PM, Kuperstein, Michael M <michael.m.kuperstein at intel.com> wrote: >> What I’m trying to say is that according to my understanding of the C++11 memory model, even in that small reproducer, the store to g and the load from g are in fact a data race. >> >> (This is regardless of the fact the load is protected by a branch that is not taken.) > > My understanding of the standard is quite the opposite.I agree. It is a potential data race, if the branch is taken then it definitely is a race because the standard explicitly does not define an ordering on loads and stores of non-atomic variables unless some happens-before relationship is established by some other operation (which does not occur in this example). In the case where the branch is not taken, then there is no data race because in the C++ abstract machine there is no load, whether there is one in the underlying concrete machine or not. I believe that this transform is valid, because there are only two possible cases here: - Some other code has established a happens-before relationship between the load and the stores, giving a well-defined ordering, however this code is not visible in the function foo() and so the load may happen anywhere. - No other code establishes a happens-before relationship, and therefore the order of the load and the store is completely undefined and as long as the load doesn't trap it is completely safe to hoist it. HOWEVER, I would dispute describing this transform as an optimisation in this case. If the two threads are running on different cores, then we have likely introduced some false sharing and have forced the two cores to exchange some bus traffic for cache coherency, even though is is completely valid in the C++ abstract machine model for the load of g to return a stale cache value. This is only a real problem on strongly ordered architectures (e.g. x86), but given the relative cost of a cache shootdown and everything else in this test case (with the exception of the thread creation), I wouldn't be surprised if it ended up slowing things down. Especially given that a modern superscalar CPU will speculatively execute the load ANYWAY if it can do so from cache, and if it can't then the performance improvement from doing it before the branch will likely be negligible. For single-core, in-order, single-issue architectures, or multicore, weakly ordered, in-order, single-issue architectures, it's probably a win... David --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
Kostya Serebryany
2013-Nov-20 05:01 UTC
[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)
On Tue, Nov 19, 2013 at 10:20 PM, David Chisnall < David.Chisnall at cl.cam.ac.uk> wrote:> On 19 Nov 2013, at 17:58, Kostya Serebryany <kcc at google.com> wrote: > > > On Tue, Nov 19, 2013 at 9:56 PM, Kuperstein, Michael M < > michael.m.kuperstein at intel.com> wrote: > >> What I’m trying to say is that according to my understanding of the > C++11 memory model, even in that small reproducer, the store to g and the > load from g are in fact a data race. > >> > >> (This is regardless of the fact the load is protected by a branch that > is not taken.) > > > > My understanding of the standard is quite the opposite. > > I agree. It is a potential data race, if the branch is taken then it > definitely is a race because the standard explicitly does not define an > ordering on loads and stores of non-atomic variables unless some > happens-before relationship is established by some other operation (which > does not occur in this example). In the case where the branch is not > taken, then there is no data race because in the C++ abstract machine there > is no load, whether there is one in the underlying concrete machine or not. > > I believe that this transform is valid, because there are only two > possible cases here: > > - Some other code has established a happens-before relationship between > the load and the stores, giving a well-defined ordering, however this code > is not visible in the function foo() and so the load may happen anywhere. > > - No other code establishes a happens-before relationship, and therefore > the order of the load and the store is completely undefined and as long as > the load doesn't trap it is completely safe to hoist it. > > HOWEVER, I would dispute describing this transform as an optimisation in > this case.You have a good point: evaluating such transformation on single-threaded benchmarks (e.g. SPEC) makes no sense if we care about multi-threaded code. And evaluating performance of anything like this on a multi-threaded code is hard too. And it's very easy to prepare a synthetic test case where this optimization will cause 4x slowdown (see below). But if we disable this transformation someone surely will come and complain :) Another data point: gcc 4.6.3 does this too. ==> if-bench.cc <=#include <pthread.h> #include <unistd.h> extern int foo(int); extern void bar(int); #ifndef N #define N (1UL << 30) #endif void *Thread(void *p) { for (long i = 0; i < N; i++) foo(0); return 0; } int main() { pthread_t t[3]; pthread_create(&t[0], 0, Thread, 0); pthread_create(&t[1], 0, Thread, 0); pthread_create(&t[2], 0, Thread, 0); for (long i = 0; i < N; i++) bar(i); pthread_join(t[0], 0); pthread_join(t[1], 0); pthread_join(t[2], 0); } ==> g.cc <=int g; int foo(int cond) { if (cond) return g; // If we replace 'g' with g*g*g*g, the benchmark becomes 4x faster. return 0; } void bar(int i) { g = i; } % clang if-bench.cc g.cc -lpthread -O1 && time ./a.out --kcc> If the two threads are running on different cores, then we have likely > introduced some false sharing and have forced the two cores to exchange > some bus traffic for cache coherency, even though is is completely valid in > the C++ abstract machine model for the load of g to return a stale cache > value. This is only a real problem on strongly ordered architectures > (e.g. x86), but given the relative cost of a cache shootdown and everything > else in this test case (with the exception of the thread creation), I > wouldn't be surprised if it ended up slowing things down. Especially given > that a modern superscalar CPU will speculatively execute the load ANYWAY if > it can do so from cache, and if it can't then the performance improvement > from doing it before the branch will likely be negligible. > > For single-core, in-order, single-issue architectures, or multicore, > weakly ordered, in-order, single-issue architectures, it's probably a win... > > David > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131120/c953f891/attachment.html>
Maybe Matching Threads
- [LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)
- [LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)
- [LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)
- [LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)
- [LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)