Nicolai Hähnle via llvm-dev
2020-Aug-09 15:15 UTC
[llvm-dev] _mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier
Hi Craig, The review for the similar GPU problem is now up here: https://reviews.llvm.org/D85603 (+ some other patches on the Phabricator stack).>From a pragmatic perspective, the constraints added to programtransforms there are sufficient for what you need. You'd produce IR such as: %token = call token @llvm.experimental.convergence.anchor() br i1 %c, label %then, label %else then: call void @llvm.x86.sse2.lfence() convergent [ "convergencectrl"(token%token) ] ... else: call void @llvm.x86.sse2.lfence() convergent [ "convergencectrl"(token %token) ] ... ... and this would prevent the hoisting of the lfences. The puzzle to me is whether one can justify this use of the convergence tokens from a theoretical point of view. We describe convergence control in terms of threads that communicate, which is a faithful description of what's happening in the GPU use case. I wonder whether for the speculative execution problem, one could justify the use of the same convergence control machinery by arguing about the existence of "potential speculative threads of execution" and communication between them. Basically, the argument would be somewhere along the lines that the lfence can only proceed execution once all speculative threads of execution that it _cannot_ communicate with according to the convergence token are killed off. I suspect that somebody would have to go off and do some deep thinking for a while to figure out whether that really makes sense. Cheers, Nicolai On Wed, Jul 29, 2020 at 11:14 AM Nicolai Hähnle <nhaehnle at gmail.com> wrote:> > Hi Craig, > > that's an interesting problem. > > We have a superficially similar problem in GPU programming models > where there are cross-thread communication operations that are > sensitive to control flow, as in: > > if (c) { > b = subgroupAdd(a); > bar(b); > } else { > b = subgroupAdd(a); > baz(b); > } > > LLVM will merge those, even though it changes the behavior > (potentially summing over a larger set of threads than in the original > program). Merging them is inherently correct for LLVM's semantics. > It's the same underlying problem as what you describe: LLVM IR simply > doesn't have a way of describing these semantics that fall somewhat > outside of a purely deterministic single-threaded execution model. For > our needs, we're currently working around this by essentially adding a > unique ID to each of these operations so that they all appear > different to LLVM. I suspect that the same could work for you. > > Still, it's a bit of an awkward workaround and a better solution would > be great. I've been wondering whether we could perhaps have token > values produced by branch instructions to express certain kinds of > dependencies. In your case, you'd end up with something like: > > %token = br i1 %c, label %then, label %else > > then: > call void @llvm.x86.sse2.lfence() [ "some-bundle"(%token) ] > ... > > else: > call void @llvm.x86.sse2.lfence() [ "some-bundle"(%token) ] > ... > > The token indicates an essential control dependency on the branch > instruction. I've previously rejected this idea as too invasive, and > there are alternatives for our particular use case, but if there are > multiple use cases for this kind of dependency -- and it kind of looks > like it from where I stand -- then perhaps this is something to > consider more seriously? > > Cheers, > Nicolai > > On Wed, Jul 29, 2020 at 1:30 AM Craig Topper via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > > _mm_lfence was originally documented as a load fence. But in light of speculative execution vulnerabilities it has started being advertised as a way to prevent speculative execution. Current Intel Software Development Manual documents it as "Specifically, LFENCE does not execute until all prior instructions have completed locally, and no later instruction begins execution until LFENCE completes". > > > > For the following test, my intention was to ensure that the body of either the if or the else would not proceed until any speculation of the branch had resolved. But SimplifyCFG saw that both control paths started with an lfence so hoisted it into a single lfence intrinsic before the branch. https://godbolt.org/z/qMc446 The intrinsic in IR has no properties so it should be assumed to read/write any memory. But that's not enough to specify this control flow dependency. gcc also exhibits a similar behavior. > > > > #include <x86intrin.h> > > > > void bar(); > > void baz(); > > > > void foo(int c) { > > if (c) { > > _mm_lfence(); > > bar(); > > } else { > > _mm_lfence(); > > baz(); > > } > > } > > > > > > Alternatively, I also tried replacing the intrinsics with inline assembly. SimplifyCFG still merged those. But gcc did not. https://godbolt.org/z/acnPxY > > > > void bar(); > > void baz(); > > > > void foo(int c) { > > if (c) { > > __asm__ __volatile ("lfence"); > > bar(); > > } else { > > __asm__ __volatile ("lfence"); > > baz(); > > } > > } > > > > I believe the [[clang::nomerge]] attribute was recently extended to inline assembly which can be used to prevent the inline assembly from being hoisted by SimplifyCFG https://reviews.llvm.org/D84225 It also appears to work for intrinsic version, but I think its limited to C++ only. > > > > Is there some existing property we can put on the intrinsic to prevent SimplifyCFG from hoisting like this? Are we more aggressive than we should be about hoisting inline assembly? > > > > Thanks, > > ~Craig > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > -- > Lerne, wie die Welt wirklich ist, > aber vergiss niemals, wie sie sein sollte.-- Lerne, wie die Welt wirklich ist, aber vergiss niemals, wie sie sein sollte.
Craig Topper via llvm-dev
2020-Aug-10 19:25 UTC
[llvm-dev] _mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier
Thanks Nicolai. I'll try to take a look at the review. The user is the one calling _mm_lfence on a particular path. Would we need some IR transform to turn it into the IR you showed if it is used on two paths? ~Craig On Sun, Aug 9, 2020 at 8:15 AM Nicolai Hähnle <nhaehnle at gmail.com> wrote:> Hi Craig, > > The review for the similar GPU problem is now up here: > https://reviews.llvm.org/D85603 (+ some other patches on the > Phabricator stack). > > From a pragmatic perspective, the constraints added to program > transforms there are sufficient for what you need. You'd produce IR > such as: > > %token = call token @llvm.experimental.convergence.anchor() > br i1 %c, label %then, label %else > > then: > call void @llvm.x86.sse2.lfence() convergent [ > "convergencectrl"(token%token) ] > ... > > else: > call void @llvm.x86.sse2.lfence() convergent [ > "convergencectrl"(token %token) ] > ... > > ... and this would prevent the hoisting of the lfences. > > The puzzle to me is whether one can justify this use of the > convergence tokens from a theoretical point of view. We describe > convergence control in terms of threads that communicate, which is a > faithful description of what's happening in the GPU use case. I wonder > whether for the speculative execution problem, one could justify the > use of the same convergence control machinery by arguing about the > existence of "potential speculative threads of execution" and > communication between them. Basically, the argument would be somewhere > along the lines that the lfence can only proceed execution once all > speculative threads of execution that it _cannot_ communicate with > according to the convergence token are killed off. I suspect that > somebody would have to go off and do some deep thinking for a while to > figure out whether that really makes sense. > > Cheers, > Nicolai > > On Wed, Jul 29, 2020 at 11:14 AM Nicolai Hähnle <nhaehnle at gmail.com> > wrote: > > > > Hi Craig, > > > > that's an interesting problem. > > > > We have a superficially similar problem in GPU programming models > > where there are cross-thread communication operations that are > > sensitive to control flow, as in: > > > > if (c) { > > b = subgroupAdd(a); > > bar(b); > > } else { > > b = subgroupAdd(a); > > baz(b); > > } > > > > LLVM will merge those, even though it changes the behavior > > (potentially summing over a larger set of threads than in the original > > program). Merging them is inherently correct for LLVM's semantics. > > It's the same underlying problem as what you describe: LLVM IR simply > > doesn't have a way of describing these semantics that fall somewhat > > outside of a purely deterministic single-threaded execution model. For > > our needs, we're currently working around this by essentially adding a > > unique ID to each of these operations so that they all appear > > different to LLVM. I suspect that the same could work for you. > > > > Still, it's a bit of an awkward workaround and a better solution would > > be great. I've been wondering whether we could perhaps have token > > values produced by branch instructions to express certain kinds of > > dependencies. In your case, you'd end up with something like: > > > > %token = br i1 %c, label %then, label %else > > > > then: > > call void @llvm.x86.sse2.lfence() [ "some-bundle"(%token) ] > > ... > > > > else: > > call void @llvm.x86.sse2.lfence() [ "some-bundle"(%token) ] > > ... > > > > The token indicates an essential control dependency on the branch > > instruction. I've previously rejected this idea as too invasive, and > > there are alternatives for our particular use case, but if there are > > multiple use cases for this kind of dependency -- and it kind of looks > > like it from where I stand -- then perhaps this is something to > > consider more seriously? > > > > Cheers, > > Nicolai > > > > On Wed, Jul 29, 2020 at 1:30 AM Craig Topper via llvm-dev > > <llvm-dev at lists.llvm.org> wrote: > > > > > > _mm_lfence was originally documented as a load fence. But in light of > speculative execution vulnerabilities it has started being advertised as a > way to prevent speculative execution. Current Intel Software Development > Manual documents it as "Specifically, LFENCE does not execute until all > prior instructions have completed locally, and no later instruction begins > execution until LFENCE completes". > > > > > > For the following test, my intention was to ensure that the body of > either the if or the else would not proceed until any speculation of the > branch had resolved. But SimplifyCFG saw that both control paths started > with an lfence so hoisted it into a single lfence intrinsic before the > branch. https://godbolt.org/z/qMc446 The intrinsic in IR has no > properties so it should be assumed to read/write any memory. But that's not > enough to specify this control flow dependency. gcc also exhibits a similar > behavior. > > > > > > #include <x86intrin.h> > > > > > > void bar(); > > > void baz(); > > > > > > void foo(int c) { > > > if (c) { > > > _mm_lfence(); > > > bar(); > > > } else { > > > _mm_lfence(); > > > baz(); > > > } > > > } > > > > > > > > > Alternatively, I also tried replacing the intrinsics with inline > assembly. SimplifyCFG still merged those. But gcc did not. > https://godbolt.org/z/acnPxY > > > > > > void bar(); > > > void baz(); > > > > > > void foo(int c) { > > > if (c) { > > > __asm__ __volatile ("lfence"); > > > bar(); > > > } else { > > > __asm__ __volatile ("lfence"); > > > baz(); > > > } > > > } > > > > > > I believe the [[clang::nomerge]] attribute was recently extended to > inline assembly which can be used to prevent the inline assembly from being > hoisted by SimplifyCFG https://reviews.llvm.org/D84225 It also appears > to work for intrinsic version, but I think its limited to C++ only. > > > > > > Is there some existing property we can put on the intrinsic to prevent > SimplifyCFG from hoisting like this? Are we more aggressive than we should > be about hoisting inline assembly? > > > > > > Thanks, > > > ~Craig > > > _______________________________________________ > > > LLVM Developers mailing list > > > llvm-dev at lists.llvm.org > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > > > > -- > > Lerne, wie die Welt wirklich ist, > > aber vergiss niemals, wie sie sein sollte. > > > > -- > Lerne, wie die Welt wirklich ist, > aber vergiss niemals, wie sie sein sollte. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200810/d9ba2a8f/attachment.html>
Nicolai Hähnle via llvm-dev
2020-Aug-14 13:29 UTC
[llvm-dev] _mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier
Hi Craig, On Mon, Aug 10, 2020 at 9:25 PM Craig Topper <craig.topper at gmail.com> wrote:> Thanks Nicolai. I'll try to take a look at the review. > > The user is the one calling _mm_lfence on a particular path. Would we need some IR transform to turn it into the IR you showed if it is used on two paths?Sorry, it took me a while to get around to this, and I'm still not sure how to answer this. Ideally, we'd have a good understanding of how to model these speculative execution side effects semantically in a way that can be exposed in a high-level language like C or C++, but I don't know how to do that. Pragmatically, you could just make _mm_lfence `convergent`. With D85603 that would make it an "uncontrolled convergent operation", which would prevent the transform in practice. The document in that review calls those deprecated, because I didn't find a good way of describing what an uncontrolled convergent operation means. So there's a (small) risk that some future change based on a changing understanding of those operations would make the transform reappear. You could probably just add a test case to try to catch that. A more thorough but also more involved option would be for the Clang frontend to generate convergence control intrinsics using an approach similar to the ConvergenceControlHeuristic of D85609, but based on the C++ control flow constructs, in those functions that contain calls to convergent intrinsics. Technically it's true that you strictly speaking only need to do any of this if the intrinsic appears on two different paths. The question is whether you can reliably detect that, given inlining and other transforms. So it's probably best to just always make it `convergent`. The question is whether perhaps you'd want two variants of the intrinsic, one that's `convergent` and another that's not, where the latter is intended to be used by programs that only care about the architectural memory model effects of the intrinsic and not about its impact on speculative execution. Cheers, Nicolai> > ~Craig > > > On Sun, Aug 9, 2020 at 8:15 AM Nicolai Hähnle <nhaehnle at gmail.com> wrote: >> >> Hi Craig, >> >> The review for the similar GPU problem is now up here: >> https://reviews.llvm.org/D85603 (+ some other patches on the >> Phabricator stack). >> >> From a pragmatic perspective, the constraints added to program >> transforms there are sufficient for what you need. You'd produce IR >> such as: >> >> %token = call token @llvm.experimental.convergence.anchor() >> br i1 %c, label %then, label %else >> >> then: >> call void @llvm.x86.sse2.lfence() convergent [ >> "convergencectrl"(token%token) ] >> ... >> >> else: >> call void @llvm.x86.sse2.lfence() convergent [ >> "convergencectrl"(token %token) ] >> ... >> >> ... and this would prevent the hoisting of the lfences. >> >> The puzzle to me is whether one can justify this use of the >> convergence tokens from a theoretical point of view. We describe >> convergence control in terms of threads that communicate, which is a >> faithful description of what's happening in the GPU use case. I wonder >> whether for the speculative execution problem, one could justify the >> use of the same convergence control machinery by arguing about the >> existence of "potential speculative threads of execution" and >> communication between them. Basically, the argument would be somewhere >> along the lines that the lfence can only proceed execution once all >> speculative threads of execution that it _cannot_ communicate with >> according to the convergence token are killed off. I suspect that >> somebody would have to go off and do some deep thinking for a while to >> figure out whether that really makes sense. >> >> Cheers, >> Nicolai >> >> On Wed, Jul 29, 2020 at 11:14 AM Nicolai Hähnle <nhaehnle at gmail.com> wrote: >> > >> > Hi Craig, >> > >> > that's an interesting problem. >> > >> > We have a superficially similar problem in GPU programming models >> > where there are cross-thread communication operations that are >> > sensitive to control flow, as in: >> > >> > if (c) { >> > b = subgroupAdd(a); >> > bar(b); >> > } else { >> > b = subgroupAdd(a); >> > baz(b); >> > } >> > >> > LLVM will merge those, even though it changes the behavior >> > (potentially summing over a larger set of threads than in the original >> > program). Merging them is inherently correct for LLVM's semantics. >> > It's the same underlying problem as what you describe: LLVM IR simply >> > doesn't have a way of describing these semantics that fall somewhat >> > outside of a purely deterministic single-threaded execution model. For >> > our needs, we're currently working around this by essentially adding a >> > unique ID to each of these operations so that they all appear >> > different to LLVM. I suspect that the same could work for you. >> > >> > Still, it's a bit of an awkward workaround and a better solution would >> > be great. I've been wondering whether we could perhaps have token >> > values produced by branch instructions to express certain kinds of >> > dependencies. In your case, you'd end up with something like: >> > >> > %token = br i1 %c, label %then, label %else >> > >> > then: >> > call void @llvm.x86.sse2.lfence() [ "some-bundle"(%token) ] >> > ... >> > >> > else: >> > call void @llvm.x86.sse2.lfence() [ "some-bundle"(%token) ] >> > ... >> > >> > The token indicates an essential control dependency on the branch >> > instruction. I've previously rejected this idea as too invasive, and >> > there are alternatives for our particular use case, but if there are >> > multiple use cases for this kind of dependency -- and it kind of looks >> > like it from where I stand -- then perhaps this is something to >> > consider more seriously? >> > >> > Cheers, >> > Nicolai >> > >> > On Wed, Jul 29, 2020 at 1:30 AM Craig Topper via llvm-dev >> > <llvm-dev at lists.llvm.org> wrote: >> > > >> > > _mm_lfence was originally documented as a load fence. But in light of speculative execution vulnerabilities it has started being advertised as a way to prevent speculative execution. Current Intel Software Development Manual documents it as "Specifically, LFENCE does not execute until all prior instructions have completed locally, and no later instruction begins execution until LFENCE completes". >> > > >> > > For the following test, my intention was to ensure that the body of either the if or the else would not proceed until any speculation of the branch had resolved. But SimplifyCFG saw that both control paths started with an lfence so hoisted it into a single lfence intrinsic before the branch. https://godbolt.org/z/qMc446 The intrinsic in IR has no properties so it should be assumed to read/write any memory. But that's not enough to specify this control flow dependency. gcc also exhibits a similar behavior. >> > > >> > > #include <x86intrin.h> >> > > >> > > void bar(); >> > > void baz(); >> > > >> > > void foo(int c) { >> > > if (c) { >> > > _mm_lfence(); >> > > bar(); >> > > } else { >> > > _mm_lfence(); >> > > baz(); >> > > } >> > > } >> > > >> > > >> > > Alternatively, I also tried replacing the intrinsics with inline assembly. SimplifyCFG still merged those. But gcc did not. https://godbolt.org/z/acnPxY >> > > >> > > void bar(); >> > > void baz(); >> > > >> > > void foo(int c) { >> > > if (c) { >> > > __asm__ __volatile ("lfence"); >> > > bar(); >> > > } else { >> > > __asm__ __volatile ("lfence"); >> > > baz(); >> > > } >> > > } >> > > >> > > I believe the [[clang::nomerge]] attribute was recently extended to inline assembly which can be used to prevent the inline assembly from being hoisted by SimplifyCFG https://reviews.llvm.org/D84225 It also appears to work for intrinsic version, but I think its limited to C++ only. >> > > >> > > Is there some existing property we can put on the intrinsic to prevent SimplifyCFG from hoisting like this? Are we more aggressive than we should be about hoisting inline assembly? >> > > >> > > Thanks, >> > > ~Craig >> > > _______________________________________________ >> > > LLVM Developers mailing list >> > > llvm-dev at lists.llvm.org >> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > >> > >> > >> > -- >> > Lerne, wie die Welt wirklich ist, >> > aber vergiss niemals, wie sie sein sollte. >> >> >> >> -- >> Lerne, wie die Welt wirklich ist, >> aber vergiss niemals, wie sie sein sollte.-- Lerne, wie die Welt wirklich ist, aber vergiss niemals, wie sie sein sollte.