thr3ads.net - llvm dev - [llvm-dev] _mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier [Jul 2020]

If this information is useful, please help other people find it:
Share via:

Craig Topper via llvm-dev

2020-Jul-28 23:30 UTC

[llvm-dev] _mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

_mm_lfence was originally documented as a load fence. But in light of
speculative execution vulnerabilities it has started being advertised as a
way to prevent speculative execution. Current Intel Software Development
Manual documents it as "Specifically, LFENCE does not execute until all
prior instructions have completed locally, and no later instruction begins
execution until LFENCE completes".

For the following test, my intention was to ensure that the body of either
the if or the else would not proceed until any speculation of the branch
had resolved. But SimplifyCFG saw that both control paths started with an
lfence so hoisted it into a single lfence intrinsic before the branch.
https://godbolt.org/z/qMc446    The intrinsic in IR has no properties so it
should be assumed to read/write any memory. But that's not enough to
specify this control flow dependency. gcc also exhibits a similar behavior.

#include <x86intrin.h>

void bar();
void baz();

void foo(int c) {
  if (c) {
      _mm_lfence();
      bar();
  } else {
      _mm_lfence();
      baz();
  }
}


Alternatively, I also tried replacing the intrinsics with inline assembly.
SimplifyCFG still merged those. But gcc did not.
https://godbolt.org/z/acnPxY

void bar();
void baz();

void foo(int c) {
  if (c) {
      __asm__ __volatile ("lfence");
      bar();
  } else {
      __asm__ __volatile ("lfence");
      baz();
  }
}

I believe the [[clang::nomerge]] attribute was recently extended to inline
assembly which can be used to prevent the inline assembly from being
hoisted by SimplifyCFG https://reviews.llvm.org/D84225    It also appears
to work for intrinsic version, but I think its limited to C++ only.

Is there some existing property we can put on the intrinsic to prevent
SimplifyCFG from hoisting like this? Are we more aggressive than we should
be about hoisting inline assembly?

Thanks,
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200728/5224ee14/attachment.html>

Nicolai Hähnle via llvm-dev

2020-Jul-29 09:14 UTC

head link

[llvm-dev] _mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

Hi Craig,

that's an interesting problem.

We have a superficially similar problem in GPU programming models
where there are cross-thread communication operations that are
sensitive to control flow, as in:

  if (c) {
    b = subgroupAdd(a);
    bar(b);
  } else {
    b = subgroupAdd(a);
    baz(b);
  }

LLVM will merge those, even though it changes the behavior
(potentially summing over a larger set of threads than in the original
program). Merging them is inherently correct for LLVM's semantics.
It's the same underlying problem as what you describe: LLVM IR simply
doesn't have a way of describing these semantics that fall somewhat
outside of a purely deterministic single-threaded execution model. For
our needs, we're currently working around this by essentially adding a
unique ID to each of these operations so that they all appear
different to LLVM. I suspect that the same could work for you.

Still, it's a bit of an awkward workaround and a better solution would
be great. I've been wondering whether we could perhaps have token
values produced by branch instructions to express certain kinds of
dependencies. In your case, you'd end up with something like:

    %token = br i1 %c, label %then, label %else

  then:
    call void @llvm.x86.sse2.lfence() [ "some-bundle"(%token) ]
     ...

  else:
    call void @llvm.x86.sse2.lfence() [ "some-bundle"(%token) ]
     ...

The token indicates an essential control dependency on the branch
instruction. I've previously rejected this idea as too invasive, and
there are alternatives for our particular use case, but if there are
multiple use cases for this kind of dependency -- and it kind of looks
like it from where I stand -- then perhaps this is something to
consider more seriously?

Cheers,
Nicolai

On Wed, Jul 29, 2020 at 1:30 AM Craig Topper via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> _mm_lfence was originally documented as a load fence. But in light of
speculative execution vulnerabilities it has started being advertised as a way
to prevent speculative execution. Current Intel Software Development Manual
documents it as "Specifically, LFENCE does not execute until all prior
instructions have completed locally, and no later instruction begins execution
until LFENCE completes".
>
> For the following test, my intention was to ensure that the body of either
the if or the else would not proceed until any speculation of the branch had
resolved. But SimplifyCFG saw that both control paths started with an lfence so
hoisted it into a single lfence intrinsic before the branch.
https://godbolt.org/z/qMc446    The intrinsic in IR has no properties so it
should be assumed to read/write any memory. But that's not enough to specify
this control flow dependency. gcc also exhibits a similar behavior.
>
> #include <x86intrin.h>
>
> void bar();
> void baz();
>
> void foo(int c) {
>   if (c) {
>       _mm_lfence();
>       bar();
>   } else {
>       _mm_lfence();
>       baz();
>   }
> }
>
>
> Alternatively, I also tried replacing the intrinsics with inline assembly.
SimplifyCFG still merged those. But gcc did not. https://godbolt.org/z/acnPxY
>
> void bar();
> void baz();
>
> void foo(int c) {
>   if (c) {
>       __asm__ __volatile ("lfence");
>       bar();
>   } else {
>       __asm__ __volatile ("lfence");
>       baz();
>   }
> }
>
> I believe the [[clang::nomerge]] attribute was recently extended to inline
assembly which can be used to prevent the inline assembly from being hoisted by
SimplifyCFG https://reviews.llvm.org/D84225    It also appears to work for
intrinsic version, but I think its limited to C++ only.
>
> Is there some existing property we can put on the intrinsic to prevent
SimplifyCFG from hoisting like this? Are we more aggressive than we should be
about hoisting inline assembly?
>
> Thanks,
> ~Craig
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.

Nicolai Hähnle via llvm-dev

2020-Aug-09 15:15 UTC

head link

[llvm-dev] _mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

Hi Craig,

The review for the similar GPU problem is now up here:
https://reviews.llvm.org/D85603 (+ some other patches on the
Phabricator stack).
>From a pragmatic perspective, the constraints added to programtransforms there are sufficient for what you need. You'd produce IR
such as:

    %token = call token @llvm.experimental.convergence.anchor()
    br i1 %c, label %then, label %else

  then:
    call void @llvm.x86.sse2.lfence() convergent [
"convergencectrl"(token%token) ]
     ...

  else:
    call void @llvm.x86.sse2.lfence() convergent [
"convergencectrl"(token %token) ]
    ...

... and this would prevent the hoisting of the lfences.

The puzzle to me is whether one can justify this use of the
convergence tokens from a theoretical point of view. We describe
convergence control in terms of threads that communicate, which is a
faithful description of what's happening in the GPU use case. I wonder
whether for the speculative execution problem, one could justify the
use of the same convergence control machinery by arguing about the
existence of "potential speculative threads of execution" and
communication between them. Basically, the argument would be somewhere
along the lines that the lfence can only proceed execution once all
speculative threads of execution that it _cannot_ communicate with
according to the convergence token are killed off. I suspect that
somebody would have to go off and do some deep thinking for a while to
figure out whether that really makes sense.

Cheers,
Nicolai

On Wed, Jul 29, 2020 at 11:14 AM Nicolai Hähnle <nhaehnle at gmail.com>
wrote:>
> Hi Craig,
>
> that's an interesting problem.
>
> We have a superficially similar problem in GPU programming models
> where there are cross-thread communication operations that are
> sensitive to control flow, as in:
>
>   if (c) {
>     b = subgroupAdd(a);
>     bar(b);
>   } else {
>     b = subgroupAdd(a);
>     baz(b);
>   }
>
> LLVM will merge those, even though it changes the behavior
> (potentially summing over a larger set of threads than in the original
> program). Merging them is inherently correct for LLVM's semantics.
> It's the same underlying problem as what you describe: LLVM IR simply
> doesn't have a way of describing these semantics that fall somewhat
> outside of a purely deterministic single-threaded execution model. For
> our needs, we're currently working around this by essentially adding a
> unique ID to each of these operations so that they all appear
> different to LLVM. I suspect that the same could work for you.
>
> Still, it's a bit of an awkward workaround and a better solution would
> be great. I've been wondering whether we could perhaps have token
> values produced by branch instructions to express certain kinds of
> dependencies. In your case, you'd end up with something like:
>
>     %token = br i1 %c, label %then, label %else
>
>   then:
>     call void @llvm.x86.sse2.lfence() [ "some-bundle"(%token) ]
>      ...
>
>   else:
>     call void @llvm.x86.sse2.lfence() [ "some-bundle"(%token) ]
>      ...
>
> The token indicates an essential control dependency on the branch
> instruction. I've previously rejected this idea as too invasive, and
> there are alternatives for our particular use case, but if there are
> multiple use cases for this kind of dependency -- and it kind of looks
> like it from where I stand -- then perhaps this is something to
> consider more seriously?
>
> Cheers,
> Nicolai
>
> On Wed, Jul 29, 2020 at 1:30 AM Craig Topper via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > _mm_lfence was originally documented as a load fence. But in light of
speculative execution vulnerabilities it has started being advertised as a way
to prevent speculative execution. Current Intel Software Development Manual
documents it as "Specifically, LFENCE does not execute until all prior
instructions have completed locally, and no later instruction begins execution
until LFENCE completes".
> >
> > For the following test, my intention was to ensure that the body of
either the if or the else would not proceed until any speculation of the branch
had resolved. But SimplifyCFG saw that both control paths started with an lfence
so hoisted it into a single lfence intrinsic before the branch.
https://godbolt.org/z/qMc446    The intrinsic in IR has no properties so it
should be assumed to read/write any memory. But that's not enough to specify
this control flow dependency. gcc also exhibits a similar behavior.
> >
> > #include <x86intrin.h>
> >
> > void bar();
> > void baz();
> >
> > void foo(int c) {
> >   if (c) {
> >       _mm_lfence();
> >       bar();
> >   } else {
> >       _mm_lfence();
> >       baz();
> >   }
> > }
> >
> >
> > Alternatively, I also tried replacing the intrinsics with inline
assembly. SimplifyCFG still merged those. But gcc did not.
https://godbolt.org/z/acnPxY
> >
> > void bar();
> > void baz();
> >
> > void foo(int c) {
> >   if (c) {
> >       __asm__ __volatile ("lfence");
> >       bar();
> >   } else {
> >       __asm__ __volatile ("lfence");
> >       baz();
> >   }
> > }
> >
> > I believe the [[clang::nomerge]] attribute was recently extended to
inline assembly which can be used to prevent the inline assembly from being
hoisted by SimplifyCFG https://reviews.llvm.org/D84225    It also appears to
work for intrinsic version, but I think its limited to C++ only.
> >
> > Is there some existing property we can put on the intrinsic to prevent
SimplifyCFG from hoisting like this? Are we more aggressive than we should be
about hoisting inline assembly?
> >
> > Thanks,
> > ~Craig
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> --
> Lerne, wie die Welt wirklich ist,
> aber vergiss niemals, wie sie sein sollte.


-- 
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Jul 2020 - _mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

[llvm-dev] _mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

[llvm-dev] _mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

[llvm-dev] _mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

Maybe Matching Threads