thr3ads.net - llvm dev - [llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics") [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Sanjoy Das via llvm-dev

2016-Feb-24 23:57 UTC

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

Hi all,

This is something that came up in the "RFC: Add guard intrinsics to
LLVM" thread; and while I'm not exactly blocked on this, figuring out
a path forward here will be helpful in deciding if we can use the
available_externally linkage type to expression certain semantic
properties guard intrinsics will have.

Let's start with an example that shows that we have a problem (direct
copy/paste from the guard intrinsics thread). Say we have:

```
void foo() available_externally {
  %t0 = load atomic %ptr
  %t1 = load atomic %ptr
  if (%t0 != %t1) print("X");
}
void main() {
  foo();
  print("Y");
}
```

The possible behaviors of the above program are {print("X"),
print("Y")} or {print("Y")}.  But if we run opt then we have

```
void foo() available_externally readnone nounwind {
  ;; After CSE'ing the two loads and folding the condition
}
void main() {
  foo();
  print("Y");
}
```

and some generic reordering

```
void foo() available_externally readnone nounwind {
  ;; After CSE'ing the two loads and folding the condition
}
void main() {
  print("Y");
  foo();  // legal since we're moving a readnone nounwind function that
          // was guaranteed to execute (hence can't have UB)
}
```

If we do not inline @foo(), and instead re-link the call site in @main
to some non-optimized copy (or differently optimized copy) of @foo,
then it is possible for the program to have the behavior {print("Y");
print ("X")}, which was disallowed in the earlier program.

In other words, opt refined the semantics of @foo() (i.e. reduced the
set of behaviors it may have) in ways that would make later
optimizations invalid if we de-refine the implementation of @foo().

The above example is clearly fabricated, but such cases can come up
even if everything is optimized to the same level.  E.g. one of the
atomic loads in the unrefined implementation of @foo() could have been
hidden behind a function call, whose body existed in only one module.
That module would then be able to refine @foo() to `ret void` but
other modules won't.

The only solution I can think of is to redefine available_externally
to mean "the only kind of IPO/IPA you can do over a call to this
function is to inline it".  Redefining available_externally this way
will also let us soundly use it to represent calls to functions that
have guard intrinsics, since a failed guard intrinsic basically
replaces the function with a "very de-refined" implementation (the
interpreter).

What do you think?  I don't think implementing the above above will be
very difficult, but needless to say, it will still be a fairly
non-trivial semantic change (hence I'm not directly jumping to
implementation).


-- Sanjoy

Duncan P. N. Exon Smith via llvm-dev

2016-Feb-25 02:51 UTC

head link

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

> On 2016-Feb-24, at 15:57, Sanjoy Das via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi all,
> 
> This is something that came up in the "RFC: Add guard intrinsics to
> LLVM" thread; and while I'm not exactly blocked on this, figuring
out
> a path forward here will be helpful in deciding if we can use the
> available_externally linkage type to expression certain semantic
> properties guard intrinsics will have.
> 
> Let's start with an example that shows that we have a problem (direct
> copy/paste from the guard intrinsics thread). Say we have:
> 
> ```
> void foo() available_externally {
>  %t0 = load atomic %ptr
>  %t1 = load atomic %ptr
>  if (%t0 != %t1) print("X");
> }
> void main() {
>  foo();
>  print("Y");
> }
> ```
> 
> The possible behaviors of the above program are {print("X"),
> print("Y")} or {print("Y")}.  But if we run opt then we
have
> 
> ```
> void foo() available_externally readnone nounwind {
>  ;; After CSE'ing the two loads and folding the condition
> }
> void main() {
>  foo();
>  print("Y");
> }
> ```
> 
> and some generic reordering
> 
> ```
> void foo() available_externally readnone nounwind {
>  ;; After CSE'ing the two loads and folding the condition
> }
> void main() {
>  print("Y");
>  foo();  // legal since we're moving a readnone nounwind function that
>          // was guaranteed to execute (hence can't have UB)
> }
> ```
> 
> If we do not inline @foo(), and instead re-link the call site in @main
> to some non-optimized copy (or differently optimized copy) of @foo,
> then it is possible for the program to have the behavior
{print("Y");
> print ("X")}, which was disallowed in the earlier program.
> 
> In other words, opt refined the semantics of @foo() (i.e. reduced the
> set of behaviors it may have) in ways that would make later
> optimizations invalid if we de-refine the implementation of @foo().
I'm probably missing something obvious here.  How could the result of
`%t0 != %t1` be different at optimization time in one file than from
runtime in the "real" implementation?  Doesn't this make the CSE
invalid?

Does linkonce_odr linkage have the same problem?
- If so, do you want to change it too?
- Else, why not?
> The above example is clearly fabricated, but such cases can come up
> even if everything is optimized to the same level.  E.g. one of the
> atomic loads in the unrefined implementation of @foo() could have been
> hidden behind a function call, whose body existed in only one module.
> That module would then be able to refine @foo() to `ret void` but
> other modules won't.
> 
> The only solution I can think of is to redefine available_externally
> to mean "the only kind of IPO/IPA you can do over a call to this
> function is to inline it".  Redefining available_externally this way
> will also let us soundly use it to represent calls to functions that
> have guard intrinsics, since a failed guard intrinsic basically
> replaces the function with a "very de-refined" implementation
(the
> interpreter).
> 
> What do you think?  I don't think implementing the above above will be
> very difficult, but needless to say, it will still be a fairly
> non-trivial semantic change (hence I'm not directly jumping to
> implementation).
This linkage is used in three places (that I know of) by clang:

  1. C-style `inline` functions.
  2. Functions defined in C++ template classes with external explicit
     instantiations, e.g. S::foo() in:

         template <class T> struct S { void foo() {} };
         void bar() { S<int>().foo(); }
         extern template struct S<int>;

  3. -flto=thin cross-module function importing.

(No comment on (1); its exact semantics are a little fuzzy to me.)
For (2) and (3), the current behaviour seems correct, and I'd be
hesitant to lose optimizing power.  (2) is under the "ODR" rule, and
I think we've been applying the same logic to (3).  Unless, are you
saying ODR isn't enough?

Assuming you need this new definition (but under ODR, the semantics
are correct), I would rather split the linkage than change it.  E.g.,
use a name like available_externally_odr for (2) and (3).

Sanjoy Das via llvm-dev

2016-Feb-25 03:09 UTC

head link

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

On Wed, Feb 24, 2016 at 6:51 PM, Duncan P. N. Exon Smith
<dexonsmith at apple.com> wrote:>> If we do not inline @foo(), and instead re-link the call site in @main
>> to some non-optimized copy (or differently optimized copy) of @foo,
>> then it is possible for the program to have the behavior
{print("Y");
>> print ("X")}, which was disallowed in the earlier program.
>>
>> In other words, opt refined the semantics of @foo() (i.e. reduced the
>> set of behaviors it may have) in ways that would make later
>> optimizations invalid if we de-refine the implementation of @foo().
>
> I'm probably missing something obvious here.  How could the result of
> `%t0 != %t1` be different at optimization time in one file than from
> runtime in the "real" implementation?  Doesn't this make the
CSE
> invalid?
`%t0` and `%t1` are "allowed" to "always be the same", i.e.
an
implementation of @foo that always feeds in the same
value for `%t0` and `%t1` is a valid implementation (which is why the
CSE was valid); but it is not the *only* valid implementation.  If I
don't CSE the two load instructions (also a valid thing to do), and
this is a second thread writing to `%par`, then the two values loaded
can be different, and you could end up printing `"X"` in `@foo`.

Did that make sense?
> Does linkonce_odr linkage have the same problem?
> - If so, do you want to change it too?
> - Else, why not?
Going by the specification in the LangRef, I'd say it depends on how
you define "definitive".  If you're allowed to replace the body of
a
function with a differently optimized body, then the above problem
exists.
>> The above example is clearly fabricated, but such cases can come up
>> even if everything is optimized to the same level.  E.g. one of the
>> atomic loads in the unrefined implementation of @foo() could have been
>> hidden behind a function call, whose body existed in only one module.
>> That module would then be able to refine @foo() to `ret void` but
>> other modules won't.
>>
>> The only solution I can think of is to redefine available_externally
>> to mean "the only kind of IPO/IPA you can do over a call to this
>> function is to inline it".  Redefining available_externally this
way
>> will also let us soundly use it to represent calls to functions that
>> have guard intrinsics, since a failed guard intrinsic basically
>> replaces the function with a "very de-refined" implementation
(the
>> interpreter).
>>
>> What do you think?  I don't think implementing the above above will
be
>> very difficult, but needless to say, it will still be a fairly
>> non-trivial semantic change (hence I'm not directly jumping to
>> implementation).
>
> This linkage is used in three places (that I know of) by clang:
>
>   1. C-style `inline` functions.
>   2. Functions defined in C++ template classes with external explicit
>      instantiations, e.g. S::foo() in:
>
>          template <class T> struct S { void foo() {} };
>          void bar() { S<int>().foo(); }
>          extern template struct S<int>;
>
>   3. -flto=thin cross-module function importing.
>
> (No comment on (1); its exact semantics are a little fuzzy to me.)
> For (2) and (3), the current behaviour seems correct, and I'd be
> hesitant to lose optimizing power.  (2) is under the "ODR" rule,
and
> I think we've been applying the same logic to (3).  Unless, are you
> saying ODR isn't enough?
By ODR, do you mean you only have one definition of the function in
the whole link (i.e. across all modules you'll link together)?
Then yes, ODR should be enough to avoid this.  But in any place where
the linker sees two differently optimized definitions for a function
and picks one as the definitive version all non-inlined calls link to,
we have this problem.
> Assuming you need this new definition (but under ODR, the semantics
> are correct), I would rather split the linkage than change it.  E.g.,
> use a name like available_externally_odr for (2) and (3).
If what I said above is correct (i.e. ODR == OD across everything
you're linking into your final executable) then splitting the linkage
/ adding a new one is probably the best alternative.

-- Sanjoy

Xinliang David Li via llvm-dev

2016-Feb-25 06:23 UTC

head link

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

We have seen similar issues with COMDAT in our production environment --
basically we can not safely mix object files compiled with different -mxxx
options. One scenario is that a users puts target specific
(multi-versioned) functions in one file and build it with option such as
-mavx. Such functions are called with runtime guard so there is no issue.
However comdat functions brought in from common headers can be problematic
-- as if the avx version of the function gets picked by the linker, the
program will crash when running on hardware without AVX.   One proposal to
avoid this is to do function privatization.

thanks,

David

On Wed, Feb 24, 2016 at 3:57 PM, Sanjoy Das via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all,
>
> This is something that came up in the "RFC: Add guard intrinsics to
> LLVM" thread; and while I'm not exactly blocked on this, figuring
out
> a path forward here will be helpful in deciding if we can use the
> available_externally linkage type to expression certain semantic
> properties guard intrinsics will have.
>
> Let's start with an example that shows that we have a problem (direct
> copy/paste from the guard intrinsics thread). Say we have:
>
> ```
> void foo() available_externally {
>   %t0 = load atomic %ptr
>   %t1 = load atomic %ptr
>   if (%t0 != %t1) print("X");
> }
> void main() {
>   foo();
>   print("Y");
> }
> ```
>
> The possible behaviors of the above program are {print("X"),
> print("Y")} or {print("Y")}.  But if we run opt then we
have
>
> ```
> void foo() available_externally readnone nounwind {
>   ;; After CSE'ing the two loads and folding the condition
> }
> void main() {
>   foo();
>   print("Y");
> }
> ```
>
> and some generic reordering
>
> ```
> void foo() available_externally readnone nounwind {
>   ;; After CSE'ing the two loads and folding the condition
> }
> void main() {
>   print("Y");
>   foo();  // legal since we're moving a readnone nounwind function that
>           // was guaranteed to execute (hence can't have UB)
> }
> ```
>
> If we do not inline @foo(), and instead re-link the call site in @main
> to some non-optimized copy (or differently optimized copy) of @foo,
> then it is possible for the program to have the behavior
{print("Y");
> print ("X")}, which was disallowed in the earlier program.
>
> In other words, opt refined the semantics of @foo() (i.e. reduced the
> set of behaviors it may have) in ways that would make later
> optimizations invalid if we de-refine the implementation of @foo().
>
> The above example is clearly fabricated, but such cases can come up
> even if everything is optimized to the same level.  E.g. one of the
> atomic loads in the unrefined implementation of @foo() could have been
> hidden behind a function call, whose body existed in only one module.
> That module would then be able to refine @foo() to `ret void` but
> other modules won't.
>
> The only solution I can think of is to redefine available_externally
> to mean "the only kind of IPO/IPA you can do over a call to this
> function is to inline it".  Redefining available_externally this way
> will also let us soundly use it to represent calls to functions that
> have guard intrinsics, since a failed guard intrinsic basically
> replaces the function with a "very de-refined" implementation
(the
> interpreter).
>
> What do you think?  I don't think implementing the above above will be
> very difficult, but needless to say, it will still be a fairly
> non-trivial semantic change (hence I'm not directly jumping to
> implementation).
>
>
> -- Sanjoy
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160224/bc480c45/attachment.html>

James Y Knight via llvm-dev

2016-Feb-25 19:41 UTC

head link

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

While we're talking about this, I'd just mention again that the same
issue
arises for *normal* functions too, when linked into a shared library:
   int foo() { return 1; }
   int bar() { return foo(); }

Now, compare:
  clang -fPIC -O1 -S -o - test.c
  gcc -fPIC -O1 -S -o - test.c

GCC will refuse to inline foo into bar, or use any information about foo in
compiling bar, because foo is exported in the dynamic symbol table, and
thus replaceable via symbol interposition.

Clang assumes that you won't do that, or that you don't care what
happens
if you do. It will happily inline. And, in absense of inlining (e.g. if foo
is too long to inline), clang will deduce function attributes about foo and
rely on those in bar -- despite that the call goes through the PLT and
could in fact be an entirely different unrelated implementation (or, for
that matter, a differently-optimized version of the same implementation).

Is that *really* okay?


On Wed, Feb 24, 2016 at 6:57 PM, Sanjoy Das via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all,
>
> This is something that came up in the "RFC: Add guard intrinsics to
> LLVM" thread; and while I'm not exactly blocked on this, figuring
out
> a path forward here will be helpful in deciding if we can use the
> available_externally linkage type to expression certain semantic
> properties guard intrinsics will have.
>
> Let's start with an example that shows that we have a problem (direct
> copy/paste from the guard intrinsics thread). Say we have:
>
> ```
> void foo() available_externally {
>   %t0 = load atomic %ptr
>   %t1 = load atomic %ptr
>   if (%t0 != %t1) print("X");
> }
> void main() {
>   foo();
>   print("Y");
> }
> ```
>
> The possible behaviors of the above program are {print("X"),
> print("Y")} or {print("Y")}.  But if we run opt then we
have
>
> ```
> void foo() available_externally readnone nounwind {
>   ;; After CSE'ing the two loads and folding the condition
> }
> void main() {
>   foo();
>   print("Y");
> }
> ```
>
> and some generic reordering
>
> ```
> void foo() available_externally readnone nounwind {
>   ;; After CSE'ing the two loads and folding the condition
> }
> void main() {
>   print("Y");
>   foo();  // legal since we're moving a readnone nounwind function that
>           // was guaranteed to execute (hence can't have UB)
> }
> ```
>
> If we do not inline @foo(), and instead re-link the call site in @main
> to some non-optimized copy (or differently optimized copy) of @foo,
> then it is possible for the program to have the behavior
{print("Y");
> print ("X")}, which was disallowed in the earlier program.
>
> In other words, opt refined the semantics of @foo() (i.e. reduced the
> set of behaviors it may have) in ways that would make later
> optimizations invalid if we de-refine the implementation of @foo().
>
> The above example is clearly fabricated, but such cases can come up
> even if everything is optimized to the same level.  E.g. one of the
> atomic loads in the unrefined implementation of @foo() could have been
> hidden behind a function call, whose body existed in only one module.
> That module would then be able to refine @foo() to `ret void` but
> other modules won't.
>
> The only solution I can think of is to redefine available_externally
> to mean "the only kind of IPO/IPA you can do over a call to this
> function is to inline it".  Redefining available_externally this way
> will also let us soundly use it to represent calls to functions that
> have guard intrinsics, since a failed guard intrinsic basically
> replaces the function with a "very de-refined" implementation
(the
> interpreter).
>
> What do you think?  I don't think implementing the above above will be
> very difficult, but needless to say, it will still be a fairly
> non-trivial semantic change (hence I'm not directly jumping to
> implementation).
>
>
> -- Sanjoy
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160225/f4b0f420/attachment.html>

Hal Finkel via llvm-dev

2016-Feb-27 01:50 UTC

head link

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

----- Original Message -----
> From: "James Y Knight via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "Sanjoy Das" <sanjoy at playingwithpointers.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Thursday, February 25, 2016 1:41:43 PM
> Subject: Re: [llvm-dev] Possible soundness issue with
> available_externally (split from "RFC: Add guard intrinsics")
> While we're talking about this, I'd just mention again that the
same
> issue arises for *normal* functions too, when linked into a shared
> library:
> int foo() { return 1; }
> int bar() { return foo(); }
> Now, compare:
> clang -fPIC -O1 -S -o - test.c
> gcc -fPIC -O1 -S -o - test.c
> GCC will refuse to inline foo into bar, or use any information about
> foo in compiling bar, because foo is exported in the dynamic symbol
> table, and thus replaceable via symbol interposition.
> Clang assumes that you won't do that, or that you don't care what
> happens if you do. It will happily inline. And, in absense of
> inlining (e.g. if foo is too long to inline), clang will deduce
> function attributes about foo and rely on those in bar -- despite
> that the call goes through the PLT and could in fact be an entirely
> different unrelated implementation (or, for that matter, a
> differently-optimized version of the same implementation).
> Is that *really* okay?
I'm comfortable with saying that symbol interposition falls outside of the
model we have for the targeted system (at least by default), and thus, this is
okay. We also don't model the possibility of someone hex-editing the binary
;)

-Hal 
> On Wed, Feb 24, 2016 at 6:57 PM, Sanjoy Das via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:
> > Hi all,
> 
> > This is something that came up in the "RFC: Add guard intrinsics
to
> 
> > LLVM" thread; and while I'm not exactly blocked on this,
figuring
> > out
> 
> > a path forward here will be helpful in deciding if we can use the
> 
> > available_externally linkage type to expression certain semantic
> 
> > properties guard intrinsics will have.
> 
> > Let's start with an example that shows that we have a problem
> > (direct
> 
> > copy/paste from the guard intrinsics thread). Say we have:
> 
> > ```
> 
> > void foo() available_externally {
> 
> > %t0 = load atomic %ptr
> 
> > %t1 = load atomic %ptr
> 
> > if (%t0 != %t1) print("X");
> 
> > }
> 
> > void main() {
> 
> > foo();
> 
> > print("Y");
> 
> > }
> 
> > ```
> 
> > The possible behaviors of the above program are {print("X"),
> 
> > print("Y")} or {print("Y")}. But if we run opt
then we have
> 
> > ```
> 
> > void foo() available_externally readnone nounwind {
> 
> > ;; After CSE'ing the two loads and folding the condition
> 
> > }
> 
> > void main() {
> 
> > foo();
> 
> > print("Y");
> 
> > }
> 
> > ```
> 
> > and some generic reordering
> 
> > ```
> 
> > void foo() available_externally readnone nounwind {
> 
> > ;; After CSE'ing the two loads and folding the condition
> 
> > }
> 
> > void main() {
> 
> > print("Y");
> 
> > foo(); // legal since we're moving a readnone nounwind function
> > that
> 
> > // was guaranteed to execute (hence can't have UB)
> 
> > }
> 
> > ```
> 
> > If we do not inline @foo(), and instead re-link the call site in
> > @main
> 
> > to some non-optimized copy (or differently optimized copy) of @foo,
> 
> > then it is possible for the program to have the behavior
> > {print("Y");
> 
> > print ("X")}, which was disallowed in the earlier program.
> 
> > In other words, opt refined the semantics of @foo() (i.e. reduced
> > the
> 
> > set of behaviors it may have) in ways that would make later
> 
> > optimizations invalid if we de-refine the implementation of @foo().
> 
> > The above example is clearly fabricated, but such cases can come up
> 
> > even if everything is optimized to the same level. E.g. one of the
> 
> > atomic loads in the unrefined implementation of @foo() could have
> > been
> 
> > hidden behind a function call, whose body existed in only one
> > module.
> 
> > That module would then be able to refine @foo() to `ret void` but
> 
> > other modules won't.
> 
> > The only solution I can think of is to redefine
> > available_externally
> 
> > to mean "the only kind of IPO/IPA you can do over a call to this
> 
> > function is to inline it". Redefining available_externally this
way
> 
> > will also let us soundly use it to represent calls to functions
> > that
> 
> > have guard intrinsics, since a failed guard intrinsic basically
> 
> > replaces the function with a "very de-refined"
implementation (the
> 
> > interpreter).
> 
> > What do you think? I don't think implementing the above above will
> > be
> 
> > very difficult, but needless to say, it will still be a fairly
> 
> > non-trivial semantic change (hence I'm not directly jumping to
> 
> > implementation).
> 
> > -- Sanjoy
> 
> > _______________________________________________
> 
> > LLVM Developers mailing list
> 
> > llvm-dev at lists.llvm.org
> 
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160226/796e1c12/attachment.html>

Gerolf Hoflehner via llvm-dev

2016-Feb-27 06:16 UTC

head link

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

> On Feb 25, 2016, at 11:41 AM, James Y Knight via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> While we're talking about this, I'd just mention again that the
same issue arises for *normal* functions too, when linked into a shared library:
>    int foo() { return 1; }
>    int bar() { return foo(); }
> 
> Now, compare:
>   clang -fPIC -O1 -S -o - test.c
>   gcc -fPIC -O1 -S -o - test.c
> 
> GCC will refuse to inline foo into bar, or use any information about foo in
compiling bar, because foo is exported in the dynamic symbol table, and thus
replaceable via symbol interposition.
> 
> Clang assumes that you won't do that, or that you don't care what
happens if you do. It will happily inline. And, in absense of inlining (e.g. if
foo is too long to inline), clang will deduce function attributes about foo and
rely on those in bar -- despite that the call goes through the PLT and could in
fact be an entirely different unrelated implementation (or, for that matter, a
differently-optimized version of the same implementation).
> 
> Is that *really* okay?+1 

I agree. The problem goes deeper than just dealing with function attributes. The
question is what optimizations are allowed for an OS specific preemption model?
The function attributes add additional need for clarification.

It think at the heart of this difference are assumptions about the OS preemption
model. Linux by default assumes that global data/functions are preemptable, so
in your example based on that model foo could not be inlined. You should also
see gp save and restores around global calls for similar reasons, extra levels
of indirections when loading global data etc.  An alternative model is to invert
the default by requiring preemptable data/functions to be marked. This is the
path eg. Windows has chosen with dllimport directives.

FWIW, my reading of available_external is that although the function is/can be
preempted it still can be inlined since the code of the external function will
match the definition in the modulo. The question about legality of other
optimizations are similar to the question which optimization is allowed in which
preemption model even w/o the attribute. However, I don’t have much experience
with the function attributes.

-Gerolf
> 
> 
> On Wed, Feb 24, 2016 at 6:57 PM, Sanjoy Das via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> Hi all,
> 
> This is something that came up in the "RFC: Add guard intrinsics to
> LLVM" thread; and while I'm not exactly blocked on this, figuring
out
> a path forward here will be helpful in deciding if we can use the
> available_externally linkage type to expression certain semantic
> properties guard intrinsics will have.
> 
> Let's start with an example that shows that we have a problem (direct
> copy/paste from the guard intrinsics thread). Say we have:
> 
> ```
> void foo() available_externally {
>   %t0 = load atomic %ptr
>   %t1 = load atomic %ptr
>   if (%t0 != %t1) print("X");
> }
> void main() {
>   foo();
>   print("Y");
> }
> ```
> 
> The possible behaviors of the above program are {print("X"),
> print("Y")} or {print("Y")}.  But if we run opt then we
have
> 
> ```
> void foo() available_externally readnone nounwind {
>   ;; After CSE'ing the two loads and folding the condition
> }
> void main() {
>   foo();
>   print("Y");
> }
> ```
> 
> and some generic reordering
> 
> ```
> void foo() available_externally readnone nounwind {
>   ;; After CSE'ing the two loads and folding the condition
> }
> void main() {
>   print("Y");
>   foo();  // legal since we're moving a readnone nounwind function that
>           // was guaranteed to execute (hence can't have UB)
> }
> ```
> 
> If we do not inline @foo(), and instead re-link the call site in @main
> to some non-optimized copy (or differently optimized copy) of @foo,
> then it is possible for the program to have the behavior
{print("Y");
> print ("X")}, which was disallowed in the earlier program.
> 
> In other words, opt refined the semantics of @foo() (i.e. reduced the
> set of behaviors it may have) in ways that would make later
> optimizations invalid if we de-refine the implementation of @foo().
> 
> The above example is clearly fabricated, but such cases can come up
> even if everything is optimized to the same level.  E.g. one of the
> atomic loads in the unrefined implementation of @foo() could have been
> hidden behind a function call, whose body existed in only one module.
> That module would then be able to refine @foo() to `ret void` but
> other modules won't.
> 
> The only solution I can think of is to redefine available_externally
> to mean "the only kind of IPO/IPA you can do over a call to this
> function is to inline it".  Redefining available_externally this way
> will also let us soundly use it to represent calls to functions that
> have guard intrinsics, since a failed guard intrinsic basically
> replaces the function with a "very de-refined" implementation
(the
> interpreter).
> 
> What do you think?  I don't think implementing the above above will be
> very difficult, but needless to say, it will still be a fairly
> non-trivial semantic change (hence I'm not directly jumping to
> implementation).
> 
> 
> -- Sanjoy
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160226/a8c0f5ba/attachment-0001.html>

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Feb 2016 - Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

Possibly Parallel Threads