thr3ads.net - llvm dev - [llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics") [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2016-Feb-25 05:34 UTC

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

----- Original Message ----- 
> From: "Chandler Carruth via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "Philip Reames" <listmail at philipreames.com>,
"Duncan P. N. Exon
> Smith" <dexonsmith at apple.com>, "Sanjoy Das"
> <sanjoy at playingwithpointers.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Wednesday, February 24, 2016 10:29:23 PM
> Subject: Re: [llvm-dev] Possible soundness issue with
> available_externally (split from "RFC: Add guard intrinsics")
> Yea, I'm pretty sad about all of this. I'm also not seeing a lot of
> awesome paths forward.
> Here is the least bad strategy I can come up with. Curious if folks
> think this is sufficient:
I may not completely understand the problem, but this seems like overkill. The
underlying restriction is that, if the compiler makes a
non-determinism-collapsing choice when optimizing a function, it must make the
same choice for all definitions of that function (undefined behavior excluded).
Thus, with an externally_available function, the CSE in Sanjoy's original
example should be forbidden. Richard's example again demonstrates this
principle, although in this case the non-determinism is in the choice of a
globally-visible implementation technique rather than non-determinism from
memory-subsystem reordering.

There is a complication, which you imply in your proposal, that such
optimizations need to be forbidden not just in the externally_available
functions themselves, but in any local function transitively called by one.
This, however, we can take care of with an (easily-deduced) attribute.

In short, it is not clear to me that the number of problematic optimizations is
large (seems likely restricted to things involving atomics in practice), and
while I understand the auditing difficulties here, we should just restrict these
in appropriate contexts instead of trying to restrict all information flow into
or out of comdats.

 -Hal
> 1) Stop deducing function attributes within comdats by examining the
> bodies of the functions (so that we remain free to transform the
> bodies of functions).
> 2) Teach frontends to emit (even at O0!!!) trivially deduced function
> attributes for comdats so that we continue to catch easy cases.
> 3) Ensure and specify that we never hoist code *into* a comdat group
> in which it would not have been executed previously. I don't know of
> anything in LLVM that does this today, but it would become an
> important invariant.
> 4) Work a lot harder to do internalizing and removing of this
> restriction.
> Pretty horrible. But I think it is correct.
> As a slight modification to #1 and #2, we could have a very carefully
> crafted deduction rule where we only deduce function attributes for
> functions prior to any modification of their function bodies. Such
> attributes should be conservatively correct because we would never
> lift new code into the function bodies. This would at least allow us
> to do bottom-up deduction to catch interprocedural cases. But it
> would become incredibly subtle that this is only valid prior to
> *any* transformations of the comdat-containing functions.
> I'm starting to think this subtle rule might be worth it. But I'm
> frankly terrified by the implications.
> On Wed, Feb 24, 2016 at 8:13 PM Philip Reames via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:
> > On 02/24/2016 08:10 PM, Duncan P. N. Exon Smith via llvm-dev wrote:
> 
> > >> On 2016-Feb-24, at 19:46, Sanjoy Das <
> > >> sanjoy at playingwithpointers.com > wrote:
> 
> > >>
> 
> > >> On Wed, Feb 24, 2016 at 7:38 PM, Chandler Carruth <
> > >> chandlerc at google.com > wrote:
> 
> > >>> On Wed, Feb 24, 2016 at 7:34 PM Duncan P. N. Exon Smith
> 
> > >>> < dexonsmith at apple.com > wrote:
> 
> > >>>>
> 
> > >>>>> On 2016-Feb-24, at 19:17, Chandler Carruth <
> > >>>>> chandlerc at google.com > wrote:
> 
> > >>>>>
> 
> > >>>>> On Wed, Feb 24, 2016 at 7:10 PM Sanjoy Das via
llvm-dev
> 
> > >>>>> < llvm-dev at lists.llvm.org > wrote:
> 
> > >>>>> On Wed, Feb 24, 2016 at 6:51 PM, Duncan P. N.
Exon Smith
> 
> > >>>>> < dexonsmith at apple.com > wrote:
> 
> > >>>>>>> If we do not inline @foo(), and instead
re-link the call
> > >>>>>>> site
> > >>>>>>> in
> 
> > >>>>>>> @main
> 
> > >>>>>>> to some non-optimized copy (or
differently optimized copy)
> > >>>>>>> of
> > >>>>>>> @foo,
> 
> > >>>>>>> then it is possible for the program to
have the behavior
> > >>>>>>> {print("Y");
> 
> > >>>>>>> print ("X")}, which was
disallowed in the earlier program.
> 
> > >>>>>>>
> 
> > >>>>>>> In other words, opt refined the semantics
of @foo() (i.e.
> > >>>>>>> reduced the
> 
> > >>>>>>> set of behaviors it may have) in ways
that would make later
> 
> > >>>>>>> optimizations invalid if we de-refine the
implementation of
> > >>>>>>> @foo().
> 
> > >>>>>> I'm probably missing something obvious
here. How could the
> > >>>>>> result of
> 
> > >>>>>> `%t0 != %t1` be different at optimization
time in one file
> > >>>>>> than from
> 
> > >>>>>> runtime in the "real"
implementation? Doesn't this make the
> > >>>>>> CSE
> 
> > >>>>>> invalid?
> 
> > >>>>> `%t0` and `%t1` are "allowed" to
"always be the same", i.e.
> > >>>>> an
> 
> > >>>>> implementation of @foo that always feeds in the
same
> 
> > >>>>> value for `%t0` and `%t1` is a valid
implementation (which is
> > >>>>> why the
> 
> > >>>>> CSE was valid); but it is not the *only* valid
> > >>>>> implementation.
> > >>>>> If I
> 
> > >>>>> don't CSE the two load instructions (also a
valid thing to
> > >>>>> do),
> > >>>>> and
> 
> > >>>>> this is a second thread writing to `%par`, then
the two
> > >>>>> values
> > >>>>> loaded
> 
> > >>>>> can be different, and you could end up printing
`"X"` in
> > >>>>> `@foo`.
> 
> > >>>>>
> 
> > >>>>> Did that make sense?
> 
> > >>>> Yes. To be sure I understand the scope: this is only
a problem
> > >>>> for
> 
> > >>>> atomics, correct? (Because multi-threaded behaviour
with other
> > >>>> globals
> 
> > >>>> is UB?)
> 
> > >>>>
> 
> > >>>>>> Does linkonce_odr linkage have the same
problem?
> 
> > >>>>>> - If so, do you want to change it too?
> 
> > >>>>>> - Else, why not?
> 
> > >>>>> Going by the specification in the LangRef,
I'd say it depends
> > >>>>> on how
> 
> > >>>>> you define "definitive". If you're
allowed to replace the
> > >>>>> body
> > >>>>> of a
> 
> > >>>>> function with a differently optimized body, then
the above
> > >>>>> problem
> 
> > >>>>> exists.
> 
> > >>>>>
> 
> > >>>>> I believe that is the case, and I strongly
believe the
> > >>>>> problem
> > >>>>> you
> 
> > >>>>> outline exists for linkonce_odr exactly as it
does for
> > >>>>> available_externally.
> 
> > >>>>>
> 
> > >>>>> Which is what makes this scary: every C++ inline
function
> > >>>>> today
> > >>>>> can
> 
> > >>>>> trigger this.
> 
> > >>>> Every C/C++ inline or template function. But only the
ones
> > >>>> that
> > >>>> use
> 
> > >>>> atomics, right?
> 
> > >>>
> 
> > >>> Well, with *this* example...
> 
> > >> Atomic are one source of non-determinism that compilers can
> > >> reason
> 
> > >> about. I don't know if the following snippet is well
defined or
> > >> not,
> 
> > >> but you could have similar issues with
> 
> > >>
> 
> > >>
> 
> > >> void foo() {
> 
> > >> int *p = malloc(sizeof(int));
> 
> > >> if (*p < 10) print("X");
> 
> > >> }
> 
> > >>
> 
> > >> or (again, I don't know if this is actually well defined)
> 
> > >>
> 
> > >> void foo() {
> 
> > >> int t; // it is probably reasonable to fold compares with
> 
> > >> ptrtoint(alloca) to undef
> 
> > >> if ((intptr_t)(&t) < 10) print("X");
> 
> > >> }
> 
> > >>
> 
> > > The first one at least is UB, but as Richard pointed out the
> > > scope
> 
> > > is certainly broader than atomics (it's not even just
> > > well-defined
> 
> > > non-deterministism).
> 
> > >
> 
> > > I'm kind of terrified by the implications.
> 
> > Me too. :(
> 
> > >
> 
> > >> -- Sanjoy
> 
> > >>
> 
> > >>>>
> 
> > >>>> Not that I'm sure that will end up being a
helpful
> > >>>> distinction.
> 
> > >>>
> 
> > >>> Right. See Richard's comment. I think that sums up
the real
> > >>> issue
> > >>> here. =/
> 
> > > _______________________________________________
> 
> > > LLVM Developers mailing list
> 
> > > llvm-dev at lists.llvm.org
> 
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> > _______________________________________________
> 
> > LLVM Developers mailing list
> 
> > llvm-dev at lists.llvm.org
> 
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Chandler Carruth via llvm-dev

2016-Feb-25 05:41 UTC

head link

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

On Wed, Feb 24, 2016 at 9:35 PM Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
>
> > From: "Chandler Carruth via llvm-dev" <llvm-dev at
lists.llvm.org>
> > To: "Philip Reames" <listmail at philipreames.com>,
"Duncan P. N. Exon
> > Smith" <dexonsmith at apple.com>, "Sanjoy Das"
> > <sanjoy at playingwithpointers.com>
> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> > Sent: Wednesday, February 24, 2016 10:29:23 PM
> > Subject: Re: [llvm-dev] Possible soundness issue with
> > available_externally (split from "RFC: Add guard
intrinsics")
>
> > Yea, I'm pretty sad about all of this. I'm also not seeing a
lot of
> > awesome paths forward.
>
> > Here is the least bad strategy I can come up with. Curious if folks
> > think this is sufficient:
>
> I may not completely understand the problem, but this seems like overkill.
> The underlying restriction is that, if the compiler makes a
> non-determinism-collapsing choice when optimizing a function, it must make
> the same choice for all definitions of that function (undefined behavior
> excluded).

This isn't enough, because some definition in some other module may *not be
optimized at all*, and yet may get selected at link time.

Put another way, it must *prove* that the same choice will *always* be made
for all definitions. This is akin to proving that the optimizer is run over
all translation units for C++ linkonce_odr functions, which you can't do.

The result would be failing to optimize the bodies of linkonce_odr
functions in any way which was externally detectable such as this. I think
that would be *much* worse than losing the ability to do function attribute
deduction for such functions?

Thus, with an externally_available function, the CSE in Sanjoy's
original> example should be forbidden. Richard's example again demonstrates this
> principle, although in this case the non-determinism is in the choice of a
> globally-visible implementation technique rather than non-determinism from
> memory-subsystem reordering.
>
> There is a complication, which you imply in your proposal, that such
> optimizations need to be forbidden not just in the externally_available
> functions themselves, but in any local function transitively called by one.
> This, however, we can take care of with an (easily-deduced) attribute.
>
> In short, it is not clear to me that the number of problematic
> optimizations is large (seems likely restricted to things involving atomics
> in practice), and while I understand the auditing difficulties here, we
> should just restrict these in appropriate contexts instead of trying to
> restrict all information flow into or out of comdats.
>
>  -Hal
>
> > 1) Stop deducing function attributes within comdats by examining the
> > bodies of the functions (so that we remain free to transform the
> > bodies of functions).
> > 2) Teach frontends to emit (even at O0!!!) trivially deduced function
> > attributes for comdats so that we continue to catch easy cases.
> > 3) Ensure and specify that we never hoist code *into* a comdat group
> > in which it would not have been executed previously. I don't know
of
> > anything in LLVM that does this today, but it would become an
> > important invariant.
> > 4) Work a lot harder to do internalizing and removing of this
> > restriction.
>
> > Pretty horrible. But I think it is correct.
>
> > As a slight modification to #1 and #2, we could have a very carefully
> > crafted deduction rule where we only deduce function attributes for
> > functions prior to any modification of their function bodies. Such
> > attributes should be conservatively correct because we would never
> > lift new code into the function bodies. This would at least allow us
> > to do bottom-up deduction to catch interprocedural cases. But it
> > would become incredibly subtle that this is only valid prior to
> > *any* transformations of the comdat-containing functions.
>
> > I'm starting to think this subtle rule might be worth it. But
I'm
> > frankly terrified by the implications.
>
> > On Wed, Feb 24, 2016 at 8:13 PM Philip Reames via llvm-dev <
> > llvm-dev at lists.llvm.org > wrote:
>
> > > On 02/24/2016 08:10 PM, Duncan P. N. Exon Smith via llvm-dev
wrote:
> >
> > > >> On 2016-Feb-24, at 19:46, Sanjoy Das <
> > > >> sanjoy at playingwithpointers.com > wrote:
> >
> > > >>
> >
> > > >> On Wed, Feb 24, 2016 at 7:38 PM, Chandler Carruth <
> > > >> chandlerc at google.com > wrote:
> >
> > > >>> On Wed, Feb 24, 2016 at 7:34 PM Duncan P. N. Exon
Smith
> >
> > > >>> < dexonsmith at apple.com > wrote:
> >
> > > >>>>
> >
> > > >>>>> On 2016-Feb-24, at 19:17, Chandler Carruth
<
> > > >>>>> chandlerc at google.com > wrote:
> >
> > > >>>>>
> >
> > > >>>>> On Wed, Feb 24, 2016 at 7:10 PM Sanjoy Das
via llvm-dev
> >
> > > >>>>> < llvm-dev at lists.llvm.org > wrote:
> >
> > > >>>>> On Wed, Feb 24, 2016 at 6:51 PM, Duncan P.
N. Exon Smith
> >
> > > >>>>> < dexonsmith at apple.com > wrote:
> >
> > > >>>>>>> If we do not inline @foo(), and
instead re-link the call
> > > >>>>>>> site
> > > >>>>>>> in
> >
> > > >>>>>>> @main
> >
> > > >>>>>>> to some non-optimized copy (or
differently optimized copy)
> > > >>>>>>> of
> > > >>>>>>> @foo,
> >
> > > >>>>>>> then it is possible for the program
to have the behavior
> > > >>>>>>> {print("Y");
> >
> > > >>>>>>> print ("X")}, which was
disallowed in the earlier program.
> >
> > > >>>>>>>
> >
> > > >>>>>>> In other words, opt refined the
semantics of @foo() (i.e.
> > > >>>>>>> reduced the
> >
> > > >>>>>>> set of behaviors it may have) in
ways that would make later
> >
> > > >>>>>>> optimizations invalid if we
de-refine the implementation of
> > > >>>>>>> @foo().
> >
> > > >>>>>> I'm probably missing something
obvious here. How could the
> > > >>>>>> result of
> >
> > > >>>>>> `%t0 != %t1` be different at
optimization time in one file
> > > >>>>>> than from
> >
> > > >>>>>> runtime in the "real"
implementation? Doesn't this make the
> > > >>>>>> CSE
> >
> > > >>>>>> invalid?
> >
> > > >>>>> `%t0` and `%t1` are "allowed" to
"always be the same", i.e.
> > > >>>>> an
> >
> > > >>>>> implementation of @foo that always feeds in
the same
> >
> > > >>>>> value for `%t0` and `%t1` is a valid
implementation (which is
> > > >>>>> why the
> >
> > > >>>>> CSE was valid); but it is not the *only*
valid
> > > >>>>> implementation.
> > > >>>>> If I
> >
> > > >>>>> don't CSE the two load instructions
(also a valid thing to
> > > >>>>> do),
> > > >>>>> and
> >
> > > >>>>> this is a second thread writing to `%par`,
then the two
> > > >>>>> values
> > > >>>>> loaded
> >
> > > >>>>> can be different, and you could end up
printing `"X"` in
> > > >>>>> `@foo`.
> >
> > > >>>>>
> >
> > > >>>>> Did that make sense?
> >
> > > >>>> Yes. To be sure I understand the scope: this is
only a problem
> > > >>>> for
> >
> > > >>>> atomics, correct? (Because multi-threaded
behaviour with other
> > > >>>> globals
> >
> > > >>>> is UB?)
> >
> > > >>>>
> >
> > > >>>>>> Does linkonce_odr linkage have the same
problem?
> >
> > > >>>>>> - If so, do you want to change it too?
> >
> > > >>>>>> - Else, why not?
> >
> > > >>>>> Going by the specification in the LangRef,
I'd say it depends
> > > >>>>> on how
> >
> > > >>>>> you define "definitive". If
you're allowed to replace the
> > > >>>>> body
> > > >>>>> of a
> >
> > > >>>>> function with a differently optimized body,
then the above
> > > >>>>> problem
> >
> > > >>>>> exists.
> >
> > > >>>>>
> >
> > > >>>>> I believe that is the case, and I strongly
believe the
> > > >>>>> problem
> > > >>>>> you
> >
> > > >>>>> outline exists for linkonce_odr exactly as
it does for
> > > >>>>> available_externally.
> >
> > > >>>>>
> >
> > > >>>>> Which is what makes this scary: every C++
inline function
> > > >>>>> today
> > > >>>>> can
> >
> > > >>>>> trigger this.
> >
> > > >>>> Every C/C++ inline or template function. But
only the ones
> > > >>>> that
> > > >>>> use
> >
> > > >>>> atomics, right?
> >
> > > >>>
> >
> > > >>> Well, with *this* example...
> >
> > > >> Atomic are one source of non-determinism that compilers
can
> > > >> reason
> >
> > > >> about. I don't know if the following snippet is well
defined or
> > > >> not,
> >
> > > >> but you could have similar issues with
> >
> > > >>
> >
> > > >>
> >
> > > >> void foo() {
> >
> > > >> int *p = malloc(sizeof(int));
> >
> > > >> if (*p < 10) print("X");
> >
> > > >> }
> >
> > > >>
> >
> > > >> or (again, I don't know if this is actually well
defined)
> >
> > > >>
> >
> > > >> void foo() {
> >
> > > >> int t; // it is probably reasonable to fold compares
with
> >
> > > >> ptrtoint(alloca) to undef
> >
> > > >> if ((intptr_t)(&t) < 10) print("X");
> >
> > > >> }
> >
> > > >>
> >
> > > > The first one at least is UB, but as Richard pointed out the
> > > > scope
> >
> > > > is certainly broader than atomics (it's not even just
> > > > well-defined
> >
> > > > non-deterministism).
> >
> > > >
> >
> > > > I'm kind of terrified by the implications.
> >
> > > Me too. :(
> >
> > > >
> >
> > > >> -- Sanjoy
> >
> > > >>
> >
> > > >>>>
> >
> > > >>>> Not that I'm sure that will end up being a
helpful
> > > >>>> distinction.
> >
> > > >>>
> >
> > > >>> Right. See Richard's comment. I think that sums
up the real
> > > >>> issue
> > > >>> here. =/
> >
> > > > _______________________________________________
> >
> > > > LLVM Developers mailing list
> >
> > > > llvm-dev at lists.llvm.org
> >
> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>
> > > _______________________________________________
> >
> > > LLVM Developers mailing list
> >
> > > llvm-dev at lists.llvm.org
> >
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> --
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160225/09a736c4/attachment.html>

Hal Finkel via llvm-dev

2016-Feb-25 06:00 UTC

head link

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

----- Original Message ----- 
> From: "Chandler Carruth" <chandlerc at google.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Philip
Reames"
> <listmail at philipreames.com>, "Duncan P. N. Exon Smith"
> <dexonsmith at apple.com>, "Sanjoy Das"
> <sanjoy at playingwithpointers.com>
> Sent: Wednesday, February 24, 2016 11:41:59 PM
> Subject: Re: [llvm-dev] Possible soundness issue with
> available_externally (split from "RFC: Add guard intrinsics")
> On Wed, Feb 24, 2016 at 9:35 PM Hal Finkel < hfinkel at anl.gov >
wrote:
> > ----- Original Message -----
> 
> > > From: "Chandler Carruth via llvm-dev" < llvm-dev at
lists.llvm.org >
> 
> > > To: "Philip Reames" < listmail at philipreames.com
>, "Duncan P. N.
> > > Exon
> 
> > > Smith" < dexonsmith at apple.com >, "Sanjoy
Das"
> 
> > > < sanjoy at playingwithpointers.com >
> 
> > > Cc: "llvm-dev" < llvm-dev at lists.llvm.org >
> 
> > > Sent: Wednesday, February 24, 2016 10:29:23 PM
> 
> > > Subject: Re: [llvm-dev] Possible soundness issue with
> 
> > > available_externally (split from "RFC: Add guard
intrinsics")
> 
> > > Yea, I'm pretty sad about all of this. I'm also not
seeing a lot
> > > of
> 
> > > awesome paths forward.
> 
> > > Here is the least bad strategy I can come up with. Curious if
> > > folks
> 
> > > think this is sufficient:
> 
> > I may not completely understand the problem, but this seems like
> > overkill. The underlying restriction is that, if the compiler makes
> > a non-determinism-collapsing choice when optimizing a function, it
> > must make the same choice for all definitions of that function
> > (undefined behavior excluded).
> 
> This isn't enough, because some definition in some other module may
> *not be optimized at all*, and yet may get selected at link time.
> Put another way, it must *prove* that the same choice will *always*
> be made for all definitions. This is akin to proving that the
> optimizer is run over all translation units for C++ linkonce_odr
> functions, which you can't do.
Sure; which is way I said that we should not perform those optimizations
(instead of saying that we just need to make sure that the same choice will be
made everywhere - as you say, LTO aside, we can't do that).
> The result would be failing to optimize the bodies of linkonce_odr
> functions in any way which was externally detectable such as this. I
> think that would be *much* worse than losing the ability to do
> function attribute deduction for such functions?
But it is not all optimizations that are the problem. Rather, it seems like a
select few (e.g. things involving collapsing allowed non-determinism in
atomics), and losing those optimizations seems better than generally losing
function-attribute deduction.

 -Hal
> > Thus, with an externally_available function, the CSE in Sanjoy's
> > original example should be forbidden. Richard's example again
> > demonstrates this principle, although in this case the
> > non-determinism is in the choice of a globally-visible
> > implementation technique rather than non-determinism from
> > memory-subsystem reordering.
> 
> > There is a complication, which you imply in your proposal, that
> > such
> > optimizations need to be forbidden not just in the
> > externally_available functions themselves, but in any local
> > function
> > transitively called by one. This, however, we can take care of with
> > an (easily-deduced) attribute.
> 
> > In short, it is not clear to me that the number of problematic
> > optimizations is large (seems likely restricted to things involving
> > atomics in practice), and while I understand the auditing
> > difficulties here, we should just restrict these in appropriate
> > contexts instead of trying to restrict all information flow into or
> > out of comdats.
> 
> > -Hal
> 
> > > 1) Stop deducing function attributes within comdats by examining
> > > the
> 
> > > bodies of the functions (so that we remain free to transform the
> 
> > > bodies of functions).
> 
> > > 2) Teach frontends to emit (even at O0!!!) trivially deduced
> > > function
> 
> > > attributes for comdats so that we continue to catch easy cases.
> 
> > > 3) Ensure and specify that we never hoist code *into* a comdat
> > > group
> 
> > > in which it would not have been executed previously. I don't
know
> > > of
> 
> > > anything in LLVM that does this today, but it would become an
> 
> > > important invariant.
> 
> > > 4) Work a lot harder to do internalizing and removing of this
> 
> > > restriction.
> 
> > > Pretty horrible. But I think it is correct.
> 
> > > As a slight modification to #1 and #2, we could have a very
> > > carefully
> 
> > > crafted deduction rule where we only deduce function attributes
> > > for
> 
> > > functions prior to any modification of their function bodies.
> > > Such
> 
> > > attributes should be conservatively correct because we would
> > > never
> 
> > > lift new code into the function bodies. This would at least allow
> > > us
> 
> > > to do bottom-up deduction to catch interprocedural cases. But it
> 
> > > would become incredibly subtle that this is only valid prior to
> 
> > > *any* transformations of the comdat-containing functions.
> 
> > > I'm starting to think this subtle rule might be worth it. But
I'm
> 
> > > frankly terrified by the implications.
> 
> > > On Wed, Feb 24, 2016 at 8:13 PM Philip Reames via llvm-dev <
> 
> > > llvm-dev at lists.llvm.org > wrote:
> 
> > > > On 02/24/2016 08:10 PM, Duncan P. N. Exon Smith via llvm-dev
> > > > wrote:
> 
> > >
> 
> > > > >> On 2016-Feb-24, at 19:46, Sanjoy Das <
> 
> > > > >> sanjoy at playingwithpointers.com > wrote:
> 
> > >
> 
> > > > >>
> 
> > >
> 
> > > > >> On Wed, Feb 24, 2016 at 7:38 PM, Chandler Carruth
<
> 
> > > > >> chandlerc at google.com > wrote:
> 
> > >
> 
> > > > >>> On Wed, Feb 24, 2016 at 7:34 PM Duncan P. N.
Exon Smith
> 
> > >
> 
> > > > >>> < dexonsmith at apple.com > wrote:
> 
> > >
> 
> > > > >>>>
> 
> > >
> 
> > > > >>>>> On 2016-Feb-24, at 19:17, Chandler
Carruth <
> 
> > > > >>>>> chandlerc at google.com > wrote:
> 
> > >
> 
> > > > >>>>>
> 
> > >
> 
> > > > >>>>> On Wed, Feb 24, 2016 at 7:10 PM Sanjoy
Das via llvm-dev
> 
> > >
> 
> > > > >>>>> < llvm-dev at lists.llvm.org >
wrote:
> 
> > >
> 
> > > > >>>>> On Wed, Feb 24, 2016 at 6:51 PM, Duncan
P. N. Exon Smith
> 
> > >
> 
> > > > >>>>> < dexonsmith at apple.com >
wrote:
> 
> > >
> 
> > > > >>>>>>> If we do not inline @foo(), and
instead re-link the
> > > > >>>>>>> call
> 
> > > > >>>>>>> site
> 
> > > > >>>>>>> in
> 
> > >
> 
> > > > >>>>>>> @main
> 
> > >
> 
> > > > >>>>>>> to some non-optimized copy (or
differently optimized
> > > > >>>>>>> copy)
> 
> > > > >>>>>>> of
> 
> > > > >>>>>>> @foo,
> 
> > >
> 
> > > > >>>>>>> then it is possible for the
program to have the
> > > > >>>>>>> behavior
> 
> > > > >>>>>>> {print("Y");
> 
> > >
> 
> > > > >>>>>>> print ("X")}, which
was disallowed in the earlier
> > > > >>>>>>> program.
> 
> > >
> 
> > > > >>>>>>>
> 
> > >
> 
> > > > >>>>>>> In other words, opt refined the
semantics of @foo()
> > > > >>>>>>> (i.e.
> 
> > > > >>>>>>> reduced the
> 
> > >
> 
> > > > >>>>>>> set of behaviors it may have)
in ways that would make
> > > > >>>>>>> later
> 
> > >
> 
> > > > >>>>>>> optimizations invalid if we
de-refine the
> > > > >>>>>>> implementation
> > > > >>>>>>> of
> 
> > > > >>>>>>> @foo().
> 
> > >
> 
> > > > >>>>>> I'm probably missing something
obvious here. How could
> > > > >>>>>> the
> 
> > > > >>>>>> result of
> 
> > >
> 
> > > > >>>>>> `%t0 != %t1` be different at
optimization time in one
> > > > >>>>>> file
> 
> > > > >>>>>> than from
> 
> > >
> 
> > > > >>>>>> runtime in the "real"
implementation? Doesn't this make
> > > > >>>>>> the
> 
> > > > >>>>>> CSE
> 
> > >
> 
> > > > >>>>>> invalid?
> 
> > >
> 
> > > > >>>>> `%t0` and `%t1` are "allowed"
to "always be the same",
> > > > >>>>> i.e.
> 
> > > > >>>>> an
> 
> > >
> 
> > > > >>>>> implementation of @foo that always
feeds in the same
> 
> > >
> 
> > > > >>>>> value for `%t0` and `%t1` is a valid
implementation
> > > > >>>>> (which
> > > > >>>>> is
> 
> > > > >>>>> why the
> 
> > >
> 
> > > > >>>>> CSE was valid); but it is not the
*only* valid
> 
> > > > >>>>> implementation.
> 
> > > > >>>>> If I
> 
> > >
> 
> > > > >>>>> don't CSE the two load instructions
(also a valid thing
> > > > >>>>> to
> 
> > > > >>>>> do),
> 
> > > > >>>>> and
> 
> > >
> 
> > > > >>>>> this is a second thread writing to
`%par`, then the two
> 
> > > > >>>>> values
> 
> > > > >>>>> loaded
> 
> > >
> 
> > > > >>>>> can be different, and you could end up
printing `"X"` in
> 
> > > > >>>>> `@foo`.
> 
> > >
> 
> > > > >>>>>
> 
> > >
> 
> > > > >>>>> Did that make sense?
> 
> > >
> 
> > > > >>>> Yes. To be sure I understand the scope:
this is only a
> > > > >>>> problem
> 
> > > > >>>> for
> 
> > >
> 
> > > > >>>> atomics, correct? (Because multi-threaded
behaviour with
> > > > >>>> other
> 
> > > > >>>> globals
> 
> > >
> 
> > > > >>>> is UB?)
> 
> > >
> 
> > > > >>>>
> 
> > >
> 
> > > > >>>>>> Does linkonce_odr linkage have the
same problem?
> 
> > >
> 
> > > > >>>>>> - If so, do you want to change it
too?
> 
> > >
> 
> > > > >>>>>> - Else, why not?
> 
> > >
> 
> > > > >>>>> Going by the specification in the
LangRef, I'd say it
> > > > >>>>> depends
> 
> > > > >>>>> on how
> 
> > >
> 
> > > > >>>>> you define "definitive". If
you're allowed to replace the
> 
> > > > >>>>> body
> 
> > > > >>>>> of a
> 
> > >
> 
> > > > >>>>> function with a differently optimized
body, then the
> > > > >>>>> above
> 
> > > > >>>>> problem
> 
> > >
> 
> > > > >>>>> exists.
> 
> > >
> 
> > > > >>>>>
> 
> > >
> 
> > > > >>>>> I believe that is the case, and I
strongly believe the
> 
> > > > >>>>> problem
> 
> > > > >>>>> you
> 
> > >
> 
> > > > >>>>> outline exists for linkonce_odr exactly
as it does for
> 
> > > > >>>>> available_externally.
> 
> > >
> 
> > > > >>>>>
> 
> > >
> 
> > > > >>>>> Which is what makes this scary: every
C++ inline function
> 
> > > > >>>>> today
> 
> > > > >>>>> can
> 
> > >
> 
> > > > >>>>> trigger this.
> 
> > >
> 
> > > > >>>> Every C/C++ inline or template function.
But only the ones
> 
> > > > >>>> that
> 
> > > > >>>> use
> 
> > >
> 
> > > > >>>> atomics, right?
> 
> > >
> 
> > > > >>>
> 
> > >
> 
> > > > >>> Well, with *this* example...
> 
> > >
> 
> > > > >> Atomic are one source of non-determinism that
compilers can
> 
> > > > >> reason
> 
> > >
> 
> > > > >> about. I don't know if the following snippet is
well defined
> > > > >> or
> 
> > > > >> not,
> 
> > >
> 
> > > > >> but you could have similar issues with
> 
> > >
> 
> > > > >>
> 
> > >
> 
> > > > >>
> 
> > >
> 
> > > > >> void foo() {
> 
> > >
> 
> > > > >> int *p = malloc(sizeof(int));
> 
> > >
> 
> > > > >> if (*p < 10) print("X");
> 
> > >
> 
> > > > >> }
> 
> > >
> 
> > > > >>
> 
> > >
> 
> > > > >> or (again, I don't know if this is actually
well defined)
> 
> > >
> 
> > > > >>
> 
> > >
> 
> > > > >> void foo() {
> 
> > >
> 
> > > > >> int t; // it is probably reasonable to fold
compares with
> 
> > >
> 
> > > > >> ptrtoint(alloca) to undef
> 
> > >
> 
> > > > >> if ((intptr_t)(&t) < 10)
print("X");
> 
> > >
> 
> > > > >> }
> 
> > >
> 
> > > > >>
> 
> > >
> 
> > > > > The first one at least is UB, but as Richard pointed
out the
> 
> > > > > scope
> 
> > >
> 
> > > > > is certainly broader than atomics (it's not even
just
> 
> > > > > well-defined
> 
> > >
> 
> > > > > non-deterministism).
> 
> > >
> 
> > > > >
> 
> > >
> 
> > > > > I'm kind of terrified by the implications.
> 
> > >
> 
> > > > Me too. :(
> 
> > >
> 
> > > > >
> 
> > >
> 
> > > > >> -- Sanjoy
> 
> > >
> 
> > > > >>
> 
> > >
> 
> > > > >>>>
> 
> > >
> 
> > > > >>>> Not that I'm sure that will end up
being a helpful
> 
> > > > >>>> distinction.
> 
> > >
> 
> > > > >>>
> 
> > >
> 
> > > > >>> Right. See Richard's comment. I think that
sums up the real
> 
> > > > >>> issue
> 
> > > > >>> here. =/
> 
> > >
> 
> > > > > _______________________________________________
> 
> > >
> 
> > > > > LLVM Developers mailing list
> 
> > >
> 
> > > > > llvm-dev at lists.llvm.org
> 
> > >
> 
> > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> > >
> 
> > > > _______________________________________________
> 
> > >
> 
> > > > LLVM Developers mailing list
> 
> > >
> 
> > > > llvm-dev at lists.llvm.org
> 
> > >
> 
> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> > >
> 
> > > _______________________________________________
> 
> > > LLVM Developers mailing list
> 
> > > llvm-dev at lists.llvm.org
> 
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> > --
> 
> > --
> 
> > Hal Finkel
> 
> > Assistant Computational Scientist
> 
> > Leadership Computing Facility
> 
> > Argonne National Laboratory
> 
-- 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Mehdi Amini via llvm-dev

2016-Feb-25 06:02 UTC

head link

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

> On Feb 24, 2016, at 9:41 PM, Chandler Carruth via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On Wed, Feb 24, 2016 at 9:35 PM Hal Finkel <hfinkel at anl.gov
<mailto:hfinkel at anl.gov>> wrote:
> ----- Original Message -----
> 
> > From: "Chandler Carruth via llvm-dev" <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
> > To: "Philip Reames" <listmail at philipreames.com
<mailto:listmail at philipreames.com>>, "Duncan P. N. Exon
> > Smith" <dexonsmith at apple.com <mailto:dexonsmith at
apple.com>>, "Sanjoy Das"
> > <sanjoy at playingwithpointers.com <mailto:sanjoy at
playingwithpointers.com>>
> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>>
> > Sent: Wednesday, February 24, 2016 10:29:23 PM
> > Subject: Re: [llvm-dev] Possible soundness issue with
> > available_externally (split from "RFC: Add guard
intrinsics")
> 
> > Yea, I'm pretty sad about all of this. I'm also not seeing a
lot of
> > awesome paths forward.
> 
> > Here is the least bad strategy I can come up with. Curious if folks
> > think this is sufficient:
> 
> I may not completely understand the problem, but this seems like overkill.
The underlying restriction is that, if the compiler makes a
non-determinism-collapsing choice when optimizing a function, it must make the
same choice for all definitions of that function (undefined behavior excluded).
> 
> This isn't enough, because some definition in some other module may
*not be optimized at all*, and yet may get selected at link time.
> 
> Put another way, it must *prove* that the same choice will *always* be made
for all definitions. This is akin to proving that the optimizer is run over all
translation units for C++ linkonce_odr functions, which you can't do.
Even if the optimizer is ran, it could take different decision because the
context would be different:

linkonce_odr foo() {
  bar();
}

If bar() is present in the TU it can gets inlined into foo(). So the optimizer
would optimize differently foo().

> 
> The result would be failing to optimize the bodies of linkonce_odr
functions in any way which was externally detectable such as this. I think that
would be *much* worse than losing the ability to do function attribute deduction
for such functions?
> 
> Thus, with an externally_available function, the CSE in Sanjoy's
original example should be forbidden. Richard's example again demonstrates
this principle, although in this case the non-determinism is in the choice of a
globally-visible implementation technique rather than non-determinism from
memory-subsystem reordering.
> 
> There is a complication, which you imply in your proposal, that such
optimizations need to be forbidden not just in the externally_available
functions themselves, but in any local function transitively called by one.
This, however, we can take care of with an (easily-deduced) attribute.
I'm not sure why " such optimizations need to be forbidden [...] in any
local function transitively called by one", can you illustrate with an
example?

-- 
Mehdi
> 
> In short, it is not clear to me that the number of problematic
optimizations is large (seems likely restricted to things involving atomics in
practice),
> and while I understand the auditing difficulties here, we should just
restrict these in appropriate contexts instead of trying to restrict all
information flow into or out of comdats.
> 
>  -Hal
> 
> > 1) Stop deducing function attributes within comdats by examining the
> > bodies of the functions (so that we remain free to transform the
> > bodies of functions).
> > 2) Teach frontends to emit (even at O0!!!) trivially deduced function
> > attributes for comdats so that we continue to catch easy cases.
> > 3) Ensure and specify that we never hoist code *into* a comdat group
> > in which it would not have been executed previously. I don't know
of
> > anything in LLVM that does this today, but it would become an
> > important invariant.
> > 4) Work a lot harder to do internalizing and removing of this
> > restriction.
> 
> > Pretty horrible. But I think it is correct.
> 
> > As a slight modification to #1 and #2, we could have a very carefully
> > crafted deduction rule where we only deduce function attributes for
> > functions prior to any modification of their function bodies. Such
> > attributes should be conservatively correct because we would never
> > lift new code into the function bodies. This would at least allow us
> > to do bottom-up deduction to catch interprocedural cases. But it
> > would become incredibly subtle that this is only valid prior to
> > *any* transformations of the comdat-containing functions.
> 
> > I'm starting to think this subtle rule might be worth it. But
I'm
> > frankly terrified by the implications.
> 
> > On Wed, Feb 24, 2016 at 8:13 PM Philip Reames via llvm-dev <
> > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> wrote:
> 
> > > On 02/24/2016 08:10 PM, Duncan P. N. Exon Smith via llvm-dev
wrote:
> >
> > > >> On 2016-Feb-24, at 19:46, Sanjoy Das <
> > > >> sanjoy at playingwithpointers.com <mailto:sanjoy at
playingwithpointers.com> > wrote:
> >
> > > >>
> >
> > > >> On Wed, Feb 24, 2016 at 7:38 PM, Chandler Carruth <
> > > >> chandlerc at google.com <mailto:chandlerc at
google.com> > wrote:
> >
> > > >>> On Wed, Feb 24, 2016 at 7:34 PM Duncan P. N. Exon
Smith
> >
> > > >>> < dexonsmith at apple.com <mailto:dexonsmith
at apple.com> > wrote:
> >
> > > >>>>
> >
> > > >>>>> On 2016-Feb-24, at 19:17, Chandler Carruth
<
> > > >>>>> chandlerc at google.com <mailto:chandlerc
at google.com> > wrote:
> >
> > > >>>>>
> >
> > > >>>>> On Wed, Feb 24, 2016 at 7:10 PM Sanjoy Das
via llvm-dev
> >
> > > >>>>> < llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org> > wrote:
> >
> > > >>>>> On Wed, Feb 24, 2016 at 6:51 PM, Duncan P.
N. Exon Smith
> >
> > > >>>>> < dexonsmith at apple.com
<mailto:dexonsmith at apple.com> > wrote:
> >
> > > >>>>>>> If we do not inline @foo(), and
instead re-link the call
> > > >>>>>>> site
> > > >>>>>>> in
> >
> > > >>>>>>> @main
> >
> > > >>>>>>> to some non-optimized copy (or
differently optimized copy)
> > > >>>>>>> of
> > > >>>>>>> @foo,
> >
> > > >>>>>>> then it is possible for the program
to have the behavior
> > > >>>>>>> {print("Y");
> >
> > > >>>>>>> print ("X")}, which was
disallowed in the earlier program.
> >
> > > >>>>>>>
> >
> > > >>>>>>> In other words, opt refined the
semantics of @foo() (i.e.
> > > >>>>>>> reduced the
> >
> > > >>>>>>> set of behaviors it may have) in
ways that would make later
> >
> > > >>>>>>> optimizations invalid if we
de-refine the implementation of
> > > >>>>>>> @foo().
> >
> > > >>>>>> I'm probably missing something
obvious here. How could the
> > > >>>>>> result of
> >
> > > >>>>>> `%t0 != %t1` be different at
optimization time in one file
> > > >>>>>> than from
> >
> > > >>>>>> runtime in the "real"
implementation? Doesn't this make the
> > > >>>>>> CSE
> >
> > > >>>>>> invalid?
> >
> > > >>>>> `%t0` and `%t1` are "allowed" to
"always be the same", i.e.
> > > >>>>> an
> >
> > > >>>>> implementation of @foo that always feeds in
the same
> >
> > > >>>>> value for `%t0` and `%t1` is a valid
implementation (which is
> > > >>>>> why the
> >
> > > >>>>> CSE was valid); but it is not the *only*
valid
> > > >>>>> implementation.
> > > >>>>> If I
> >
> > > >>>>> don't CSE the two load instructions
(also a valid thing to
> > > >>>>> do),
> > > >>>>> and
> >
> > > >>>>> this is a second thread writing to `%par`,
then the two
> > > >>>>> values
> > > >>>>> loaded
> >
> > > >>>>> can be different, and you could end up
printing `"X"` in
> > > >>>>> `@foo`.
> >
> > > >>>>>
> >
> > > >>>>> Did that make sense?
> >
> > > >>>> Yes. To be sure I understand the scope: this is
only a problem
> > > >>>> for
> >
> > > >>>> atomics, correct? (Because multi-threaded
behaviour with other
> > > >>>> globals
> >
> > > >>>> is UB?)
> >
> > > >>>>
> >
> > > >>>>>> Does linkonce_odr linkage have the same
problem?
> >
> > > >>>>>> - If so, do you want to change it too?
> >
> > > >>>>>> - Else, why not?
> >
> > > >>>>> Going by the specification in the LangRef,
I'd say it depends
> > > >>>>> on how
> >
> > > >>>>> you define "definitive". If
you're allowed to replace the
> > > >>>>> body
> > > >>>>> of a
> >
> > > >>>>> function with a differently optimized body,
then the above
> > > >>>>> problem
> >
> > > >>>>> exists.
> >
> > > >>>>>
> >
> > > >>>>> I believe that is the case, and I strongly
believe the
> > > >>>>> problem
> > > >>>>> you
> >
> > > >>>>> outline exists for linkonce_odr exactly as
it does for
> > > >>>>> available_externally.
> >
> > > >>>>>
> >
> > > >>>>> Which is what makes this scary: every C++
inline function
> > > >>>>> today
> > > >>>>> can
> >
> > > >>>>> trigger this.
> >
> > > >>>> Every C/C++ inline or template function. But
only the ones
> > > >>>> that
> > > >>>> use
> >
> > > >>>> atomics, right?
> >
> > > >>>
> >
> > > >>> Well, with *this* example...
> >
> > > >> Atomic are one source of non-determinism that compilers
can
> > > >> reason
> >
> > > >> about. I don't know if the following snippet is well
defined or
> > > >> not,
> >
> > > >> but you could have similar issues with
> >
> > > >>
> >
> > > >>
> >
> > > >> void foo() {
> >
> > > >> int *p = malloc(sizeof(int));
> >
> > > >> if (*p < 10) print("X");
> >
> > > >> }
> >
> > > >>
> >
> > > >> or (again, I don't know if this is actually well
defined)
> >
> > > >>
> >
> > > >> void foo() {
> >
> > > >> int t; // it is probably reasonable to fold compares
with
> >
> > > >> ptrtoint(alloca) to undef
> >
> > > >> if ((intptr_t)(&t) < 10) print("X");
> >
> > > >> }
> >
> > > >>
> >
> > > > The first one at least is UB, but as Richard pointed out the
> > > > scope
> >
> > > > is certainly broader than atomics (it's not even just
> > > > well-defined
> >
> > > > non-deterministism).
> >
> > > >
> >
> > > > I'm kind of terrified by the implications.
> >
> > > Me too. :(
> >
> > > >
> >
> > > >> -- Sanjoy
> >
> > > >>
> >
> > > >>>>
> >
> > > >>>> Not that I'm sure that will end up being a
helpful
> > > >>>> distinction.
> >
> > > >>>
> >
> > > >>> Right. See Richard's comment. I think that sums
up the real
> > > >>> issue
> > > >>> here. =/
> >
> > > > _______________________________________________
> >
> > > > LLVM Developers mailing list
> >
> > > > llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
> >
> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> >
> 
> > > _______________________________________________
> >
> > > LLVM Developers mailing list
> >
> > > llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
> >
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> >
> 
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> --
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160224/669118b0/attachment-0001.html>

Pete Cooper via llvm-dev

2016-Feb-25 06:07 UTC

head link

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

Sent from my iPhone
> On Feb 24, 2016, at 9:41 PM, Chandler Carruth via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> On Wed, Feb 24, 2016 at 9:35 PM Hal Finkel <hfinkel at anl.gov>
wrote:
>> ----- Original Message -----
>> 
>> > From: "Chandler Carruth via llvm-dev" <llvm-dev at
lists.llvm.org>
>> > To: "Philip Reames" <listmail at
philipreames.com>, "Duncan P. N. Exon
>> > Smith" <dexonsmith at apple.com>, "Sanjoy
Das"
>> > <sanjoy at playingwithpointers.com>
>> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
>> > Sent: Wednesday, February 24, 2016 10:29:23 PM
>> > Subject: Re: [llvm-dev] Possible soundness issue with
>> > available_externally (split from "RFC: Add guard
intrinsics")
>> 
>> > Yea, I'm pretty sad about all of this. I'm also not seeing
a lot of
>> > awesome paths forward.
>> 
>> > Here is the least bad strategy I can come up with. Curious if
folks
>> > think this is sufficient:
>> 
>> I may not completely understand the problem, but this seems like
overkill. The underlying restriction is that, if the compiler makes a
non-determinism-collapsing choice when optimizing a function, it must make the
same choice for all definitions of that function (undefined behavior excluded).
> 
> This isn't enough, because some definition in some other module may
*not be optimized at all*, and yet may get selected at link time.
> 
> Put another way, it must *prove* that the same choice will *always* be made
for all definitions. This is akin to proving that the optimizer is run over all
translation units for C++ linkonce_odr functions, which you can't do.
> 
> The result would be failing to optimize the bodies of linkonce_odr
functions in any way which was externally detectable such as this. I think that
would be *much* worse than losing the ability to do function attribute deduction
for such functions?
> If you are prior to optimising the function in any way (which I think is a
reasonable thing to require as you said before) then any attributes added there
will at least be conservatively correct. That will still allow a bunch of
optimisation, I hope!

So for the following function:

int foo(int *i) {
  if (*i == 1)
    return 0;
  return 1;
}

You can legitimately add readonly and propagate to callers that the returned
value has known zero bits > 1.

And for this similar function:

int foo(int *i, bool b) {
  if (b) {
    *i = 1;
    return 0;
  }
  return 1;
}

You could legitimately optimise a call 'foo(p, false)' so that the call
site propagates in the constants and knows that this call returns 1 for sure.

But could you constant propagate (without inlining) in to the call and add
readonly to the call instruction here? I think you can because you know that
passing false for b means that the function will not take the path that loads
memory.

Pete>> Thus, with an externally_available function, the CSE in Sanjoy's
original example should be forbidden. Richard's example again demonstrates
this principle, although in this case the non-determinism is in the choice of a
globally-visible implementation technique rather than non-determinism from
memory-subsystem reordering.
>> 
>> There is a complication, which you imply in your proposal, that such
optimizations need to be forbidden not just in the externally_available
functions themselves, but in any local function transitively called by one.
This, however, we can take care of with an (easily-deduced) attribute.
>> 
>> In short, it is not clear to me that the number of problematic
optimizations is large (seems likely restricted to things involving atomics in
practice), and while I understand the auditing difficulties here, we should just
restrict these in appropriate contexts instead of trying to restrict all
information flow into or out of comdats.
>> 
>>  -Hal
>> 
>> > 1) Stop deducing function attributes within comdats by examining
the
>> > bodies of the functions (so that we remain free to transform the
>> > bodies of functions).
>> > 2) Teach frontends to emit (even at O0!!!) trivially deduced
function
>> > attributes for comdats so that we continue to catch easy cases.
>> > 3) Ensure and specify that we never hoist code *into* a comdat
group
>> > in which it would not have been executed previously. I don't
know of
>> > anything in LLVM that does this today, but it would become an
>> > important invariant.
>> > 4) Work a lot harder to do internalizing and removing of this
>> > restriction.
>> 
>> > Pretty horrible. But I think it is correct.
>> 
>> > As a slight modification to #1 and #2, we could have a very
carefully
>> > crafted deduction rule where we only deduce function attributes
for
>> > functions prior to any modification of their function bodies. Such
>> > attributes should be conservatively correct because we would never
>> > lift new code into the function bodies. This would at least allow
us
>> > to do bottom-up deduction to catch interprocedural cases. But it
>> > would become incredibly subtle that this is only valid prior to
>> > *any* transformations of the comdat-containing functions.
>> 
>> > I'm starting to think this subtle rule might be worth it. But
I'm
>> > frankly terrified by the implications.
>> 
>> > On Wed, Feb 24, 2016 at 8:13 PM Philip Reames via llvm-dev <
>> > llvm-dev at lists.llvm.org > wrote:
>> 
>> > > On 02/24/2016 08:10 PM, Duncan P. N. Exon Smith via llvm-dev
wrote:
>> >
>> > > >> On 2016-Feb-24, at 19:46, Sanjoy Das <
>> > > >> sanjoy at playingwithpointers.com > wrote:
>> >
>> > > >>
>> >
>> > > >> On Wed, Feb 24, 2016 at 7:38 PM, Chandler Carruth
<
>> > > >> chandlerc at google.com > wrote:
>> >
>> > > >>> On Wed, Feb 24, 2016 at 7:34 PM Duncan P. N.
Exon Smith
>> >
>> > > >>> < dexonsmith at apple.com > wrote:
>> >
>> > > >>>>
>> >
>> > > >>>>> On 2016-Feb-24, at 19:17, Chandler
Carruth <
>> > > >>>>> chandlerc at google.com > wrote:
>> >
>> > > >>>>>
>> >
>> > > >>>>> On Wed, Feb 24, 2016 at 7:10 PM Sanjoy
Das via llvm-dev
>> >
>> > > >>>>> < llvm-dev at lists.llvm.org >
wrote:
>> >
>> > > >>>>> On Wed, Feb 24, 2016 at 6:51 PM, Duncan
P. N. Exon Smith
>> >
>> > > >>>>> < dexonsmith at apple.com > wrote:
>> >
>> > > >>>>>>> If we do not inline @foo(), and
instead re-link the call
>> > > >>>>>>> site
>> > > >>>>>>> in
>> >
>> > > >>>>>>> @main
>> >
>> > > >>>>>>> to some non-optimized copy (or
differently optimized copy)
>> > > >>>>>>> of
>> > > >>>>>>> @foo,
>> >
>> > > >>>>>>> then it is possible for the
program to have the behavior
>> > > >>>>>>> {print("Y");
>> >
>> > > >>>>>>> print ("X")}, which
was disallowed in the earlier program.
>> >
>> > > >>>>>>>
>> >
>> > > >>>>>>> In other words, opt refined the
semantics of @foo() (i.e.
>> > > >>>>>>> reduced the
>> >
>> > > >>>>>>> set of behaviors it may have) in
ways that would make later
>> >
>> > > >>>>>>> optimizations invalid if we
de-refine the implementation of
>> > > >>>>>>> @foo().
>> >
>> > > >>>>>> I'm probably missing something
obvious here. How could the
>> > > >>>>>> result of
>> >
>> > > >>>>>> `%t0 != %t1` be different at
optimization time in one file
>> > > >>>>>> than from
>> >
>> > > >>>>>> runtime in the "real"
implementation? Doesn't this make the
>> > > >>>>>> CSE
>> >
>> > > >>>>>> invalid?
>> >
>> > > >>>>> `%t0` and `%t1` are "allowed"
to "always be the same", i.e.
>> > > >>>>> an
>> >
>> > > >>>>> implementation of @foo that always feeds
in the same
>> >
>> > > >>>>> value for `%t0` and `%t1` is a valid
implementation (which is
>> > > >>>>> why the
>> >
>> > > >>>>> CSE was valid); but it is not the *only*
valid
>> > > >>>>> implementation.
>> > > >>>>> If I
>> >
>> > > >>>>> don't CSE the two load instructions
(also a valid thing to
>> > > >>>>> do),
>> > > >>>>> and
>> >
>> > > >>>>> this is a second thread writing to
`%par`, then the two
>> > > >>>>> values
>> > > >>>>> loaded
>> >
>> > > >>>>> can be different, and you could end up
printing `"X"` in
>> > > >>>>> `@foo`.
>> >
>> > > >>>>>
>> >
>> > > >>>>> Did that make sense?
>> >
>> > > >>>> Yes. To be sure I understand the scope: this
is only a problem
>> > > >>>> for
>> >
>> > > >>>> atomics, correct? (Because multi-threaded
behaviour with other
>> > > >>>> globals
>> >
>> > > >>>> is UB?)
>> >
>> > > >>>>
>> >
>> > > >>>>>> Does linkonce_odr linkage have the
same problem?
>> >
>> > > >>>>>> - If so, do you want to change it
too?
>> >
>> > > >>>>>> - Else, why not?
>> >
>> > > >>>>> Going by the specification in the
LangRef, I'd say it depends
>> > > >>>>> on how
>> >
>> > > >>>>> you define "definitive". If
you're allowed to replace the
>> > > >>>>> body
>> > > >>>>> of a
>> >
>> > > >>>>> function with a differently optimized
body, then the above
>> > > >>>>> problem
>> >
>> > > >>>>> exists.
>> >
>> > > >>>>>
>> >
>> > > >>>>> I believe that is the case, and I
strongly believe the
>> > > >>>>> problem
>> > > >>>>> you
>> >
>> > > >>>>> outline exists for linkonce_odr exactly
as it does for
>> > > >>>>> available_externally.
>> >
>> > > >>>>>
>> >
>> > > >>>>> Which is what makes this scary: every
C++ inline function
>> > > >>>>> today
>> > > >>>>> can
>> >
>> > > >>>>> trigger this.
>> >
>> > > >>>> Every C/C++ inline or template function. But
only the ones
>> > > >>>> that
>> > > >>>> use
>> >
>> > > >>>> atomics, right?
>> >
>> > > >>>
>> >
>> > > >>> Well, with *this* example...
>> >
>> > > >> Atomic are one source of non-determinism that
compilers can
>> > > >> reason
>> >
>> > > >> about. I don't know if the following snippet is
well defined or
>> > > >> not,
>> >
>> > > >> but you could have similar issues with
>> >
>> > > >>
>> >
>> > > >>
>> >
>> > > >> void foo() {
>> >
>> > > >> int *p = malloc(sizeof(int));
>> >
>> > > >> if (*p < 10) print("X");
>> >
>> > > >> }
>> >
>> > > >>
>> >
>> > > >> or (again, I don't know if this is actually well
defined)
>> >
>> > > >>
>> >
>> > > >> void foo() {
>> >
>> > > >> int t; // it is probably reasonable to fold compares
with
>> >
>> > > >> ptrtoint(alloca) to undef
>> >
>> > > >> if ((intptr_t)(&t) < 10)
print("X");
>> >
>> > > >> }
>> >
>> > > >>
>> >
>> > > > The first one at least is UB, but as Richard pointed out
the
>> > > > scope
>> >
>> > > > is certainly broader than atomics (it's not even
just
>> > > > well-defined
>> >
>> > > > non-deterministism).
>> >
>> > > >
>> >
>> > > > I'm kind of terrified by the implications.
>> >
>> > > Me too. :(
>> >
>> > > >
>> >
>> > > >> -- Sanjoy
>> >
>> > > >>
>> >
>> > > >>>>
>> >
>> > > >>>> Not that I'm sure that will end up being
a helpful
>> > > >>>> distinction.
>> >
>> > > >>>
>> >
>> > > >>> Right. See Richard's comment. I think that
sums up the real
>> > > >>> issue
>> > > >>> here. =/
>> >
>> > > > _______________________________________________
>> >
>> > > > LLVM Developers mailing list
>> >
>> > > > llvm-dev at lists.llvm.org
>> >
>> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> 
>> > > _______________________________________________
>> >
>> > > LLVM Developers mailing list
>> >
>> > > llvm-dev at lists.llvm.org
>> >
>> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> 
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> --
>> 
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160224/d3b5f398/attachment.html>

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Feb 2016 - Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")

Seemingly Similar Threads