thr3ads.net - llvm dev - [llvm-dev] The nsw story revisited [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2017-Jun-28 19:34 UTC

[llvm-dev] The nsw story revisited

On 06/28/2017 02:18 PM, Chandler Carruth wrote:> On Wed, Jun 28, 2017 at 12:09 PM Peter Lawrence 
> <peterl95124 at sbcglobal.net <mailto:peterl95124 at
sbcglobal.net>> wrote:
>
>     Chandler,
>                     Please give some citations, I’ve search the
>     llvm-dev archives and didn't find any.
>
>
> They are all in the discussions from Nuno, Sanjoy, John and others 
> around poison semantics. I won't be able to find the citations any 
> easier than you, I simply wanted to point out that rather than 
> conclude this is novel, you should probably ask folks who have 
> participated in these discussions for reference points that they may have.
Peter, I'll also point out that, in general, information on the subject 
of our UB handling is also spread throughout numerous bug reports, patch 
reviews, and in addition, has been the subject of many, many hours of 
in-person discussions. The fact that there's no one place to get a good 
in-depth overview is something that we, as a community, should strive to 
improve. This is a very important topic. However, I can assure you that 
what you've found by searching llvm-dev is only part of the relevant 
history.

  -Hal
>
>     Peter Lawrence.
>
>
>>     On Jun 28, 2017, at 12:01 PM, Chandler Carruth
>>     <chandlerc at gmail.com <mailto:chandlerc at
gmail.com>> wrote:
>>
>>     On Wed, Jun 28, 2017 at 9:39 AM Peter Lawrence
>>     <peterl95124 at sbcglobal.net <mailto:peterl95124 at
sbcglobal.net>> wrote:
>>
>>
>>         Preface: This paper shows that "poison" was never
actually
>>         necessary
>>         in the first place. “Poison"s existence is based on
incorrect
>>         assumptions
>>         that are being explored for the first time.
>>
>>
>>     Just so you are aware, there have been numerous people in the
>>     community who have been exploring the issues you bring up for
>>     many years. So I don't think it is correct to say that they are
>>     being explored for the first time.
>>
>>
>>
>>
>>         I have been re-reading Dan Gohman's original post "the
nsw
>>         story" [1]
>>         and have come to the conclusion that Dan got it wrong in some
>>         respects.
>>
>>         He came up with "no signed wrap" ("nsw")
which in his words means
>>         "The purpose of the nsw flag is to indicate instructions
>>         which are
>>         known to have no overflow".  The key operative word here
is
>>         "known".
>>
>>         This means literally an operation with an additional
>>         "llvm.assume".
>>         For example    (a +nsw b)    when a and b are i32   is really
>>
>>         ((llvm.assume((INT_MIN <= ((i64)a+(i64)b) <= INT_MAX) ,
(a + b))
>>
>>         It is this "assume" that justifies the transformation
that
>>         Dan does
>>         to optimize sign-extension of i32 induction variables out of
>>         loops
>>         on LP64 targets.  So far so good.
>>
>>         Note that there is no "undef" in the IR either before
or
>>         after the
>>         transform, this doesn't just fall out because of a clever
>>         definition
>>         of IR "undef".
>>
>>         Note that even the concept of "undefined" never
enters into the
>>         justification, only that "nsw" ==>
"assume" ==> the loop
>>         iteration
>>         bounds don't wrap ==> i64 arithmetic will generate the
exact same
>>         iterations as i32 arithmetic ==> the induction variable can
be
>>         promoted ==> there is no longer any sign-extend inside the
loop.
>>
>>         Note that clang can generate "+nsw" for signed
“+" regardless
>>         of whether the precise C standard wording is "undefined
behavior"
>>         or more simply "unspecified value".
>>
>>
>>
>>         Where Dan goes wrong is in thinking that "+nsw" is an
operation
>>         rather than an operation with an assume, and therefore he feels
>>         the need to define what the result of this operation is when it
>>         overflows, which then seems to require a new "poison"
instruction
>>         because "undef" isn't good enough.
>>
>>         But there is no need to ask what the result of overflow is
>>         because "+nsw" is like a "+" inside of an
if-statement whose
>>         condition precludes overflow, and if it can't overflow then
>>         asking about it is a non sequitor.
>>
>>         And speculatively hoisting the "+nsw" doesn't
cause problems
>>         because hoisting a "+nsw" is like taking a
"+" outside of the
>>         if-statement that guarantees no overflow, it is then simply
>>         a plain old un-attributed "+" operation which has no
undefined
>>         behavior.
>>
>>         Dan's follow on email "nsw is still inconsistent"
[2] shows by
>>         example why it is illegal to hoist the "nsw"
attribute along
>>         with the "+" operation.
>>
>>         It therefore makes no sense to discuss the result of
"+nsw" as
>>         ever being either "undef" or "poison", and
so the need for
>>         "poison"
>>         is gone.
>>
>>
>>
>>         Here's what Dan thought at the time about this
"poison" creation
>>
>>                 "I wrote up a description of this concept, and
it's
>>         been in
>>                 LangRef ever since. It sticks out though, because it
>>         got pretty
>>                 big and complex, especially in view of its relative
>>         obscurity.
>>                 Realistically speaking, it's probably not fully
>>         watertight yet."
>>
>>         I agree with Dan here "it's probably not fully
watertight
>>         yet", and
>>         apparently other folks agree because yet another instruction,
>>         "freeze", is being proposed to fix
"poison"s problems. My guess
>>         is that "freeze is probably not fully watertight yet"
either, but
>>         since "poison" isn't needed it is time to delete
it from the
>>         LangRef,
>>         and we can therefore stop considering "freeze".
>>
>>
>>         Peter Lawrence.
>>
>>
>>         References
>>
>>         [1. llvm-dev, Dan Gohman, Tue Nov 29 15:21:58 PST 2011 ]
>>         [2. llvm-dev, Dan Gohman, Mon Dec 12 12:58:31 PST 2011 ]
>>
>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170628/3e901af0/attachment.html>

Peter Lawrence via llvm-dev

2017-Jun-28 19:42 UTC

head link

[llvm-dev] The nsw story revisited

Hal,
Chandler,
               Can we get back on topic, IE if you’ve actually read the paper
I’d like to
hear some feedback on its technical content

Peter Lawrence.

> On Jun 28, 2017, at 12:34 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> 
> On 06/28/2017 02:18 PM, Chandler Carruth wrote:
>> On Wed, Jun 28, 2017 at 12:09 PM Peter Lawrence <peterl95124 at
sbcglobal.net <mailto:peterl95124 at sbcglobal.net>> wrote:
>> Chandler,
>>                 Please give some citations, I’ve search the llvm-dev
archives and didn't find any.
>> 
>> They are all in the discussions from Nuno, Sanjoy, John and others
around poison semantics. I won't be able to find the citations any easier
than you, I simply wanted to point out that rather than conclude this is novel,
you should probably ask folks who have participated in these discussions for
reference points that they may have.
> 
> Peter, I'll also point out that, in general, information on the subject
of our UB handling is also spread throughout numerous bug reports, patch
reviews, and in addition, has been the subject of many, many hours of in-person
discussions. The fact that there's no one place to get a good in-depth
overview is something that we, as a community, should strive to improve. This is
a very important topic. However, I can assure you that what you've found by
searching llvm-dev is only part of the relevant history.
> 
>  -Hal
> 
>>  
>> 
>> Peter Lawrence.
>> 
>> 
>>> On Jun 28, 2017, at 12:01 PM, Chandler Carruth <chandlerc at
gmail.com <mailto:chandlerc at gmail.com>> wrote:
>>> 
>>> On Wed, Jun 28, 2017 at 9:39 AM Peter Lawrence <peterl95124 at
sbcglobal.net <mailto:peterl95124 at sbcglobal.net>> wrote:
>>> 
>>> Preface: This paper shows that "poison" was never
actually necessary
>>> in the first place. “Poison"s existence is based on incorrect
assumptions
>>> that are being explored for the first time.
>>> 
>>> Just so you are aware, there have been numerous people in the
community who have been exploring the issues you bring up for many years. So I
don't think it is correct to say that they are being explored for the first
time.
>>>  
>>> 
>>> 
>>> 
>>> I have been re-reading Dan Gohman's original post "the nsw
story" [1]
>>> and have come to the conclusion that Dan got it wrong in some
respects.
>>> 
>>> He came up with "no signed wrap" ("nsw") which
in his words means
>>> "The purpose of the nsw flag is to indicate instructions which
are
>>> known to have no overflow".  The key operative word here is
"known".
>>> 
>>> This means literally an operation with an additional
"llvm.assume".
>>> For example    (a +nsw b)    when a and b are i32   is really
>>> 
>>>   ((llvm.assume((INT_MIN <= ((i64)a+(i64)b) <= INT_MAX) , (a
+ b))
>>> 
>>> It is this "assume" that justifies the transformation
that Dan does
>>> to optimize sign-extension of i32 induction variables out of loops
>>> on LP64 targets.  So far so good.
>>> 
>>> Note that there is no "undef" in the IR either before or
after the
>>> transform, this doesn't just fall out because of a clever
definition
>>> of IR "undef".
>>> 
>>> Note that even the concept of "undefined" never enters
into the
>>> justification, only that "nsw" ==> "assume"
==> the loop iteration
>>> bounds don't wrap ==> i64 arithmetic will generate the exact
same
>>> iterations as i32 arithmetic ==> the induction variable can be
>>> promoted ==> there is no longer any sign-extend inside the loop.
>>> 
>>> Note that clang can generate "+nsw" for signed “+"
regardless
>>> of whether the precise C standard wording is "undefined
behavior"
>>> or more simply "unspecified value".
>>> 
>>> 
>>> 
>>> Where Dan goes wrong is in thinking that "+nsw" is an
operation
>>> rather than an operation with an assume, and therefore he feels
>>> the need to define what the result of this operation is when it
>>> overflows, which then seems to require a new "poison"
instruction
>>> because "undef" isn't good enough.
>>> 
>>> But there is no need to ask what the result of overflow is
>>> because "+nsw" is like a "+" inside of an
if-statement whose
>>> condition precludes overflow, and if it can't overflow then
>>> asking about it is a non sequitor.
>>> 
>>> And speculatively hoisting the "+nsw" doesn't cause
problems
>>> because hoisting a "+nsw" is like taking a "+"
outside of the
>>> if-statement that guarantees no overflow, it is then simply
>>> a plain old un-attributed "+" operation which has no
undefined
>>> behavior.
>>> 
>>> Dan's follow on email "nsw is still inconsistent" [2]
shows by
>>> example why it is illegal to hoist the "nsw" attribute
along
>>> with the "+" operation.
>>> 
>>> It therefore makes no sense to discuss the result of
"+nsw" as
>>> ever being either "undef" or "poison", and so
the need for "poison"
>>> is gone.
>>> 
>>> 
>>> 
>>> Here's what Dan thought at the time about this
"poison" creation
>>> 
>>>         "I wrote up a description of this concept, and
it's been in
>>>         LangRef ever since. It sticks out though, because it got
pretty
>>>         big and complex, especially in view of its relative
obscurity.
>>>         Realistically speaking, it's probably not fully
watertight yet."
>>> 
>>> I agree with Dan here "it's probably not fully watertight
yet", and
>>> apparently other folks agree because yet another instruction,
>>> "freeze", is being proposed to fix "poison"s
problems.  My guess
>>> is that "freeze is probably not fully watertight yet"
either, but
>>> since "poison" isn't needed it is time to delete it
from the LangRef,
>>> and we can therefore stop considering "freeze".
>>> 
>>> 
>>> Peter Lawrence.
>>> 
>>> 
>>> References
>>> 
>>> [1. llvm-dev, Dan Gohman, Tue Nov 29 15:21:58 PST 2011 ]
>>> [2. llvm-dev, Dan Gohman, Mon Dec 12 12:58:31 PST 2011 ]
>>> 
>> 
> 
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170628/80535a98/attachment-0001.html>

Chandler Carruth via llvm-dev

2017-Jun-29 01:07 UTC

head link

[llvm-dev] The nsw story revisited

On Wed, Jun 28, 2017 at 12:42 PM Peter Lawrence <peterl95124 at
sbcglobal.net>
wrote:
> Hal,
> Chandler,
>                Can we get back on topic, IE if you’ve actually read the
> paper I’d like to
> hear some feedback on its technical content
>
Sure, I was mostly mentioning prior discussion because I don't think this
is going to add a lot to the discussion around `nsw` and the `llvm.assume`
intrinsic back when Hal and I started discussing modeling assumptions as
first class entities in the IR.

>>> I have been re-reading Dan Gohman's original post "the nsw
story" [1]
>>> and have come to the conclusion that Dan got it wrong in some
respects.
>>>
>>> He came up with "no signed wrap" ("nsw") which
in his words means
>>> "The purpose of the nsw flag is to indicate instructions which
are
>>> known to have no overflow".  The key operative word here is
"known".
>>>
>>> This means literally an operation with an additional
"llvm.assume".
>>> For example    (a +nsw b)    when a and b are i32   is really
>>>
>>>   ((llvm.assume((INT_MIN <= ((i64)a+(i64)b) <= INT_MAX) , (a
+ b))
>>>
>>> It is this "assume" that justifies the transformation
that Dan does
>>> to optimize sign-extension of i32 induction variables out of loops
>>> on LP64 targets.  So far so good.
>>>
>>I just want to point out that this isn't complete.

*Before* an `add nsw` instruction is hoisted, it could be thought of as a
normal `add` plus `llvm.assume`. But the `llvm.assume` cannot be hoisted
above predicates. So what this would end up doing is allowing the hoisting
of the `add` by separating the invariant and keeping the invariant below
the predicate. This is a fairly minor point, but I didn't want it to be
unstated.
>
>>> Where Dan goes wrong is in thinking that "+nsw" is an
operation
>>> rather than an operation with an assume, and therefore he feels
>>> the need to define what the result of this operation is when it
>>> overflows, which then seems to require a new "poison"
instruction
>>> because "undef" isn't good enough.
>>>
>>A meta comment. I think it would be helpful for you to phrase your
discussion in a less personalized way. Rather than talking about "So-and-so
went wrong when they thought ...", I would instead merely discuss the
situation as it is today. This helps keep the discussion from sliding into
inappropriate areas, which has already been a concern, so it seems worth
pointing out as a suggestion for future emails.

To the actual point, it seems backwards to say that we made a mistake in
specifying `nsw` as an operation rather than an `assume`. At the time we
specified `nsw` there was no such construct as `llvm.assume`. Perhaps
introducing `llvm.assume` would have been better than introducing `nsw`,
perhaps not. But it doesn't actually matter. This kind of historical
critique doesn't seem like a useful way to frame the discussion.

The point is, we have `llvm.assume` now, and we can and should consider
whether it is a better tool for the job. Hopefully I have understood the
point you are trying to make?

My response is that I don't think this is clearly a superior tool for the
job. We did actually discuss things like `nsw` when designing
`llvm.assume`. One serious issue with `llvm.assume` is that it *cannot be
speculated*. Even though we gain the ability to speculate the `add` by
decoupling the invariant, we would retain a large amount of IR that could
not be speculated and would cause us to retain control flow and be a
significant barrier to optimization.

In fact, `llvm.assume` is *incredibly* expensive already. We talked at
length when introducing it about how it should be relatively sparingly used
and used with care to provide the most impactful hints to the optimizer
rather than trying to encode everything that might possibly be assumed to
be true.

On the flip side, `add nsw` is extremely common in the IR produced by many
frontends. So it would seem like a poor fit to use a very expensive and
heavyweight tool to represent something that is very common.

I'm happy to view `add nsw` as an `add` with an implicit `llvm.assume`
inside of control flow where the result is "used" (for a complex
definition
of "used" that should be discussed on the `undef` thread and not here)
if
that is a useful notional or theoretical model. But the *representation
optimization* of `add nsw` still seems quite useful.

It therefore makes no sense to discuss the result of "+nsw"
as>>> ever being either "undef" or "poison", and so
the need for "poison"
>>> is gone.
>>>
>> Note that there have been examples which show that the `undef`
semanticsalone as used to model reading from uninitialized memory are effectively
equivalent to `poison`. So I don't believe that removing `nsw` would
necessarily be sufficient here. I think it would also require other
changes. And in your emails about `undef` you seem to agree that we would
have to change `undef` itself.

I think it would be more productive to focus on that thread. If we decide
to get rid of poison entirely (including changing `undef`), *then* we can
have a discussion about how to model `nsw`.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170629/a2c69f28/attachment-0001.html>

llvm dev - Jun 2017 - The nsw story revisited

[llvm-dev] The nsw story revisited

[llvm-dev] The nsw story revisited

[llvm-dev] The nsw story revisited