thr3ads.net - llvm dev - [llvm-dev] Potential issue with noalias @malloc and @realloc [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Daniel Berlin via llvm-dev

2017-Apr-12 19:19 UTC

[llvm-dev] Potential issue with noalias @malloc and @realloc

>
>
> It seems to me that there are two ways of thinking about this: either the
> value of a pointer in IR is richer than its bit sequence, in which case
> replacing p1 with p0 in a block predicated by p0 == p1 is an incorrect
> transformation if you cannot prove that one pointer was based on the other,
>
Which would be a non-starter just from the cost of doing so, not to mention
the allowable optimization you lose :)
> or the value of a pointer in IR is exactly its bit sequence, in which case
> the code performing the transformation incorrectly updated the IR and a
> correct transformation would need to somehow remove the noalias from the
> malloc calls.
>
Sure, but noalias is just a symptom here. You can make the same thing occur
in other ways.  It's fundamentally an issue of being able to express when
the abstract identity of a pointer has changed even when the bit-value has
not.

IE there are enough transformation updates that need to occur that we
probably need to do something different than try to band-aid/patch all the
places that will have this issue.


The C++ object model formally takes the former standpoint; its
pointers> notionally point to objects, which are abstract entities occupying storage,
> rather than pointing to the storage itself.
>

Which, i get why they do (in fact i would do the same), but saying the
abstract objects have an identity outside of the bit values of the pointers
means that the IR's need to be able to represent identity changes
separately from value changes (this is what i meant by "add support for
describing lifetimes that has semantic meaning").
I'm not aware of any compiler that does this effectively, and it's a
fairly
large semantic change.

They all pretty much hack it and hope they don't break too much shit.

Separately, changing noalias would just band-aid it. You can make the same
thing occur with TBAA, placement new, or really, any way we have where the
abstract identity may change but llvm doesn't express it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170412/b39afb4c/attachment.html>

Daniel Berlin via llvm-dev

2017-Apr-12 19:38 UTC

head link

[llvm-dev] Potential issue with noalias @malloc and @realloc

On Wed, Apr 12, 2017 at 12:19 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:
>
>> It seems to me that there are two ways of thinking about this: either
the
>> value of a pointer in IR is richer than its bit sequence, in which case
>> replacing p1 with p0 in a block predicated by p0 == p1 is an incorrect
>> transformation if you cannot prove that one pointer was based on the
other,
>>
>
> Which would be a non-starter just from the cost of doing so, not to
> mention the allowable optimization you lose :)
>
>> or the value of a pointer in IR is exactly its bit sequence, in which
>> case the code performing the transformation incorrectly updated the IR
and
>> a correct transformation would need to somehow remove the noalias from
the
>> malloc calls.
>>
>
> Also note, btw, the transformation did not incorrectly update the IR.The store and the load were never noalias in the first place, becuase they
were *always* talking about different abstract identities than the one that
the malloc calls had.  It's just not expressed in the IR, and it  was not
used to optimize until the other transformations occurred.

That is, with a smart enough optimization code, I could right now,
 directly go from the start code to the end code with the IR as it appeared
in the first example.
We just don't happen to.

Given good enough optimizers, there is no transformation update point to
fix, there is only the missing semantic information that the abstract
identity of the memory being stored and loaded is not the same as the
abstract identity of the pointer returned by malloc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170412/216ee4f0/attachment.html>

Richard Smith via llvm-dev

2017-Apr-12 20:24 UTC

head link

[llvm-dev] Potential issue with noalias @malloc and @realloc

On 12 April 2017 at 12:19, Daniel Berlin <dberlin at dberlin.org> wrote:
>
>> It seems to me that there are two ways of thinking about this: either
the
>> value of a pointer in IR is richer than its bit sequence, in which case
>> replacing p1 with p0 in a block predicated by p0 == p1 is an incorrect
>> transformation if you cannot prove that one pointer was based on the
other,
>>
>
> Which would be a non-starter just from the cost of doing so, not to
> mention the allowable optimization you lose :)
>
It may be possible to modify the optimization rather than losing it, such
as by replacing p1 with barrier(p0) rather than with simply p0, where
barrier(p) returns some pointer that is bitwise identical to p but may
carry different ancillary data.

The way I'm thinking of this is that we have a semilattice of virtual
values for each bit-pattern, where it's generally correct to move a value
up in the lattice but not down; the barrier would return the top element of
the lattice. Viewed this way, the problem is that replacing p1 with p0 may
change to a value that is not at or above the original value in the
semilattice. We could preserve more information when doing these value
replacements by inserting an intrinsic representing join(p0, p1) instead of
barrier(p0), but that seems like it would hinder further optimization
rather than help it.

or the value of a pointer in IR is exactly its bit sequence, in which
case>> the code performing the transformation incorrectly updated the IR and a
>> correct transformation would need to somehow remove the noalias from
the
>> malloc calls.
>>
>
> Sure, but noalias is just a symptom here. You can make the same thing
> occur in other ways.  It's fundamentally an issue of being able to
express
> when the abstract identity of a pointer has changed even when the bit-value
> has not.
>
I agree. Is it reasonable and feasible to require every optimization that
replaces one (pointer) value with another value that is known to be
bitwise-equal but not known to be semantically equivalent to -- somehow --
express that in the resulting IR?

IE there are enough transformation updates that need to occur that
we> probably need to do something different than try to band-aid/patch all the
> places that will have this issue.
>
>
> The C++ object model formally takes the former standpoint; its pointers
>> notionally point to objects, which are abstract entities occupying
storage,
>> rather than pointing to the storage itself.
>>
>
>
> Which, i get why they do (in fact i would do the same), but saying the
> abstract objects have an identity outside of the bit values of the pointers
> means that the IR's need to be able to represent identity changes
> separately from value changes (this is what i meant by "add support
for
> describing lifetimes that has semantic meaning").
> I'm not aware of any compiler that does this effectively, and it's
a
> fairly large semantic change.
>
> They all pretty much hack it and hope they don't break too much shit.
>
> Separately, changing noalias would just band-aid it. You can make the same
> thing occur with TBAA, placement new, or really, any way we have where the
> abstract identity may change but llvm doesn't express it.
>
Right. Even just for the noalias case, allocas would seem have the same
problem as malloc:

int *f() { int n; return &n; };
int g(int *p) {
  int n, v;
  for (int k = 0; k != 1; ++k) {
    n = 20;
    v = n;
    if (false) // obscured, as per Sanjoy's example
      if (&n == p) // noalias but may evaluate to true
        a();
      else
        b();
  }
}
void h() { g(f()); }
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170412/554f4f8a/attachment-0001.html>

Daniel Berlin via llvm-dev

2017-Apr-12 22:51 UTC

head link

[llvm-dev] Potential issue with noalias @malloc and @realloc

On Wed, Apr 12, 2017 at 1:24 PM, Richard Smith <richard at metafoo.co.uk>
wrote:
> On 12 April 2017 at 12:19, Daniel Berlin <dberlin at dberlin.org>
wrote:
>
>>
>>> It seems to me that there are two ways of thinking about this:
either
>>> the value of a pointer in IR is richer than its bit sequence, in
which case
>>> replacing p1 with p0 in a block predicated by p0 == p1 is an
incorrect
>>> transformation if you cannot prove that one pointer was based on
the other,
>>>
>>
>> Which would be a non-starter just from the cost of doing so, not to
>> mention the allowable optimization you lose :)
>>
>
> It may be possible to modify the optimization rather than losing it, such
> as by replacing p1 with barrier(p0) rather than with simply p0, where
> barrier(p) returns some pointer that is bitwise identical to p but may
> carry different ancillary data.
>
> The way I'm thinking of this is that we have a semilattice of virtual
> values for each bit-pattern, where it's generally correct to move a
value
> up in the lattice but not down; the barrier would return the top element of
> the lattice. Viewed this way, the problem is that replacing p1 with p0 may
> change to a value that is not at or above the original value in the
> semilattice. We could preserve more information when doing these value
> replacements by inserting an intrinsic representing join(p0, p1) instead of
> barrier(p0), but that seems like it would hinder further optimization
> rather than help it.
>
> or the value of a pointer in IR is exactly its bit sequence, in which case
>>> the code performing the transformation incorrectly updated the IR
and a
>>> correct transformation would need to somehow remove the noalias
from the
>>> malloc calls.
>>>
>>
>> Sure, but noalias is just a symptom here. You can make the same thing
>> occur in other ways.  It's fundamentally an issue of being able to
express
>> when the abstract identity of a pointer has changed even when the
bit-value
>> has not.
>>
>
> I agree. Is it reasonable and feasible to require every optimization that
> replaces one (pointer) value with another value that is known to be
> bitwise-equal but not known to be semantically equivalent to -- somehow --
> express that in the resulting IR?
>

Yes.
It's not *practically* different than requiring them to do something like
 "update memoryssa" (ie in terms of how you'd do it or where
you'd do it)
or in some cases "update ssa".


>
> IE there are enough transformation updates that need to occur that we
>> probably need to do something different than try to band-aid/patch all
the
>> places that will have this issue.
>>
>>
>> The C++ object model formally takes the former standpoint; its pointers
>>> notionally point to objects, which are abstract entities occupying
storage,
>>> rather than pointing to the storage itself.
>>>
>>
>>
>> Which, i get why they do (in fact i would do the same), but saying the
>> abstract objects have an identity outside of the bit values of the
pointers
>> means that the IR's need to be able to represent identity changes
>> separately from value changes (this is what i meant by "add
support for
>> describing lifetimes that has semantic meaning").
>> I'm not aware of any compiler that does this effectively, and
it's a
>> fairly large semantic change.
>>
>> They all pretty much hack it and hope they don't break too much
shit.
>>
>> Separately, changing noalias would just band-aid it. You can make the
>> same thing occur with TBAA, placement new, or really, any way we have
where
>> the abstract identity may change but llvm doesn't express it.
>>
>
> Right. Even just for the noalias case, allocas would seem have the same
> problem as malloc:
>

Yeah, though until we want to do a lot of work, i fear we should just be
ignoring these issues until we have real world programs we are breaking :)

I hate answers like that, but i feel like ATM it's more in the category of
"something we should be cognizant of" than "something we should
cut a wide
swath through the compiler destroying optimizations to make work".

Maybe others feel differently, however.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170412/a5666dae/attachment.html>

Sanjoy Das via llvm-dev

2017-Apr-13 07:27 UTC

head link

[llvm-dev] Potential issue with noalias @malloc and @realloc

Hi Richard,

On April 12, 2017 at 1:24:55 PM, Richard Smith via llvm-dev
(llvm-dev at lists.llvm.org) wrote:> >> It seems to me that there are two ways of thinking about this:
either the
> >> value of a pointer in IR is richer than its bit sequence, in which
case
> >> replacing p1 with p0 in a block predicated by p0 == p1 is an
incorrect
> >> transformation if you cannot prove that one pointer was based on
the other,
> >>
> >
> > Which would be a non-starter just from the cost of doing so, not to
> > mention the allowable optimization you lose :)
> >
>
> It may be possible to modify the optimization rather than losing it, such
> as by replacing p1 with barrier(p0) rather than with simply p0, where
> barrier(p) returns some pointer that is bitwise identical to p but may
> carry different ancillary data.
>
> The way I'm thinking of this is that we have a semilattice of virtual
> values for each bit-pattern, where it's generally correct to move a
value
> up in the lattice but not down; the barrier would return the top element of
> the lattice. Viewed this way, the problem is that replacing p1 with p0 may
> change to a value that is not at or above the original value in the
> semilattice. We could preserve more information when doing these value
> replacements by inserting an intrinsic representing join(p0, p1) instead of
> barrier(p0), but that seems like it would hinder further optimization
> rather than help it.
That's an interesting idea!  Is it correct to rephrase what you said
as "barrier(x) returns the most recent abstract object inhabiting the
physical address x"?

The two caveats I see are:

 - This won't allow some pointer to integer conversions, since you're
   relying on the IR types to decided when to insert barrier calls.

 - malloc's return value won't be noalias with values that may have
   been produced by other threads.

To elaborate the second point, I mean in

T0:
  a = malloc();
  b = relaxed_load ptr

a has to MayAlias b, because you could have a second thread do:

T1:
  x = malloc()
  free(x)
  t = barrier(x)
  relaxed store t to ptr

since a possible execution is:

  x = malloc()
  free(x)
  a = malloc();
  // Now the newest value inhabiting the physical address x
  // is `a`, because malloc reused the storage
  t = barrier(x) = a
  relaxed store a to ptr
  b = relaxed_load ptr = a

However, I've not thought about how to make T1 happen from a well
defined C/C++ program -- perhaps that's impossible (maybe we can
somehow rule it out in the IR level too).  Secondly, I'm not convinced
that I'll actually be able to show the above problem (if it is one)
manifest as a miscompile.

Another way to look at it is: barrier provides a way for threads to
"guess" abstract heap locations in some cases, which may be
problematic.

-- Sanjoy

llvm dev - Apr 2017 - Potential issue with noalias @malloc and @realloc

[llvm-dev] Potential issue with noalias @malloc and @realloc

[llvm-dev] Potential issue with noalias @malloc and @realloc

[llvm-dev] Potential issue with noalias @malloc and @realloc

[llvm-dev] Potential issue with noalias @malloc and @realloc

[llvm-dev] Potential issue with noalias @malloc and @realloc