thr3ads.net - llvm dev - [llvm-dev] RFC: Devirtualization v2 [Mar 2018]

If this information is useful, please help other people find it:
Share via:

Piotr Padlewski via llvm-dev

2018-Mar-19 23:27 UTC

[llvm-dev] RFC: Devirtualization v2

Hi folks,

here is a link to the proposal that we've been working on lately:
https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBt
vjWUod2SujZVEo/edit?usp=sharing

But for the record, I also paste it here. Feedback will be really
appreciated!









































































































































































































































































*RFC: C++ Devirtualization v2Piotr Padlewski - piotr.padlewski at gmail.com
<piotr.padlewski at gmail.com>Krzysztof Pszeniczny -
krzysztof.pszeniczny at gmail.com <krzysztof.pszeniczny at gmail.com>Jakub
Kuderski - kubakuderski at gmail.com <kubakuderski at gmail.com>Richard
Smith -
RichardSmith at google.com <RichardSmith at google.com>This proposal
describes a
new model of representing pointers to dynamic objects as “fat pointers”. It
is designed to solve the hole in the previous devirtualization model that
could cause miscompilation. We believe that solving this is very important,
especially with the mitigation of the recent Spectre vulnerability -
retpolines.Introduction to previous devirtualizationIn the previous model
we introduced invariant.group to LLVM as a way of saying “this load will
produce the same value if given the same argument”, which was  applied on
loads from virtual pointer to communicate that virtual pointer (or in other
words, dynamic type) does not change during the lifetime of an object.
Because virtual pointer might be set multiple times during construction and
destruction of derived types, and because it is possible to change the
dynamic type of an object using placement new, invariant.group.barrier was
introduced. It is used in every place where the dynamic type might change.
To turn devirtualization on user had to specify the
-fstrict-vtable-pointers flag for clang. If you want to learn more about
the previous model, check out [0][1][2][3]. Although the previous model is
not sound for C++, languages like Java or Scala can safely use it. This is
because virtual pointer in Java is set only once for every dynamic type,
and you can’t do things like placement new, which means 2 aliasing pointers
will always have the same dynamic type (also because every pointer still
points to a valid object).The problemThe old model miscompiles on the
following example:A *a = new A;a->foo(); A *b = new(a) B;if (a == b)
 b->foo(); // This call could be devirtualized to A::foo()The problem is
that GVN and other pass replaces the SSA value of b with a based on the
a==b comparison, which is a totally legal optimization in LLVM and we
surely still want to do it. In C++ however, If you replaced b with a inside
the if statement’s body, then you would introduce UB. The problem arises
because after changing %b to %a in LLVM, the load from virtual pointer can
be devirtualized to different type. %vtable_a = load %a,
!invariant.group;;;; %a == %b%bool = icmp %a, %bbr %bool, %if, %afterif:;
if this will be changed to %a, then the optimizer will be able to replace ;
%vtable_b with %vtable_a%vtable_b = load %b, !invariant.groupFor other
corner cases check out appendix.SolutionThe proposed solution is to model
pointers to dynamic types as “fat pointers”. We can think of them as
pointers that also store the current dynamic type. Virtual calls would
consist of virtual pointer load from fat pointer. We will still use
invariant.groups for the devirtualization. The difference is that accessing
class field, or comparing pointers to dynamic objects would require a call
to a new intrinsic - i8* llvm.strip.invariant.group(i8* ) to firstly get
pointer without information about dynamic type. Creation (or reloading) of
fat pointer would be done by call to new intrinsic - i8*
llvm.launder.invariant.group(i8*) that replaces
llvm.invariant.group.barrier.The pointer comparison will now not be an
issue, because the optimizer will not be able to replace the operand of a
virtual pointer load with another pointer:%vtable_a = load i8* %a,
!invariant.group !{};;;; %a == %b%addr_a = call i8*
@llvm.strip.invariant.group(i8* %a)%addr_b = call i8*
@llvm.strip.invariant.group(i8* %b)%bool = icmp %addr_a, %addr_bbr %bool,
%if, %afterif:; This will not be able to change %b to %a%vtable_b = load
i8* %b, !invariant.group !{}It is important to notice that vtable load of
%b cannot be replaced with a load of %addr_b Although the optimizer could
potentially figure out that %b and %addr_b are aliasing, the aliasing rules
are not strong enough to make this transformation valid. One counterexample
could be that although both pointers are pointing to the same memory, one
could be a pointer to mmap’ed memory with no write/load permission.The
Solution formalizedFormally, to virtually mark pointers as optionally
belonging to invariant groups subject to the following rules: - alloca and
library malloc-like functions return pointers belonging to fresh invariant
groups- belonging to an invariant group is preserved by bitcasts- an
intrinsic, i8* llvm.strip.invariant.group(i8*) returns its argument with
the invariant group virtual metadata stripped- an intrinsic, i8*
llvm.launder.invariant.group(i8*) returns its argument with a fresh
invariant group virtual metadata, i.e. it starts a new invariant group.
This is similar to C++'s std::launder.- every load and store marked
!invariant.group from/to pointers belonging to the same invariant group
must load/store the same value- the behaviour of a load or store marked
!invariant.group from/to pointers not belonging to any invariant group
(e.g. obtained from llvm.strip.invariant.group) is undefined- constructors
may assume the this pointer passed to them belongs to a fresh invariant
groupFrom those rules one easily gets: - llvm.strip.invariant.group is a
pure function, i.e. its value depends only on its argument. Its llvm
attributes include at least: readnone speculatable nounwind. Its results
must alias its argument.- llvm.launder.invariant.group is not a pure
function: it creates a fresh invariant group each time it's called. One may
mentally model this as getting a fresh invariant group identifier somewhere
from a 'magic' memory inaccessible to no-one else, so its llvm
attributes
include at least: inaccessiblememonly speculatable nounwind. Its results
must alias its argument.- strip({strip,launder}(X)) = strip(X). This is
because we do not care which particular invariant group metadata is
stripped.- launder({strip,launder}(X)) may be replaced with launder(X) by
the optimiser. Note that this not mean that launder(launder(X)) launder(X)in the
IR itself, but only on a metalanguage level: launder is
inherently nondeterministic, so no two invocations of it ever return the
same value.This allows to compile the following C++ constructs: - vtable
loads required to perform virtual calls are marked with !invariant.group,
as before- constructors of derived classes need to launder the this pointer
before passing it to the constructors of base classes: they may
subsequently operate on the original this pointer- likewise, destructors of
derived classes need to launder the this pointer before passing it to the
destructors of base classes- placement new and std::launder need to call
launder- accesses to union members need to call launder, because the active
member of the union may have changed since the last visible access-
whenever two pointers are to be compared, they must be stripped first,
because we want the comparison to provide information about the address
equality, and not about invariant group equality- likewise, whenever a
pointer is to be cast to an integer type, it must be stripped first-
whenever an integer is cast to a pointer type, the result must be laundered
before it is used, to prevent reasoning about the equality of integers to
provide any information about equality of invariant groupsNote that adding
calls to strip and launder related to pointer comparisons and
integer<->pointer conversions will not cause any semantic information to
be
lost: if any piece of information could be inferred by the optimiser about
some collection of variables (e.g. that two pointers are equal) can be
inferred now about their stripped versions, no matter how many strip and
launder calls have been made to obtain them in the IR. As an example, the
C++ expression ptr == std::launder(ptr) will be optimised to true, because
it will compare strip(ptr) with strip(launder(ptr)), which are indeed equal
according to our rules.Semantics of
strip.invariant.groupllvm.strip.invariant.group is a readnone speculatable
nounwind function that takes a fat pointer as argument and returns a
pointer with stripped information about dynamic type. The returned value
must alias with its argument.Semantics of
launder.invariant.groupllvm.launder.invariant.group is a
inaccessiblememonly speculatable nounwind function that takes a pointer
argument and a returns pointer that represents concept of a fat pointer
containing dynamic information. The returned value must alias with its
argument.We can think of it, as a function that uses magic memory (virtual
inaccessible bits in the pointer) for generating new identifier for dynamic
type.Why we need another ‘barrier’?Because stripping the information from
the same pointer returns the same result, which is not the case if we would
use launder.invariant.group instead:%a = strip(%x)%b = strip(%x)cmp eq %a,
%b => cmp eq %a, %aNote also that stripping dynamic type information from
laundered pointer is the same as stripping information from the pointer
itself:strip(launder(%x)) => strip(%x)Strip is also idempotent, because
stripping 2 times gives the same as stripping one time:strip(strip(%x)) =>
strip(%x) Examples with code snippetsHere are a couple of examples of
emitted LLVM, assuming definition of type A and B as below. Note that no
optimizations has been applied to this examples. If you want to see full
code code check this
snippet:https://gist.github.com/prazek/109c388d175a0114cf8a5e10787104ca
<https://gist.github.com/prazek/109c388d175a0114cf8a5e10787104ca>and this
one to see how it is optimized by current
pipeline:https://gist.github.com/prazek/a0215c056821931136fef6f2b78f4962
<https://gist.github.com/prazek/a0215c056821931136fef6f2b78f4962>struct A
{
  virtual void foo();   int field;};struct B : A {  void foo() override;
 int field2;};Constructor:Before:define linkonce_odr void @A_A(%struct.A *)
{  %2 = getelementptr inbounds %struct.A, %struct.A * %0, i64 0, i32 0
 store i32 (...)** bitcast (i8** getelementptr inbounds ({ [3 x i8*] }, {
[3 x i8*] }* @vtable_for_A, i64 0, inrange i32 0, i64 2) to i32 (...)**),
i32 (...)** * %2, align 8, !invariant.group !0  %3 = getelementptr inbounds
%struct.A, %struct.A* %0, i64 0, i32 1  store i32 0, i32* %3  ret
void}define linkonce_odr void @B_B(%struct.B *) {  %2 = bitcast %struct.B *
%0 to i8 *  %3 = tail call i8 * @llvm.invariant.group.barrier.p0i8(i8 * %2)
 %4 = bitcast i8 * %3 to %struct.A *    tail call void @A_A(%struct.A * %4)
 %5 = getelementptr inbounds %struct.B, %struct.B * %0, i64 0, i32 0, i32 0
 store i32 (...)** bitcast (i8** getelementptr inbounds ({ [3 x i8*] }, {
[3 x i8*] }* @vtable_for_B, i64 0, inrange i32 0, i64 2) to i32 (...)**),
i32 (...)** * %5, align 8, !invariant.group !0  ret void}After: ctors and
dtors take fat pointer as argument and uses @llvm.strip.invariant.group for
setting fields.define linkonce_odr void @A_A(%struct.A *) {  %2 getelementptr
inbounds %struct.A, %struct.A * %0, i64 0, i32 0  store i32
(...)** bitcast (i8** getelementptr inbounds ({ [3 x i8*] }, { [3 x i8*] }*
@vtable_for_A, i64 0, inrange i32 0, i64 2) to i32 (...)**), i32 (...)** *
%2, align 8, !invariant.group !0    %fatcast = bitcast %struct.A* %0 to i8*
 %base = call i8* @llvm.strip.invariant.group(i8* %fatcast)  %ptr = bitcast
i8* %base to %struct.A*      %3 = getelementptr inbounds %struct.A,
%struct.A* %ptr, i64 0, i32 1  store i32 0, i32* %3  ret void}define
linkonce_odr void @B_B(%struct.B *) {  %2 = bitcast %struct.B * %0 to i8 *
 %3 = tail call i8 * @llvm.launder.invariant.group(i8 * %2)  %4 = bitcast
i8 * %3 to %struct.A *    tail call void @A_A(%struct.A * %4)  %5 getelementptr
inbounds %struct.B, %struct.B * %0, i64 0, i32 0, i32 0
 store i32 (...)** bitcast (i8** getelementptr inbounds ({ [3 x i8*] }, {
[3 x i8*] }* @vtable_for_B, i64 0, inrange i32 0, i64 2) to i32 (...)**),
i32 (...)** * %5, align 8, !invariant.group !0  ret void}Virtual call,
placement new and fields:int foo(A * a) {  a->field = 32;  a->foo();  int
p
= a->field;  a->foo();  return p;}void bar() {  A *a = new A;  foo(a);  A
*
b = new (a) B;  if (a == b)    b->foo();}Before:define i32 @foo(%struct.A
*) local_unnamed_addr #0 {  %2 = getelementptr inbounds %struct.A,
%struct.A* %0, i64 0, i32 1  store i32 32, i32* %2, align 8  %3 = bitcast
%struct.A * %0 to void (%struct.A *) ** *  %4 = load void (%struct.A *) **,
void (%struct.A *) *** %3, !invariant.group !0  %5 = load void (%struct.A
*)*, void (%struct.A *) ** %4, !invariant.load !0  tail call void
%5(%struct.A * %0)      %gep2 = getelementptr inbounds %struct.A,
%struct.A* %0, i64 0, i32 1  %6 = load i32, i32* %gep2, align 8  %7 = load
void (%struct.A *)**, void (%struct.A *)** * %3, !invariant.group !0  %8 load
void (%struct.A *)*, void (%struct.A *)** %7, !invariant.load !0  tail
call void %8(%struct.A * %0)  ret i32 %6}define void @bar()
local_unnamed_addr #0 {  %1 = tail call i8* @operator_new(i64 16)  %2 bitcast
i8* %1 to %struct.A*  tail call void @A_A(%struct.A * nonnull %2)
 %3 = tail call i32 @foo(%struct.A * nonnull %2)    %4 = bitcast %struct.A
* %2 to i8 *  %5 = tail call i8 * @llvm.invariant.group.barrier.p0i8(i8 *
%4)  %6 = bitcast i8 * %5 to %struct.B *    tail call void @B_B(%struct.B *
nonnull %6) #5    %c = bitcast %struct.B * %6 to %struct.A *  %bool = icmp
eq %struct.A* %2, %c  br i1 %bool, label %if, label %endif:  %7 = bitcast
%struct.B * %6 to %struct.A *  %8 = bitcast %struct.A * %7 to void
(%struct.A *)** *  %9 = load void (%struct.A *)**, void (%struct.A *)** *
%8, !invariant.group !0  %10 = load void (%struct.A *)*, void (%struct.A
*)** %9, !invariant.load !0  tail call void %10(%struct.A * nonnull %7)  br
label %endend:  ret void}After:define i32 @foo(%struct.A *)
local_unnamed_addr #0 {  %fatcast = bitcast %struct.A* %0 to i8*  %base call i8*
@llvm.strip.invariant.group(i8* %fatcast)  %ptr = bitcast i8*
%base to %struct.A*    %2 = getelementptr inbounds %struct.A, %struct.A*
%ptr, i64 0, i32 1  store i32 32, i32* %2, align 8  %3 = bitcast %struct.A
* %0 to void (%struct.A *) ** *  %4 = load void (%struct.A *) **, void
(%struct.A *) *** %3, !invariant.group !0  %5 = load void (%struct.A *)*,
void (%struct.A *) ** %4, !invariant.load !0  tail call void %5(%struct.A *
%0)     %fatcast2 = bitcast %struct.A* %0 to i8*  %base2 = call i8*
@llvm.strip.invariant.group(i8* %fatcast)  %ptr2 = bitcast i8* %base to
%struct.A*  %gep2 = getelementptr inbounds %struct.A, %struct.A* %ptr, i64
0, i32 1     %6 = load i32, i32* %gep2, align 8  %7 = load void (%struct.A
*)**, void (%struct.A *)** * %3, !invariant.group !0  %8 = load void
(%struct.A *)*, void (%struct.A *)** %7, !invariant.load !0  tail call void
%8(%struct.A * %0)  ret i32 %6}define void @bar() local_unnamed_addr #0 {
 %1 = tail call i8* @operator_new(i64 16)  ; note that we don’t need this
launder  %fat = call i8* @llvm.launder.invariant.group(i8* %1)  %2 bitcast i8*
%fat to %struct.A*  tail call void @A_A(%struct.A * nonnull %2)
 %3 = tail call i32 @foo(%struct.A * nonnull %2)    %4 = bitcast %struct.A
* %2 to i8 *  %5 = tail call i8 * @llvm.launder.invariant.group(i8 * %4)
 %6 = bitcast i8 * %5 to %struct.B *    tail call void @B_B(%struct.B *
nonnull %6) #5    %fatcast_a = bitcast %struct.A* %2 to i8*  %a1 = call i8*
@llvm.strip.invariant.group(i8* %fatcast_a)  %fatcast_b = bitcast
%struct.B* %6 to i8*  %b2 = call i8* @llvm.strip.invariant.group(i8*
%fatcast_b)  %bool = icmp eq i8* %a1, %b2  br i1 %bool, label %if, label
%endif:  %7 = bitcast %struct.B * %6 to %struct.A *  %8 = bitcast %struct.A
* %7 to void (%struct.A *)** *  %9 = load void (%struct.A *)**, void
(%struct.A *)** * %8, !invariant.group !0  %10 = load void (%struct.A *)*,
void (%struct.A *)** %9, !invariant.load !0  tail call void %10(%struct.A *
nonnull %7)  br label %endend:  ret void}Required changesLLVMBecause LTO
between a module with and without devirtualization will be invalid, we will
need to break LLVM level ABI. This is however already implemented, because
LTO between modules with invariant.group.barriers and without is also
invalid. This also means that if we don’t want to break ABI between modules
with and without optimizations, we will need to have invariant.barriers and
fatpointer.create/strip turned on all the time.  For the users it will
means that when switching to new compiler, they will have to recompile all
of the generated object files for LTO builds.ClangClang will require a
 couple of minor changes in CodeGen for constructors and
destructors.AcknowledgmentSpecial thanks for the initiators of this idea -
Richard Smith, Chandler Carruth and Sanjoy Das and also for all the other
people who took a part in helping with devirtualization v1 including:
Daniel Berlin, Reid Kleckner, David Majnemer, John McCall.References[0] -
Devirtualization in LLVM, Proceeding SPLASH Companion 2017 Proceedings
Companion of the 2017 ACM SIGPLAN International Conference on Systems,
Programming, Languages, and Applications: Software for Humanity -
https://dl.acm.org/citation.cfm?id=3135947
<https://dl.acm.org/citation.cfm?id=3135947>[1] - RFC: Devirtualization in
LLVM (ver 1) -
https://docs.google.com/document/d/1f2SGa4TIPuBGm6y6YO768GrQsA8awNfGEJSBFukLhYA/edit?usp=sharing
<https://docs.google.com/document/d/1f2SGa4TIPuBGm6y6YO768GrQsA8awNfGEJSBFukLhYA/edit?usp=sharing>[2]
- Devirtualization in LLVM - LLVM Dev meeting 2016 -
https://www.youtube.com/watch?v=qMhV6d3B1Vk
<https://www.youtube.com/watch?v=qMhV6d3B1Vk>[3] - Devirtualization in
LLVM
and Clang - llvm blog-
http://blog.llvm.org/2017/03/devirtualization-in-llvm-and-clang.html
<http://blog.llvm.org/2017/03/devirtualization-in-llvm-and-clang.html>AppendixHere
are some other corner cases that we found.UnionsBecause union can have
dynamic types as members, this means that they would have the same address.
To solve this we emit launder.invariant.group before getting any dynamic
type. It is possible that we could somehow avoid it, but because virtually
no one is using unions this way, this should not cause any problems.struct
A {  virtual void foo();};struct B : A { void foo() override;};union U {
  A a;   B b;   U() : b() {}};void do_call(A &a) {
  a.foo();}__attribute__((noinline)) void init_B(U &u) {   new(&u.b)
B;}void union_test(U &u) {   do_call(u.b);   new(&u.a) A;  
do_call(u.a);
  init_B(u);   do_call(u.b);}int main() {   U u;   union_test(u);}Ptr to
intIn this example, instead of comparing the pointers we compare the
addresses stored in integers. Because ptrtoint conversion will also require
call to strip.invariant.group, this will also work.void ptr_to_int() {   A
*a = new A;   a->foo();   A *b = new(a) B;   auto av = (uintptr_t)a;   auto
bv = (uintptr_t)b;   if (av == bv)       b->foo();}int to ptrHere because
we are creating a fat pointer from integer, we need to use launder:void
foo(Base *base) {  uintptr_t base_int = (uintptr_t)base;  /* base_int ptrtoint
(strip base) */  Base *base_int_ptr = (Base*)base_int;  /*
base_int_ptr = inttoptr base_int = inttoptr (ptrtoint (strip base)) = strip
base */  base_int_ptr->vfun();  ...  Derived *derived = std::launder(base);
 uintptr_t derived_int = (uintptr_t)derived;  /* derived_int = ptrtoint
(strip derived) */  /* Note: base_int == derived_int */  ....  Base
*derived_int_ptr = (Base*)derived_int;  /* derived_int_ptr = inttoptr
(ptrtoint (strip derived)) */  /* Note: base_int_ptr == derived_int_ptr */
 derived_int_ptr->vfun();  /* BUG: But it's different than
base_int_ptr->vfun()! */}*
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180320/de07168b/attachment-0001.html>

Reid Kleckner via llvm-dev

2018-Mar-20 21:14 UTC

head link

[llvm-dev] [cfe-dev] RFC: Devirtualization v2

Sounds like it works!

Basically, adding these extra strip.invariant.group calls before pointer
comparisons breaks the transform that was problematic. Presumably, clang
would only strip invariant groups from pointers to dynamic types before
casting them or using them in a comparison, so that the value equivalence
optimization still works in the general case. The proposal trades value
equivalence power for more devirtualization power.

One downside is that it may add many new IR instructions, but I don't see
how to avoid them, as the whole point is to create distinct Value objects
for use in the pointer equality comparison.

On Mon, Mar 19, 2018 at 4:31 PM Piotr Padlewski via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> Hi folks,
>
> here is a link to the proposal that we've been working on lately:
>
>
https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo/edit?usp=sharing
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180320/762ec145/attachment.html>

Piotr Padlewski via llvm-dev

2018-Mar-21 10:40 UTC

head link

[llvm-dev] [cfe-dev] RFC: Devirtualization v2

2018-03-20 22:14 GMT+01:00 Reid Kleckner <rnk at google.com>:
> Sounds like it works!
>
> Basically, adding these extra strip.invariant.group calls before pointer
> comparisons breaks the transform that was problematic. Presumably, clang
> would only strip invariant groups from pointers to dynamic types before
> casting them or using them in a comparison, so that the value equivalence
> optimization still works in the general case. The proposal trades value
> equivalence power for more devirtualization power.
>
> If optimizer was able to infer some equivalence of pointers before, itshould be also able to do it right now, and that is thanks to the
strip.invariant.group semantics. Compared to previous proposal, we do not
get any more devirtualization
power (assuming we didn't care about the full correctness) and hopefully we
do not loose any other optimizations. But ofc comparing to the -O2 we
definitelly get more devirtualization :)

> One downside is that it may add many new IR instructions, but I don't
see
> how to avoid them, as the whole point is to create distinct Value objects
> for use in the pointer equality comparison.
>
> That is correct, we do get more instructions (especially that the calls to{strip,launder}.invariant.group usually need to have bitcast to i8* and
back from i8*) and I also don't see how we could avoid them (exept unifying
pointer types which I hope will happen some day), but the good part is that
because of the semantics of strip (especially readnone attribute) the
optimizer should be able to CSE all the calls to strip.invariant.group to
one.

> On Mon, Mar 19, 2018 at 4:31 PM Piotr Padlewski via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> Hi folks,
>>
>> here is a link to the proposal that we've been working on lately:
>> https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBt
>> vjWUod2SujZVEo/edit?usp=sharing
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180321/b6f1bc80/attachment.html>

John McCall via llvm-dev

2018-Mar-28 21:23 UTC

head link

[llvm-dev] [cfe-dev] RFC: Devirtualization v2

> On Mar 19, 2018, at 7:27 PM, Piotr Padlewski via cfe-dev <cfe-dev at
lists.llvm.org> wrote:
> 
> Hi folks,
> 
> here is a link to the proposal that we've been working on lately:
>
https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo/edit?usp=sharing
<https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo/edit?usp=sharing>
> 
> But for the record, I also paste it here. Feedback will be really
appreciated!
> 
> 
> RFC: C++ Devirtualization v2
> Piotr Padlewski - piotr.padlewski at gmail.com <mailto:piotr.padlewski
at gmail.com>
> Krzysztof Pszeniczny - krzysztof.pszeniczny at gmail.com
<mailto:krzysztof.pszeniczny at gmail.com>
> Jakub Kuderski - kubakuderski at gmail.com <mailto:kubakuderski at
gmail.com>
> Richard Smith - RichardSmith at google.com <mailto:RichardSmith at
google.com>
> 
> 
> This proposal describes a new model of representing pointers to dynamic
objects as “fat pointers”. It is designed to solve the hole in the previous
devirtualization model that could cause miscompilation. We believe that solving
this is very important, especially with the mitigation of the recent Spectre
vulnerability - retpolines.
> 
> Introduction to previous devirtualization
> In the previous model we introduced invariant.group to LLVM as a way of
saying “this load will produce the same value if given the same argument”, which
was  applied on loads from virtual pointer to communicate that virtual pointer
(or in other words, dynamic type) does not change during the lifetime of an
object. Because virtual pointer might be set multiple times during construction
and destruction of derived types, and because it is possible to change the
dynamic type of an object using placement new, invariant.group.barrier was
introduced. It is used in every place where the dynamic type might change. To
turn devirtualization on user had to specify the -fstrict-vtable-pointers flag
for clang.
> If you want to learn more about the previous model, check out [0][1][2][3].
Although the previous model is not sound for C++, languages like Java or Scala
can safely use it. This is because virtual pointer in Java is set only once for
every dynamic type, and you can’t do things like placement new, which means 2
aliasing pointers will always have the same dynamic type (also because every
pointer still points to a valid object).
> 
> The problem
> The old model miscompiles on the following example:
> 
> A *a = new A;
> a->foo(); 
> A *b = new(a) B;
> if (a == b)
>   b->foo(); // This call could be devirtualized to A::foo()
> 
> The problem is that GVN and other pass replaces the SSA value of b with a
based on the a==b comparison, which is a totally legal optimization in LLVM and
we surely still want to do it. In C++ however, If you replaced b with a inside
the if statement’s body, then you would introduce UB. The problem arises because
after changing %b to %a in LLVM, the load from virtual pointer can be
devirtualized to different type.
> 
> %vtable_a = load %a, !invariant.group
> ;;;
> ; %a == %b
> %bool = icmp %a, %b
> br %bool, %if, %after
> if:
> ; if this will be changed to %a, then the optimizer will be able to replace
> ; %vtable_b with %vtable_a
> %vtable_b = load %b, !invariant.group
> 
> For other corner cases check out appendix.
> Solution
> The proposed solution is to model pointers to dynamic types as “fat
pointers”. We can think of them as pointers that also store the current dynamic
type. Virtual calls would consist of virtual pointer load from fat pointer. We
will still use invariant.groups for the devirtualization. The difference is that
accessing class field, or comparing pointers to dynamic objects would require a
call to a new intrinsic - i8* llvm.strip.invariant.group(i8* ) to firstly get
pointer without information about dynamic type. Creation (or reloading) of fat
pointer would be done by call to new intrinsic - i8*
llvm.launder.invariant.group(i8*) that replaces llvm.invariant.group.barrier.
> 
> The pointer comparison will now not be an issue, because the optimizer will
not be able to replace the operand of a virtual pointer load with another
pointer:
> 
> 
> %vtable_a = load i8* %a, !invariant.group !{}
> ;;;
> ; %a == %b
> %addr_a = call i8* @llvm.strip.invariant.group(i8* %a)
> %addr_b = call i8* @llvm.strip.invariant.group(i8* %b)
> %bool = icmp %addr_a, %addr_b
> br %bool, %if, %after
> if:
> ; This will not be able to change %b to %a
> %vtable_b = load i8* %b, !invariant.group !{}
> 
> It is important to notice that vtable load of %b cannot be replaced with a
load of %addr_b Although the optimizer could potentially figure out that %b and
%addr_b are aliasing, the aliasing rules are not strong enough to make this
transformation valid. One counterexample could be that although both pointers
are pointing to the same memory, one could be a pointer to mmap’ed memory with
no write/load permission.
> The Solution formalized
> Formally, to virtually mark pointers as optionally belonging to invariant
groups subject to the following rules:
> alloca and library malloc-like functions return pointers belonging to fresh
invariant groups
> belonging to an invariant group is preserved by bitcasts
> an intrinsic, i8* llvm.strip.invariant.group(i8*) returns its argument with
the invariant group virtual metadata stripped
> an intrinsic, i8* llvm.launder.invariant.group(i8*) returns its argument
with a fresh invariant group virtual metadata, i.e. it starts a new invariant
group. This is similar to C++'s std::launder.
> every load and store marked !invariant.group from/to pointers belonging to
the same invariant group must load/store the same value
> the behaviour of a load or store marked !invariant.group from/to pointers
not belonging to any invariant group (e.g. obtained from
llvm.strip.invariant.group) is undefined
> constructors may assume the this pointer passed to them belongs to a fresh
invariant group
> 
> From those rules one easily gets:
> llvm.strip.invariant.group is a pure function, i.e. its value depends only
on its argument. Its llvm attributes include at least: readnone speculatable
nounwind. Its results must alias its argument.
> llvm.launder.invariant.group is not a pure function: it creates a fresh
invariant group each time it's called. One may mentally model this as
getting a fresh invariant group identifier somewhere from a 'magic'
memory inaccessible to no-one else, so its llvm attributes include at least:
inaccessiblememonly speculatable nounwind. Its results must alias its argument.
> strip({strip,launder}(X)) = strip(X). This is because we do not care which
particular invariant group metadata is stripped.
> launder({strip,launder}(X)) may be replaced with launder(X) by the
optimiser. Note that this not mean that launder(launder(X)) = launder(X)in the
IR itself, but only on a metalanguage level: launder is inherently
nondeterministic, so no two invocations of it ever return the same value.
> 
> This allows to compile the following C++ constructs:
> vtable loads required to perform virtual calls are marked with
!invariant.group, as before
> constructors of derived classes need to launder the this pointer before
passing it to the constructors of base classes: they may subsequently operate on
the original this pointer
> likewise, destructors of derived classes need to launder the this pointer
before passing it to the destructors of base classes
> placement new and std::launder need to call launder
> accesses to union members need to call launder, because the active member
of the union may have changed since the last visible access
> whenever two pointers are to be compared, they must be stripped first,
because we want the comparison to provide information about the address
equality, and not about invariant group equality
> likewise, whenever a pointer is to be cast to an integer type, it must be
stripped first
> whenever an integer is cast to a pointer type, the result must be laundered
before it is used, to prevent reasoning about the equality of integers to
provide any information about equality of invariant groups
> 
> Note that adding calls to strip and launder related to pointer comparisons
and integer<->pointer conversions will not cause any semantic information
to be lost: if any piece of information could be inferred by the optimiser about
some collection of variables (e.g. that two pointers are equal) can be inferred
now about their stripped versions, no matter how many strip and launder calls
have been made to obtain them in the IR. As an example, the C++ expression ptr
== std::launder(ptr) will be optimised to true, because it will compare
strip(ptr) with strip(launder(ptr)), which are indeed equal according to our
rules.
This proposal sounds great, even if it still doesn't solve some of the
problems I personally need to solve with invariant loads. :)

I take it that the actual devirtualization here is ultimately still done by
forwarding a visible store of the v-table pointer to an invariant load, just by
noticing that they occur to the same laundered pointer and therefore must
involve the same value.  There's no way of saying "I know what the
value of the v-table pointer is even if you can't see a store" when
creating a laundered pointer.  For example, in Swift we have constructor
functions that are known to return a complete object of a specific type, even if
we can't necessarily see the implementation of that function; there's no
way for us to say anything about that function pointer> LLVM
> 
> 
> Because LTO between a module with and without devirtualization will be
invalid, we will need to break LLVM level ABI. This is however already
implemented, because LTO between modules with invariant.group.barriers and
without is also invalid. This also means that if we don’t want to break ABI
between modules with and without optimizations, we will need to have
invariant.barriers and fatpointer.create/strip turned on all the time.  For the
users it will means that when switching to new compiler, they will have to
recompile all of the generated object files for LTO builds.
Is there really no way to have this degrade more gracefully?  I continue to be
very concerned about frontend interworking here, either between different
versions of a single frontend (e.g. clang 6 vs. clang 8), or between different
invocations of a single frontend with different language options set (e.g. clang
vs. clang++), or even between different frontends that produce IR that gets
linked together (e.g. clang vs. swift).

How about this approach:
  - Instead of taking a meaningless !{} argument, invariant.group takes a string
argument which identifies a metadata-dependent optimization.  In your case, it
would be something like !"clang.cxx_devirtualization".
  - Functions have a "supported optimizations" list which declares all
the metadata-reliant optimizations they promise to have correct metadata for. 
So e.g. clang++ would list "clang.cxx_devirtualization" on every
single function it compiled, regardless of whether that function actually needed
any metadata.  I'm pretty sure metadata are optimized so that identical
lists of options like this don't take up more space just because they're
added to every single function in the module.
 - Interprocedural optimizations — which mostly means inlining — are required to
be aware of the supported-optimizations list.  The inliner would intersect the
supported-optimizations lists and then strip metadata/intrinsics that don't
belong anymore.

But the idea that every single metadata-dependent optimization is going to
create a new "IR ABI break" just seems unacceptable to me.  Compiler
optimization IRs are not stable things; compiler engineers constantly find new
things that they want to express.

John.> Clang
> Clang will require a  couple of minor changes in CodeGen for constructors
and destructors.
> 
> 
> Acknowledgment
> Special thanks for the initiators of this idea - Richard Smith, Chandler
Carruth and Sanjoy Das and also for all the other people who took a part in
helping with devirtualization v1 including: Daniel Berlin, Reid Kleckner, David
Majnemer, John McCall.
> 
> References
> [0] - Devirtualization in LLVM, Proceeding SPLASH Companion 2017
Proceedings Companion of the 2017 ACM SIGPLAN International Conference on
Systems, Programming, Languages, and Applications: Software for Humanity -
https://dl.acm.org/citation.cfm?id=3135947
<https://dl.acm.org/citation.cfm?id=3135947>
> [1] - RFC: Devirtualization in LLVM (ver 1) -
https://docs.google.com/document/d/1f2SGa4TIPuBGm6y6YO768GrQsA8awNfGEJSBFukLhYA/edit?usp=sharing
<https://docs.google.com/document/d/1f2SGa4TIPuBGm6y6YO768GrQsA8awNfGEJSBFukLhYA/edit?usp=sharing>
> [2] - Devirtualization in LLVM - LLVM Dev meeting 2016 -
https://www.youtube.com/watch?v=qMhV6d3B1Vk
<https://www.youtube.com/watch?v=qMhV6d3B1Vk>
> [3] - Devirtualization in LLVM and Clang - llvm blog-
http://blog.llvm.org/2017/03/devirtualization-in-llvm-and-clang.html
<http://blog.llvm.org/2017/03/devirtualization-in-llvm-and-clang.html>
> 
> 
> Appendix
> Here are some other corner cases that we found.
> Unions
> Because union can have dynamic types as members, this means that they would
have the same address. To solve this we emit launder.invariant.group before
getting any dynamic type. It is possible that we could somehow avoid it, but
because virtually no one is using unions this way, this should not cause any
problems.
> 
> struct A {
>   virtual void foo();
> };
> 
> struct B : A {
>  void foo() override;
> };
> 
> union U {
>    A a;
>    B b;
>    U() : b() {}
> };
> 
> void do_call(A &a) {
>    a.foo();
> }
> 
> __attribute__((noinline)) void init_B(U &u) {
>    new(&u.b) B;
> }
> 
> void union_test(U &u) {
>    do_call(u.b);
>    new(&u.a) A;
>    do_call(u.a);
> 
>    init_B(u);
>    do_call(u.b);
> 
> }
> 
> int main() {
>    U u;
>    union_test(u);
> }
> Ptr to int
> In this example, instead of comparing the pointers we compare the addresses
stored in integers. Because ptrtoint conversion will also require call to
strip.invariant.group, this will also work.
> 
> 
> void ptr_to_int() {
>    A *a = new A;
>    a->foo();
>    A *b = new(a) B;
>    auto av = (uintptr_t)a;
>    auto bv = (uintptr_t)b;
>    if (av == bv)
>        b->foo();
> }
> 
> 
> int to ptr
> Here because we are creating a fat pointer from integer, we need to use
launder:
> 
> void foo(Base *base) {
>   uintptr_t base_int = (uintptr_t)base;
>   /* base_int = ptrtoint (strip base) */
>   Base *base_int_ptr = (Base*)base_int;
>   /* base_int_ptr = inttoptr base_int = inttoptr (ptrtoint (strip base)) =
strip base */
>   base_int_ptr->vfun();
>   ...
>   Derived *derived = std::launder(base);
>   uintptr_t derived_int = (uintptr_t)derived;
>   /* derived_int = ptrtoint (strip derived) */
>   /* Note: base_int == derived_int */
>   ....
>   Base *derived_int_ptr = (Base*)derived_int;
>   /* derived_int_ptr = inttoptr (ptrtoint (strip derived)) */
>   /* Note: base_int_ptr == derived_int_ptr */
>   derived_int_ptr->vfun();
>   /* BUG: But it's different than base_int_ptr->vfun()! */
> }
> 
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180328/eb1a04ff/attachment-0001.html>

Piotr Padlewski via llvm-dev

2018-Mar-29 13:12 UTC

head link

[llvm-dev] [cfe-dev] RFC: Devirtualization v2

Hi John,

2018-03-28 23:23 GMT+02:00 John McCall <rjmccall at apple.com>:
>
>
> On Mar 19, 2018, at 7:27 PM, Piotr Padlewski via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> Hi folks,
>
> here is a link to the proposal that we've been working on lately:
> https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8am
> ICNBtvjWUod2SujZVEo/edit?usp=sharing
>
> But for the record, I also paste it here. Feedback will be really
> appreciated!
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *RFC: C++ Devirtualization v2Piotr Padlewski - piotr.padlewski at gmail.com
> <piotr.padlewski at gmail.com>Krzysztof Pszeniczny -
> krzysztof.pszeniczny at gmail.com <krzysztof.pszeniczny at
gmail.com>Jakub
> Kuderski - kubakuderski at gmail.com <kubakuderski at
gmail.com>Richard Smith -
> RichardSmith at google.com <RichardSmith at google.com>This proposal
describes a
> new model of representing pointers to dynamic objects as “fat pointers”. It
> is designed to solve the hole in the previous devirtualization model that
> could cause miscompilation. We believe that solving this is very important,
> especially with the mitigation of the recent Spectre vulnerability -
> retpolines.Introduction to previous devirtualizationIn the previous model
> we introduced invariant.group to LLVM as a way of saying “this load will
> produce the same value if given the same argument”, which was  applied on
> loads from virtual pointer to communicate that virtual pointer (or in other
> words, dynamic type) does not change during the lifetime of an object.
> Because virtual pointer might be set multiple times during construction and
> destruction of derived types, and because it is possible to change the
> dynamic type of an object using placement new, invariant.group.barrier was
> introduced. It is used in every place where the dynamic type might change.
> To turn devirtualization on user had to specify the
> -fstrict-vtable-pointers flag for clang. If you want to learn more about
> the previous model, check out [0][1][2][3]. Although the previous model is
> not sound for C++, languages like Java or Scala can safely use it. This is
> because virtual pointer in Java is set only once for every dynamic type,
> and you can’t do things like placement new, which means 2 aliasing pointers
> will always have the same dynamic type (also because every pointer still
> points to a valid object).The problemThe old model miscompiles on the
> following example:A *a = new A;a->foo(); A *b = new(a) B;if (a == b)
>  b->foo(); // This call could be devirtualized to A::foo()The problem is
> that GVN and other pass replaces the SSA value of b with a based on the
> a==b comparison, which is a totally legal optimization in LLVM and we
> surely still want to do it. In C++ however, If you replaced b with a inside
> the if statement’s body, then you would introduce UB. The problem arises
> because after changing %b to %a in LLVM, the load from virtual pointer can
> be devirtualized to different type. %vtable_a = load %a,
> !invariant.group;;;; %a == %b%bool = icmp %a, %bbr %bool, %if, %afterif:;
> if this will be changed to %a, then the optimizer will be able to replace ;
> %vtable_b with %vtable_a%vtable_b = load %b, !invariant.groupFor other
> corner cases check out appendix.SolutionThe proposed solution is to model
> pointers to dynamic types as “fat pointers”. We can think of them as
> pointers that also store the current dynamic type. Virtual calls would
> consist of virtual pointer load from fat pointer. We will still use
> invariant.groups for the devirtualization. The difference is that accessing
> class field, or comparing pointers to dynamic objects would require a call
> to a new intrinsic - i8* llvm.strip.invariant.group(i8* ) to firstly get
> pointer without information about dynamic type. Creation (or reloading) of
> fat pointer would be done by call to new intrinsic - i8*
> llvm.launder.invariant.group(i8*) that replaces
> llvm.invariant.group.barrier.The pointer comparison will now not be an
> issue, because the optimizer will not be able to replace the operand of a
> virtual pointer load with another pointer:%vtable_a = load i8* %a,
> !invariant.group !{};;;; %a == %b%addr_a = call i8*
> @llvm.strip.invariant.group(i8* %a)%addr_b = call i8*
> @llvm.strip.invariant.group(i8* %b)%bool = icmp %addr_a, %addr_bbr %bool,
> %if, %afterif:; This will not be able to change %b to %a%vtable_b = load
> i8* %b, !invariant.group !{}It is important to notice that vtable load of
> %b cannot be replaced with a load of %addr_b Although the optimizer could
> potentially figure out that %b and %addr_b are aliasing, the aliasing rules
> are not strong enough to make this transformation valid. One counterexample
> could be that although both pointers are pointing to the same memory, one
> could be a pointer to mmap’ed memory with no write/load permission.The
> Solution formalizedFormally, to virtually mark pointers as optionally
> belonging to invariant groups subject to the following rules: - alloca and
> library malloc-like functions return pointers belonging to fresh invariant
> groups- belonging to an invariant group is preserved by bitcasts- an
> intrinsic, i8* llvm.strip.invariant.group(i8*) returns its argument with
> the invariant group virtual metadata stripped- an intrinsic, i8*
> llvm.launder.invariant.group(i8*) returns its argument with a fresh
> invariant group virtual metadata, i.e. it starts a new invariant group.
> This is similar to C++'s std::launder.- every load and store marked
> !invariant.group from/to pointers belonging to the same invariant group
> must load/store the same value- the behaviour of a load or store marked
> !invariant.group from/to pointers not belonging to any invariant group
> (e.g. obtained from llvm.strip.invariant.group) is undefined- constructors
> may assume the this pointer passed to them belongs to a fresh invariant
> groupFrom those rules one easily gets: - llvm.strip.invariant.group is a
> pure function, i.e. its value depends only on its argument. Its llvm
> attributes include at least: readnone speculatable nounwind. Its results
> must alias its argument.- llvm.launder.invariant.group is not a pure
> function: it creates a fresh invariant group each time it's called. One
may
> mentally model this as getting a fresh invariant group identifier somewhere
> from a 'magic' memory inaccessible to no-one else, so its llvm
attributes
> include at least: inaccessiblememonly speculatable nounwind. Its results
> must alias its argument.- strip({strip,launder}(X)) = strip(X). This is
> because we do not care which particular invariant group metadata is
> stripped.- launder({strip,launder}(X)) may be replaced with launder(X) by
> the optimiser. Note that this not mean that launder(launder(X)) >
launder(X)in the IR itself, but only on a metalanguage level: launder is
> inherently nondeterministic, so no two invocations of it ever return the
> same value.This allows to compile the following C++ constructs: - vtable
> loads required to perform virtual calls are marked with !invariant.group,
> as before- constructors of derived classes need to launder the this pointer
> before passing it to the constructors of base classes: they may
> subsequently operate on the original this pointer- likewise, destructors of
> derived classes need to launder the this pointer before passing it to the
> destructors of base classes- placement new and std::launder need to call
> launder- accesses to union members need to call launder, because the active
> member of the union may have changed since the last visible access-
> whenever two pointers are to be compared, they must be stripped first,
> because we want the comparison to provide information about the address
> equality, and not about invariant group equality- likewise, whenever a
> pointer is to be cast to an integer type, it must be stripped first-
> whenever an integer is cast to a pointer type, the result must be laundered
> before it is used, to prevent reasoning about the equality of integers to
> provide any information about equality of invariant groupsNote that adding
> calls to strip and launder related to pointer comparisons and
> integer<->pointer conversions will not cause any semantic information
to be
> lost: if any piece of information could be inferred by the optimiser about
> some collection of variables (e.g. that two pointers are equal) can be
> inferred now about their stripped versions, no matter how many strip and
> launder calls have been made to obtain them in the IR. As an example, the
> C++ expression ptr == std::launder(ptr) will be optimised to true, because
> it will compare strip(ptr) with strip(launder(ptr)), which are indeed equal
> according to our rules.*
>
>
> This proposal sounds great, even if it still doesn't solve some of the
> problems I personally need to solve with invariant loads. :)
>
> I take it that the actual devirtualization here is ultimately still done
> by forwarding a visible store of the v-table pointer to an invariant load,
> just by noticing that they occur to the same laundered pointer and
> therefore must involve the same value.  There's no way of saying
"I know
> what the value of the v-table pointer is even if you can't see a
store"
> when creating a laundered pointer.  For example, in Swift we have
> constructor functions that are known to return a complete object of a
> specific type, even if we can't necessarily see the implementation of
that
> function; there's no way for us to say anything about that function
pointer
>
> I think we have already solved that problem with calls to llvm.assumeintrinsic. After calling the constructor, we load virtual pointer (with
invariant group) and compare it with the vtable it should point to and then
pass it to the assume.

  call void @_ZN1AC1Ev(%struct.A* %a) ; call ctor
  %3 = load {...} %a                  ; Load vptr
  %4 = icmp eq %3, @_ZTV1A      ; compare vptr with vtable
  call void @llvm.assume(i1 %4)

(from http://blog.llvm.org/2017/03/devirtualization-in-llvm-and-clang.html )

If I understand it correctly, you should be able to use the same technique
for the constructor-like functions in Swift :)
>
>
> *LLVMBecause LTO between a module with and without devirtualization will
> be invalid, we will need to break LLVM level ABI. This is however already
> implemented, because LTO between modules with invariant.group.barriers and
> without is also invalid. This also means that if we don’t want to break ABI
> between modules with and without optimizations, we will need to have
> invariant.barriers and fatpointer.create/strip turned on all the time.  For
> the users it will means that when switching to new compiler, they will have
> to recompile all of the generated object files for LTO builds.*
>
>
> Is there really no way to have this degrade more gracefully?  I continue
> to be very concerned about frontend interworking here, either between
> different versions of a single frontend (e.g. clang 6 vs. clang 8), or
> between different invocations of a single frontend with different language
> options set (e.g. clang vs. clang++), or even between different frontends
> that produce IR that gets linked together (e.g. clang vs. swift).
>
> How about this approach:
>   - Instead of taking a meaningless !{} argument, invariant.group takes a
> string argument which identifies a metadata-dependent optimization.  In
> your case, it would be something like
!"clang.cxx_devirtualization".
>   - Functions have a "supported optimizations" list which
declares all the
> metadata-reliant optimizations they promise to have correct metadata for.
> So e.g. clang++ would list "clang.cxx_devirtualization" on every
single
> function it compiled, regardless of whether that function actually needed
> any metadata.  I'm pretty sure metadata are optimized so that identical
> lists of options like this don't take up more space just because
they're
> added to every single function in the module.
>  - Interprocedural optimizations — which mostly means inlining — are
> required to be aware of the supported-optimizations list.  The inliner
> would intersect the supported-optimizations lists and then strip
> metadata/intrinsics that don't belong anymore.
>
> But the idea that every single metadata-dependent optimization is going to
> create a new "IR ABI break" just seems unacceptable to me. 
Compiler
> optimization IRs are not stable things; compiler engineers constantly find
> new things that they want to express.
>
> John.
>
I haven't thought about LTO between different languages, thanks for
bringing that!
Can you actually use C++ objects without going through C interface? If it
is possible, then that is heavy.
To clarify how it works right now - if you would do LTO between IR compiled
with -fstrict-vtable-pointers and without, then the linker would throw an
error. I can see it right now, that it pretty much stops you from doing any
LTO between different languages.
The other idea that we had, was to actually strip all the invariant.groups
when linking with module that does not have them. This, opposed to the
first idea would let us link whatever we want, but we could silently loose
some optimizations.

I like the idea that you proposed - it is somewhere between these two
ideas, as you limit the potential loses to only some functions and in the
same time you can link whaterver IR you like.
However, if you agree that the option 2 - stripping invariant.groups from
whole modules - addresses all of your concerns, then I would propose to
firstly go with this idea and then optimize it if we would find a problem
with it.
I feel that it might be an overkill to implement it on the first go,
especially that we are not even in the point of thinking about turing
-fstrict-vtable-pointers on by default.

What do you think about that?

Piotr

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *ClangClang will require a  couple of minor changes in CodeGen for
> constructors and destructors.AcknowledgmentSpecial thanks for the
> initiators of this idea - Richard Smith, Chandler Carruth and Sanjoy Das
> and also for all the other people who took a part in helping with
> devirtualization v1 including: Daniel Berlin, Reid Kleckner, David
> Majnemer, John McCall.References[0] - Devirtualization in LLVM, Proceeding
> SPLASH Companion 2017 Proceedings Companion of the 2017 ACM SIGPLAN
> International Conference on Systems, Programming, Languages, and
> Applications: Software for Humanity -
> https://dl.acm.org/citation.cfm?id=3135947
> <https://dl.acm.org/citation.cfm?id=3135947>[1] - RFC:
Devirtualization in
> LLVM (ver 1) -
>
https://docs.google.com/document/d/1f2SGa4TIPuBGm6y6YO768GrQsA8awNfGEJSBFukLhYA/edit?usp=sharing
>
<https://docs.google.com/document/d/1f2SGa4TIPuBGm6y6YO768GrQsA8awNfGEJSBFukLhYA/edit?usp=sharing>[2]
> - Devirtualization in LLVM - LLVM Dev meeting 2016 -
> https://www.youtube.com/watch?v=qMhV6d3B1Vk
> <https://www.youtube.com/watch?v=qMhV6d3B1Vk>[3] - Devirtualization
in LLVM
> and Clang - llvm blog-
> http://blog.llvm.org/2017/03/devirtualization-in-llvm-and-clang.html
>
<http://blog.llvm.org/2017/03/devirtualization-in-llvm-and-clang.html>AppendixHere
> are some other corner cases that we found.UnionsBecause union can have
> dynamic types as members, this means that they would have the same address.
> To solve this we emit launder.invariant.group before getting any dynamic
> type. It is possible that we could somehow avoid it, but because virtually
> no one is using unions this way, this should not cause any problems.struct
> A {  virtual void foo();};struct B : A { void foo() override;};union U {
>   A a;   B b;   U() : b() {}};void do_call(A &a) {
>   a.foo();}__attribute__((noinline)) void init_B(U &u) {  
new(&u.b)
> B;}void union_test(U &u) {   do_call(u.b);   new(&u.a) A;  
do_call(u.a);
>   init_B(u);   do_call(u.b);}int main() {   U u;   union_test(u);}Ptr to
> intIn this example, instead of comparing the pointers we compare the
> addresses stored in integers. Because ptrtoint conversion will also require
> call to strip.invariant.group, this will also work.void ptr_to_int() {   A
> *a = new A;   a->foo();   A *b = new(a) B;   auto av = (uintptr_t)a;  
auto
> bv = (uintptr_t)b;   if (av == bv)       b->foo();}int to ptrHere
because
> we are creating a fat pointer from integer, we need to use launder:void
> foo(Base *base) {  uintptr_t base_int = (uintptr_t)base;  /* base_int >
ptrtoint (strip base) */  Base *base_int_ptr = (Base*)base_int;  /*
> base_int_ptr = inttoptr base_int = inttoptr (ptrtoint (strip base)) = strip
> base */  base_int_ptr->vfun();  ...  Derived *derived =
std::launder(base);
>  uintptr_t derived_int = (uintptr_t)derived;  /* derived_int = ptrtoint
> (strip derived) */  /* Note: base_int == derived_int */  ....  Base
> *derived_int_ptr = (Base*)derived_int;  /* derived_int_ptr = inttoptr
> (ptrtoint (strip derived)) */  /* Note: base_int_ptr == derived_int_ptr */
>  derived_int_ptr->vfun();  /* BUG: But it's different than
> base_int_ptr->vfun()! */}*
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180329/8e7ddd82/attachment-0001.html>

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Mar 2018 - RFC: Devirtualization v2

[llvm-dev] RFC: Devirtualization v2

[llvm-dev] [cfe-dev] RFC: Devirtualization v2

[llvm-dev] [cfe-dev] RFC: Devirtualization v2

[llvm-dev] [cfe-dev] RFC: Devirtualization v2

[llvm-dev] [cfe-dev] RFC: Devirtualization v2

Apparently Analagous Threads