thr3ads.net - llvm dev - [llvm-dev] RFC [ThinLTO]: An embedded summary encoding to support CFI and vtable opt [Jun 2016]

If this information is useful, please help other people find it:
Share via:

Peter Collingbourne via llvm-dev

2016-May-04 22:02 UTC

[llvm-dev] RFC [ThinLTO]: An embedded summary encoding to support CFI and vtable opt

Hi all,

I wanted to make this proposal to extend ThinLTO to allow a bitcode module
to embed another bitcode module containing summary information. The purpose
of doing so is to support CFI and  whole-program devirtualization
optimizations under ThinLTO.

Overview

The CFI and whole-program devirtualization optimizations work by
transforming vtables according to the class hierarchy. For example, if a
class A has two derived classes B and C, CFI will lay out the vtables for
A, B and C consecutively, so that clients can check that a vtable refers to
a derived class of A by performing arithmetic on the virtual function
pointer. For more details, see [1].

Both CFI and vtable opt rely on bitset metadata [2] in order to know where
the address points for the vtables are located. This is currently encoded
using module-level metadata.

In order to lay out the vtables correctly, all vtables need to be visible
at once. This is the only part of the process that requires full LTO. The
rest of the process can just rely on a set of summary metadata that
contains information about how to perform CFI checks for a particular
class, or how to devirtualize a particular virtual call. This information
could be made part of the ThinLTO summary.

Implementation

The idea is to allow bitcode to contain embedded summary blobs. For
example, in our scenario, the summary bitcode would contain a section with
an embedded blob consisting of a bitcode file containing definitions of the
vtables defined by that translation unit and the bitset metadata for CFI
and vtable opt, and the "top-level" bitcode would contain everything
else.

The mechanism for merging summaries would be to link the embedded summary
bitcode files into a single module using the IRMover, with a mechanism very
similar to regular LTO. This would move all the necessary vtables and
metadata into a single module where they can be processed using the
existing LowerBitSets and WholeProgramDevirt passes, which would be
extended to export summary metadata. This summary metadata would be copied
into the regular summary information, where it can be used by individual
ThinLTO backends.

In the future, we could also consider representing importing summaries as
metadata. That would also make the summary loading process very
straightforward.

Alternatives

1) We could use a native object file, with one section named ".llvmbc"
containing the summary module with the vtables and CFI metadata, and
another section ".llvmbc.thin" containing "everything else".
This would be
my preferred option, as it would make things even simpler. For example, the
linker could handle the top-level sections as it reads them, and it would
allow the individual sections to be extracted (e.g. using objcopy) and
inspected by normal tools, such as llvm-as and llvm-dis. The native object
format could also be the container for native code; see my earlier proposal
[3].

The implementation in lld is very simple (about 10 lines in my prototype),
but I can accept that it may be more difficult in other linkers, so those
linkers may want to use bitcode as the top-level format. In that case, we
would probably want to go with what I described in "Implementation".

2) We could emit the vtables and CFI metadata directly into the top-level
bitcode. However, this would create a need for a mechanism to distinguish
vtables from non-vtables for when we link the LTO parts of the module. In
order to do this, we could add a new bitcode record type for bitset
metadata that could also act as an index for vtables in a similar way to
how ThinLTO importing summaries already work. However, this would add even
more complexity to the bitcode format, when I feel that we should really be
going the other way with a simpler bitcode format.

Thanks,
-- 
-- 
Peter

[1] http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html
[2] http://llvm.org/docs/BitSets.html
[3] http://lists.llvm.org/pipermail/llvm-dev/2016-April/098081.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160504/e6e6d5f0/attachment.html>

Xinliang David Li via llvm-dev

2016-May-05 01:19 UTC

head link

[llvm-dev] RFC [ThinLTO]: An embedded summary encoding to support CFI and vtable opt

On Wed, May 4, 2016 at 3:02 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> Hi all,
>
> I wanted to make this proposal to extend ThinLTO to allow a bitcode module
> to embed another bitcode module containing summary information. The purpose
> of doing so is to support CFI and  whole-program devirtualization
> optimizations under ThinLTO.
>
> Overview
>
> The CFI and whole-program devirtualization optimizations work by
> transforming vtables according to the class hierarchy. For example, if a
> class A has two derived classes B and C, CFI will lay out the vtables for
> A, B and C consecutively, so that clients can check that a vtable refers to
> a derived class of A by performing arithmetic on the virtual function
> pointer. For more details, see [1].
>
> Both CFI and vtable opt rely on bitset metadata [2] in order to know where
> the address points for the vtables are located. This is currently encoded
> using module-level metadata.
>
> In order to lay out the vtables correctly, all vtables need to be visible
> at once. This is the only part of the process that requires full LTO. The
> rest of the process can just rely on a set of summary metadata that
> contains information about how to perform CFI checks for a particular
> class, or how to devirtualize a particular virtual call. This information
> could be made part of the ThinLTO summary.
>
> Implementation
>
> The idea is to allow bitcode to contain embedded summary blobs. For
> example, in our scenario, the summary bitcode would contain a section with
> an embedded blob consisting of a bitcode file containing definitions of the
> vtables defined by that translation unit and the bitset metadata for CFI
> and vtable opt, and the "top-level" bitcode would contain
everything else.
>
> The mechanism for merging summaries would be to link the embedded summary
> bitcode files into a single module using the IRMover, with a mechanism very
> similar to regular LTO. This would move all the necessary vtables and
> metadata into a single module where they can be processed using the
> existing LowerBitSets and WholeProgramDevirt passes, which would be
> extended to export summary metadata. This summary metadata would be copied
> into the regular summary information, where it can be used by individual
> ThinLTO backends.
>
> In the future, we could also consider representing importing summaries as
> metadata. That would also make the summary loading process very
> straightforward.
>
> Alternatives
>
> 1) We could use a native object file, with one section named
".llvmbc"
> containing the summary module with the vtables and CFI metadata, and
> another section ".llvmbc.thin" containing "everything
else". This would be
> my preferred option, as it would make things even simpler. For example, the
> linker could handle the top-level sections as it reads them, and it would
> allow the individual sections to be extracted (e.g. using objcopy) and
> inspected by normal tools, such as llvm-as and llvm-dis. The native object
> format could also be the container for native code; see my earlier proposal
> [3].
>
> The implementation in lld is very simple (about 10 lines in my prototype),
> but I can accept that it may be more difficult in other linkers, so those
> linkers may want to use bitcode as the top-level format. In that case, we
> would probably want to go with what I described in
"Implementation".
>

Using the native object format for ThinLTO was the originally proposed
(flexibility with binutil tools etc).  We have not revisited the issue ever
since. As ThinLTO gets more mature and future use cases, maybe it is time
to revisit this (more experience gathered).

David

>
> 2) We could emit the vtables and CFI metadata directly into the top-level
> bitcode. However, this would create a need for a mechanism to distinguish
> vtables from non-vtables for when we link the LTO parts of the module. In
> order to do this, we could add a new bitcode record type for bitset
> metadata that could also act as an index for vtables in a similar way to
> how ThinLTO importing summaries already work. However, this would add even
> more complexity to the bitcode format, when I feel that we should really be
> going the other way with a simpler bitcode format.
>
> Thanks,
> --
> --
> Peter
>
> [1] http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html
> [2] http://llvm.org/docs/BitSets.html
> [3] http://lists.llvm.org/pipermail/llvm-dev/2016-April/098081.html
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160504/684be51f/attachment.html>

Petr Pavlu via llvm-dev

2016-May-13 07:37 UTC

head link

[llvm-dev] RFC [ThinLTO]: An embedded summary encoding to support CFI and vtable opt

On 05/05/16 02:19, Xinliang David Li via llvm-dev wrote:> On Wed, May 4, 2016 at 3:02 PM, Peter Collingbourne <peter at pcc.me.uk
<mailto:peter at pcc.me.uk>> wrote:
>
>     Hi all,
>
>     I wanted to make this proposal to extend ThinLTO to allow a bitcode
module to embed another bitcode module containing summary information. The
purpose of doing so is to support CFI and  whole-program devirtualization
optimizations under ThinLTO.
>
>     Overview
>
>     The CFI and whole-program devirtualization optimizations work by
transforming vtables according to the class hierarchy. For example, if a class A
has two derived classes B and C, CFI will lay out the vtables for A, B and C
consecutively, so that clients can check that a vtable refers to a derived class
of A by performing arithmetic on the virtual function pointer. For more details,
see [1].
>
>     Both CFI and vtable opt rely on bitset metadata [2] in order to know
where the address points for the vtables are located. This is currently encoded
using module-level metadata.
>
>     In order to lay out the vtables correctly, all vtables need to be
visible at once. This is the only part of the process that requires full LTO.
The rest of the process can just rely on a set of summary metadata that contains
information about how to perform CFI checks for a particular class, or how to
devirtualize a particular virtual call. This information could be made part of
the ThinLTO summary.
>
>     Implementation
>
>     The idea is to allow bitcode to contain embedded summary blobs. For
example, in our scenario, the summary bitcode would contain a section with an
embedded blob consisting of a bitcode file containing definitions of the vtables
defined by that translation unit and the bitset metadata for CFI and vtable opt,
and the "top-level" bitcode would contain everything else.
>
>     The mechanism for merging summaries would be to link the embedded
summary bitcode files into a single module using the IRMover, with a mechanism
very similar to regular LTO. This would move all the necessary vtables and
metadata into a single module where they can be processed using the existing
LowerBitSets and WholeProgramDevirt passes, which would be extended to export
summary metadata. This summary metadata would be copied into the regular summary
information, where it can be used by individual ThinLTO backends.
>
>     In the future, we could also consider representing importing summaries
as metadata. That would also make the summary loading process very
straightforward.
>
>     Alternatives
>
>     1) We could use a native object file, with one section named
".llvmbc" containing the summary module with the vtables and CFI
metadata, and another section ".llvmbc.thin" containing
"everything else". This would be my preferred option, as it would make
things even simpler. For example, the linker could handle the top-level sections
as it reads them, and it would allow the individual sections to be extracted
(e.g. using objcopy) and inspected by normal tools, such as llvm-as and
llvm-dis. The native object format could also be the container for native code;
see my earlier proposal [3].
>
>     The implementation in lld is very simple (about 10 lines in my
prototype), but I can accept that it may be more difficult in other linkers, so
those linkers may want to use bitcode as the top-level format. In that case, we
would probably want to go with what I described in "Implementation".
>
>
>
> Using the native object format for ThinLTO was the originally proposed
(flexibility with binutil tools etc).  We have not revisited the issue ever
since. As ThinLTO gets more mature and future use cases, maybe it is time to
revisit this (more experience gathered).
I would also like to voice an interest to have an option to
generate LTO bitcode that is wrapped in a native object file in
the .llvmbc section (even for "full" LTO). The reason is same as
described on the LLVM Bitcode File Format page [1]. In my case,
the interesting metadata that should be carried with the file are
ARM build attributes. This is useful for the linker I work on
(the ARM Compiler linker) as it allows it to select appropriate
libraries by consulting these attributes. The linker needs this
information in Phase 2 of the LTO process [2] before it starts
searching the libraries but the LTO codegen provides the
attributes only in Phase 3 in which the final object file is
produced.

For my purposes, I have implemented this wrapped format in clang
in a similar way it is done by llgo. At the time the LTO bitcode
should be written out, the module is taken and written as bitcode
into a new variable in the module. The section of this variable
is set to .llvmbc and the rest of the module is cleaned up. The
normal emit phases are then run on this converted module. This is
a very simple solution but not the best one as it makes the
content of the original module opaque to the emit phases and so,
for example, the symbol table does not contain any symbols from
the original module.

[1] http://llvm.org/docs/BitCodeFormat.html#native-object-file-wrapper-format
[2]
http://llvm.org/docs/LinkTimeOptimization.html#multi-phase-communication-between-liblto-and-linker

Thanks,
Petr

Mehdi Amini via llvm-dev

2016-Jun-07 04:09 UTC

head link

[llvm-dev] RFC [ThinLTO]: An embedded summary encoding to support CFI and vtable opt

Hi,
> On May 4, 2016, at 3:02 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> 
> Hi all,
> 
> I wanted to make this proposal to extend ThinLTO to allow a bitcode module
to embed another bitcode module containing summary information. The purpose of
doing so is to support CFI and  whole-program devirtualization optimizations
under ThinLTO.
> 
> Overview
> 
> The CFI and whole-program devirtualization optimizations work by
transforming vtables according to the class hierarchy. For example, if a class A
has two derived classes B and C, CFI will lay out the vtables for A, B and C
consecutively, so that clients can check that a vtable refers to a derived class
of A by performing arithmetic on the virtual function pointer. For more details,
see [1].
> 
> Both CFI and vtable opt rely on bitset metadata [2] in order to know where
the address points for the vtables are located. This is currently encoded using
module-level metadata.
> 
> In order to lay out the vtables correctly, all vtables need to be visible
at once. This is the only part of the process that requires full LTO. The rest
of the process can just rely on a set of summary metadata that contains
information about how to perform CFI checks for a particular class, or how to
devirtualize a particular virtual call. This information could be made part of
the ThinLTO summary.
> 
> Implementation
> 
> The idea is to allow bitcode to contain embedded summary blobs. For
example, in our scenario, the summary bitcode would contain a section with an
embedded blob consisting of a bitcode file containing definitions of the vtables
defined by that translation unit and the bitset metadata for CFI and vtable opt,
and the "top-level" bitcode would contain everything else.
> 
> The mechanism for merging summaries would be to link the embedded summary
bitcode files into a single module using the IRMover, with a mechanism very
similar to regular LTO. This would move all the necessary vtables and metadata
into a single module where they can be processed using the existing LowerBitSets
and WholeProgramDevirt passes, which would be extended to export summary
metadata. This summary metadata would be copied into the regular summary
information, where it can be used by individual ThinLTO backends.
It is not clear to me how this would play with our current way of handling
ThinLTO importing. You are mentioning that the existing WholeProgramDevirt is
supposed to handled a module that would contains only the Vtables and the
metadata: it seems to me that currently it relies on seeing the call-sites.

I'd expect that we have available the devirtualization information as
"first class" in the summary-based call graph to be able to perform
the devirtualization without touching any IR and in a way that can be used to
drive accurate importing decisions.

-- 
Mehdi

> 
> In the future, we could also consider representing importing summaries as
metadata. That would also make the summary loading process very straightforward.
> 
> Alternatives
> 
> 1) We could use a native object file, with one section named
".llvmbc" containing the summary module with the vtables and CFI
metadata, and another section ".llvmbc.thin" containing
"everything else". This would be my preferred option, as it would make
things even simpler. For example, the linker could handle the top-level sections
as it reads them, and it would allow the individual sections to be extracted
(e.g. using objcopy) and inspected by normal tools, such as llvm-as and
llvm-dis. The native object format could also be the container for native code;
see my earlier proposal [3].
> 
> The implementation in lld is very simple (about 10 lines in my prototype),
but I can accept that it may be more difficult in other linkers, so those
linkers may want to use bitcode as the top-level format. In that case, we would
probably want to go with what I described in "Implementation".
> 
> 2) We could emit the vtables and CFI metadata directly into the top-level
bitcode. However, this would create a need for a mechanism to distinguish
vtables from non-vtables for when we link the LTO parts of the module. In order
to do this, we could add a new bitcode record type for bitset metadata that
could also act as an index for vtables in a similar way to how ThinLTO importing
summaries already work. However, this would add even more complexity to the
bitcode format, when I feel that we should really be going the other way with a
simpler bitcode format.
> 
> Thanks,
> -- 
> -- 
> Peter
> 
> [1] http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html
<http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html>
> [2] http://llvm.org/docs/BitSets.html
<http://llvm.org/docs/BitSets.html>
> [3] http://lists.llvm.org/pipermail/llvm-dev/2016-April/098081.html
<http://lists.llvm.org/pipermail/llvm-dev/2016-April/098081.html>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160606/b67a1287/attachment.html>

Peter Collingbourne via llvm-dev

2016-Jun-07 17:16 UTC

head link

[llvm-dev] RFC [ThinLTO]: An embedded summary encoding to support CFI and vtable opt

On Mon, Jun 6, 2016 at 9:09 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
> Hi,
>
> On May 4, 2016, at 3:02 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
>
> Hi all,
>
> I wanted to make this proposal to extend ThinLTO to allow a bitcode module
> to embed another bitcode module containing summary information. The purpose
> of doing so is to support CFI and  whole-program devirtualization
> optimizations under ThinLTO.
>
> Overview
>
> The CFI and whole-program devirtualization optimizations work by
> transforming vtables according to the class hierarchy. For example, if a
> class A has two derived classes B and C, CFI will lay out the vtables for
> A, B and C consecutively, so that clients can check that a vtable refers to
> a derived class of A by performing arithmetic on the virtual function
> pointer. For more details, see [1].
>
> Both CFI and vtable opt rely on bitset metadata [2] in order to know where
> the address points for the vtables are located. This is currently encoded
> using module-level metadata.
>
> In order to lay out the vtables correctly, all vtables need to be visible
> at once. This is the only part of the process that requires full LTO. The
> rest of the process can just rely on a set of summary metadata that
> contains information about how to perform CFI checks for a particular
> class, or how to devirtualize a particular virtual call. This information
> could be made part of the ThinLTO summary.
>
> Implementation
>
> The idea is to allow bitcode to contain embedded summary blobs. For
> example, in our scenario, the summary bitcode would contain a section with
> an embedded blob consisting of a bitcode file containing definitions of the
> vtables defined by that translation unit and the bitset metadata for CFI
> and vtable opt, and the "top-level" bitcode would contain
everything else.
>
> The mechanism for merging summaries would be to link the embedded summary
> bitcode files into a single module using the IRMover, with a mechanism very
> similar to regular LTO. This would move all the necessary vtables and
> metadata into a single module where they can be processed using the
> existing LowerBitSets and WholeProgramDevirt passes, which would be
> extended to export summary metadata. This summary metadata would be copied
> into the regular summary information, where it can be used by individual
> ThinLTO backends.
>
>
> It is not clear to me how this would play with our current way of handling
> ThinLTO importing. You are mentioning that the existing WholeProgramDevirt
> is supposed to handled a module that would contains only the Vtables and
> the metadata: it seems to me that currently it relies on seeing the
> call-sites.
>
> I'd expect that we have available the devirtualization information as
> "first class" in the summary-based call graph to be able to
perform the
> devirtualization without touching any IR and in a way that can be used to
> drive accurate importing decisions.
>
Yes, one thing I did not cover was what the summary information would look
like in the individual modules, and what it would look like in the combined
summary.

The summary information in the individual modules would be stored in the
FunctionSummary. For CFI this would consist of the set of type identifiers
that are used to check pointers at call sites, and for devirtualization it
would be the set of (type identifier, offset of virtual function from
address point) pairs used at virtual call sites. In the latter case I would
use the routines I moved into lib/Analysis/BitSetUtils.cpp (to be renamed
TypeMetadataUtils.cpp) to summarize the function.

The combined summary would look very similar, except that instead of sets
we would have maps from either identifiers or (identifier, offset) pairs to
the "resolution" for that key (e.g. for a successful single
implementation
devirtualization this would name the single possible callee). If we
successfully do single-implementation devirtualization, we would add an
edge to the call graph for the associated FunctionSummary.

Peter
>
> --
> Mehdi
>
>
>
> In the future, we could also consider representing importing summaries as
> metadata. That would also make the summary loading process very
> straightforward.
>
> Alternatives
>
> 1) We could use a native object file, with one section named
".llvmbc"
> containing the summary module with the vtables and CFI metadata, and
> another section ".llvmbc.thin" containing "everything
else". This would be
> my preferred option, as it would make things even simpler. For example, the
> linker could handle the top-level sections as it reads them, and it would
> allow the individual sections to be extracted (e.g. using objcopy) and
> inspected by normal tools, such as llvm-as and llvm-dis. The native object
> format could also be the container for native code; see my earlier proposal
> [3].
>
> The implementation in lld is very simple (about 10 lines in my prototype),
> but I can accept that it may be more difficult in other linkers, so those
> linkers may want to use bitcode as the top-level format. In that case, we
> would probably want to go with what I described in
"Implementation".
>
> 2) We could emit the vtables and CFI metadata directly into the top-level
> bitcode. However, this would create a need for a mechanism to distinguish
> vtables from non-vtables for when we link the LTO parts of the module. In
> order to do this, we could add a new bitcode record type for bitset
> metadata that could also act as an index for vtables in a similar way to
> how ThinLTO importing summaries already work. However, this would add even
> more complexity to the bitcode format, when I feel that we should really be
> going the other way with a simpler bitcode format.
>
> Thanks,
> --
> --
> Peter
>
> [1] http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html
> [2] http://llvm.org/docs/BitSets.html
> [3] http://lists.llvm.org/pipermail/llvm-dev/2016-April/098081.html
>
>
>

-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160607/c544aa64/attachment.html>

llvm dev - Jun 2016 - RFC [ThinLTO]: An embedded summary encoding to support CFI and vtable opt

[llvm-dev] RFC [ThinLTO]: An embedded summary encoding to support CFI and vtable opt

[llvm-dev] RFC [ThinLTO]: An embedded summary encoding to support CFI and vtable opt

[llvm-dev] RFC [ThinLTO]: An embedded summary encoding to support CFI and vtable opt

[llvm-dev] RFC [ThinLTO]: An embedded summary encoding to support CFI and vtable opt

[llvm-dev] RFC [ThinLTO]: An embedded summary encoding to support CFI and vtable opt