Teresa Johnson via llvm-dev
2019-Dec-11 14:21 UTC
[llvm-dev] RFC: Safe Whole Program Devirtualization Enablement
Please send any comments. As mentioned at the end I will follow up with some patches as soon as they are cleaned up and I create some test cases. RFC: Safe Whole Program Devirtualization Enablement ================================================== High Level Summary ------------------ The goal of the changes described in this RFC is to support aggressive Whole Program Devirtualization without requiring -fvisibility=hidden at compile time, by pre-enabling bitcode for whole program devirtualization, but delaying the decision on whether to apply devirtualization until LTO link time. This is needed both because we may not know whether the link mode is safe for hidden LTO visibility until link time, and also to allow bitcode objects to be shared between links of targets with differing valid LTO visibility. This utilizes the !vcall_visibility metadata added for Dead Virtual Function Elimination. The summary of changes required are (these are described in more detail later): 1) When -fwhole-program-vtables is specified, always insert type test assumes for virtual calls, and additionally add !vcall_visibility metadata to vtable definitions (which will be summarized in the ThinLTO index). 2) At LTO link time, apply hidden LTO visibility to vtable definition vcall_visibility metadata (or summary) when specified by a new link option (-lto-whole-program-visibility). 3) During the LTO link time Whole Program Devirtualization analysis, only allow devirtualization when the associated vtable definitions have hidden LTO visibility, as derived from the !vcall_visibility metadata (summarized in the index for index-only WPD). 4) Modify the Virtual Function Elimination application in GlobalDCE to ignore vtables with !vcall_visibility when they are associated with type tests (and not just type checked loads). Background ---------- Whole Program Devirtualization is supported for LTO (both regular and Thin) via the -fwhole-program-vtables option. However, it can only be safely applied to classes for which LTO can analyze the entire class hierarchy, and therefore is restricted to those classes with hidden LTO visibility. See https://clang.llvm.org/docs/LTOVisibility.html for more information. The LTO visibility of a class is derived at compile time from the class’s symbol visibility. Generally, only classes that are internal at the source level (e.g. declared in an anonymous namespace) receive hidden LTO visibility. Compiling with -fvisibility=hidden tells the compiler that, unless otherwise marked, symbols are assumed to have hidden visibility, which also implies that all classes have hidden LTO visibility (unless decorated with a public visibility attribute). This results in much more aggressive devirtualization. However, compiling with -fvisibility=hidden is only safe when we know we are LTO linking with full view of the class hierarchy. Specifically, this is true when a binary is being LTO linked with either all sources being bitcode (so that the LTO unit is the same as the linkage unit), or when the only translation units being linked as native code are known to not derive any classes defined in the LTO unit (e.g. system libraries). Additionally, the binary may not dlopen any libraries at runtime that contain classes derived from those defined in the main binary. Assuming we are building and linking a binary that satisfies the above constraints (we are LTO linking all translation units as bitcode, except certain (e.g. system) libraries or other native objects known to be safe by the user or build system, and the binary will not dlopen any libraries deriving from the binary’s classes), then it should be safe to compile with -fvisibility=hidden, along with -fwhole-program-vtables. However, there are cases where it is unknown until link time whether we are building a target that meets the above constraints. Additionally, we may want to build additional targets that do not meet the criteria for safe application of -fvisibility=hidden during the same build invocation (specifically, because subsets of the code will be linked into shared libraries instead of linking all code directly into the binary). Even if possible to build two sets of bitcode object files (one with default visibility for the unsafely linked targets and one with hidden visibility for the safely linked targets), this causes duplication in both time and space, which is prohibitive in an environment where it is common to build targets with tens of thousands of sources, and multiple targets with different link modes simultaneously. The goals of the changes described in this RFC are to essentially delay the application of -fvisibility=hidden until LTO link time, and allow bitcode objects to be shared between links of targets with differing link modes and therefore differing valid LTO visibility. Type Information for Devirtualization ------------------------------------- LTO whole program devirtualization is driven off of type information in the IR. This includes type metadata (on vtable definitions), as well as type test intrinsics before virtual calls. The former is safe to emit into the IR in all cases, but the latter is currently not. The virtual call sites are decorated with an llvm.assume(llvm.type.test(ptr, typeid)) sequence, which drives the LTO analysis of virtual calls. This sequence is an assertion that the given pointer is associated with the given type identifier (https://llvm.org/docs/LangRef.html#llvm-type-test-intrinsic). It is currently inserted only for classes with hidden LTO visibility as the implication of this sequence is that we have full visibility of that type’s class hierarchy, and may devirtualize the call based on that knowledge. This assumption is not valid if the class does not have hidden LTO visibility. In order to drive later devirtualization, we still need the type compatibility information provided by the llvm.type.test, but want to delay a decision on whether it is valid to assume that we have full class hierarchy visibility, and thus whether devirtualization of that target can be safely applied. Specifically, what we want to know at LTO time is whether the vtable has hidden LTO visibility or not, and use that to guide the application of devirtualization to the type tested virtual call sites. By default, only those with statically guaranteed hidden LTO visibility should be marked as such. And as described later, at LTO link time we can optionally decide to convert vtables to hidden LTO visibility for more aggressive devirtualization when appropriate. There is already a mechanism in the compiler to describe the vtable visibility, which was recently added for Dead Virtual Function Elimination (D63932): !vcall_visibility metadata, documented at https://llvm.org/docs/TypeMetadata.html#vcall-visibility-metadata. This metadata is attached to vtable definitions, currently only when VFE is enabled. As described in the documentation, because this is currently only used for VFE, it also requires that the corresponding function pointer loads use the llvm.type.checked.load intrinsic. This would not be required for devirtualization (although the VFE support in GlobalDCE will need modification to ignore the metadata when type checked loads not used, more on that later). This RFC proposes adding the !vcall_visibility metadata to vtable definitions when -fwhole-program-vtables is specified. Unlike for VFE, the function pointer loads can still use normal loads with corresponding type test assume sequences (better for optimization). Additional changes to the LTO compilation steps are detailed below. Pre-Link LTO Compile -------------------- First, type test assume sequences will be inserted when -fwhole-program-vtables is specified, and not just for classes with hidden LTO visibility. Second, as mentioned earlier, the !vcall_visibility metadata will be inserted under -fwhole-program-vtables. For the purposes of index-only WPD, a single-bit flag indicating whether or not the vtable def has hidden LTO visibility is added to the GVarFlags on the GlobalVarSummary. Note that we can collapse the 3 enum values of the metadata down to a single bit, because for the purposes of devirtualization, both VCallVisibilityLinkageUnit and VCallVisibilityTranslationUnit can be treated the same (we only need to have at least VCallVisibilityLinkageUnit to devirtualize). The ModuleSummaryIndex builder will set this new flag from the !vcall_visibility metadata on vtable definitions. Finally, the VFE support in GlobalDCE (which is enabled by default and currently triggers automatically in the presence of this metadata), will need to be modified to ignore !vcall_visibility metadata inserted for devirtualization only, i.e. when there are any type test assume sequences for that Type ID. This should be straightforward, as we can scan the type tests and remove any vtables decorated with compatible type ids from VFESafeVTables. Note that this change will affect the invocation of GlobalDCE both here in the pre-link LTO compile as well as later in the LTO Backend (where it is applied to a broader set of vtables). LTO Link Handling ----------------- During Whole Program Devirtualization analysis, when looking at the vtables corresponding to the summarized virtual calls during tryFindVirtualCallTargets, we must consult the vcall_visibility information. For hybrid (regular+thin) LTO, the vtable definitions are in the regular LTO partition and so the IR can be consulted directly. For index-only WPD, we instead consult the flag on the vtable’s GlobalVarSummary. If any of the vtable definitions compatible with a given virtual call have public LTO visibility, the devirtualization must be skipped. By default, only classes that have statically determined hidden LTO visibility would be allowed to devirtualize. However, as noted earlier, we want to enable more aggressive devirtualization at LTO link time when we know that the linking mode guarantees full LTO visibility of any code that may derive classes from the bitcode being linked. To do so, we will add a new linker option: For lld, the proposed option is: -lto-whole-program-visibility. For gold, the corresponding plugin option would be “whole-program-visibility”. When this option is set, LTO will convert all vtable definitions to have hidden LTO visibility before invoking Whole Program Devirtualization. In the hybrid LTO case this would mean changing the metadata on the IR. In the index-only case this would be done in the summaries. LTO Backend Handling -------------------- No changes are required in the LTO backend’s invocation of Whole Program Devirtualization, since any visibility constraints are enforced at LTO link time, and the loosening of visibility under the new link option only needs to affect the LTO WPD invocation. As mentioned earlier when describing the pre-link LTO compile changes, GlobalDCE will be changed to ignore vtables with !vcall_visibility metadata corresponding to type tests (and not just type checked loads). Status ------ These changes have been prototyped and tested with index-only WPD (with the exception of the proposed changes to GlobalDCE, at the moment I have been testing with -enable-vfe=false). I will be cleaning up the changes and sending patches for review in the coming days. -- Teresa Johnson | Software Engineer | tejohnson at google.com | -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191211/49c62d22/attachment.html>
Evgeny Leviant via llvm-dev
2019-Dec-13 16:56 UTC
[llvm-dev] Safe Whole Program Devirtualization Enablement
> Specifically, what we want to know at LTO time is whether the vtable has hidden LTO visibility or notI can be missing something, but why can't we use type metadata instead of !vcall_visibility to identify vtable pointers? We can skip emission of !type for vtables having [[clang::lto_visibility_public]] attribute and postpone decision on other vtables in the way you suggested. ________________________________ От: Teresa Johnson <tejohnson at google.com> Отправлено: 11 декабря 2019 г. 17:21 Кому: llvm-dev Копия: Peter Collingbourne; Steven Wu; Evgeny Leviant; David Li Тема: RFC: Safe Whole Program Devirtualization Enablement CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. If you suspect potential phishing or spam email, report it to ReportSpam at accesssoftek.com Please send any comments. As mentioned at the end I will follow up with some patches as soon as they are cleaned up and I create some test cases. RFC: Safe Whole Program Devirtualization Enablement ================================================== High Level Summary ------------------ The goal of the changes described in this RFC is to support aggressive Whole Program Devirtualization without requiring -fvisibility=hidden at compile time, by pre-enabling bitcode for whole program devirtualization, but delaying the decision on whether to apply devirtualization until LTO link time. This is needed both because we may not know whether the link mode is safe for hidden LTO visibility until link time, and also to allow bitcode objects to be shared between links of targets with differing valid LTO visibility. This utilizes the !vcall_visibility metadata added for Dead Virtual Function Elimination. The summary of changes required are (these are described in more detail later): 1) When -fwhole-program-vtables is specified, always insert type test assumes for virtual calls, and additionally add !vcall_visibility metadata to vtable definitions (which will be summarized in the ThinLTO index). 2) At LTO link time, apply hidden LTO visibility to vtable definition vcall_visibility metadata (or summary) when specified by a new link option (-lto-whole-program-visibility). 3) During the LTO link time Whole Program Devirtualization analysis, only allow devirtualization when the associated vtable definitions have hidden LTO visibility, as derived from the !vcall_visibility metadata (summarized in the index for index-only WPD). 4) Modify the Virtual Function Elimination application in GlobalDCE to ignore vtables with !vcall_visibility when they are associated with type tests (and not just type checked loads). Background ---------- Whole Program Devirtualization is supported for LTO (both regular and Thin) via the -fwhole-program-vtables option. However, it can only be safely applied to classes for which LTO can analyze the entire class hierarchy, and therefore is restricted to those classes with hidden LTO visibility. See https://clang.llvm.org/docs/LTOVisibility.html for more information. The LTO visibility of a class is derived at compile time from the class’s symbol visibility. Generally, only classes that are internal at the source level (e.g. declared in an anonymous namespace) receive hidden LTO visibility. Compiling with -fvisibility=hidden tells the compiler that, unless otherwise marked, symbols are assumed to have hidden visibility, which also implies that all classes have hidden LTO visibility (unless decorated with a public visibility attribute). This results in much more aggressive devirtualization. However, compiling with -fvisibility=hidden is only safe when we know we are LTO linking with full view of the class hierarchy. Specifically, this is true when a binary is being LTO linked with either all sources being bitcode (so that the LTO unit is the same as the linkage unit), or when the only translation units being linked as native code are known to not derive any classes defined in the LTO unit (e.g. system libraries). Additionally, the binary may not dlopen any libraries at runtime that contain classes derived from those defined in the main binary. Assuming we are building and linking a binary that satisfies the above constraints (we are LTO linking all translation units as bitcode, except certain (e.g. system) libraries or other native objects known to be safe by the user or build system, and the binary will not dlopen any libraries deriving from the binary’s classes), then it should be safe to compile with -fvisibility=hidden, along with -fwhole-program-vtables. However, there are cases where it is unknown until link time whether we are building a target that meets the above constraints. Additionally, we may want to build additional targets that do not meet the criteria for safe application of -fvisibility=hidden during the same build invocation (specifically, because subsets of the code will be linked into shared libraries instead of linking all code directly into the binary). Even if possible to build two sets of bitcode object files (one with default visibility for the unsafely linked targets and one with hidden visibility for the safely linked targets), this causes duplication in both time and space, which is prohibitive in an environment where it is common to build targets with tens of thousands of sources, and multiple targets with different link modes simultaneously. The goals of the changes described in this RFC are to essentially delay the application of -fvisibility=hidden until LTO link time, and allow bitcode objects to be shared between links of targets with differing link modes and therefore differing valid LTO visibility. Type Information for Devirtualization ------------------------------------- LTO whole program devirtualization is driven off of type information in the IR. This includes type metadata (on vtable definitions), as well as type test intrinsics before virtual calls. The former is safe to emit into the IR in all cases, but the latter is currently not. The virtual call sites are decorated with an llvm.assume(llvm.type.test(ptr, typeid)) sequence, which drives the LTO analysis of virtual calls. This sequence is an assertion that the given pointer is associated with the given type identifier (https://llvm.org/docs/LangRef.html#llvm-type-test-intrinsic). It is currently inserted only for classes with hidden LTO visibility as the implication of this sequence is that we have full visibility of that type’s class hierarchy, and may devirtualize the call based on that knowledge. This assumption is not valid if the class does not have hidden LTO visibility. In order to drive later devirtualization, we still need the type compatibility information provided by the llvm.type.test, but want to delay a decision on whether it is valid to assume that we have full class hierarchy visibility, and thus whether devirtualization of that target can be safely applied. Specifically, what we want to know at LTO time is whether the vtable has hidden LTO visibility or not, and use that to guide the application of devirtualization to the type tested virtual call sites. By default, only those with statically guaranteed hidden LTO visibility should be marked as such. And as described later, at LTO link time we can optionally decide to convert vtables to hidden LTO visibility for more aggressive devirtualization when appropriate. There is already a mechanism in the compiler to describe the vtable visibility, which was recently added for Dead Virtual Function Elimination (D63932): !vcall_visibility metadata, documented at https://llvm.org/docs/TypeMetadata.html#vcall-visibility-metadata. This metadata is attached to vtable definitions, currently only when VFE is enabled. As described in the documentation, because this is currently only used for VFE, it also requires that the corresponding function pointer loads use the llvm.type.checked.load intrinsic. This would not be required for devirtualization (although the VFE support in GlobalDCE will need modification to ignore the metadata when type checked loads not used, more on that later). This RFC proposes adding the !vcall_visibility metadata to vtable definitions when -fwhole-program-vtables is specified. Unlike for VFE, the function pointer loads can still use normal loads with corresponding type test assume sequences (better for optimization). Additional changes to the LTO compilation steps are detailed below. Pre-Link LTO Compile -------------------- First, type test assume sequences will be inserted when -fwhole-program-vtables is specified, and not just for classes with hidden LTO visibility. Second, as mentioned earlier, the !vcall_visibility metadata will be inserted under -fwhole-program-vtables. For the purposes of index-only WPD, a single-bit flag indicating whether or not the vtable def has hidden LTO visibility is added to the GVarFlags on the GlobalVarSummary. Note that we can collapse the 3 enum values of the metadata down to a single bit, because for the purposes of devirtualization, both VCallVisibilityLinkageUnit and VCallVisibilityTranslationUnit can be treated the same (we only need to have at least VCallVisibilityLinkageUnit to devirtualize). The ModuleSummaryIndex builder will set this new flag from the !vcall_visibility metadata on vtable definitions. Finally, the VFE support in GlobalDCE (which is enabled by default and currently triggers automatically in the presence of this metadata), will need to be modified to ignore !vcall_visibility metadata inserted for devirtualization only, i.e. when there are any type test assume sequences for that Type ID. This should be straightforward, as we can scan the type tests and remove any vtables decorated with compatible type ids from VFESafeVTables. Note that this change will affect the invocation of GlobalDCE both here in the pre-link LTO compile as well as later in the LTO Backend (where it is applied to a broader set of vtables). LTO Link Handling ----------------- During Whole Program Devirtualization analysis, when looking at the vtables corresponding to the summarized virtual calls during tryFindVirtualCallTargets, we must consult the vcall_visibility information. For hybrid (regular+thin) LTO, the vtable definitions are in the regular LTO partition and so the IR can be consulted directly. For index-only WPD, we instead consult the flag on the vtable’s GlobalVarSummary. If any of the vtable definitions compatible with a given virtual call have public LTO visibility, the devirtualization must be skipped. By default, only classes that have statically determined hidden LTO visibility would be allowed to devirtualize. However, as noted earlier, we want to enable more aggressive devirtualization at LTO link time when we know that the linking mode guarantees full LTO visibility of any code that may derive classes from the bitcode being linked. To do so, we will add a new linker option: For lld, the proposed option is: -lto-whole-program-visibility. For gold, the corresponding plugin option would be “whole-program-visibility”. When this option is set, LTO will convert all vtable definitions to have hidden LTO visibility before invoking Whole Program Devirtualization. In the hybrid LTO case this would mean changing the metadata on the IR. In the index-only case this would be done in the summaries. LTO Backend Handling -------------------- No changes are required in the LTO backend’s invocation of Whole Program Devirtualization, since any visibility constraints are enforced at LTO link time, and the loosening of visibility under the new link option only needs to affect the LTO WPD invocation. As mentioned earlier when describing the pre-link LTO compile changes, GlobalDCE will be changed to ignore vtables with !vcall_visibility metadata corresponding to type tests (and not just type checked loads). Status ------ These changes have been prototyped and tested with index-only WPD (with the exception of the proposed changes to GlobalDCE, at the moment I have been testing with -enable-vfe=false). I will be cleaning up the changes and sending patches for review in the coming days. -- Teresa Johnson | Software Engineer | tejohnson at google.com<mailto:tejohnson at google.com> | -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191213/d0a2c7e4/attachment-0001.html>
Teresa Johnson via llvm-dev
2019-Dec-13 17:06 UTC
[llvm-dev] Safe Whole Program Devirtualization Enablement
On Fri, Dec 13, 2019 at 8:56 AM Evgeny Leviant <eleviant at accesssoftek.com> wrote:> > Specifically, what we want to know at LTO time is whether the vtable > has hidden LTO visibility or not > > > I can be missing something, but why can't we use type metadata instead of > !vcall_visibility to identify vtable pointers? We can skip emission of > !type for vtables having [[clang::lto_visibility_public]] attribute and > postpone decision on other vtables in the way you suggested. >I'm not sure if you mean the vtables that have received this attribute manually, or just the ones that by default would get public LTO visibility (the latter is the vast bulk of the interesting case). Regardless, it is the same reason. At LTO link time we want to optionally treat these as hidden (i.e. delay the effects of what would have been done at compile time under -fvisibility=hidden). If we don't emit the !type metadata, then we cannot do this as we lose the class hierarchy info necessary for WPD. The vcall_visibility attribute just tells us which vtables we must treat conservatively as public without the LTO link time assertion provided by the proposed new link option that we can safely treat public classes as hidden due to the link mode. Teresa> > > ------------------------------ > *От:* Teresa Johnson <tejohnson at google.com> > *Отправлено:* 11 декабря 2019 г. 17:21 > *Кому:* llvm-dev > *Копия:* Peter Collingbourne; Steven Wu; Evgeny Leviant; David Li > *Тема:* RFC: Safe Whole Program Devirtualization Enablement > > CAUTION: This email originated from outside of the organization. Do not > click links or open attachments unless you recognize the sender and know > the content is safe. If you suspect potential phishing or spam email, > report it to ReportSpam at accesssoftek.com > Please send any comments. As mentioned at the end I will follow up with > some patches as soon as they are cleaned up and I create some test cases. > > RFC: Safe Whole Program Devirtualization Enablement > ==================================================> > High Level Summary > ------------------ > > The goal of the changes described in this RFC is to support aggressive > Whole Program Devirtualization without requiring -fvisibility=hidden at > compile time, by pre-enabling bitcode for whole program devirtualization, > but delaying the decision on whether to apply devirtualization until LTO > link time. This is needed both because we may not know whether the link > mode is safe for hidden LTO visibility until link time, and also to allow > bitcode objects to be shared between links of targets with differing valid > LTO visibility. This utilizes the !vcall_visibility metadata added for Dead > Virtual Function Elimination. > > The summary of changes required are (these are described in more detail > later): > > 1) When -fwhole-program-vtables is specified, always insert type test > assumes for virtual calls, and additionally add !vcall_visibility metadata > to vtable definitions (which will be summarized in the ThinLTO index). > > 2) At LTO link time, apply hidden LTO visibility to vtable definition > vcall_visibility metadata (or summary) when specified by a new link option > (-lto-whole-program-visibility). > > 3) During the LTO link time Whole Program Devirtualization analysis, only > allow devirtualization when the associated vtable definitions have hidden > LTO visibility, as derived from the !vcall_visibility metadata (summarized > in the index for index-only WPD). > > 4) Modify the Virtual Function Elimination application in GlobalDCE to > ignore vtables with !vcall_visibility when they are associated with type > tests (and not just type checked loads). > > Background > ---------- > > Whole Program Devirtualization is supported for LTO (both regular and > Thin) via the -fwhole-program-vtables option. However, it can only be > safely applied to classes for which LTO can analyze the entire class > hierarchy, and therefore is restricted to those classes with hidden LTO > visibility. See https://clang.llvm.org/docs/LTOVisibility.html for more > information. > > The LTO visibility of a class is derived at compile time from the class’s > symbol visibility. Generally, only classes that are internal at the source > level (e.g. declared in an anonymous namespace) receive hidden LTO > visibility. Compiling with -fvisibility=hidden tells the compiler that, > unless otherwise marked, symbols are assumed to have hidden visibility, > which also implies that all classes have hidden LTO visibility (unless > decorated with a public visibility attribute). This results in much more > aggressive devirtualization. > > However, compiling with -fvisibility=hidden is only safe when we know we > are LTO linking with full view of the class hierarchy. Specifically, this > is true when a binary is being LTO linked with either all sources being > bitcode (so that the LTO unit is the same as the linkage unit), or when the > only translation units being linked as native code are known to not derive > any classes defined in the LTO unit (e.g. system libraries). Additionally, > the binary may not dlopen any libraries at runtime that contain classes > derived from those defined in the main binary. > > Assuming we are building and linking a binary that satisfies the above > constraints (we are LTO linking all translation units as bitcode, except > certain (e.g. system) libraries or other native objects known to be safe by > the user or build system, and the binary will not dlopen any libraries > deriving from the binary’s classes), then it should be safe to compile with > -fvisibility=hidden, along with -fwhole-program-vtables. > > However, there are cases where it is unknown until link time whether we > are building a target that meets the above constraints. Additionally, we > may want to build additional targets that do not meet the criteria for safe > application of -fvisibility=hidden during the same build invocation > (specifically, because subsets of the code will be linked into shared > libraries instead of linking all code directly into the binary). Even if > possible to build two sets of bitcode object files (one with default > visibility for the unsafely linked targets and one with hidden visibility > for the safely linked targets), this causes duplication in both time and > space, which is prohibitive in an environment where it is common to build > targets with tens of thousands of sources, and multiple targets with > different link modes simultaneously. > > The goals of the changes described in this RFC are to essentially delay > the application of -fvisibility=hidden until LTO link time, and allow > bitcode objects to be shared between links of targets with differing link > modes and therefore differing valid LTO visibility. > > Type Information for Devirtualization > ------------------------------------- > > LTO whole program devirtualization is driven off of type information in > the IR. This includes type metadata (on vtable definitions), as well as > type test intrinsics before virtual calls. The former is safe to emit into > the IR in all cases, but the latter is currently not. The virtual call > sites are decorated with an llvm.assume(llvm.type.test(ptr, typeid)) > sequence, which drives the LTO analysis of virtual calls. This sequence is > an assertion that the given pointer is associated with the given type > identifier (https://llvm.org/docs/LangRef.html#llvm-type-test-intrinsic). > It is currently inserted only for classes with hidden LTO visibility as the > implication of this sequence is that we have full visibility of that type’s > class hierarchy, and may devirtualize the call based on that knowledge. > This assumption is not valid if the class does not have hidden LTO > visibility. > > In order to drive later devirtualization, we still need the type > compatibility information provided by the llvm.type.test, but want to delay > a decision on whether it is valid to assume that we have full class > hierarchy visibility, and thus whether devirtualization of that target can > be safely applied. > > Specifically, what we want to know at LTO time is whether the vtable has > hidden LTO visibility or not, and use that to guide the application of > devirtualization to the type tested virtual call sites. By default, only > those with statically guaranteed hidden LTO visibility should be marked as > such. And as described later, at LTO link time we can optionally decide to > convert vtables to hidden LTO visibility for more aggressive > devirtualization when appropriate. > > There is already a mechanism in the compiler to describe the vtable > visibility, which was recently added for Dead Virtual Function Elimination > (D63932): !vcall_visibility metadata, documented at > https://llvm.org/docs/TypeMetadata.html#vcall-visibility-metadata. This > metadata is attached to vtable definitions, currently only when VFE is > enabled. As described in the documentation, because this is currently only > used for VFE, it also requires that the corresponding function pointer > loads use the llvm.type.checked.load intrinsic. This would not be required > for devirtualization (although the VFE support in GlobalDCE will need > modification to ignore the metadata when type checked loads not used, more > on that later). > > This RFC proposes adding the !vcall_visibility metadata to vtable > definitions when -fwhole-program-vtables is specified. Unlike for VFE, the > function pointer loads can still use normal loads with corresponding type > test assume sequences (better for optimization). > > Additional changes to the LTO compilation steps are detailed below. > > Pre-Link LTO Compile > -------------------- > > First, type test assume sequences will be inserted when > -fwhole-program-vtables is specified, and not just for classes with hidden > LTO visibility. > > Second, as mentioned earlier, the !vcall_visibility metadata will be > inserted under -fwhole-program-vtables. For the purposes of index-only WPD, > a single-bit flag indicating whether or not the vtable def has hidden LTO > visibility is added to the GVarFlags on the GlobalVarSummary. Note that we > can collapse the 3 enum values of the metadata down to a single bit, > because for the purposes of devirtualization, both > VCallVisibilityLinkageUnit and VCallVisibilityTranslationUnit can be > treated the same (we only need to have at least VCallVisibilityLinkageUnit > to devirtualize). The ModuleSummaryIndex builder will set this new flag > from the !vcall_visibility metadata on vtable definitions. > > Finally, the VFE support in GlobalDCE (which is enabled by default and > currently triggers automatically in the presence of this metadata), will > need to be modified to ignore !vcall_visibility metadata inserted for > devirtualization only, i.e. when there are any type test assume sequences > for that Type ID. This should be straightforward, as we can scan the type > tests and remove any vtables decorated with compatible type ids from > VFESafeVTables. Note that this change will affect the invocation of > GlobalDCE both here in the pre-link LTO compile as well as later in the LTO > Backend (where it is applied to a broader set of vtables). > > LTO Link Handling > ----------------- > > During Whole Program Devirtualization analysis, when looking at the > vtables corresponding to the summarized virtual calls during > tryFindVirtualCallTargets, we must consult the vcall_visibility > information. For hybrid (regular+thin) LTO, the vtable definitions are in > the regular LTO partition and so the IR can be consulted directly. For > index-only WPD, we instead consult the flag on the vtable’s > GlobalVarSummary. > > If any of the vtable definitions compatible with a given virtual call have > public LTO visibility, the devirtualization must be skipped. > > By default, only classes that have statically determined hidden LTO > visibility would be allowed to devirtualize. However, as noted earlier, we > want to enable more aggressive devirtualization at LTO link time when we > know that the linking mode guarantees full LTO visibility of any code that > may derive classes from the bitcode being linked. To do so, we will add a > new linker option: > > For lld, the proposed option is: -lto-whole-program-visibility. > For gold, the corresponding plugin option would be > “whole-program-visibility”. > > When this option is set, LTO will convert all vtable definitions to have > hidden LTO visibility before invoking Whole Program Devirtualization. In > the hybrid LTO case this would mean changing the metadata on the IR. In the > index-only case this would be done in the summaries. > > LTO Backend Handling > -------------------- > > No changes are required in the LTO backend’s invocation of Whole Program > Devirtualization, since any visibility constraints are enforced at LTO link > time, and the loosening of visibility under the new link option only needs > to affect the LTO WPD invocation. > > As mentioned earlier when describing the pre-link LTO compile changes, > GlobalDCE will be changed to ignore vtables with !vcall_visibility metadata > corresponding to type tests (and not just type checked loads). > > Status > ------ > > These changes have been prototyped and tested with index-only WPD (with > the exception of the proposed changes to GlobalDCE, at the moment I have > been testing with -enable-vfe=false). I will be cleaning up the changes and > sending patches for review in the coming days. > > -- > Teresa Johnson | Software Engineer | tejohnson at google.com | >-- Teresa Johnson | Software Engineer | tejohnson at google.com | -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191213/e8df1cae/attachment.html>
Iurii Gribov via llvm-dev
2019-Dec-17 15:36 UTC
[llvm-dev] RFC: Safe Whole Program Devirtualization Enablement
(cc list this time) Hi Teresa, Apologies if this has been discussed before but ...> The LTO visibility of a class is derived at compile time from the class’s symbol visibility. > Generally, only classes that are internal at the source level (e.g. declared in an anonymous namespace) receive hidden LTO visibility. > Compiling with -fvisibility=hidden tells the compiler that, unless > otherwise marked, symbols are assumed to have hidden visibility, which > also implies that all classes have hidden LTO visibility (unless decorated with a public visibility attribute). > This results in much more aggressive devirtualization.Note that by default, unlike GCC, LLVM is liberal on visibility-constrained optimizations. In particular it freely performs inlining, IPA and cloning on them (see https://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html which also suggested adding -fsemantic-interposition to actually respect visibility in optimizations). It's unclear why devirtualization should behave differently than other optimizations (at least by default). -I
Teresa Johnson via llvm-dev
2019-Dec-17 16:32 UTC
[llvm-dev] RFC: Safe Whole Program Devirtualization Enablement
On Tue, Dec 17, 2019 at 7:36 AM Iurii Gribov <Iurii.Gribov at ceva-dsp.com> wrote:> (cc list this time) > > Hi Teresa, > > Apologies if this has been discussed before but ... > > > The LTO visibility of a class is derived at compile time from the > class’s symbol visibility. > > Generally, only classes that are internal at the source level (e.g. > declared in an anonymous namespace) receive hidden LTO visibility. > > Compiling with -fvisibility=hidden tells the compiler that, unless > > otherwise marked, symbols are assumed to have hidden visibility, which > > also implies that all classes have hidden LTO visibility (unless > decorated with a public visibility attribute). > > This results in much more aggressive devirtualization. > > Note that by default, unlike GCC, LLVM is liberal on > visibility-constrained optimizations. In particular it freely performs > inlining, IPA and cloning on them (see > https://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html which > also suggested adding -fsemantic-interposition to actually respect > visibility in optimizations). It's unclear why devirtualization should > behave differently than other optimizations (at least by default). >Are you suggesting that we should be more aggressive by default (i.e. without -fvisibility=hidden or any new options)? I believe that will be too aggressive for class LTO visibility. It is common to override a virtual functions across shared library boundaries (e.g. a test may override a virtual function from a shared library with a mock class). But with what I am proposing we will assume it is safe under the proposed LTO link option, which should be applied when linking statically other than e.g. system libraries. Thanks, Teresa> -I > >-- Teresa Johnson | Software Engineer | tejohnson at google.com | -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191217/ca109dfe/attachment.html>
Teresa Johnson via llvm-dev
2019-Dec-26 19:55 UTC
[llvm-dev] RFC: Safe Whole Program Devirtualization Enablement
FYI I mailed 3 patches this morning that together implement the RFC. PTAL: D71907: [WPD/VFE] Always emit vcall_visibility metadata for -fwhole-program-vtables D71911: [ThinLTO] Summarize vcall_visibility metadata D71913: [LTO/WPD] Enable aggressive WPD under LTO option Teresa On Wed, Dec 11, 2019 at 6:21 AM Teresa Johnson <tejohnson at google.com> wrote:> Please send any comments. As mentioned at the end I will follow up with > some patches as soon as they are cleaned up and I create some test cases. > > RFC: Safe Whole Program Devirtualization Enablement > ==================================================> > High Level Summary > ------------------ > > The goal of the changes described in this RFC is to support aggressive > Whole Program Devirtualization without requiring -fvisibility=hidden at > compile time, by pre-enabling bitcode for whole program devirtualization, > but delaying the decision on whether to apply devirtualization until LTO > link time. This is needed both because we may not know whether the link > mode is safe for hidden LTO visibility until link time, and also to allow > bitcode objects to be shared between links of targets with differing valid > LTO visibility. This utilizes the !vcall_visibility metadata added for Dead > Virtual Function Elimination. > > The summary of changes required are (these are described in more detail > later): > > 1) When -fwhole-program-vtables is specified, always insert type test > assumes for virtual calls, and additionally add !vcall_visibility metadata > to vtable definitions (which will be summarized in the ThinLTO index). > > 2) At LTO link time, apply hidden LTO visibility to vtable definition > vcall_visibility metadata (or summary) when specified by a new link option > (-lto-whole-program-visibility). > > 3) During the LTO link time Whole Program Devirtualization analysis, only > allow devirtualization when the associated vtable definitions have hidden > LTO visibility, as derived from the !vcall_visibility metadata (summarized > in the index for index-only WPD). > > 4) Modify the Virtual Function Elimination application in GlobalDCE to > ignore vtables with !vcall_visibility when they are associated with type > tests (and not just type checked loads). > > Background > ---------- > > Whole Program Devirtualization is supported for LTO (both regular and > Thin) via the -fwhole-program-vtables option. However, it can only be > safely applied to classes for which LTO can analyze the entire class > hierarchy, and therefore is restricted to those classes with hidden LTO > visibility. See https://clang.llvm.org/docs/LTOVisibility.html for more > information. > > The LTO visibility of a class is derived at compile time from the class’s > symbol visibility. Generally, only classes that are internal at the source > level (e.g. declared in an anonymous namespace) receive hidden LTO > visibility. Compiling with -fvisibility=hidden tells the compiler that, > unless otherwise marked, symbols are assumed to have hidden visibility, > which also implies that all classes have hidden LTO visibility (unless > decorated with a public visibility attribute). This results in much more > aggressive devirtualization. > > However, compiling with -fvisibility=hidden is only safe when we know we > are LTO linking with full view of the class hierarchy. Specifically, this > is true when a binary is being LTO linked with either all sources being > bitcode (so that the LTO unit is the same as the linkage unit), or when the > only translation units being linked as native code are known to not derive > any classes defined in the LTO unit (e.g. system libraries). Additionally, > the binary may not dlopen any libraries at runtime that contain classes > derived from those defined in the main binary. > > Assuming we are building and linking a binary that satisfies the above > constraints (we are LTO linking all translation units as bitcode, except > certain (e.g. system) libraries or other native objects known to be safe by > the user or build system, and the binary will not dlopen any libraries > deriving from the binary’s classes), then it should be safe to compile with > -fvisibility=hidden, along with -fwhole-program-vtables. > > However, there are cases where it is unknown until link time whether we > are building a target that meets the above constraints. Additionally, we > may want to build additional targets that do not meet the criteria for safe > application of -fvisibility=hidden during the same build invocation > (specifically, because subsets of the code will be linked into shared > libraries instead of linking all code directly into the binary). Even if > possible to build two sets of bitcode object files (one with default > visibility for the unsafely linked targets and one with hidden visibility > for the safely linked targets), this causes duplication in both time and > space, which is prohibitive in an environment where it is common to build > targets with tens of thousands of sources, and multiple targets with > different link modes simultaneously. > > The goals of the changes described in this RFC are to essentially delay > the application of -fvisibility=hidden until LTO link time, and allow > bitcode objects to be shared between links of targets with differing link > modes and therefore differing valid LTO visibility. > > Type Information for Devirtualization > ------------------------------------- > > LTO whole program devirtualization is driven off of type information in > the IR. This includes type metadata (on vtable definitions), as well as > type test intrinsics before virtual calls. The former is safe to emit into > the IR in all cases, but the latter is currently not. The virtual call > sites are decorated with an llvm.assume(llvm.type.test(ptr, typeid)) > sequence, which drives the LTO analysis of virtual calls. This sequence is > an assertion that the given pointer is associated with the given type > identifier (https://llvm.org/docs/LangRef.html#llvm-type-test-intrinsic). > It is currently inserted only for classes with hidden LTO visibility as the > implication of this sequence is that we have full visibility of that type’s > class hierarchy, and may devirtualize the call based on that knowledge. > This assumption is not valid if the class does not have hidden LTO > visibility. > > In order to drive later devirtualization, we still need the type > compatibility information provided by the llvm.type.test, but want to delay > a decision on whether it is valid to assume that we have full class > hierarchy visibility, and thus whether devirtualization of that target can > be safely applied. > > Specifically, what we want to know at LTO time is whether the vtable has > hidden LTO visibility or not, and use that to guide the application of > devirtualization to the type tested virtual call sites. By default, only > those with statically guaranteed hidden LTO visibility should be marked as > such. And as described later, at LTO link time we can optionally decide to > convert vtables to hidden LTO visibility for more aggressive > devirtualization when appropriate. > > There is already a mechanism in the compiler to describe the vtable > visibility, which was recently added for Dead Virtual Function Elimination > (D63932): !vcall_visibility metadata, documented at > https://llvm.org/docs/TypeMetadata.html#vcall-visibility-metadata. This > metadata is attached to vtable definitions, currently only when VFE is > enabled. As described in the documentation, because this is currently only > used for VFE, it also requires that the corresponding function pointer > loads use the llvm.type.checked.load intrinsic. This would not be required > for devirtualization (although the VFE support in GlobalDCE will need > modification to ignore the metadata when type checked loads not used, more > on that later). > > This RFC proposes adding the !vcall_visibility metadata to vtable > definitions when -fwhole-program-vtables is specified. Unlike for VFE, the > function pointer loads can still use normal loads with corresponding type > test assume sequences (better for optimization). > > Additional changes to the LTO compilation steps are detailed below. > > Pre-Link LTO Compile > -------------------- > > First, type test assume sequences will be inserted when > -fwhole-program-vtables is specified, and not just for classes with hidden > LTO visibility. > > Second, as mentioned earlier, the !vcall_visibility metadata will be > inserted under -fwhole-program-vtables. For the purposes of index-only WPD, > a single-bit flag indicating whether or not the vtable def has hidden LTO > visibility is added to the GVarFlags on the GlobalVarSummary. Note that we > can collapse the 3 enum values of the metadata down to a single bit, > because for the purposes of devirtualization, both > VCallVisibilityLinkageUnit and VCallVisibilityTranslationUnit can be > treated the same (we only need to have at least VCallVisibilityLinkageUnit > to devirtualize). The ModuleSummaryIndex builder will set this new flag > from the !vcall_visibility metadata on vtable definitions. > > Finally, the VFE support in GlobalDCE (which is enabled by default and > currently triggers automatically in the presence of this metadata), will > need to be modified to ignore !vcall_visibility metadata inserted for > devirtualization only, i.e. when there are any type test assume sequences > for that Type ID. This should be straightforward, as we can scan the type > tests and remove any vtables decorated with compatible type ids from > VFESafeVTables. Note that this change will affect the invocation of > GlobalDCE both here in the pre-link LTO compile as well as later in the LTO > Backend (where it is applied to a broader set of vtables). > > LTO Link Handling > ----------------- > > During Whole Program Devirtualization analysis, when looking at the > vtables corresponding to the summarized virtual calls during > tryFindVirtualCallTargets, we must consult the vcall_visibility > information. For hybrid (regular+thin) LTO, the vtable definitions are in > the regular LTO partition and so the IR can be consulted directly. For > index-only WPD, we instead consult the flag on the vtable’s > GlobalVarSummary. > > If any of the vtable definitions compatible with a given virtual call have > public LTO visibility, the devirtualization must be skipped. > > By default, only classes that have statically determined hidden LTO > visibility would be allowed to devirtualize. However, as noted earlier, we > want to enable more aggressive devirtualization at LTO link time when we > know that the linking mode guarantees full LTO visibility of any code that > may derive classes from the bitcode being linked. To do so, we will add a > new linker option: > > For lld, the proposed option is: -lto-whole-program-visibility. > For gold, the corresponding plugin option would be > “whole-program-visibility”. > > When this option is set, LTO will convert all vtable definitions to have > hidden LTO visibility before invoking Whole Program Devirtualization. In > the hybrid LTO case this would mean changing the metadata on the IR. In the > index-only case this would be done in the summaries. > > LTO Backend Handling > -------------------- > > No changes are required in the LTO backend’s invocation of Whole Program > Devirtualization, since any visibility constraints are enforced at LTO link > time, and the loosening of visibility under the new link option only needs > to affect the LTO WPD invocation. > > As mentioned earlier when describing the pre-link LTO compile changes, > GlobalDCE will be changed to ignore vtables with !vcall_visibility metadata > corresponding to type tests (and not just type checked loads). > > Status > ------ > > These changes have been prototyped and tested with index-only WPD (with > the exception of the proposed changes to GlobalDCE, at the moment I have > been testing with -enable-vfe=false). I will be cleaning up the changes and > sending patches for review in the coming days. > > -- > Teresa Johnson | Software Engineer | tejohnson at google.com | >-- Teresa Johnson | Software Engineer | tejohnson at google.com | -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191226/b1ffa804/attachment-0001.html>
Apparently Analagous Threads
- RFC: dynamic_cast optimization in LTO
- Proposal: virtual constant propagation
- RFC: Using link-time optimization to eliminate retpolines
- RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
- RFC: Using link-time optimization to eliminate retpolines