Chandler Carruth via llvm-dev
2017-Oct-27 03:14 UTC
[llvm-dev] RFC: We need to explicitly state that some functions are reserved by LLVM
I've gotten a fantastic bug report. Consider the LLVM IR: target triple = "x86_64-unknown-linux-gnu" define internal i8* @access({ i8* }* %arg, i64) { ret i8* undef } define i8* @g({ i8* }* %arg) { bb: %tmp = alloca { i8* }*, align 8 store { i8* }* %arg, { i8* }** %tmp, align 8 br i1 undef, label %bb4, label %bb1 bb1: %tmp2 = load { i8* }*, { i8* }** %tmp, align 8 %tmp3 = call i8* @access({ i8* }* %tmp2, i64 undef) br label %bb4 bb4: ret i8* undef } This IR, if compiled with `opt -passes='cgscc(inline,argpromotion)' -disable-output` hits a bunch of asserts in the LazyCallGraph. The problem here is that `argpromotion` turns a normal looking function `i8* @access({ i8* }* %arg, i64)` and turn it into a magical function `i8* @access(i8* %arg, i64)`. This latter signature is the POSIX `access` function that LLVM's `TargetLibraryInfo` knows magical things about. Because *some* library functions known to `TargetLibraryInfo` can have *calls* to them introduced at arbitrary points of optimization (consider vectorized variants of math functions), the new pass manager and its graph to provide ordering over the module get Very Unhappy when you *introduce* a definition of a library function in the middle of the compilation pipeline. And really, we do *not* want `argpromotion` to do this. We don't want it to turn some random function by the name of `@free` into the actual `@free` function and suddenly change how LLVM handles it. So what do we do? One option is to make `argpromotion` and every other pass that mutates a function's signature rename the function (or add a `nobuiltin` attribute to it). However, this seems brittle and somewhat complicated. My proposal is that we admit that certain names of functions are reserved in LLVM's IR. For these names, in some cases *any* function with that name will be treated specially by the optimizer. We can still check the signatures when transforming code based on LLVM's semantic understanding of that function, but this avoids any need to check things when mutating the signature of the function. This would require frontends to avoid emitting functions by these names unless they should have these special semantics. However, even if they do, everything should remain conservatively correct. But I'll send an email to cfe-dev suggesting that Clang start "mangling" internal functions that collide with target names. I think this is important as I've found a quite surprising number of cases where this happens in real code. There is no need to auto-upgrade here, because again, LLVM's handling will remain conservatively correct. Does this seem reasonable? If so, I'll send patches to update the LangRef with these restrictions. I'll also take a quick stab at generating some example tables of such names from the .td files used by `TargetLibraryInfo` already. These can't be authoritative because of the platform-specific nature of it, but should help people understand how this area works. One alternative that seems appealing but doesn't actually help would be to make `TargetLibraryInfo` ignore internal functions. That is how the C++ spec seems to handle this for example (C library function names are reserved only when they have linkage). But this doesn't work well for LLVM because we want to be able to LTO an internalized C library. So I think we need the rule for LLVM function names to not rely on linkage here. Thanks, -Chandler -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171027/cb865c00/attachment.html>
Chandler Carruth via llvm-dev
2017-Oct-27 03:14 UTC
[llvm-dev] RFC: We need to explicitly state that some functions are reserved by LLVM
Oh, I forgot to say, thanks to +Hal Finkel <hfinkel at anl.gov> as he helped a lot by suggesting that `argpromotion` was simply wrong here and with discussing these ideas. =] On Thu, Oct 26, 2017 at 8:14 PM Chandler Carruth <chandlerc at gmail.com> wrote:> I've gotten a fantastic bug report. Consider the LLVM IR: > > target triple = "x86_64-unknown-linux-gnu" > > define internal i8* @access({ i8* }* %arg, i64) { > ret i8* undef > } > > define i8* @g({ i8* }* %arg) { > bb: > %tmp = alloca { i8* }*, align 8 > store { i8* }* %arg, { i8* }** %tmp, align 8 > br i1 undef, label %bb4, label %bb1 > > bb1: > %tmp2 = load { i8* }*, { i8* }** %tmp, align 8 > %tmp3 = call i8* @access({ i8* }* %tmp2, i64 undef) > br label %bb4 > > bb4: > ret i8* undef > } > > This IR, if compiled with `opt -passes='cgscc(inline,argpromotion)' > -disable-output` hits a bunch of asserts in the LazyCallGraph. > > The problem here is that `argpromotion` turns a normal looking function > `i8* @access({ i8* }* %arg, i64)` and turn it into a magical function `i8* > @access(i8* %arg, i64)`. This latter signature is the POSIX `access` > function that LLVM's `TargetLibraryInfo` knows magical things about. > > Because *some* library functions known to `TargetLibraryInfo` can have > *calls* to them introduced at arbitrary points of optimization (consider > vectorized variants of math functions), the new pass manager and its graph > to provide ordering over the module get Very Unhappy when you *introduce* a > definition of a library function in the middle of the compilation pipeline. > > And really, we do *not* want `argpromotion` to do this. We don't want it > to turn some random function by the name of `@free` into the actual `@free` > function and suddenly change how LLVM handles it. > > So what do we do? > > One option is to make `argpromotion` and every other pass that mutates a > function's signature rename the function (or add a `nobuiltin` attribute to > it). However, this seems brittle and somewhat complicated. > > My proposal is that we admit that certain names of functions are reserved > in LLVM's IR. For these names, in some cases *any* function with that name > will be treated specially by the optimizer. We can still check the > signatures when transforming code based on LLVM's semantic understanding of > that function, but this avoids any need to check things when mutating the > signature of the function. > > This would require frontends to avoid emitting functions by these names > unless they should have these special semantics. However, even if they do, > everything should remain conservatively correct. But I'll send an email to > cfe-dev suggesting that Clang start "mangling" internal functions that > collide with target names. I think this is important as I've found a quite > surprising number of cases where this happens in real code. > > There is no need to auto-upgrade here, because again, LLVM's handling will > remain conservatively correct. > > Does this seem reasonable? If so, I'll send patches to update the LangRef > with these restrictions. I'll also take a quick stab at generating some > example tables of such names from the .td files used by `TargetLibraryInfo` > already. These can't be authoritative because of the platform-specific > nature of it, but should help people understand how this area works. > > > One alternative that seems appealing but doesn't actually help would be to > make `TargetLibraryInfo` ignore internal functions. That is how the C++ > spec seems to handle this for example (C library function names are > reserved only when they have linkage). But this doesn't work well for LLVM > because we want to be able to LTO an internalized C library. So I think we > need the rule for LLVM function names to not rely on linkage here. > > > Thanks, > -Chandler > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171027/f27ed4bf/attachment.html>
Chris Lattner via llvm-dev
2017-Oct-27 03:53 UTC
[llvm-dev] RFC: We need to explicitly state that some functions are reserved by LLVM
On Oct 26, 2017, at 8:14 PM, Chandler Carruth via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > I've gotten a fantastic bug report. Consider the LLVM IR: > > target triple = "x86_64-unknown-linux-gnu" > > define internal i8* @access({ i8* }* %arg, i64) { > ret i8* undef > } > > define i8* @g({ i8* }* %arg) { > bb: > %tmp = alloca { i8* }*, align 8 > store { i8* }* %arg, { i8* }** %tmp, align 8 > br i1 undef, label %bb4, label %bb1 > > bb1: > %tmp2 = load { i8* }*, { i8* }** %tmp, align 8 > %tmp3 = call i8* @access({ i8* }* %tmp2, i64 undef) > br label %bb4 > > bb4: > ret i8* undef > } > > This IR, if compiled with `opt -passes='cgscc(inline,argpromotion)' -disable-output` hits a bunch of asserts in the LazyCallGraph. > > The problem here is that `argpromotion` turns a normal looking function `i8* @access({ i8* }* %arg, i64)` and turn it into a magical function `i8* @access(i8* %arg, i64)`. This latter signature is the POSIX `access` function that LLVM's `TargetLibraryInfo` knows magical things about.I don’t think this is an argpromotion bug. We don’t want any transformation that changes internal functions to know about these sorts of things. I think the right fix is to make TargetLibraryInfo treat non-externally visible functions as unknown. -Chris
Chris Lattner via llvm-dev
2017-Oct-27 03:56 UTC
[llvm-dev] RFC: We need to explicitly state that some functions are reserved by LLVM
> On Oct 26, 2017, at 8:14 PM, Chandler Carruth via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > One alternative that seems appealing but doesn't actually help would be to make `TargetLibraryInfo` ignore internal functions. That is how the C++ spec seems to handle this for example (C library function names are reserved only when they have linkage). But this doesn't work well for LLVM because we want to be able to LTO an internalized C library. So I think we need the rule for LLVM function names to not rely on linkage here.Oh sorry, (almost) TLDR I didn’t get to this part. I don’t see how this is applicable. If you’re statically linking in a libc, I think it is fine to forgo the optimizations that TargetLibraryInfo is all about. If these transformations are important to use in this case, we should invent a new attribute, and the thing that turns libc symbols into internal ones should add the attribute to the (now internal) libc symbols. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171026/8266bb30/attachment.html>
Hal Finkel via llvm-dev
2017-Oct-27 04:01 UTC
[llvm-dev] RFC: We need to explicitly state that some functions are reserved by LLVM
On 10/26/2017 10:56 PM, Chris Lattner via llvm-dev wrote:> >> On Oct 26, 2017, at 8:14 PM, Chandler Carruth via llvm-dev >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> >> One alternative that seems appealing but doesn't actually help would >> be to make `TargetLibraryInfo` ignore internal functions. That is how >> the C++ spec seems to handle this for example (C library function >> names are reserved only when they have linkage). But this doesn't >> work well for LLVM because we want to be able to LTO an internalized >> C library. So I think we need the rule for LLVM function names to not >> rely on linkage here. > > Oh sorry, (almost) TLDR I didn’t get to this part. I don’t see how > this is applicable. If you’re statically linking in a libc, I think > it is fine to forgo the optimizations that TargetLibraryInfo is all about. > > If these transformations are important to use in this case, we should > invent a new attribute, and the thing that turns libc symbols into > internal ones should add the attribute to the (now internal) libc symbols.I'm not sure; some of the transformations are somewhat special (e.g., based on mathematical properties, or things like printf -> puts translation). LTO alone certainly won't give you those kinds of things via normal IPA, and I doubt we want attributes for all of them. Also, having LTO essentially disable optimizations isn't good either. -Hal> > -Chris > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171026/6f3631a0/attachment.html>
David Chisnall via llvm-dev
2017-Oct-27 08:50 UTC
[llvm-dev] RFC: We need to explicitly state that some functions are reserved by LLVM
This seems slightly inverted. As I understand it, the root of the problem is that some standards, such as C, C++, and POSIX, define some functions as special and we rely on their specialness when optimising. Unfortunately, the specialness is a property of the source language and, possibly, environment and not necessarily of the target. The knowledge of which functions are special seems like it ought to belong in the front end, so a C++ compiler might tag a function called _Znwm as special, but to a C or Fortran front end this is just another function and shouldn’t be treated specially. Would it not be cleaner to have the front end (and any optimisations that are aware of special behaviour of functions) add metadata indicating that these functions are special? If the metadata is lost, then this inhibits later optimisations but shouldn’t affect the semantics of the code (it’s always valid to treat the special functions as non-special functions) and optimisations then don’t need to mark them. This would also give us a free mechanism of specifying functions that are semantically equivalent but have different spellings. David> On 27 Oct 2017, at 04:14, Chandler Carruth via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I've gotten a fantastic bug report. Consider the LLVM IR: > > target triple = "x86_64-unknown-linux-gnu" > > define internal i8* @access({ i8* }* %arg, i64) { > ret i8* undef > } > > define i8* @g({ i8* }* %arg) { > bb: > %tmp = alloca { i8* }*, align 8 > store { i8* }* %arg, { i8* }** %tmp, align 8 > br i1 undef, label %bb4, label %bb1 > > bb1: > %tmp2 = load { i8* }*, { i8* }** %tmp, align 8 > %tmp3 = call i8* @access({ i8* }* %tmp2, i64 undef) > br label %bb4 > > bb4: > ret i8* undef > } > > This IR, if compiled with `opt -passes='cgscc(inline,argpromotion)' -disable-output` hits a bunch of asserts in the LazyCallGraph. > > The problem here is that `argpromotion` turns a normal looking function `i8* @access({ i8* }* %arg, i64)` and turn it into a magical function `i8* @access(i8* %arg, i64)`. This latter signature is the POSIX `access` function that LLVM's `TargetLibraryInfo` knows magical things about. > > Because *some* library functions known to `TargetLibraryInfo` can have *calls* to them introduced at arbitrary points of optimization (consider vectorized variants of math functions), the new pass manager and its graph to provide ordering over the module get Very Unhappy when you *introduce* a definition of a library function in the middle of the compilation pipeline. > > And really, we do *not* want `argpromotion` to do this. We don't want it to turn some random function by the name of `@free` into the actual `@free` function and suddenly change how LLVM handles it. > > So what do we do? > > One option is to make `argpromotion` and every other pass that mutates a function's signature rename the function (or add a `nobuiltin` attribute to it). However, this seems brittle and somewhat complicated. > > My proposal is that we admit that certain names of functions are reserved in LLVM's IR. For these names, in some cases *any* function with that name will be treated specially by the optimizer. We can still check the signatures when transforming code based on LLVM's semantic understanding of that function, but this avoids any need to check things when mutating the signature of the function. > > This would require frontends to avoid emitting functions by these names unless they should have these special semantics. However, even if they do, everything should remain conservatively correct. But I'll send an email to cfe-dev suggesting that Clang start "mangling" internal functions that collide with target names. I think this is important as I've found a quite surprising number of cases where this happens in real code. > > There is no need to auto-upgrade here, because again, LLVM's handling will remain conservatively correct. > > Does this seem reasonable? If so, I'll send patches to update the LangRef with these restrictions. I'll also take a quick stab at generating some example tables of such names from the .td files used by `TargetLibraryInfo` already. These can't be authoritative because of the platform-specific nature of it, but should help people understand how this area works. > > > One alternative that seems appealing but doesn't actually help would be to make `TargetLibraryInfo` ignore internal functions. That is how the C++ spec seems to handle this for example (C library function names are reserved only when they have linkage). But this doesn't work well for LLVM because we want to be able to LTO an internalized C library. So I think we need the rule for LLVM function names to not rely on linkage here. > > > Thanks, > -Chandler > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Xinliang David Li via llvm-dev
2017-Oct-27 18:10 UTC
[llvm-dev] RFC: We need to explicitly state that some functions are reserved by LLVM
On Fri, Oct 27, 2017 at 1:50 AM, David Chisnall via llvm-dev < llvm-dev at lists.llvm.org> wrote:> This seems slightly inverted. As I understand it, the root of the problem > is that some standards, such as C, C++, and POSIX, define some functions as > special and we rely on their specialness when optimising. Unfortunately, > the specialness is a property of the source language and, possibly, > environment and not necessarily of the target. The knowledge of which > functions are special seems like it ought to belong in the front end, so a > C++ compiler might tag a function called _Znwm as special, but to a C or > Fortran front end this is just another function and shouldn’t be treated > specially. > > Would it not be cleaner to have the front end (and any optimisations that > are aware of special behaviour of functions) add metadata indicating that > these functions are special?Ideally many of these functions should be annotated as builtin in the system headers. An hacky solution is for frontend to check if the declarations are from system headers to decide if metadata needs to be applied. David> If the metadata is lost, then this inhibits later optimisations but > shouldn’t affect the semantics of the code (it’s always valid to treat the > special functions as non-special functions) and optimisations then don’t > need to mark them. This would also give us a free mechanism of specifying > functions that are semantically equivalent but have different spellings. > >> David > > > On 27 Oct 2017, at 04:14, Chandler Carruth via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > > I've gotten a fantastic bug report. Consider the LLVM IR: > > > > target triple = "x86_64-unknown-linux-gnu" > > > > define internal i8* @access({ i8* }* %arg, i64) { > > ret i8* undef > > } > > > > define i8* @g({ i8* }* %arg) { > > bb: > > %tmp = alloca { i8* }*, align 8 > > store { i8* }* %arg, { i8* }** %tmp, align 8 > > br i1 undef, label %bb4, label %bb1 > > > > bb1: > > %tmp2 = load { i8* }*, { i8* }** %tmp, align 8 > > %tmp3 = call i8* @access({ i8* }* %tmp2, i64 undef) > > br label %bb4 > > > > bb4: > > ret i8* undef > > } > > > > This IR, if compiled with `opt -passes='cgscc(inline,argpromotion)' > -disable-output` hits a bunch of asserts in the LazyCallGraph. > > > > The problem here is that `argpromotion` turns a normal looking function > `i8* @access({ i8* }* %arg, i64)` and turn it into a magical function `i8* > @access(i8* %arg, i64)`. This latter signature is the POSIX `access` > function that LLVM's `TargetLibraryInfo` knows magical things about. > > > > Because *some* library functions known to `TargetLibraryInfo` can have > *calls* to them introduced at arbitrary points of optimization (consider > vectorized variants of math functions), the new pass manager and its graph > to provide ordering over the module get Very Unhappy when you *introduce* a > definition of a library function in the middle of the compilation pipeline. > > > > And really, we do *not* want `argpromotion` to do this. We don't want it > to turn some random function by the name of `@free` into the actual `@free` > function and suddenly change how LLVM handles it. > > > > So what do we do? > > > > One option is to make `argpromotion` and every other pass that mutates a > function's signature rename the function (or add a `nobuiltin` attribute to > it). However, this seems brittle and somewhat complicated. > > > > My proposal is that we admit that certain names of functions are > reserved in LLVM's IR. For these names, in some cases *any* function with > that name will be treated specially by the optimizer. We can still check > the signatures when transforming code based on LLVM's semantic > understanding of that function, but this avoids any need to check things > when mutating the signature of the function. > > > > This would require frontends to avoid emitting functions by these names > unless they should have these special semantics. However, even if they do, > everything should remain conservatively correct. But I'll send an email to > cfe-dev suggesting that Clang start "mangling" internal functions that > collide with target names. I think this is important as I've found a quite > surprising number of cases where this happens in real code. > > > > There is no need to auto-upgrade here, because again, LLVM's handling > will remain conservatively correct. > > > > Does this seem reasonable? If so, I'll send patches to update the > LangRef with these restrictions. I'll also take a quick stab at generating > some example tables of such names from the .td files used by > `TargetLibraryInfo` already. These can't be authoritative because of the > platform-specific nature of it, but should help people understand how this > area works. > > > > > > One alternative that seems appealing but doesn't actually help would be > to make `TargetLibraryInfo` ignore internal functions. That is how the C++ > spec seems to handle this for example (C library function names are > reserved only when they have linkage). But this doesn't work well for LLVM > because we want to be able to LTO an internalized C library. So I think we > need the rule for LLVM function names to not rely on linkage here. > > > > > > Thanks, > > -Chandler > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171027/18798603/attachment.html>