Nick Desaulniers via llvm-dev
2021-Jun-24 18:52 UTC
[llvm-dev] [RFC] inlining and mismatched function attributes
Hey folks, It was suggested that I bring up a peculiarity from an individual code review to the list. We have a couple function attributes (in C) where inlining can produce unexpected or surprising results. Let's take for example, stack protectors, which help me prevent stack buffer overruns from rewriting a return address pushed on the stack (sometimes). Now let's say I'm an operating system kernel and I need to restore register state after suspend (ie. during resume). That register state would conflict with the generated stack protector code, so I use a function attribute like __attribute__((no_stack_protector) to prevent my compiler from inserting such code in my resume handler which has to initialize a new stack canary. But I wasn't careful to check that all of the functions I call from said handler have a matching function attribute, and due to inlining, I wind up with a stack protector, even though I requested a function not have a stack protector! Suddenly I load the incorrect stack canary value from garbage register state, fail the stack check guard, and now I fail to resume due to a kernel panic. (True story) Another stack protector issue that came up recently has to do with forking "zygote-like" processes, and having unique stack protector values per process. Point being, there are a few cases where we want stack protectors generally, but need the fine grain control provided by function attributes. Moving code into separate translation units in order to use different flags, is tedious and commonly runs into issues with LTO in LLVM when flags that aren't encoded in IR are dropped by LTO. Similar problems crop up again with coverage, and profiling instrumentation in "delicate" code. Sanitizers, shadow call stack, and a few others run into similar situations. As a developer without knowledge of my toolchain, how do I go about debugging this myself? (Debugging the suspend/resume issue was not fun). What are some ways developers can fix this? Well, if I know my call chain, I can go through and start marking my callees __attribute__((no_stack_protector)) or __attribute__((noinline)) recursively. Eventually, I should get to the point where none of my callees have stack protectors or are inlined. But that strips off more stack protectors perhaps than I'd like, and leaves a lot of code unprotected. This begins to feel like "what color is your function?" "Red" functions may only call other "red" functions; "blue" functions may only call other "blue" functions. https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ Maybe my callees have different callers with different constraints. Aliases can be used for callees. Example: // has a stack protector void callee_bad(void) { ... } __attribute__((alias("callee_bad"),noinline)) void callee_good(void); // requires no stack protector __attribute__((no_stack_protector)) void caller(void) { callee_good(); } However, caller() could be inlined into yet another (not defined above) call site that does have/need a stack protector. And other call sites (that don't care about stack protectors) can just call callee_bad like normal. If I don't have too many callees, it's not too bad. But very quickly it can become difficult as a developer to tell which callee was problematic. Another possibility is that the compiler could have knowledge of certain conflicting function attributes, and skip inline substitution of callee into caller upon mismatch. We provide "remarks" to help developers understand why inline substitution failed, if they care to know why. ie. $ clang -O1 -O1 -Rpass-missed=inline foo.c foo.c:6:10: remark: foo not inlined into bar because it should never be inlined (cost=never): <really good reason why here> Indeed, that's exactly the tact I took with no_stack_protector in commit bc044a88ee3c ("[Inline] prevent inlining on stack protector mismatch"). It's what GCC is proposing for __attribute__((no_profile_instrument_function)) (coverage and profiling) in: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573511.html https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 and I'm looking to match in https://reviews.llvm.org/D104810 . We already have precedence for this in LLVM; checkout `CompatRule` in llvm/include/llvm/IR/Attributes.td and `llvm::getAttributeBasedInliningDecision` in llvm/lib/Analysis/InlineCost.cpp. I'd argue it's also less surprising for a function not to be inlined than for a stack protector or profile/coverage instrumentation to show up when a function is explicitly attributed to not have those things. These attributes describe a desired property of the function. If inlined, we wish the new copies have the same property. However, function attribute applies to the whole function, not a subset of instructions. But, now we're making IR function attributes that could have been orthogonal (nossp, noinline) entangled. That hurts the composition of such attributes. (I will note, nossp does not unconditionally imply noinline; it implies "noinline for mismatched callers, and noinline on callees that are mismatched, on a per call site basis." __attribute__((always_inline)) will override this exception, and we don't provide helpful diagnostics in such case; good luck! :-/ ) We likely will have such conflicts with additional function attributes in the future. Should the inliner scan the caller and callee at every call site for such attribute lists? Should the LangRef document such inlining behavior? Another concern is diverging from GCC here; while we're both discussing no_profile_instrument_function it would be good to gather more feedback. Sometimes, I wish in C we had the ability to express at a given callsite that inlining the callee should not occur. You can see more discussion https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 . Surely there are additional solutions we have not yet conceived of. At this point, I don't particularly care what color we paint this bikeshed; I just need a bikeshed built so that my kernel boots. We all want all software to be better, and can have a gentle-person's disagreement. Thoughts?
Chris Wailes via llvm-dev
2021-Jul-07 18:47 UTC
[llvm-dev] [RFC] inlining and mismatched function attributes
Nick, Thanks for bringing up this topic. Another issue to consider is the performance impact of the interaction between `no_stack_protector` and inlining. If the translation unit is compiled with stack protectors and you want to remove them from a specific function you have to prevent all inlining into the function in question or recursively annotate all called functions with `no_stack_protector`. This can lead to disastrous consequences for performance if it is a hot function. Sometimes, I wish in C we had the ability to express at a given callsite> that inlining the callee should not occur. >This would solve the usability problem that you are discussing (you wouldn't have to go chasing down all callees that are marked as `alwaysinline`) but it wouldn't solve the performance problem. Another solution might be to have an annotation that says "my `no_stack_protector` or `no_profile_instrument_function` annotation overrides the values from inlined functions". This would allow the inlining to occur but gives the callee control over their own behavior. If we had had this annotation it would have made one of the several solutions to stack protectors in the Zygote that we tried actually viable. - Chris On Thu, Jun 24, 2021 at 11:52 AM Nick Desaulniers <ndesaulniers at google.com> wrote:> Hey folks, > It was suggested that I bring up a peculiarity from an individual code > review to the list. > > We have a couple function attributes (in C) where inlining can produce > unexpected or surprising results. > > Let's take for example, stack protectors, which help me prevent stack > buffer overruns from rewriting a return address pushed on the stack > (sometimes). Now let's say I'm an operating system kernel and I need > to restore register state after suspend (ie. during resume). That > register state would conflict with the generated stack protector code, > so I use a function attribute like __attribute__((no_stack_protector) > to prevent my compiler from inserting such code in my resume handler > which has to initialize a new stack canary. But I wasn't careful to > check that all of the functions I call from said handler have a > matching function attribute, and due to inlining, I wind up with a > stack protector, even though I requested a function not have a stack > protector! Suddenly I load the incorrect stack canary value from > garbage register state, fail the stack check guard, and now I fail to > resume due to a kernel panic. (True story) > > Another stack protector issue that came up recently has to do with > forking "zygote-like" processes, and having unique stack protector > values per process. Point being, there are a few cases where we want > stack protectors generally, but need the fine grain control provided > by function attributes. Moving code into separate translation units > in order to use different flags, is tedious and commonly runs into > issues with LTO in LLVM when flags that aren't encoded in IR are > dropped by LTO. > > Similar problems crop up again with coverage, and profiling > instrumentation in "delicate" code. Sanitizers, shadow call stack, > and a few others run into similar situations. > > As a developer without knowledge of my toolchain, how do I go about > debugging this myself? (Debugging the suspend/resume issue was not > fun). > > What are some ways developers can fix this? Well, if I know my call > chain, I can go through and start marking my callees > __attribute__((no_stack_protector)) or __attribute__((noinline)) > recursively. Eventually, I should get to the point where none of my > callees have stack protectors or are inlined. But that strips off more > stack protectors perhaps than I'd like, and leaves a lot of code > unprotected. This begins to feel like "what color is your function?" > "Red" functions may only call other "red" functions; "blue" functions > may only call other "blue" functions. > https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ > > Maybe my callees have different callers with different constraints. > Aliases can be used for callees. Example: > > // has a stack protector > void callee_bad(void) { ... } > > __attribute__((alias("callee_bad"),noinline)) > void callee_good(void); > > // requires no stack protector > __attribute__((no_stack_protector)) > void caller(void) { > callee_good(); > } > > However, caller() could be inlined into yet another (not defined > above) call site that does have/need a stack protector. > > And other call sites (that don't care about stack protectors) can just > call callee_bad like normal. If I don't have too many callees, it's > not too bad. But very quickly it can become difficult as a developer > to tell which callee was problematic. > > Another possibility is that the compiler could have knowledge of > certain conflicting function attributes, and skip inline substitution > of callee into caller upon mismatch. We provide "remarks" to help > developers understand why inline substitution failed, if they care to > know why. ie. > $ clang -O1 -O1 -Rpass-missed=inline foo.c > foo.c:6:10: remark: foo not inlined into bar because it should never > be inlined (cost=never): <really good reason why here> > > Indeed, that's exactly the tact I took with no_stack_protector in > commit bc044a88ee3c ("[Inline] prevent inlining on stack protector > mismatch"). > > It's what GCC is proposing for > __attribute__((no_profile_instrument_function)) (coverage and > profiling) in: > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573511.html > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 > and I'm looking to match in > https://reviews.llvm.org/D104810 . > > We already have precedence for this in LLVM; checkout `CompatRule` in > llvm/include/llvm/IR/Attributes.td and > `llvm::getAttributeBasedInliningDecision` in > llvm/lib/Analysis/InlineCost.cpp. > > I'd argue it's also less surprising for a function not to be inlined > than for a stack protector or profile/coverage instrumentation to show > up when a function is explicitly attributed to not have those things. > These attributes describe a desired property of the function. If > inlined, we wish the new copies have the same property. However, > function attribute applies to the whole function, not a subset of > instructions. > > But, now we're making IR function attributes that could have been > orthogonal (nossp, noinline) entangled. That hurts the composition of > such attributes. (I will note, nossp does not unconditionally imply > noinline; it implies "noinline for mismatched callers, and noinline on > callees that are mismatched, on a per call site basis." > __attribute__((always_inline)) will override this exception, and we > don't provide helpful diagnostics in such case; good luck! :-/ ) We > likely will have such conflicts with additional function attributes in > the future. Should the inliner scan the caller and callee at every > call site for such attribute lists? Should the LangRef document such > inlining behavior? > > Another concern is diverging from GCC here; while we're both > discussing no_profile_instrument_function it would be good to gather > more feedback. > > Sometimes, I wish in C we had the ability to express at a given > callsite that inlining the callee should not occur. > > You can see more discussion > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 . Surely there are > additional solutions we have not yet conceived of. At this point, I > don't particularly care what color we paint this bikeshed; I just need > a bikeshed built so that my kernel boots. We all want all software to > be better, and can have a gentle-person's disagreement. Thoughts? >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210707/f7d488db/attachment.html>
Martin Liška via llvm-dev
2021-Aug-16 13:26 UTC
[llvm-dev] [RFC] inlining and mismatched function attributes
On 6/24/21 8:52 PM, Nick Desaulniers wrote:> Hey folks, > It was suggested that I bring up a peculiarity from an individual code > review to the list.Hello. You raise bunch of very interesting question I must say.> > We have a couple function attributes (in C) where inlining can produce > unexpected or surprising results. > > Let's take for example, stack protectors, which help me prevent stack > buffer overruns from rewriting a return address pushed on the stack > (sometimes). Now let's say I'm an operating system kernel and I need > to restore register state after suspend (ie. during resume). That > register state would conflict with the generated stack protector code, > so I use a function attribute like __attribute__((no_stack_protector) > to prevent my compiler from inserting such code in my resume handler > which has to initialize a new stack canary. But I wasn't careful to > check that all of the functions I call from said handler have a > matching function attribute, and due to inlining, I wind up with a > stack protector, even though I requested a function not have a stack > protector! Suddenly I load the incorrect stack canary value from > garbage register state, fail the stack check guard, and now I fail to > resume due to a kernel panic. (True story)Just adding a note, where Jakub disagrees with blocking of the inlining for the stact protector attribute: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722#c9> > Another stack protector issue that came up recently has to do with > forking "zygote-like" processes, and having unique stack protector > values per process. Point being, there are a few cases where we want > stack protectors generally, but need the fine grain control provided > by function attributes. Moving code into separate translation units > in order to use different flags, is tedious and commonly runs into > issues with LTO in LLVM when flags that aren't encoded in IR are > dropped by LTO. > > Similar problems crop up again with coverage, and profiling > instrumentation in "delicate" code. Sanitizers, shadow call stack, > and a few others run into similar situations. > > As a developer without knowledge of my toolchain, how do I go about > debugging this myself? (Debugging the suspend/resume issue was not > fun). > > What are some ways developers can fix this? Well, if I know my call > chain, I can go through and start marking my callees > __attribute__((no_stack_protector)) or __attribute__((noinline)) > recursively. Eventually, I should get to the point where none of my > callees have stack protectors or are inlined. But that strips off more > stack protectors perhaps than I'd like, and leaves a lot of code > unprotected. This begins to feel like "what color is your function?" > "Red" functions may only call other "red" functions; "blue" functions > may only call other "blue" functions. > https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ > > Maybe my callees have different callers with different constraints. > Aliases can be used for callees. Example: > > // has a stack protector > void callee_bad(void) { ... } > > __attribute__((alias("callee_bad"),noinline)) > void callee_good(void); > > // requires no stack protector > __attribute__((no_stack_protector)) > void caller(void) { > callee_good(); > } > > However, caller() could be inlined into yet another (not defined > above) call site that does have/need a stack protector. > > And other call sites (that don't care about stack protectors) can just > call callee_bad like normal. If I don't have too many callees, it's > not too bad. But very quickly it can become difficult as a developer > to tell which callee was problematic. > > Another possibility is that the compiler could have knowledge of > certain conflicting function attributes, and skip inline substitution > of callee into caller upon mismatch. We provide "remarks" to help > developers understand why inline substitution failed, if they care to > know why. ie. > $ clang -O1 -O1 -Rpass-missed=inline foo.c > foo.c:6:10: remark: foo not inlined into bar because it should never > be inlined (cost=never): <really good reason why here> > > Indeed, that's exactly the tact I took with no_stack_protector in > commit bc044a88ee3c ("[Inline] prevent inlining on stack protector mismatch"). > > It's what GCC is proposing for > __attribute__((no_profile_instrument_function)) (coverage and > profiling) in: > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573511.html > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 > and I'm looking to match in > https://reviews.llvm.org/D104810 . > > We already have precedence for this in LLVM; checkout `CompatRule` in > llvm/include/llvm/IR/Attributes.td and > `llvm::getAttributeBasedInliningDecision` in > llvm/lib/Analysis/InlineCost.cpp. > > I'd argue it's also less surprising for a function not to be inlined > than for a stack protector or profile/coverage instrumentation to show > up when a function is explicitly attributed to not have those things. > These attributes describe a desired property of the function. If > inlined, we wish the new copies have the same property. However, > function attribute applies to the whole function, not a subset of > instructions. > > But, now we're making IR function attributes that could have been > orthogonal (nossp, noinline) entangled. That hurts the composition of > such attributes. (I will note, nossp does not unconditionally imply > noinline; it implies "noinline for mismatched callers, and noinline on > callees that are mismatched, on a per call site basis." > __attribute__((always_inline)) will override this exception, and we > don't provide helpful diagnostics in such case; good luck! :-/ ) We > likely will have such conflicts with additional function attributes in > the future. Should the inliner scan the caller and callee at every > call site for such attribute lists? Should the LangRef document such > inlining behavior? > > Another concern is diverging from GCC here; while we're both > discussing no_profile_instrument_function it would be good to gather > more feedback.About this attribute, it's definitely something different as no_stack_protector is a release feature, while no_profile_instrument_function is used for PGO. My understanding of the attribute is that it's used for super-busy functions which would slow down a training run significantly. So GCC approach would be to allow inlining of a function after PGO instrumentation. Martin> > Sometimes, I wish in C we had the ability to express at a given > callsite that inlining the callee should not occur. > > You can see more discussion > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 . Surely there are > additional solutions we have not yet conceived of. At this point, I > don't particularly care what color we paint this bikeshed; I just need > a bikeshed built so that my kernel boots. We all want all software to > be better, and can have a gentle-person's disagreement. Thoughts? >