David Blaikie via llvm-dev
2021-Apr-22 02:29 UTC
[llvm-dev] Revisiting/refining the definition of optnone with interprocedural transformations
Implemented a first-pass at adding noipa IR/bitcode and the basic functionality, noipa implying "may be unrefined"/not is-definition-exact. https://reviews.llvm.org/D101011 On Mon, Apr 19, 2021 at 4:32 PM Johannes Doerfert <johannesdoerfert at gmail.com> wrote:> > > On 4/19/21 4:41 PM, David Blaikie wrote: > > On Sun, Apr 18, 2021 at 10:30 PM Johannes Doerfert > > <johannesdoerfert at gmail.com> wrote: > >> On 4/18/21 10:51 PM, David Blaikie wrote: > >>> On Sun, Apr 18, 2021 at 8:29 PM Johannes Doerfert < > >>> johannesdoerfert at gmail.com> wrote: > >>> > >>>> I'm very much in favor of `noipa`. It comes up every few months > >>>> and it would be widely useful. > >>> Out of curiosity, what sort of uses do you have in mind for it? > >> Most times people basically want `noinline` to also mean "no > >> interprocedural optimization", but without `optnone`. So, your > >> function is optimized but actually called and the call result > >> is used, no constants are propagated etc. > >> > >> Example: > >> > >> ``` > >> __attribute__((noipa)) > >> void foo() { return 1 + 2; } > >> void bar() { return foo(); } > >> ``` > >> should become > >> > >> ``` > >> __attribute__((noipa)) > >> void foo() { return 3; } > >> void bar() { return foo(); } > >> ``` > >> which it does not right now. > > I'm curious what the use case is you've come across (the justification > > for the GCC implementation of noipa was mostly for compiler testing - > > which is my interest in having these semantics (under optnone or > > otherwise) - so just curious what other use cases I should have in > > mind, etc) > > I looked for `noipa` in my inbox, here are some results that > show different use cases people brought up since March 2020: > > https://reviews.llvm.org/D75815#1939277 > https://bugs.llvm.org/show_bug.cgi?id=46463 > https://reviews.llvm.org/D93838#2472155 > https://reviews.llvm.org/D97971#2608302 > > Another use case is runtime call detection in the presence of definitions. > So, we detect `malloc` and also various OpenMP runtime calls, which works > fine because those are usually declarations. However, sometimes they are > not and then we can easily end up with signatures that do not match what we > expect anymore. At least that happens if we link in the OpenMP GPU runtime > into an application.Ah, thanks for all the links/context!> >>>> I'd expose it via Clang and -O0 could > >>>> set it as well (for the LTO case). > >>>> > >>>> When it comes to inexact definitions, optnone functions, and existing > >>>> attributes, > >>>> I'd be in favor of 1) always allowing the use of existing attributes, > >>>> > >>> I'm not sure what you mean by this ^ - could you rephrase/elaborate? > >>> > >>> > >>>> and 2) not deriving new ones for an inexact or optnone definition. > >>>> > >>> Also this ^ I'm similarly confused/unclear about. > >> So if you have a call of F, and F has attribute A, we can use > >> that fact at the call site, regardless of the definition of F. > >> F could be `optnone` or with non-exact linkage, but the information > >> attached to it is still usable. > > +1 SGTM. > > > >> If we go for the above we can never derive/attach information > >> for a non-exact linkage definitions. That way we prevent IPO from > >> using information that might be invalid if the definition is replaced. > > Yup, sounds good. > > > >> It is all about where you disturb the ipo deduction in this case, I think > >> it is more beneficial to not attach new things but an argument could be > >> made to allow that but no propagation. > > Allow adding them, but never using them? Yeah, that doesn't seem > > especially helpful/useful - the attributes are entirely for IPO, so if > > you want to block IPO it seems best not to add them. > > We could use them *inside* the function, but we can make that work > differently as well. IPO seems the more important target.Ah, right. Yeah, agreed.> >> Both have benefits, its' not 100% > >> clear what is more desirable at the end of the day. > >> > >> > >>>> This is how the Attributor determines if it a function level attribute > >>>> could > >>>> be derived or if we should only stick with the existing information: > >>>> > >>>> /// Determine whether the function \p F is IPO amendable > >>>> /// > >>>> /// If a function is exactly defined or it has alwaysinline attribute > >>>> /// and is viable to be inlined, we say it is IPO amendable > >>>> bool isFunctionIPOAmendable(const Function &F) { > >>>> return F.hasExactDefinition() || > >>>> InfoCache.InlineableFunctions.count(&F); > >>>> } > >>>> > >>>> So, if the above check doesn't hold we will not add new attributes but > >>>> we will > >>>> still use existing ones. This seems to me the right way to allow > >>>> users/frontends > >>>> to provide information selectively. > >>>> > >>> Yep, that sounds right to me (if you put attributes on an optnone/noipa > >>> function, they should be usable/used - but none should be discovered/added > >>> later by inspection of the implementation of such a function) - currently > >>> doesn't seem to be the case for the (old pass manager?) FunctionAttrs pass, > >>> so I have to figure some things out there. > >> That is what I tried to say above, I think. > >> > >> In the end, I want to know that foo does not access memory but > >> bar could for all we know: > >> > >> ``` > >> __attribute__((pure, optnone)) // or non-exact linkage > >> void pure_optnone() { /* empty */ } > >> > >> __attribute__((optnone)) // or non-exact linkage > >> void optnone() { /* empty */ } > >> > >> void foo() { pure_optnone(); } > >> > >> void bar() { optnone(); } > >> ``` > > Got it, > > > > I'll see about posting an implementation of noipa and switching > > __attribute__((optnone)) over to lower to LLVM's optnone+noipa rather > > than optnone+noinline. > > FWIW, I think `noipa` should not imply `noinline`, unsure if you > had that in mind or not.Do you think it should require that noipa also carries noinline? (the way optnone currently requires noinline) Or should we let the non-inlining fall out naturally from the non-exact definition property?> > Happy if someone wants to add clang support for an > > __attribute__((noipa)) lowering to that LLVM noipa once it's in (maybe > > I'll do it, guess it's probably fairly cheap/easy). > > Agreed, I won't volunteer right now, I doubt that I'll get to it > anytime soon. That said, I actually would like to use `noipa`, see > above. > > ~ Johannes > > > > - Dave > > > >> ~ Johannes > >> > >> > >>>> That said, right now the Attributor will not propagate any information > >>>> from an > >>>> optnone function or derive new information. Nevertheless, I'd be in > >>>> favor to allow > >>>> existing information to be used for IPO. > >>>> > >>> *nod* I think I'm with you there. > >>> > >>> - Dave > >>> > >>> > >>>> ~ Johannes > >>>> > >>>> > >>>> On 4/18/21 8:40 PM, David Blaikie via llvm-dev wrote: > >>>>> Prototyping the idea of "isDefinitionExact" returning false for optnone > >>>>> (whether or not we split it out into noipo or not) I've tripped over > >>>>> something it seems I created 5 years ago: > >>>>> > >>>>> I added some IPC support for optnone to GlobalsModRef: > >>>>> > >>>> https://github.com/llvm/llvm-project/commit/c662b501508200076e581beb9345a7631173a1d8#diff-55664e96a7ce3533b46f12c6906acecb2bd9a599e2b79c97506af4b1b4873fa1 > >>>>> - so it wouldn't conclude properties of an optnone function. > >>>>> > >>>>> But I then made a follow-up commit (without a lot of context as to why, > >>>>> unfortunately :/ ) that allowed GlobasModRef to use existing attributes > >>>> on > >>>>> an optnone function: > >>>>> > >>>> https://github.com/llvm/llvm-project/commit/7a9b788830da0a426fb0ff0a4cec6d592bb026e9#diff-55664e96a7ce3533b46f12c6906acecb2bd9a599e2b79c97506af4b1b4873fa1 > >>>>> But it seems making the function definition inexact, breaks the unit > >>>>> testing added in the latter commit. I suppose then it's an open question > >>>>> whether existing attributes on an inexact definition should be used at > >>>> all? > >>>>> (I don't know what motivated me to support them for optnone) > >>>>> > >>>>> Oh, and here's a change from Chandler around the same time similarly > >>>>> blocking some ipo for optnone: > >>>>> > >>>> https://github.com/llvm/llvm-project/commit/0fb998110abcf3d67495d12f854a1576b182d811#diff-cc618a9485181a9246c4e0367dc9f1a19d3cb6811d1e488713f53a753d3da60c > >>>>> - in this case preventing FunctionAttrs from deriving the attributes for > >>>> an > >>>>> optnone function. That functionality looks like it can be subsumed by the > >>>>> inexact approach - applying inexact to optnone and removing the change in > >>>>> Chandler's patch still passes the tests. (hmm, tested - not quite, but > >>>> more > >>>>> work to do there) > >>>>> > >>>>> On Sun, Apr 18, 2021 at 10:06 AM David Blaikie<dblaikie at gmail.com> > >>>> wrote: > >>>>>> On Sun, Apr 18, 2021 at 9:43 AM Roman Lebedev<lebedev.ri at gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>>> There's 'noipa' attribute in GCC, currently it is not supported by > >>>> clang. > >>>>>>> Theoretically, how would one implement it? > >>>>>>> > >>>>>> If we wanted to do this really robustly, I guess we might have to > >>>>>> introduce some sort of "here's the usual way to check if this is a > >>>>>> definition/get the body of the function" (which for noipa it says > >>>> "there is > >>>>>> no body/don't look here") and "no, really, I need the definition" (for > >>>>>> actual code generation). > >>>>>> > >>>>>> Though I'm not advocating for that - I'm OK with a more > >>>> ad-hoc/best-effort > >>>>>> implementation targeting the -O0/debugging assistant > >>>>>> __attribute__((optnone)) kind of use case - happy to fix cases as they > >>>> come > >>>>>> up to improve the user experience for these situations. > >>>>>> > >>>>>> Maybe we could get away with generalizing this by having an optnone (or > >>>>>> noipa) function appear "interposable" even though it doesn't have a real > >>>>>> interposable linkage? That should hinder/disable any IPA. > >>>>>> > >>>>>> Hmm, looks like GlobalValue::isDefinitionExact would be best to return > >>>>>> false in this case (whatever we end up naming it) /maybe/ > >>>>>> mayBeDerefined should return false too. > >>>>>> > >>>>>> Yeah, I guess if we can implement such a robust generalization, then > >>>> it'd > >>>>>> probably be OK/easy enough to implement both noipa and optnone implies > >>>>>> noipa the same as it implies noinline (well, I guess noipa would subsume > >>>>>> the noinline implication - if the function isn't exact, the inliner > >>>> won't > >>>>>> inline it so there wouldn't be any need for the explicit noinline) > >>>>>> > >>>>>> > >>>>>>> With your proposal, clang `noipa` attribute could be lowered > >>>>>>> to `optnone` on the whole function, To me that seems like > >>>>>>> too much of a hammer, should that be the path forward. > >>>>>>> > >>>>>> I agree that lowering noipa to optnone would be a very aggressive form > >>>> of > >>>>>> noipa - likely if we want to support noipa it would be to support it > >>>>>> separately and maybe either lower -O0 (& maybe > >>>> __attribute__((optnone))) to > >>>>>> both optnone+noipa+noinline (since optnone already implies noinline) or > >>>>>> make optnone imply ipa/be a superset of it implicitly (if we do have > >>>> noipa > >>>>>> it's probably best to have "optnone requires noipa" the same way > >>>> "optnone > >>>>>> requires noinline" rather than an implicit superset sort of thing). > >>>>>> > >>>>>> I think that'd certainly be appropriate for -O0, and I'd argue it'd be > >>>>>> appropriate for __attribute__((optnone)) because I think it'd be what > >>>>>> people expect/is consistent with the motivation for the attribute (for > >>>>>> debuggability - so you wouldn't want a caller to not fill in > >>>>>> parameters/pass in garbage because it knows the implementation doesn't > >>>>>> matter, or not use the result because it knows what the result should > >>>> be). > >>>>>>> Would it not be best to not conflate the two, > >>>>>>> and just introduce the `noipa` attribute? > >>>>>>> > >>>>>> I think we'd still want to conflate them for user-facing functionality, > >>>>>> even if they were separable at the IR level. > >>>>>> > >>>>>> - Dave > >>>>>> > >>>>>> > >>>>>>> Roman > >>>>>>> > >>>>>>> On Sun, Apr 18, 2021 at 7:37 PM David Blaikie<dblaikie at gmail.com> > >>>> wrote: > >>>>>>>> While trying to reproduce some debug info thing (I don't have the > >>>> exact > >>>>>>> example at the moment - but I think it was more aggressive than the > >>>> example > >>>>>>> I have now, but something like this: > >>>>>>>> __attribute__((optnone)) int f1() { > >>>>>>>> return 3; > >>>>>>>> } > >>>>>>>> int main() { > >>>>>>>> return f1(); > >>>>>>>> } > >>>>>>>> > >>>>>>>> > >>>>>>>> (actually I think in my case I had a variable to hold the return value > >>>>>>> from f1, with the intent that this variable's location couldn't use a > >>>>>>> constant - a load from a volatile variable would probably have provided > >>>>>>> similar functionality in this case) > >>>>>>>> LLVM (& specifically Sparse Conditional Constant Propagation, > >>>>>>> llvm/lib/Transforms/Scalar/SCCP.cpp) optimizes this code noting that f1 > >>>>>>> always returns 3, so rather than using the return value from the call > >>>> to > >>>>>>> f1, it ends up hardcoding the return value: > >>>>>>>> define dso_local i32 @main() local_unnamed_addr #1 { > >>>>>>>> > >>>>>>>> entry: > >>>>>>>> > >>>>>>>> %call = tail call i32 @_Z2f1v() > >>>>>>>> > >>>>>>>> ret i32 3 > >>>>>>>> > >>>>>>>> } > >>>>>>>> > >>>>>>>> > >>>>>>>> I consider this a bug - in that optnone is used to implement -O0 for > >>>>>>> LTO, so it seemed to me that the correct behavior is for an optnone > >>>>>>> function to behave as though it were compiled in another object file > >>>>>>> outside the purview of optimizations - interprocedural or > >>>> intraprocedural. > >>>>>>>> So I senthttps://reviews.llvm.org/D100353 to fix that. > >>>>>>>> > >>>>>>>> Florian pointed out that this wasn't quite specified in the LangRef, > >>>>>>> which says this about optnone: > >>>>>>>> This function attribute indicates that most optimization passes will > >>>>>>> skip this function, with the exception of interprocedural optimization > >>>>>>> passes. Code generation defaults to the “fast” instruction selector. > >>>> This > >>>>>>> attribute cannot be used together with the alwaysinline attribute; this > >>>>>>> attribute is also incompatible with the minsize attribute and the > >>>> optsize > >>>>>>> attribute. > >>>>>>>> This attribute requires the noinline attribute to be specified on the > >>>>>>> function as well, so the function is never inlined into any caller. > >>>> Only > >>>>>>> functions with the alwaysinline attribute are valid candidates for > >>>> inlining > >>>>>>> into the body of this function. > >>>>>>>> So the spec of optnone is unclear (or arguably explicitly disallows) > >>>>>>> whether interprocedural optimizations should treat optnone functions > >>>> in any > >>>>>>> particular way. > >>>>>>>> So I was going to update the wording to rephrase this to say > >>>>>>> "Interprocedural optimizations should treat this function as though it > >>>> were > >>>>>>> defined in an isolated module/object." (perhaps "interprocedural > >>>>>>> optimizations should treat optnone functions as opaque" or "as though > >>>> they > >>>>>>> were only declarations") > >>>>>>>> The choice of this direction was based on my (possibly incorrect or > >>>>>>> debatable) understanding of optnone, that it was equivalent to the > >>>> function > >>>>>>> being in a separate/non-lto object. (this seems consistent with the way > >>>>>>> optnone is used to implement -O0 under lto - you could imagine a user > >>>>>>> debugging a binary, using -O0 for the code they're interested in > >>>> debugging, > >>>>>>> and potentially using an interactive debugger to change some state in > >>>> the > >>>>>>> function causing it to return a different value - which would get quite > >>>>>>> confusing if the return value was effectively hardcoded into the > >>>> caller) > >>>>>>>> What're folks thoughts on this? > >>>>>>>> > >>>>>>>> - Dave > >>>>> _______________________________________________ > >>>>> LLVM Developers mailing list > >>>>> llvm-dev at lists.llvm.org > >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Johannes Doerfert via llvm-dev
2021-Apr-22 04:08 UTC
[llvm-dev] Revisiting/refining the definition of optnone with interprocedural transformations
On 4/21/21 9:29 PM, David Blaikie wrote:> >>> I'll see about posting an implementation of noipa and switching >>> __attribute__((optnone)) over to lower to LLVM's optnone+noipa rather >>> than optnone+noinline. >> FWIW, I think `noipa` should not imply `noinline`, unsure if you >> had that in mind or not. > Do you think it should require that noipa also carries noinline? (the > way optnone currently requires noinline) Or should we let the > non-inlining fall out naturally from the non-exact definition > property? >So, non-exact definitions do not prevent inlining. You can even create an internal copy and use that for IPO, think of it as "inline-then-outline". That said, I believe it is a mistake that `optnone` requires `noinline`. There is no reason for it to do so on the IR level. If you argue C-level `optnone` should imply `noinline`, that is a something worth discussing, though on the IR level we can decouple them. Use case, for example, the not-optimized version is called from functions that are `optnone` themselves while other call sites are inlined and your function is optimized. So you can use the two attributes to do context sensitive `optnone`. Circling back to `noipa`, I'm very much in favor of letting it compose freely with the others, at least in the IR. So, it does not require, nor imply `noinline` or `optnone`. Similarly, `noinline` does not imply `noipa`, neither does `optnone`. The latter might be surprising but I imagine I can use function attributes of an `optnone` function at the call site but I will not if the function is `noipa`. Others might have different opinions though. ~ Johannes