thr3ads.net - llvm dev - [llvm-dev] Revisiting/refining the definition of optnone with interprocedural transformations [Apr 2021]

If this information is useful, please help other people find it:
Share via:

Johannes Doerfert via llvm-dev

2021-Apr-22 17:32 UTC

[llvm-dev] Revisiting/refining the definition of optnone with interprocedural transformations

On 4/22/21 12:21 PM, David Blaikie wrote:> On Thu, Apr 22, 2021 at 10:04 AM Johannes Doerfert
> <johannesdoerfert at gmail.com> wrote:
>>
>> On 4/22/21 11:49 AM, David Blaikie wrote:
>>> There seems to be a bunch of confusion and probably some
>>> conflation/ambiguity about whether we're talking about IR
constructs
>>> on the C attributes.
>>>
>>> Johannes - I assume your claim is restricted mostly to the IR? That
>>> having optnone not require or imply noinline improves orthogonality
of
>>> features and that there are reasonable use cases where one might
want
>>> optnone while allowing inlining (of the optnone function) or
optnone
>>> while disallowing inlining (of the optnone function)
>> Yes, IR level is my concern. I'd be in favor of matching it in C
>> but I don't care enough to go through the discussions.
>>
>> My latest use case: `s/optnone//g` should not result in a verifier
>> error when you just want to try out an optimization on a single
function.
> I guess you meant adding optnone, rather than removing it? (removing
> optnone shouldn't cause any verifier errors, does it?)
Yes, right, my bad. Removing `noinline` or adding `optnone` can
get you in trouble.

>>> Paul - I think you're mostly thinking about/interested in the
specific
>>> source level/end user use case that motivated the initial
>>> implementation of optnone. Where, I tend to agree - inlining an
>>> optnone function is not advantageous to the user. Though it's
possible
>>> Johannes 's argument could be generalized from IR to C and
still
>>> apply: orthogonal features are more powerful and the user can
always
>>> compose them together to get what they want. (good chance
they're
>>> using attributes behind macros for ease of use anyway - they're
a bit
>>> verbose to write by hand all the time)
>> I'd give the user the capability building blocks and let them work
>> with that, but again, this is not my main concern right now.
>>
>>
>>> There's also the -O0 use of optnone these days (clang puts
optnone on
>>> all functions when compiling with -O0 - the intent being to treat
such
>>> functions as though they were compiled in a separate object file
>>> without optimizations (that's me projecting what I /think/ the
mental
>>> model should be) - which, similarly, I think will probably want to
>>> keep the current behavior (no ipa/inlining and no optimization -
>>> however that's phrased).
>>>
>>> Essentially the high level use cases of optnone all look like
"imagine
>>> if I compiled this in a separate object file without LTO" -
apparently
>>> even noipa+optnone isn't enough for that, maybe (I need to test
that
>>> more, based on a comment from Johannes earlier that inexact
>>> definitions don't stop the inliner... )? Sounds like maybe
it's more
>>> like what I'm thinking of is "what if this function had
weak linkage"
>>> (ie: could be replaced by a totally different one)?
>> Yes, weak should do the trick. That said, if you want "separate
>> object file without LTO", go with `noinline` + `noipa`. This will
>> make the call edges optimization barriers. If you want to also not
>> optimize the function, add `optnone` as required.
> Starts to feel like a long list to get what seems like one wholistic
> concept ("treat it as though it were a separate non-LTO object
file" /
> "treat it as though this function had weak linkage"), but
it's
> probably OK/a fine thing to do (not like there's a high cost to having
> multiple attributes since the attribute lists are shared, etc) - just
> psychologically for me, there seems to be one core concept and
> stitching it together from several attributes makes me worry that
> there are gaps (as there have been/what's motivating this discussion -
> though it certainly sounds like there will be fewer gaps after this
> work, for sure).
If you want "separate non-LTO" behavior by design, put it in a
different file and don't compile that file with LTO ;)

Inside a single TU there is no "separate non-LTO" idea, it is
a single TU after all. To get the same effect we build it from
blocks that have a meaning in the single TU case. Sure, there
might be gaps left, that usually means there is something missing
in the single TU case as well so a new attribute is in order.

>> FWIW, if all functions are `optnone`, `noinline` and `noipa` are
>> not needed, that is the -O0 case. More specifically, if your caller
>> is `optnone`, `noinline` and `noipa` are not needed (for that caller).
> Right - though LTO is the case that motivated adding optnone for -O0,
> so it would be respected even under LTO - so in that case we'd want
> noinline and noipa, by the sounds of it.
If you run only one file with -O0 and then LTO it with other files
that do not have -O0, you probably want to add `noinline` + `noipa`
at the "entry points" that you want to debug. I think -O0 could
reasonably add all three arguments anyway, if you say O0 they all
make sense (to me). If you want more control, you need to seed them
manually.

~ Johannes

>
> - Dave
>
>> ~ Johannes
>>
>>
>>> On Thu, Apr 22, 2021 at 9:23 AM Johannes Doerfert
>>> <johannesdoerfert at gmail.com> wrote:
>>>> On 4/22/21 11:05 AM, paul.robinson at sony.com wrote:
>>>>>> On 4/22/21 9:43 AM, paul.robinson at sony.com wrote:
>>>>>>>> That said, I believe it is a mistake that
`optnone` requires
>>>>>>>> `noinline`. There is no reason for it to do so
on the IR level.
>>>>>>>> If you argue C-level `optnone` should imply
`noinline`, that is
>>>>>>>> a something worth discussing, though on the IR
level we can
>>>>>>>> decouple them. Use case, for example, the
not-optimized version
>>>>>>>> is called from functions that are `optnone`
themselves while
>>>>>>>> other call sites are inlined and your function
is optimized. So
>>>>>>>> you can use the two attributes to do context
sensitive `optnone`.
>>>>>>> The original intent for `optnone` was to imitate
the -O0 pipeline
>>>>>>> to the extent that was feasible.  The -O0 pipeline
(as constructed
>>>>>>> by Clang) runs just the always-inliner, not the
regular inliner;
>>>>>>> so, functions marked `optnone` should not be
inlined.  The way
>>>>>>> to achieve that effect most simply is to have
`optnone` require
>>>>>>> `noinline` and that's what we did.
>>>>>>>
>>>>>>> If we have `optnone` stop requiring `noinline` and
teach the
>>>>>>> inliner to inline an `optnone` callee only into an
`optnone` caller,
>>>>>>> then we are violating the intent that `optnone`
imitate -O0, because
>>>>>>> that inlining would not have happened at -O0.
>>>>>> I think I initially read this wrong, hence the part
below.
>>>>>> After reading it again, I have one question: Why would
the
>>>>>> inliner inline something that is not `always_inline`
into
>>>>>> an `optnone` caller? That would violate the idea of
`optnone`,
>>>>>> IMHO, regardless if the callee is `optnone` or not.
That is
>>>>>> why I don't believe `noinline` on the callee is
necessary
>>>>>> for your use case.
>>>>> The inliner should be ignoring `optnone` callers, so it
would
>>>>> never inline *anything* into an `optnone` caller.  (Other
than
>>>>> an `alwaysinline` function.)
>>>> My point is, it already does ignore `optnone` callers
>>>> and inlines only `alwaysinline` calls into them:
>>>>
>>>> https://clang.godbolt.org/z/fznbjTEd5
>>>>
>>>>
>>>>> I had read this:
>>>>>
>>>>>>>> I believe it is a mistake that `optnone`
requires `noinline`.
>>>>> and the case that came to mind is inlining an `optnone`
callee
>>>>> into a not-`optnone` caller.  The inlined copy would then
be
>>>>> treated to further optimization, which violates the idea of
>>>>> `optnone`.
>>>> But that is a composition issue. If you do not want to
>>>> inline a `optnone` callee into an non-`optnone` caller,
>>>> then add `noinline` to the callee. If you don't mind if
>>>> it is inlined into non-`optnone` callers and optimized
>>>> in there, then don't. My last email contained an example
>>>> to show the different cases, you can mix and match IR
>>>> attributes to get what you want. Requiring them to be
>>>> tied is not improving anything but just restricting the
>>>> options.
>>>>
>>>>
>>>>> Now, the inliner already knows to avoid `noinline` callees,
so
>>>>> attaching `noinline` to `optnone` functions was (at the
time)
>>>>> considered an optimal way to avoid the problematic case. 
We
>>>>> could instead teach the inliner to skip `optnone` callees,
and
>>>>> that would allow us to eliminate the requirement that
`optnone`
>>>>> functions must also be `noinline`.  I am unclear why
redefining
>>>>> `optnone` to _imply_ `noinline` (rather than _require_
`noinline`)
>>>>> is better, but then I don't work much with attributes.
>>>> The inliner will not inline calls into an `optnone` caller
>>>> if it is not necessary. As said before, that would violate
>>>> the `optnone` idea for the caller, no matter what the callee
>>>> looks like. So requiring  `noinline` on the callee seems
>>>> to me like a workaround or an oversight.
>>>>
>>>> It is better to not require them together because you can
>>>> actually describe more distinct scenarios. Please take
>>>> another look at my example in the last email, it shows
>>>> what is possible if you split them. Furthermore, `optnone`
>>>> does by design imply `noinline` for the call sites in the
>>>> caller, or at least nobody argued that it shouldn't. Thus,
>>>> requiring `noinline` on the callee is simply unnecessary
>>>> as it does not add any value.
>>>>
>>>>
>>>>> The notion of allowing an `optnone` caller to inline an
`optnone`
>>>>> callee sounds like it would also violate the intent of
`optnone`
>>>>> in that it should imitate -O0, where inlining is confined
to
>>>>> `alwaysinline` callees, and `optnone` is defined to
conflict with
>>>>> `alwaysinline` (because if you always inline something, you
are
>>>>> allowing it to have subsequent optimizations same as the
caller,
>>>>> which conflicts with `optnone`).
>>>> Nobody said `optnone` callers should inline calls that are
>>>> not always_inline, at least so far I have not seen that
>>>> argument be made anywhere. I'll just skip this paragraph.
>>>>
>>>>
>>>>> So, if you want to undo the _requirement_ that `optnone`
must
>>>>> have `noinline`, but then redefine `optnone` such that it
can't
>>>>> be inlined anywhere, you've done something that seems
to have no
>>>>> practical effect.  Maybe that helps Attributor in some way,
but
>>>>> I don't see any other reason to be making this change.
>>>> I do not want to say `optnone` cannot be inlined. `noinline`
>>>> says it cannot be inlined. If you want it to not be inlined,
>>>> use `noinline`, if you want it to not be optimized in it's
>>>> own function, use `optnone`, if you want both, use both.
>>>>
>>>> The practical effect was literally showcased in my last email,
>>>> please go back and look at the example.
>>>>
>>>> I don't know why the Attributor has to do with this,
I'm happy
>>>> to hear your thoughts on that though :)
>>>>
>>>> ~ Johannes
>>>>
>>>>
>>>>> --paulr
>>>>>

via llvm-dev

2021-Apr-22 18:41 UTC

head link

[llvm-dev] Revisiting/refining the definition of optnone with interprocedural transformations

I'll just say,
> If you want "separate non-LTO" behavior by design, put it in a
> different file and don't compile that file with LTO ;)
that is impractical in many build systems, and an excessive amount
of work if you want to selectively build a some code with -O0
because you're debugging something at the moment.
--paulr

llvm dev - Apr 2021 - Revisiting/refining the definition of optnone with interprocedural transformations

[llvm-dev] Revisiting/refining the definition of optnone with interprocedural transformations

[llvm-dev] Revisiting/refining the definition of optnone with interprocedural transformations