thr3ads.net - llvm dev - [llvm-dev] Metadata in LLVM back-end [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Lorenzo Casalino via llvm-dev

2020-Sep-15 09:31 UTC

[llvm-dev] Metadata in LLVM back-end

Am 08/09/20 um 17:57 schrieb David Greene:> Lorenzo Casalino <lorenzo.casalino93 at gmail.com> writes:
>
>> Am 31/08/20 um 14:10 schrieb David Greene:
>>> Lorenzo Casalino via llvm-dev <llvm-dev at lists.llvm.org>
writes:
>>>
>>>> Furthermore, after register allocation there is a
non-negligible effort
>>>> to properly annotate instructions which share the same output
register...
>>>>
>>>> Concerning the usage of the live ranges to tie annotated
instruction and
>>>> intrinsic, I have some doubts:
>>>>
>>>>  1. After register allocation, since metadata intrinsics are
skipped
>>>> (otherwise,     they would be involved in the register
allocation
>>>> process, increasing the     register pressure), the instruction
stream
>>>> would present both virtual and     physical registers, which I
am not
>>>> sure it is totally ok.
>>> They would have to participate in register allocation.
>> Should they? I mean: the register allocation "simply" creates
a map
>> (VirtReg -> PhysReg), and actual register re-writing takes place in
a
>> subsequent machine pass.
> Maybe they could be skipped?  I don't know if there's any precedent
for
> that.I think that they could be neglected, since they just carry information;
there's
no point in allocating physical registers for their unused
output.>> So, we could avoid their partecipation in register allocation,
>> reducing register pressure and spill/reload work. As a downside, we
>> would have intrinsics with virtual registers as outputs, but it is not
>> a problem, since they do not perform any real computation.
> If we can get that to work, yes I guess having no-op intrinsics with
> virtual registers would be ok.  I don't know how the backend post-RA
> would cope with that though.  There might be lots of asserts that assume
> physical registers.
Yes, if I recall correctly, there are a lot of check of the type of
register.
>>> If the intrinsics really shadow "real" instructions then
it should be
>>> possible to place them such that this is not an issue; for example,
you
>>> could place them immediately before the "real"
instruction.
>> I do not think this would be possible: before register allocation, code
is
>> SSA form, thus the annotated instruction *must* preceeds the intrinsic
>> annotating it.
> Oh yes of course.  Duh.  :)
>
>> An alternative is to place the annotating intrinsic before the
>> instruction who ends the specific live-range (not necessarely be an
>> immediate predecessor).
> I'm not sure exactly what you mean
I mean, to avoid artificial extension of the live-range, place the
annotating
intrinsic (I) before the instruction (K) that kills the live-range (but the
intrinsic (I) does not have to be an *immediate* predecessor of (K) in the
instruction stream).

For instance, assume to have the following SSA stream (I am using the
ARM Thumb2
MIR since I've been working mainly on that backend):

 #i %res = t2ANDrr %src_1_i, %src_2_i
    ...
 #j %null = llvm.metadata %a, (some metadata)
    ...
 #l %c = t2STRi12 %res, %stack_slot_res

Where instruction #l kills the live-range representing %res, and
instructions
#j is covered by the live-range of %res, which spans from #i to #l.

Giving a total ordering to the stream of instructions, #i <= #j <= #l.
As you can infer, intrinsic represented by instruction #j does not have
to be immediate predecessor of #l (that is, there can exist an
instruction #k
such that #j < #k < #l).

In such way, the live-range won't be extended (at least, in this trivial
case...)
>  but it strikes me just now that if
> the intrinsic is connected to the target instruction via the target
> instruction's output value, then putting the intrinsic right after the
> target instruction should not have any live range issues, unless the
> target instruction were truly dead, in which case the intrinsic would
> keep it alive.  But since the intrinsic would eventually go away, I
> assume we could eliminate the target instruction at the same time.
>
> If the target instruction output is used *somewhere* it has a live range
> and adding another use just after the def should not affect register
> allocation appreciably.
Yes! :D> It could of course affect spill choice
> heuristics like number of uses of a value but that's probably in the
> noise.
>
> It could, however, affect folding (e.g. mem operands) because a single
> use of a load would turn into two uses, preventing folding.  It's not
> clear to me whether you would *want* folding in your use-case since you
> apparently need to do something special with the load anyway.Uhm...yes, folding requires particular attention; but, in my project, I
avoided the problem by "disabling" folding, so I didn't care
really much
about that aspect.>> Just to point out a problem to cope with: instruction scheduling must
be
>> aware of this particular positioning of annotation intrinsics.
> Probably true.  This is a difficult problem, one I have dealt with.  If
> you want to keep two instructions "close" during scheduling it is
a real
> pain.  ScheduleDAG has a concept for "glue" nodes but it's
pretty hacky
> and difficult to maintain in the presence of upstream churn.  My initial
> attempt to avoid the need for codegen metadata took this approach and it
> was quite infeasible.  My second approach to hack in the information in
> other ways wasn't much more successful.  :(It is just only an idea, but could MI Bundles be profitably
employed?> I think we've uncovered a number of tricky issues when trying to encode
> metadata via intrinsics.  To me, at least, they clearly point to the
> need for a first-class solution and I think you agree with that too.
> Chris also seemed to at least give tentative support to the idea.
Yep!> I wonder if we're at the point of drafting an initial RFC for review.Uh, this a good question. To be honest, it would the first time for me.
For sure, we could start by pinpointing the main problems and challenges
-- that we identified -- that the employment of intrinsics would face.

>                      -David

David Greene via llvm-dev

2020-Sep-15 14:58 UTC

head link

[llvm-dev] Metadata in LLVM back-end

Lorenzo Casalino <lorenzo.casalino93 at gmail.com> writes:
>>> An alternative is to place the annotating intrinsic before the
>>> instruction who ends the specific live-range (not necessarely be an
>>> immediate predecessor).
>>
>> I'm not sure exactly what you mean
>
> I mean, to avoid artificial extension of the live-range, place the
> annotating intrinsic (I) before the instruction (K) that kills the
> live-range (but the intrinsic (I) does not have to be an *immediate*
> predecessor of (K) in the instruction stream).
Ok, got it.  Thanks!
>> It could, however, affect folding (e.g. mem operands) because a single
>> use of a load would turn into two uses, preventing folding.  It's
not
>> clear to me whether you would *want* folding in your use-case since you
>> apparently need to do something special with the load anyway.
>
> Uhm...yes, folding requires particular attention; but, in my project, I
> avoided the problem by "disabling" folding, so I didn't care
really much
> about that aspect.
That makes sense for your project but it is another case of intrinsics
causing problems for general use.
>>> Just to point out a problem to cope with: instruction scheduling
must be
>>> aware of this particular positioning of annotation intrinsics.
>>
>> Probably true.  This is a difficult problem, one I have dealt with.  If
>> you want to keep two instructions "close" during scheduling
it is a real
>> pain.  ScheduleDAG has a concept for "glue" nodes but
it's pretty hacky
>> and difficult to maintain in the presence of upstream churn.  My
initial
>> attempt to avoid the need for codegen metadata took this approach and
it
>> was quite infeasible.  My second approach to hack in the information in
>> other ways wasn't much more successful.  :(
>
> It is just only an idea, but could MI Bundles be profitably employed?
Possibly.  Those didn't exist when I did my work.
>> I wonder if we're at the point of drafting an initial RFC for
review.
>
> Uh, this a good question. To be honest, it would the first time for
> me.  For sure, we could start by pinpointing the main problems and
> challenges -- that we identified -- that the employment of intrinsics
> would face.
That's the place to start, I think.  Gather a list of requirements/use
cases along with the challenges we've discussed.  Then it's a matter of
engineering a solution that fulfills the requirements while hitting as
few of the challenges as possible.  Let's start by simply gathering some
lists.  I'll take a quick stab and you and others can add to/edit it.

Requirements
------------
- Convey information not readily available in existing IR constructs to
  very late-stage codegen (after regalloc/scheduling, right through
  asm/object emission)

- Flexible format - it should be as simple as possible to express the
  desired information while minimizing changes to APIs

- Preserve information by default, only drop if explicitly told (I'm
  trying to capture the requirements for your use-case here and this
  differs from IR-level metadata)

- No bifurcation between "well-known"/"built-in" information
and things
  added later/locally

- Should not impact compile time excessively (what is "excessive?")

Challenges of using intrinsics and other alternatives
-----------------------------------------------------
- Post-SSA annotation/how to associate intrinsics with
  instructions/registers/types

- Instruction selection fallout (inhibiting folding, etc.)

- Register allocation impacts (extending live ranges, etc.)

- Scheduling challenges (ensuring intrinsics can be found
  post-scheduling, etc.)

- Extending existing constructs (which ones?) requires hard-coding
  aspects of information, reducing flexibility

This is currently rather weasily-worded, because I didn't want to impose
too many restrictions right off the bat.

                  -David

Lorenzo Casalino via llvm-dev

2020-Oct-10 11:13 UTC

head link

[llvm-dev] Metadata in LLVM back-end

> That's the place to start, I think.  Gather a list of requirements/use
> cases along with the challenges we've discussed.  Then it's a
matter of
> engineering a solution that fulfills the requirements while hitting as
> few of the challenges as possible.  Let's start by simply gathering
some
> lists.  I'll take a quick stab and you and others can add to/edit it.
>
> Requirements
> ------------
> - Convey information not readily available in existing IR constructs to
>   very late-stage codegen (after regalloc/scheduling, right through
>   asm/object emission)
I see this more as the GOAL of the RFC, rather than a requirement.
> - Flexible format - it should be as simple as possible to express the
>   desired information while minimizing changes to APIsI do not want to raise a philosophical discussion (although, I would
find it quite interesting), but "flexible" does not necessarely mean
"simple".

We could split this requirement as:

- Flexible format - the format should be expressive enough to enable
modelization
  of *virtually* any kind of information type.

- Simple interface - expressing information and attaching them to MIR
elements (e.g.,
  instructions) should be "easy" (what does it mean *easy*?)
> - Preserve information by default, only drop if explicitly told (I'm
>   trying to capture the requirements for your use-case here and this
>   differs from IR-level metadata)What about giving to end-users the possibility to define a custom
defaultpolicy, as
well as the possibility to define different type of policies.

Further, we must cope with the combination of instructions: the
information associated
to two instructions eligible for combination, how are combined?

- Information transformation - the information associated to two
instruction A, B, which
  are combined into an instruction C, should be properly transformed
according to a
  user-specific policy.

  A default policy may be "assign both information of A and B to C"
(gather-all/assign-all
  policy?)
> - No bifurcation between "well-known"/"built-in"
information and things
>   added later/locally
May I ask you to elaborate a bit more about this point?> - Should not impact compile time excessively (what is
"excessive?")
Probably, such estimation should be performed on
 

What about the granularity level?

- Granularity level - metadata information should be attachable with
different
  level of granularity:

  - *Coarse*: MachineFunction level
  - *Medium*: MachineBasicBlock level
  - *Fine*:   MachineInstruction level

Clearly, there are other degree of granularity and/or dimensions to be
considered
(e.g., LiveInterval, MIBundles, Loops, ...).
> Challenges of using intrinsics and other alternatives
> -----------------------------------------------------
> - Post-SSA annotation/how to associate intrinsics with
>   instructions/registers/types
>
> - Instruction selection fallout (inhibiting folding, etc.)
>
> - Register allocation impacts (extending live ranges, etc.)
>
> - Scheduling challenges (ensuring intrinsics can be found
>   post-scheduling, etc.)
>
> - Extending existing constructs (which ones?) requires hard-coding
>   aspects of information, reducing flexibility
>
> This is currently rather weasily-worded, because I didn't want to
impose
> too many restrictions right off the bat.
>
>                   -David

Sorry for the long delay!

-- Lorenzo

llvm dev - Sep 2020 - Metadata in LLVM back-end

[llvm-dev] Metadata in LLVM back-end

[llvm-dev] Metadata in LLVM back-end

[llvm-dev] Metadata in LLVM back-end