thr3ads.net - llvm dev - [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca. [Dec 2018]

If this information is useful, please help other people find it:
Share via:

Andrea Di Biagio via llvm-dev

2018-Nov-27 17:08 UTC

[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Thanks for clarifying it Matt.

In general, I quite like your suggested design.

My only concern is about the semantic of the two new intrinsics. You design
doesn't allow mca ranges to span through multiple basic blocks. That
constraint is acceptable for now, since llvm-mca doesn't know how to deal
with control flow.
However, I am a bit concerned about what might happen in future if we
decide to let users specify code regions that span through multiple basic
blocks. Basically, I don't particularly like the idea of changing the
semantic of already existing intrinsic. A design that already accounts for
that particular scenario/future work would be ideal. That being said,
marking those new intrinsics as 'experimental' may be a good compromise
(at
least for now).

So, I am quite happy overall with the direction of this RFC.
However, I am interesting to hear from other developers about your
suggested design.
> This initial patch only targets ELF object files, and does not handlerelocatable addresses. Since the start of a code region is represented as an
assembly label, and referenced in the .mca_code_regions section, that
address
is relocatable.

This may be okay for now. However, it would be nice to remove that
constraint in future and add support to generic object files.

-Andrea

On Thu, Nov 22, 2018 at 7:21 PM <Matthew.Davis at sony.com> wrote:
> I want to clarify a few restrictions of llvm-mca code regions that this
> RFC proposes:
>
> 1) All llvm-mca code regions must start with an llvm.mca.code.region.start
> intrinsic and end with
> an llvm.mca.code.region.end intrinsic.  This rule is enforced at the IR
> level in the IR verifier.
>
> 2) llvm-mca code regions cannot nest.  This restriction implies that an
> llvm.mca.code.region.start
> must have a llvm.mca.code.region.end intrinsic without any other llvm.mca
> start intrinsics
> between the two. The current implementation in the patch enforces this
> restriction at the
> IR level via the IR Verifier.
>
> 3) An llvm-mca code region cannot span multiple basic blocks.  llvm-mca
> does not follow
> branches (yet).  Instead, a branch instruction is treated by llvm-mca like
> any other instruction.
> The current patch associated with this RFC does not enforce this
> restriction.  I plan on updating
> the patch to enforce that a code region can only belong to a single basic
> block.  This is a simple
> check, ensuring that both the llvm.mca.code.region.start and accompanying
> end intrinsics live
> in the same basic block. I imagine adding this check at the IR level when
> we also verify points 1 and 2
> above.  That will keep the code-region verification logic isolated to the
> IR verifier.  The start/end
> intrinsics should not have any uses, so I'm not sure that they would be
> moved/sunk on behalf
> of any other instruction.  In other words, I do not imagine that a start
> and end would be split
> apart due to later MI optimizations.  If I discover that such a case
> occurs, then I might add the
> basic-block check prior to emitting the code region data to the object
> file.    Once  llvm-mca  is
> updated to handle branches, then we can remove this constraint.
>
> -Matt
>
> > -----Original Message-----
> > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of
Matt
> Davis via llvm-
> > dev
> > Sent: Wednesday, November 21, 2018 8:47 AM
> > To: Andrea Di Biagio <andrea.dibiagio at gmail.com>
> > Cc: llvm-dev <llvm-dev at lists.llvm.org>; Di Biagio, Andrea
> > <Andrea.Dibiagio at sony.com>; cfe-dev at lists.llvm.org
> > Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to
> llvm-mca.
> >
> > Hi Andrea,
> >
> > Thanks for your input.
> >
> > On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio wrote:
> > [... snip ...]
> > > About the suggested design:
> > > I like the idea of being able to identify code regions using a
numeric
> > > identifier.
> > > However, what happens if a code region spans through multiple
basic
> blocks?
> >
> > The current patch does not take into consideration cases where the
> > region start and end intrinsics are placed in different basic blocks.
> > Such would be the case if a region is defined to span multiple blocks.
> > This would be similar to the current case where a user places a
> > #LLVM-MCA-BEGIN assembly comment in one block and an #LLVM-MCA-END in
> > another.  However, as you point out below, if the user does this in
the
> > source code via intrinsics (just what this patch is proposing), then
> > there is a chance that optimizations might change the layout of the
> > instructions and confuse the ordering of the MCA intrinsics.
> >
> > Since MCA does not follow branches (MCA just treats a branch as it
would
> > a non-branching instruction), it seems that a user should be aware
that
> > defining MCA code regions that span multiple blocks might result in an
> > unexpected analysis.  While we do not discourage this, it seems like
> > such a case will probably not produce an expected result for the user.
> > We could introduce a warning, or automatically divide the regions so
> > that a single region can only contain a single block.
> >
> > > My understanding is that code regions are not allowed to overlap.
So,
> it
> > > makes sense if ` __mca_code_region_end()` doesn't take an ID
as input.
> > > However, what if ` __mca_code_region_end()` ends in a different
basic
> block?
> > >
> > > `__mca_code_region_start()` has to always dominate `
> > > __mca_code_region_end()`. This is trivial to verify when both
calls
> are in
> > > a same basic block; however, we need to make sure that the
> relationship is
> > > still the same when the `end()` call is in a different basic
block.
> > > That would not be enough. I think we should also verify  that `
> > > __mca_code_region_end()` always post-dominates the call to
> > > `__mca_code_region_start()`.
> >
> > In any case this patch should probably check dominance of the
> > intrinsics, even though MCA does not follow branches and MCA does not
> > not explicitly forbid a region from containing multiple blocks.
> >
> > >
> > > My question is: what happens with basic block reordering? We
don't
> know the
> > > layout of basic blocks until we reach code emission. How does it
work
> for
> > > regions that span through multiple basic blocks?. I think your
RFC
> should
> > > clarify this aspect.
> > >
> > > As a side note: at the moment, llvm-mca doesn't know how to
deal with
> > > branches. So, for simplicity we could force code regions to only
> contain
> > > instructions from a single basic block.
> > >
> > > However, In future we may want to teach llvm-mca how to analyze
branchy
> > > code too. For example, we could introduce a simple control-flow
> analysis in
> > > llvm-mca, and use an external "branch trace"
information (for example,
> a
> > > perf trace generated by an external tool) to decorate branches
with
> with
> > > branch probabilities (similarly to what we currently do in LLVM
with
> PGO).
> > > We could then use that knowledge to model branch prediction and
> simulate
> > > what happens in the presence of multiple branches.
> > >
> > > So, the idea of having regions that potentially span multiple
basic
> blocks
> > > is not bad in general. However, I think you should better clarify
what
> are
> > > the constraints (at least, you should answer to my questions from
> before).
> >
> > I agree! Thanks for pointing that out.
> >
> > > If we decide to use those new intrinsics, then those should be
> experimental
> > > (at least to start).
> >
> > Agreed.
> >
> > -Matt
> >
> > > On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <llvm-dev at
lists.llvm.org
> >
> > > wrote:
> > >
> > > > Introduction
> > > > -----------------
> > > > Currently llvm-mca only accepts assembly code as input. We
would
> like to
> > > > extend llvm-mca to support object files, allowing users to
analyze
> the
> > > > performance of binaries. The proposed changes (which involve
both
> > > > clang and llvm) optionally introduce an object file section,
but
> this can
> > > > be
> > > > stripped-out if desired.
> > > >
> > > > For the llvm-mca binary support feature to be useful, a user
needs
> to tell
> > > > llvm-mca which portions of their code they would like
analyzed.
> Currently,
> > > > this is accomplished via assembly comments. However,
assembly
> comments are
> > > > not
> > > > preserved in object files, and this has encouraged this RFC.
For the
> > > > proposed
> > > > binary support, we need to introduce changes to clang and
llvm to
> allow the
> > > > user's object code to be recognized by llvm-mca:
> > > >
> > > > * We need a way for a user to identify a region/block of
code they
> want
> > > >    analyzed by llvm-mca.
> > > > * We need the information defining the user's region of
code to be
> > > > maintained
> > > >    in the object file so that llvm-mca can analyze the
desired
> region(s)
> > > > from the
> > > >    object file.
> > > >
> > > > We define a "code region" as a subset of a
user's program that is to
> be
> > > > analyzed via llvm-mca. The sequence of instructions to be
analyzed is
> > > > represented as a pair: <start, end> where the
'start' marks the
> beginning
> > > > of
> > > > the user's source code and 'end' terminates the
sequence. The
> instructions
> > > > between 'start' and 'end' form the region
that can be analyzed by
> llvm-mca
> > > > at a
> > > > later time.
> > > >
> > > > Example
> > > > -----------
> > > > Before we go into the details of this proposed change,
let's first
> look at
> > > > a
> > > > simple example:
> > > >
> > > > // example.c -- Analyze a dot-product expression.
> > > > double test(double x, double y) {
> > > >    double result = 0.0;
> > > >    __mca_code_region_start(42);
> > > >    result += x * y;
> > > >    __mca_code_region_end();
> > > >    return result;
> > > > }
> > > >
> > > > In the example above, we have identified a code region, in
this case
> a
> > > > single
> > > > dot-product expression. For the sake of brevity and
simplicity, we've
> > > > chosen
> > > > a very simple example, but in reality a more complicated
example
> could use
> > > > multiple expressions. We have also denoted this region as
number 42.
> That
> > > > identifier is only for the user, and simplifies reading an
llvm-mca
> > > > analysis
> > > > report later.
> > > >
> > > > When this code is compiled, the region markers (the
mca_code_region
> > > > markers)
> > > > are transformed into assembly labels. While the markers are
> presented as
> > > > function calls, in reality they are no-ops.
> > > >
> > > > test:
> > > > pushq   %rbp
> > > > movq    %rsp, %rbp
> > > > movsd   %xmm0, -8(%rbp)
> > > > movsd   %xmm1, -16(%rbp)
> > > > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
> > > > xorps   %xmm0, %xmm0
> > > > movsd   %xmm0, -24(%rbp)
> > > > movsd   -8(%rbp), %xmm0
> > > > mulsd   -16(%rbp), %xmm0
> > > > addsd   -24(%rbp), %xmm0
> > > > movsd   %xmm0, -24(%rbp)
> > > > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
> > > > movsd   -24(%rbp), %xmm0
> > > > popq    %rbp
> > > > retq
> > > > .section        .mca_code_regions,"", at progbits
> > > > .quad   42
> > > > .quad   .Lmca_code_region_start_0
> > > > .quad   .Lmca_code_region_end_0-.Lmca_code_region_start_0
> > > >
> > > > The assembly has been trimmed to show the portions relevant
to this
> RFC.
> > > > Notice the labels enclose the user's defined region, and
that they
> > > > preserve the
> > > > user's arbitrary region identifier, the
ever-so-important region 42.
> > > >
> > > > In the object file section .mca_code_regions, we have noted
the
> user's
> > > > region
> > > > identifier (.quad 42), start address, and region size. A
more
> complicated
> > > > example can have multiple regions defined within a single
> .mca_code_regions
> > > > section. This section can be read by llvm-mca, allowing
llvm-mca to
> take
> > > > object files as input instead of assembly source.
> > > >
> > > > Details
> > > > ---------
> > > > We need a way for a user to identify a region/block of code
they want
> > > > analyzed
> > > > by llvm-mca. We solve this problem by introducing two
intrinsics
> that a
> > > > user can
> > > > specify, for identifying regions of code for analysis.
> > > >
> > > > The two intrinsics are: llvm.mca.code.regions.start and
> > > > llvm.mca.code.regions.end. A user can identify a code region
by
> inserting
> > > > the
> > > > mca_code_region_start and mca_code_region_end markers. These
are
> simply
> > > > clang builtins and are transformed into the aforementioned
intrinsics
> > > > during
> > > > compilation. The code between the intrinsics are what we
call "code
> > > > regions"
> > > > and are to be easily identifiable by llvm-mca; any code
between a
> start/end
> > > > pair can be analyzed by llvm-mca at a later time. A user can
define
> > > > multiple
> > > > non-overlapping code regions within their program.
> > > >
> > > > The llvm.mca.code.region.start intrinsic takes an integer
constant
> as its
> > > > only
> > > > argument. This argument is implemented as a metadata i32,
and is
> only used
> > > > when generating llvm-mca reports. This value allows a user
to more
> easily
> > > > identify a specific code region. llvm.mca.code.region.end
takes no
> > > > arguments.
> > > > Since we disallow nesting of regions, the first
'end' intrinsic
> lexically
> > > > following a 'start' intrinsic represents the end of
that code region.
> > > >
> > > > Now that we have a solution for identifying regions for
analysis, we
> now
> > > > need a
> > > > way for preserving that information to be read at a later
time. To
> > > > accomplish
> > > > this we propose adding a new section (.mca_code_regions) to
the
> object file
> > > > generated by llvm. During code generation, the start/end
intrinsics
> > > > described
> > > > above will be transformed into start/end labels in assembly.
When
> llvm
> > > > generates the object file from the user's code, these
start/end
> labels
> > > > form a
> > > > pair of values identifying the start of the user's code
region, and
> size.
> > > > The
> > > > size represents the number of bytes between the start and
end
> address of
> > > > the
> > > > labels. Note that the labels are emitted during assembly
printing.
> We hope
> > > > that these labels have no influence on code generation or
basic-block
> > > > placement. However, the target assembler strategy for
handling
> labels is
> > > > outside of our control.
> > > >
> > > > This proposed change affects the size of a binary, but only
if the
> user
> > > > calls
> > > > the start/end builtins mentioned above. The additional size
of the
> > > > .mca_code_regions section, which we imagine to be very small
(to the
> order
> > > > of a
> > > > few bytes), can trivially be stripped by tools like
'strip' or
> 'objcopy'.
> > > >
> > > > Implementation Status
> > > > ------------------------------
> > > > We currently have the proposed changes implemented at the
url posted
> below.
> > > > This initial patch only targets ELF object files, and does
not handle
> > > > relocatable addresses. Since the start of a code region is
> represented as
> > > > an
> > > > assembly label, and referenced in the .mca_code_regions
section, that
> > > > address
> > > > is relocatable. That value can be represented as
section-relative
> > > > relocatable
> > > > symbol (.text + addend), but we are not handling that case
yet.
> Instead,
> > > > the
> > > > proposed changes only handle linked/executable object files.
> > > >
> > > > For purposes of review and to communicate the idea, the
change is
> > > > presented as a monolithic patch here:
> > > >
> > > > https://reviews.llvm.org/D54603
> > > >
> > > > The change is presented as a monolithic patch; however, if
accepted
> > > > the patch will be split into three smaller patches:
> > > > 1. The introduction of the builtins to clang.
> > > > 2. The llvm portion (the added intrinsics).
> > > > 3. The llvm-mca portion.
> > > >
> > > > Thanks!
> > > >
> > > > -Matt
> > > > _______________________________________________
> > > > LLVM Developers mailing list
> > > > llvm-dev at lists.llvm.org
> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181127/e9afd5ed/attachment-0001.html>

Andrea Di Biagio via llvm-dev

2018-Dec-03 13:21 UTC

head link

[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

So, I have been thinking a bit more about this whole design.

The more I think about your suggested design, the more I am convinced that
we should do something more to support ranges in binary object files too.
My understanding is that the reason why we don't support object files in
general, is because of the presence of relocations. That is because a
region start marker is effectively symbol relative, and the symbol (a
function) would be relocated in the final executable.
You mentioned to me that resolving even a 'simple' symbol-relative
relocation is not trivial, beause it requires specific knowledge about the
binary format, and the target (i.e. how relocations are encoded is target
specific). I am surprised that there is not a utility library for resolving
relocations.. but I am not familiar with that part of the compiler. I was
hoping that there was a target specific interface to use in this case...

An alternative approach would require that you define your own
"symbol-relative" reference. After all, ranges are just a sequences of
instructions in a function. If a function symbol is described by the symbol
table, then you should be able to obtain its offset in the .text section.
So, you could potentially encode your own symbol+offset. However, the
linker would not be able to understand your "custom relocation", and
information about regions in the final elf would be basically broken.
So,that would not be a solution...

I don't know honestly what is the best approach to use in this case.
As a compromise, it would not be a bad idea to add the ability to specify
ranges from command line. What do you think?
Still, from a user point of view, the idea that we don't support object
files in general sounds like a big limitation.

About the new experimental intrinsics: those would definitely work well for
the simple case where instructions are from the same basic block.
However, some/most of the constraints that you plan to add will have to
change if in future we decide to allow ranges that potentially cross
multiple basic blocks. How will the rules/constraints on those new
intrinsics change? I just want to make sure that the suggested design is
future-proof.

-Andrea

On Tue, Nov 27, 2018 at 5:08 PM Andrea Di Biagio <andrea.dibiagio at
gmail.com>
wrote:
> Thanks for clarifying it Matt.
>
> In general, I quite like your suggested design.
>
> My only concern is about the semantic of the two new intrinsics. You
> design doesn't allow mca ranges to span through multiple basic blocks.
That
> constraint is acceptable for now, since llvm-mca doesn't know how to
deal
> with control flow.
> However, I am a bit concerned about what might happen in future if we
> decide to let users specify code regions that span through multiple basic
> blocks. Basically, I don't particularly like the idea of changing the
> semantic of already existing intrinsic. A design that already accounts for
> that particular scenario/future work would be ideal. That being said,
> marking those new intrinsics as 'experimental' may be a good
compromise (at
> least for now).
>
> So, I am quite happy overall with the direction of this RFC.
> However, I am interesting to hear from other developers about your
> suggested design.
>
> > This initial patch only targets ELF object files, and does not handle
> relocatable addresses. Since the start of a code region is represented as
> an
> assembly label, and referenced in the .mca_code_regions section, that
> address
> is relocatable.
>
> This may be okay for now. However, it would be nice to remove that
> constraint in future and add support to generic object files.
>
> -Andrea
>
> On Thu, Nov 22, 2018 at 7:21 PM <Matthew.Davis at sony.com> wrote:
>
>> I want to clarify a few restrictions of llvm-mca code regions that this
>> RFC proposes:
>>
>> 1) All llvm-mca code regions must start with an
>> llvm.mca.code.region.start intrinsic and end with
>> an llvm.mca.code.region.end intrinsic.  This rule is enforced at the IR
>> level in the IR verifier.
>>
>> 2) llvm-mca code regions cannot nest.  This restriction implies that an
>> llvm.mca.code.region.start
>> must have a llvm.mca.code.region.end intrinsic without any other
llvm.mca
>> start intrinsics
>> between the two. The current implementation in the patch enforces this
>> restriction at the
>> IR level via the IR Verifier.
>>
>> 3) An llvm-mca code region cannot span multiple basic blocks.  llvm-mca
>> does not follow
>> branches (yet).  Instead, a branch instruction is treated by llvm-mca
>> like any other instruction.
>> The current patch associated with this RFC does not enforce this
>> restriction.  I plan on updating
>> the patch to enforce that a code region can only belong to a single
basic
>> block.  This is a simple
>> check, ensuring that both the llvm.mca.code.region.start and
accompanying
>> end intrinsics live
>> in the same basic block. I imagine adding this check at the IR level
when
>> we also verify points 1 and 2
>> above.  That will keep the code-region verification logic isolated to
the
>> IR verifier.  The start/end
>> intrinsics should not have any uses, so I'm not sure that they
would be
>> moved/sunk on behalf
>> of any other instruction.  In other words, I do not imagine that a
start
>> and end would be split
>> apart due to later MI optimizations.  If I discover that such a case
>> occurs, then I might add the
>> basic-block check prior to emitting the code region data to the object
>> file.    Once  llvm-mca  is
>> updated to handle branches, then we can remove this constraint.
>>
>> -Matt
>>
>> > -----Original Message-----
>> > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On
Behalf Of Matt
>> Davis via llvm-
>> > dev
>> > Sent: Wednesday, November 21, 2018 8:47 AM
>> > To: Andrea Di Biagio <andrea.dibiagio at gmail.com>
>> > Cc: llvm-dev <llvm-dev at lists.llvm.org>; Di Biagio, Andrea
>> > <Andrea.Dibiagio at sony.com>; cfe-dev at lists.llvm.org
>> > Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to
>> llvm-mca.
>> >
>> > Hi Andrea,
>> >
>> > Thanks for your input.
>> >
>> > On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio wrote:
>> > [... snip ...]
>> > > About the suggested design:
>> > > I like the idea of being able to identify code regions using
a numeric
>> > > identifier.
>> > > However, what happens if a code region spans through multiple
basic
>> blocks?
>> >
>> > The current patch does not take into consideration cases where the
>> > region start and end intrinsics are placed in different basic
blocks.
>> > Such would be the case if a region is defined to span multiple
blocks.
>> > This would be similar to the current case where a user places a
>> > #LLVM-MCA-BEGIN assembly comment in one block and an #LLVM-MCA-END
in
>> > another.  However, as you point out below, if the user does this
in the
>> > source code via intrinsics (just what this patch is proposing),
then
>> > there is a chance that optimizations might change the layout of
the
>> > instructions and confuse the ordering of the MCA intrinsics.
>> >
>> > Since MCA does not follow branches (MCA just treats a branch as it
would
>> > a non-branching instruction), it seems that a user should be aware
that
>> > defining MCA code regions that span multiple blocks might result
in an
>> > unexpected analysis.  While we do not discourage this, it seems
like
>> > such a case will probably not produce an expected result for the
user.
>> > We could introduce a warning, or automatically divide the regions
so
>> > that a single region can only contain a single block.
>> >
>> > > My understanding is that code regions are not allowed to
overlap. So,
>> it
>> > > makes sense if ` __mca_code_region_end()` doesn't take an
ID as input.
>> > > However, what if ` __mca_code_region_end()` ends in a
different basic
>> block?
>> > >
>> > > `__mca_code_region_start()` has to always dominate `
>> > > __mca_code_region_end()`. This is trivial to verify when both
calls
>> are in
>> > > a same basic block; however, we need to make sure that the
>> relationship is
>> > > still the same when the `end()` call is in a different basic
block.
>> > > That would not be enough. I think we should also verify  that
`
>> > > __mca_code_region_end()` always post-dominates the call to
>> > > `__mca_code_region_start()`.
>> >
>> > In any case this patch should probably check dominance of the
>> > intrinsics, even though MCA does not follow branches and MCA does
not
>> > not explicitly forbid a region from containing multiple blocks.
>> >
>> > >
>> > > My question is: what happens with basic block reordering? We
don't
>> know the
>> > > layout of basic blocks until we reach code emission. How does
it work
>> for
>> > > regions that span through multiple basic blocks?. I think
your RFC
>> should
>> > > clarify this aspect.
>> > >
>> > > As a side note: at the moment, llvm-mca doesn't know how
to deal with
>> > > branches. So, for simplicity we could force code regions to
only
>> contain
>> > > instructions from a single basic block.
>> > >
>> > > However, In future we may want to teach llvm-mca how to
analyze
>> branchy
>> > > code too. For example, we could introduce a simple
control-flow
>> analysis in
>> > > llvm-mca, and use an external "branch trace"
information (for
>> example, a
>> > > perf trace generated by an external tool) to decorate
branches with
>> with
>> > > branch probabilities (similarly to what we currently do in
LLVM with
>> PGO).
>> > > We could then use that knowledge to model branch prediction
and
>> simulate
>> > > what happens in the presence of multiple branches.
>> > >
>> > > So, the idea of having regions that potentially span multiple
basic
>> blocks
>> > > is not bad in general. However, I think you should better
clarify
>> what are
>> > > the constraints (at least, you should answer to my questions
from
>> before).
>> >
>> > I agree! Thanks for pointing that out.
>> >
>> > > If we decide to use those new intrinsics, then those should
be
>> experimental
>> > > (at least to start).
>> >
>> > Agreed.
>> >
>> > -Matt
>> >
>> > > On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <
>> llvm-dev at lists.llvm.org>
>> > > wrote:
>> > >
>> > > > Introduction
>> > > > -----------------
>> > > > Currently llvm-mca only accepts assembly code as input.
We would
>> like to
>> > > > extend llvm-mca to support object files, allowing users
to analyze
>> the
>> > > > performance of binaries. The proposed changes (which
involve both
>> > > > clang and llvm) optionally introduce an object file
section, but
>> this can
>> > > > be
>> > > > stripped-out if desired.
>> > > >
>> > > > For the llvm-mca binary support feature to be useful, a
user needs
>> to tell
>> > > > llvm-mca which portions of their code they would like
analyzed.
>> Currently,
>> > > > this is accomplished via assembly comments. However,
assembly
>> comments are
>> > > > not
>> > > > preserved in object files, and this has encouraged this
RFC. For the
>> > > > proposed
>> > > > binary support, we need to introduce changes to clang
and llvm to
>> allow the
>> > > > user's object code to be recognized by llvm-mca:
>> > > >
>> > > > * We need a way for a user to identify a region/block of
code they
>> want
>> > > >    analyzed by llvm-mca.
>> > > > * We need the information defining the user's region
of code to be
>> > > > maintained
>> > > >    in the object file so that llvm-mca can analyze the
desired
>> region(s)
>> > > > from the
>> > > >    object file.
>> > > >
>> > > > We define a "code region" as a subset of a
user's program that is
>> to be
>> > > > analyzed via llvm-mca. The sequence of instructions to
be analyzed
>> is
>> > > > represented as a pair: <start, end> where the
'start' marks the
>> beginning
>> > > > of
>> > > > the user's source code and 'end' terminates
the sequence. The
>> instructions
>> > > > between 'start' and 'end' form the
region that can be analyzed by
>> llvm-mca
>> > > > at a
>> > > > later time.
>> > > >
>> > > > Example
>> > > > -----------
>> > > > Before we go into the details of this proposed change,
let's first
>> look at
>> > > > a
>> > > > simple example:
>> > > >
>> > > > // example.c -- Analyze a dot-product expression.
>> > > > double test(double x, double y) {
>> > > >    double result = 0.0;
>> > > >    __mca_code_region_start(42);
>> > > >    result += x * y;
>> > > >    __mca_code_region_end();
>> > > >    return result;
>> > > > }
>> > > >
>> > > > In the example above, we have identified a code region,
in this
>> case a
>> > > > single
>> > > > dot-product expression. For the sake of brevity and
simplicity,
>> we've
>> > > > chosen
>> > > > a very simple example, but in reality a more complicated
example
>> could use
>> > > > multiple expressions. We have also denoted this region
as number
>> 42. That
>> > > > identifier is only for the user, and simplifies reading
an llvm-mca
>> > > > analysis
>> > > > report later.
>> > > >
>> > > > When this code is compiled, the region markers (the
mca_code_region
>> > > > markers)
>> > > > are transformed into assembly labels. While the markers
are
>> presented as
>> > > > function calls, in reality they are no-ops.
>> > > >
>> > > > test:
>> > > > pushq   %rbp
>> > > > movq    %rsp, %rbp
>> > > > movsd   %xmm0, -8(%rbp)
>> > > > movsd   %xmm1, -16(%rbp)
>> > > > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
>> > > > xorps   %xmm0, %xmm0
>> > > > movsd   %xmm0, -24(%rbp)
>> > > > movsd   -8(%rbp), %xmm0
>> > > > mulsd   -16(%rbp), %xmm0
>> > > > addsd   -24(%rbp), %xmm0
>> > > > movsd   %xmm0, -24(%rbp)
>> > > > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
>> > > > movsd   -24(%rbp), %xmm0
>> > > > popq    %rbp
>> > > > retq
>> > > > .section        .mca_code_regions,"", at
progbits
>> > > > .quad   42
>> > > > .quad   .Lmca_code_region_start_0
>> > > > .quad  
.Lmca_code_region_end_0-.Lmca_code_region_start_0
>> > > >
>> > > > The assembly has been trimmed to show the portions
relevant to this
>> RFC.
>> > > > Notice the labels enclose the user's defined region,
and that they
>> > > > preserve the
>> > > > user's arbitrary region identifier, the
ever-so-important region 42.
>> > > >
>> > > > In the object file section .mca_code_regions, we have
noted the
>> user's
>> > > > region
>> > > > identifier (.quad 42), start address, and region size. A
more
>> complicated
>> > > > example can have multiple regions defined within a
single
>> .mca_code_regions
>> > > > section. This section can be read by llvm-mca, allowing
llvm-mca to
>> take
>> > > > object files as input instead of assembly source.
>> > > >
>> > > > Details
>> > > > ---------
>> > > > We need a way for a user to identify a region/block of
code they
>> want
>> > > > analyzed
>> > > > by llvm-mca. We solve this problem by introducing two
intrinsics
>> that a
>> > > > user can
>> > > > specify, for identifying regions of code for analysis.
>> > > >
>> > > > The two intrinsics are: llvm.mca.code.regions.start and
>> > > > llvm.mca.code.regions.end. A user can identify a code
region by
>> inserting
>> > > > the
>> > > > mca_code_region_start and mca_code_region_end markers.
These are
>> simply
>> > > > clang builtins and are transformed into the
aforementioned
>> intrinsics
>> > > > during
>> > > > compilation. The code between the intrinsics are what we
call "code
>> > > > regions"
>> > > > and are to be easily identifiable by llvm-mca; any code
between a
>> start/end
>> > > > pair can be analyzed by llvm-mca at a later time. A user
can define
>> > > > multiple
>> > > > non-overlapping code regions within their program.
>> > > >
>> > > > The llvm.mca.code.region.start intrinsic takes an
integer constant
>> as its
>> > > > only
>> > > > argument. This argument is implemented as a metadata
i32, and is
>> only used
>> > > > when generating llvm-mca reports. This value allows a
user to more
>> easily
>> > > > identify a specific code region.
llvm.mca.code.region.end takes no
>> > > > arguments.
>> > > > Since we disallow nesting of regions, the first
'end' intrinsic
>> lexically
>> > > > following a 'start' intrinsic represents the end
of that code
>> region.
>> > > >
>> > > > Now that we have a solution for identifying regions for
analysis,
>> we now
>> > > > need a
>> > > > way for preserving that information to be read at a
later time. To
>> > > > accomplish
>> > > > this we propose adding a new section (.mca_code_regions)
to the
>> object file
>> > > > generated by llvm. During code generation, the start/end
intrinsics
>> > > > described
>> > > > above will be transformed into start/end labels in
assembly. When
>> llvm
>> > > > generates the object file from the user's code,
these start/end
>> labels
>> > > > form a
>> > > > pair of values identifying the start of the user's
code region, and
>> size.
>> > > > The
>> > > > size represents the number of bytes between the start
and end
>> address of
>> > > > the
>> > > > labels. Note that the labels are emitted during assembly
printing.
>> We hope
>> > > > that these labels have no influence on code generation
or
>> basic-block
>> > > > placement. However, the target assembler strategy for
handling
>> labels is
>> > > > outside of our control.
>> > > >
>> > > > This proposed change affects the size of a binary, but
only if the
>> user
>> > > > calls
>> > > > the start/end builtins mentioned above. The additional
size of the
>> > > > .mca_code_regions section, which we imagine to be very
small (to
>> the order
>> > > > of a
>> > > > few bytes), can trivially be stripped by tools like
'strip' or
>> 'objcopy'.
>> > > >
>> > > > Implementation Status
>> > > > ------------------------------
>> > > > We currently have the proposed changes implemented at
the url
>> posted below.
>> > > > This initial patch only targets ELF object files, and
does not
>> handle
>> > > > relocatable addresses. Since the start of a code region
is
>> represented as
>> > > > an
>> > > > assembly label, and referenced in the .mca_code_regions
section,
>> that
>> > > > address
>> > > > is relocatable. That value can be represented as
section-relative
>> > > > relocatable
>> > > > symbol (.text + addend), but we are not handling that
case yet.
>> Instead,
>> > > > the
>> > > > proposed changes only handle linked/executable object
files.
>> > > >
>> > > > For purposes of review and to communicate the idea, the
change is
>> > > > presented as a monolithic patch here:
>> > > >
>> > > > https://reviews.llvm.org/D54603
>> > > >
>> > > > The change is presented as a monolithic patch; however,
if accepted
>> > > > the patch will be split into three smaller patches:
>> > > > 1. The introduction of the builtins to clang.
>> > > > 2. The llvm portion (the added intrinsics).
>> > > > 3. The llvm-mca portion.
>> > > >
>> > > > Thanks!
>> > > >
>> > > > -Matt
>> > > > _______________________________________________
>> > > > LLVM Developers mailing list
>> > > > llvm-dev at lists.llvm.org
>> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> > > >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181203/b110626f/attachment.html>

Matt Davis via llvm-dev

2018-Dec-03 22:38 UTC

head link

[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Hi Andrea,

On Mon, Dec 03, 2018 at 01:21:33PM +0000, Andrea Di Biagio
wrote:> So, I have been thinking a bit more about this whole design.
> 
> The more I think about your suggested design, the more I am convinced that
> we should do something more to support ranges in binary object files too.
> My understanding is that the reason why we don't support object files
in
> general, is because of the presence of relocations. That is because a
> region start marker is effectively symbol relative, and the symbol (a
> function) would be relocated in the final executable.
> You mentioned to me that resolving even a 'simple' symbol-relative
> relocation is not trivial, beause it requires specific knowledge about the
> binary format, and the target (i.e. how relocations are encoded is target
> specific). I am surprised that there is not a utility library for resolving
> relocations.. but I am not familiar with that part of the compiler. I was
> hoping that there was a target specific interface to use in this case...
There might be a better way of resolving the relocs, but from what I saw
looking at llvm-objdump and other related tools, it seems that resolving
the relocated symbol is a target specific effort.  I also spent sometime
sniffing around ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp which also
performs the reloc resolution.  I should clarify that I too am not an
expert in llvm's utilities for performing symbol/reloc resolution, and
perhaps someone in the community can point me in the right direction.  I
can clearly see the reloc data in the object file via tools like
objdump; however, accessing the relocs via
llvm::object::ObjectFile::relocations() did not produce address values
that we could use (values of zero).

I was hoping that, for a first pass at this patch, supporting just
executables would be okay.  That keeps this initial patch set simple,
and hopefully will encourage others to take a peek at it, since it's
less daunting than what it might otherwise be.  Of course, there is the
concern that this initial patch will lock us into a design that will be
more complicated to unravel later.
> An alternative approach would require that you define your own
> "symbol-relative" reference. After all, ranges are just a
sequences of
> instructions in a function. If a function symbol is described by the symbol
> table, then you should be able to obtain its offset in the .text section.
> So, you could potentially encode your own symbol+offset. However, the
> linker would not be able to understand your "custom relocation",
and
> information about regions in the final elf would be basically broken.
> So,that would not be a solution...
> 
> I don't know honestly what is the best approach to use in this case.
> As a compromise, it would not be a bad idea to add the ability to specify
> ranges from command line. What do you think?
> Still, from a user point of view, the idea that we don't support object
> files in general sounds like a big limitation.
I agree, only supporting executables is a limitation.  However, I'd
like to land the base support now and add in the additional
features/support after this large patch set lands.  But I can see
where landing the whole thing entirely also makes sense.
> About the new experimental intrinsics: those would definitely work well for
> the simple case where instructions are from the same basic block.
> However, some/most of the constraints that you plan to add will have to
> change if in future we decide to allow ranges that potentially cross
> multiple basic blocks. How will the rules/constraints on those new
> intrinsics change? I just want to make sure that the suggested design is
> future-proof.
Since the llvm/clang parts of the code are just responsible for
collecting where a range starts/ends, I hope that we can remove some
of the baked-in constraints that are specified in IR/Verifier.cpp.
As you pointed out earlier in this thread, we might want to
introduce a dominance check if/when we lift the one-basic-block
restriction. 

-Matt
> 
> -Andrea
> 
> On Tue, Nov 27, 2018 at 5:08 PM Andrea Di Biagio <andrea.dibiagio at
gmail.com>
> wrote:
> 
> > Thanks for clarifying it Matt.
> >
> > In general, I quite like your suggested design.
> >
> > My only concern is about the semantic of the two new intrinsics. You
> > design doesn't allow mca ranges to span through multiple basic
blocks. That
> > constraint is acceptable for now, since llvm-mca doesn't know how
to deal
> > with control flow.
> > However, I am a bit concerned about what might happen in future if we
> > decide to let users specify code regions that span through multiple
basic
> > blocks. Basically, I don't particularly like the idea of changing
the
> > semantic of already existing intrinsic. A design that already accounts
for
> > that particular scenario/future work would be ideal. That being said,
> > marking those new intrinsics as 'experimental' may be a good
compromise (at
> > least for now).
> >
> > So, I am quite happy overall with the direction of this RFC.
> > However, I am interesting to hear from other developers about your
> > suggested design.
> >
> > > This initial patch only targets ELF object files, and does not
handle
> > relocatable addresses. Since the start of a code region is represented
as
> > an
> > assembly label, and referenced in the .mca_code_regions section, that
> > address
> > is relocatable.
> >
> > This may be okay for now. However, it would be nice to remove that
> > constraint in future and add support to generic object files.
> >
> > -Andrea
> >
> > On Thu, Nov 22, 2018 at 7:21 PM <Matthew.Davis at sony.com>
wrote:
> >
> >> I want to clarify a few restrictions of llvm-mca code regions that
this
> >> RFC proposes:
> >>
> >> 1) All llvm-mca code regions must start with an
> >> llvm.mca.code.region.start intrinsic and end with
> >> an llvm.mca.code.region.end intrinsic.  This rule is enforced at
the IR
> >> level in the IR verifier.
> >>
> >> 2) llvm-mca code regions cannot nest.  This restriction implies
that an
> >> llvm.mca.code.region.start
> >> must have a llvm.mca.code.region.end intrinsic without any other
llvm.mca
> >> start intrinsics
> >> between the two. The current implementation in the patch enforces
this
> >> restriction at the
> >> IR level via the IR Verifier.
> >>
> >> 3) An llvm-mca code region cannot span multiple basic blocks. 
llvm-mca
> >> does not follow
> >> branches (yet).  Instead, a branch instruction is treated by
llvm-mca
> >> like any other instruction.
> >> The current patch associated with this RFC does not enforce this
> >> restriction.  I plan on updating
> >> the patch to enforce that a code region can only belong to a
single basic
> >> block.  This is a simple
> >> check, ensuring that both the llvm.mca.code.region.start and
accompanying
> >> end intrinsics live
> >> in the same basic block. I imagine adding this check at the IR
level when
> >> we also verify points 1 and 2
> >> above.  That will keep the code-region verification logic isolated
to the
> >> IR verifier.  The start/end
> >> intrinsics should not have any uses, so I'm not sure that they
would be
> >> moved/sunk on behalf
> >> of any other instruction.  In other words, I do not imagine that a
start
> >> and end would be split
> >> apart due to later MI optimizations.  If I discover that such a
case
> >> occurs, then I might add the
> >> basic-block check prior to emitting the code region data to the
object
> >> file.    Once  llvm-mca  is
> >> updated to handle branches, then we can remove this constraint.
> >>
> >> -Matt
> >>
> >> > -----Original Message-----
> >> > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On
Behalf Of Matt
> >> Davis via llvm-
> >> > dev
> >> > Sent: Wednesday, November 21, 2018 8:47 AM
> >> > To: Andrea Di Biagio <andrea.dibiagio at gmail.com>
> >> > Cc: llvm-dev <llvm-dev at lists.llvm.org>; Di Biagio,
Andrea
> >> > <Andrea.Dibiagio at sony.com>; cfe-dev at
lists.llvm.org
> >> > Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support
to
> >> llvm-mca.
> >> >
> >> > Hi Andrea,
> >> >
> >> > Thanks for your input.
> >> >
> >> > On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio
wrote:
> >> > [... snip ...]
> >> > > About the suggested design:
> >> > > I like the idea of being able to identify code regions
using a numeric
> >> > > identifier.
> >> > > However, what happens if a code region spans through
multiple basic
> >> blocks?
> >> >
> >> > The current patch does not take into consideration cases
where the
> >> > region start and end intrinsics are placed in different basic
blocks.
> >> > Such would be the case if a region is defined to span
multiple blocks.
> >> > This would be similar to the current case where a user places
a
> >> > #LLVM-MCA-BEGIN assembly comment in one block and an
#LLVM-MCA-END in
> >> > another.  However, as you point out below, if the user does
this in the
> >> > source code via intrinsics (just what this patch is
proposing), then
> >> > there is a chance that optimizations might change the layout
of the
> >> > instructions and confuse the ordering of the MCA intrinsics.
> >> >
> >> > Since MCA does not follow branches (MCA just treats a branch
as it would
> >> > a non-branching instruction), it seems that a user should be
aware that
> >> > defining MCA code regions that span multiple blocks might
result in an
> >> > unexpected analysis.  While we do not discourage this, it
seems like
> >> > such a case will probably not produce an expected result for
the user.
> >> > We could introduce a warning, or automatically divide the
regions so
> >> > that a single region can only contain a single block.
> >> >
> >> > > My understanding is that code regions are not allowed to
overlap. So,
> >> it
> >> > > makes sense if ` __mca_code_region_end()` doesn't
take an ID as input.
> >> > > However, what if ` __mca_code_region_end()` ends in a
different basic
> >> block?
> >> > >
> >> > > `__mca_code_region_start()` has to always dominate `
> >> > > __mca_code_region_end()`. This is trivial to verify when
both calls
> >> are in
> >> > > a same basic block; however, we need to make sure that
the
> >> relationship is
> >> > > still the same when the `end()` call is in a different
basic block.
> >> > > That would not be enough. I think we should also verify 
that `
> >> > > __mca_code_region_end()` always post-dominates the call
to
> >> > > `__mca_code_region_start()`.
> >> >
> >> > In any case this patch should probably check dominance of the
> >> > intrinsics, even though MCA does not follow branches and MCA
does not
> >> > not explicitly forbid a region from containing multiple
blocks.
> >> >
> >> > >
> >> > > My question is: what happens with basic block
reordering? We don't
> >> know the
> >> > > layout of basic blocks until we reach code emission. How
does it work
> >> for
> >> > > regions that span through multiple basic blocks?. I
think your RFC
> >> should
> >> > > clarify this aspect.
> >> > >
> >> > > As a side note: at the moment, llvm-mca doesn't know
how to deal with
> >> > > branches. So, for simplicity we could force code regions
to only
> >> contain
> >> > > instructions from a single basic block.
> >> > >
> >> > > However, In future we may want to teach llvm-mca how to
analyze
> >> branchy
> >> > > code too. For example, we could introduce a simple
control-flow
> >> analysis in
> >> > > llvm-mca, and use an external "branch trace"
information (for
> >> example, a
> >> > > perf trace generated by an external tool) to decorate
branches with
> >> with
> >> > > branch probabilities (similarly to what we currently do
in LLVM with
> >> PGO).
> >> > > We could then use that knowledge to model branch
prediction and
> >> simulate
> >> > > what happens in the presence of multiple branches.
> >> > >
> >> > > So, the idea of having regions that potentially span
multiple basic
> >> blocks
> >> > > is not bad in general. However, I think you should
better clarify
> >> what are
> >> > > the constraints (at least, you should answer to my
questions from
> >> before).
> >> >
> >> > I agree! Thanks for pointing that out.
> >> >
> >> > > If we decide to use those new intrinsics, then those
should be
> >> experimental
> >> > > (at least to start).
> >> >
> >> > Agreed.
> >> >
> >> > -Matt
> >> >
> >> > > On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <
> >> llvm-dev at lists.llvm.org>
> >> > > wrote:
> >> > >
> >> > > > Introduction
> >> > > > -----------------
> >> > > > Currently llvm-mca only accepts assembly code as
input. We would
> >> like to
> >> > > > extend llvm-mca to support object files, allowing
users to analyze
> >> the
> >> > > > performance of binaries. The proposed changes
(which involve both
> >> > > > clang and llvm) optionally introduce an object file
section, but
> >> this can
> >> > > > be
> >> > > > stripped-out if desired.
> >> > > >
> >> > > > For the llvm-mca binary support feature to be
useful, a user needs
> >> to tell
> >> > > > llvm-mca which portions of their code they would
like analyzed.
> >> Currently,
> >> > > > this is accomplished via assembly comments.
However, assembly
> >> comments are
> >> > > > not
> >> > > > preserved in object files, and this has encouraged
this RFC. For the
> >> > > > proposed
> >> > > > binary support, we need to introduce changes to
clang and llvm to
> >> allow the
> >> > > > user's object code to be recognized by
llvm-mca:
> >> > > >
> >> > > > * We need a way for a user to identify a
region/block of code they
> >> want
> >> > > >    analyzed by llvm-mca.
> >> > > > * We need the information defining the user's
region of code to be
> >> > > > maintained
> >> > > >    in the object file so that llvm-mca can analyze
the desired
> >> region(s)
> >> > > > from the
> >> > > >    object file.
> >> > > >
> >> > > > We define a "code region" as a subset of
a user's program that is
> >> to be
> >> > > > analyzed via llvm-mca. The sequence of instructions
to be analyzed
> >> is
> >> > > > represented as a pair: <start, end> where the
'start' marks the
> >> beginning
> >> > > > of
> >> > > > the user's source code and 'end'
terminates the sequence. The
> >> instructions
> >> > > > between 'start' and 'end' form the
region that can be analyzed by
> >> llvm-mca
> >> > > > at a
> >> > > > later time.
> >> > > >
> >> > > > Example
> >> > > > -----------
> >> > > > Before we go into the details of this proposed
change, let's first
> >> look at
> >> > > > a
> >> > > > simple example:
> >> > > >
> >> > > > // example.c -- Analyze a dot-product expression.
> >> > > > double test(double x, double y) {
> >> > > >    double result = 0.0;
> >> > > >    __mca_code_region_start(42);
> >> > > >    result += x * y;
> >> > > >    __mca_code_region_end();
> >> > > >    return result;
> >> > > > }
> >> > > >
> >> > > > In the example above, we have identified a code
region, in this
> >> case a
> >> > > > single
> >> > > > dot-product expression. For the sake of brevity and
simplicity,
> >> we've
> >> > > > chosen
> >> > > > a very simple example, but in reality a more
complicated example
> >> could use
> >> > > > multiple expressions. We have also denoted this
region as number
> >> 42. That
> >> > > > identifier is only for the user, and simplifies
reading an llvm-mca
> >> > > > analysis
> >> > > > report later.
> >> > > >
> >> > > > When this code is compiled, the region markers (the
mca_code_region
> >> > > > markers)
> >> > > > are transformed into assembly labels. While the
markers are
> >> presented as
> >> > > > function calls, in reality they are no-ops.
> >> > > >
> >> > > > test:
> >> > > > pushq   %rbp
> >> > > > movq    %rsp, %rbp
> >> > > > movsd   %xmm0, -8(%rbp)
> >> > > > movsd   %xmm1, -16(%rbp)
> >> > > > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
> >> > > > xorps   %xmm0, %xmm0
> >> > > > movsd   %xmm0, -24(%rbp)
> >> > > > movsd   -8(%rbp), %xmm0
> >> > > > mulsd   -16(%rbp), %xmm0
> >> > > > addsd   -24(%rbp), %xmm0
> >> > > > movsd   %xmm0, -24(%rbp)
> >> > > > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
> >> > > > movsd   -24(%rbp), %xmm0
> >> > > > popq    %rbp
> >> > > > retq
> >> > > > .section        .mca_code_regions,"", at
progbits
> >> > > > .quad   42
> >> > > > .quad   .Lmca_code_region_start_0
> >> > > > .quad  
.Lmca_code_region_end_0-.Lmca_code_region_start_0
> >> > > >
> >> > > > The assembly has been trimmed to show the portions
relevant to this
> >> RFC.
> >> > > > Notice the labels enclose the user's defined
region, and that they
> >> > > > preserve the
> >> > > > user's arbitrary region identifier, the
ever-so-important region 42.
> >> > > >
> >> > > > In the object file section .mca_code_regions, we
have noted the
> >> user's
> >> > > > region
> >> > > > identifier (.quad 42), start address, and region
size. A more
> >> complicated
> >> > > > example can have multiple regions defined within a
single
> >> .mca_code_regions
> >> > > > section. This section can be read by llvm-mca,
allowing llvm-mca to
> >> take
> >> > > > object files as input instead of assembly source.
> >> > > >
> >> > > > Details
> >> > > > ---------
> >> > > > We need a way for a user to identify a region/block
of code they
> >> want
> >> > > > analyzed
> >> > > > by llvm-mca. We solve this problem by introducing
two intrinsics
> >> that a
> >> > > > user can
> >> > > > specify, for identifying regions of code for
analysis.
> >> > > >
> >> > > > The two intrinsics are: llvm.mca.code.regions.start
and
> >> > > > llvm.mca.code.regions.end. A user can identify a
code region by
> >> inserting
> >> > > > the
> >> > > > mca_code_region_start and mca_code_region_end
markers. These are
> >> simply
> >> > > > clang builtins and are transformed into the
aforementioned
> >> intrinsics
> >> > > > during
> >> > > > compilation. The code between the intrinsics are
what we call "code
> >> > > > regions"
> >> > > > and are to be easily identifiable by llvm-mca; any
code between a
> >> start/end
> >> > > > pair can be analyzed by llvm-mca at a later time. A
user can define
> >> > > > multiple
> >> > > > non-overlapping code regions within their program.
> >> > > >
> >> > > > The llvm.mca.code.region.start intrinsic takes an
integer constant
> >> as its
> >> > > > only
> >> > > > argument. This argument is implemented as a
metadata i32, and is
> >> only used
> >> > > > when generating llvm-mca reports. This value allows
a user to more
> >> easily
> >> > > > identify a specific code region.
llvm.mca.code.region.end takes no
> >> > > > arguments.
> >> > > > Since we disallow nesting of regions, the first
'end' intrinsic
> >> lexically
> >> > > > following a 'start' intrinsic represents
the end of that code
> >> region.
> >> > > >
> >> > > > Now that we have a solution for identifying regions
for analysis,
> >> we now
> >> > > > need a
> >> > > > way for preserving that information to be read at a
later time. To
> >> > > > accomplish
> >> > > > this we propose adding a new section
(.mca_code_regions) to the
> >> object file
> >> > > > generated by llvm. During code generation, the
start/end intrinsics
> >> > > > described
> >> > > > above will be transformed into start/end labels in
assembly. When
> >> llvm
> >> > > > generates the object file from the user's code,
these start/end
> >> labels
> >> > > > form a
> >> > > > pair of values identifying the start of the
user's code region, and
> >> size.
> >> > > > The
> >> > > > size represents the number of bytes between the
start and end
> >> address of
> >> > > > the
> >> > > > labels. Note that the labels are emitted during
assembly printing.
> >> We hope
> >> > > > that these labels have no influence on code
generation or
> >> basic-block
> >> > > > placement. However, the target assembler strategy
for handling
> >> labels is
> >> > > > outside of our control.
> >> > > >
> >> > > > This proposed change affects the size of a binary,
but only if the
> >> user
> >> > > > calls
> >> > > > the start/end builtins mentioned above. The
additional size of the
> >> > > > .mca_code_regions section, which we imagine to be
very small (to
> >> the order
> >> > > > of a
> >> > > > few bytes), can trivially be stripped by tools like
'strip' or
> >> 'objcopy'.
> >> > > >
> >> > > > Implementation Status
> >> > > > ------------------------------
> >> > > > We currently have the proposed changes implemented
at the url
> >> posted below.
> >> > > > This initial patch only targets ELF object files,
and does not
> >> handle
> >> > > > relocatable addresses. Since the start of a code
region is
> >> represented as
> >> > > > an
> >> > > > assembly label, and referenced in the
.mca_code_regions section,
> >> that
> >> > > > address
> >> > > > is relocatable. That value can be represented as
section-relative
> >> > > > relocatable
> >> > > > symbol (.text + addend), but we are not handling
that case yet.
> >> Instead,
> >> > > > the
> >> > > > proposed changes only handle linked/executable
object files.
> >> > > >
> >> > > > For purposes of review and to communicate the idea,
the change is
> >> > > > presented as a monolithic patch here:
> >> > > >
> >> > > > https://reviews.llvm.org/D54603
> >> > > >
> >> > > > The change is presented as a monolithic patch;
however, if accepted
> >> > > > the patch will be split into three smaller patches:
> >> > > > 1. The introduction of the builtins to clang.
> >> > > > 2. The llvm portion (the added intrinsics).
> >> > > > 3. The llvm-mca portion.
> >> > > >
> >> > > > Thanks!
> >> > > >
> >> > > > -Matt
> >> > > > _______________________________________________
> >> > > > LLVM Developers mailing list
> >> > > > llvm-dev at lists.llvm.org
> >> > > >
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> > > >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > llvm-dev at lists.llvm.org
> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Dec 2018 - [RFC][llvm-mca] Adding binary support to llvm-mca.

[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Apparently Analagous Threads