thr3ads.net - llvm dev - [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca. [Dec 2018]

If this information is useful, please help other people find it:
Share via:

Simon Pilgrim via llvm-dev

2018-Dec-10 21:40 UTC

[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Hi Matt,

I can see a near future where perf-analysis tooling uses branch history 
profiler captures to determine how often loops/branches are taken and 
feeds that into llvm-mca, especially for hot/branchy loop analysis 
reports etc. Are you confident that your approach will be easily 
extendable for this?

Similarly, being able to generally embed the profile markers in object 
libraries for reuse is going to be important for some people - I'd like 
to see more of a plan of how this will be achieved. I understand that it 
might not be easy for some exe formats.

Sorry if I'm being too critical, but I'm a bit worried that we end up 
with an initial implementation that will take a lot of reworking to meet 
our final aims.

Thanks, Simon.

On 10/12/2018 19:32, Matt Davis wrote:> Thanks for the feedback Guillaume and Clement!
>
> In response to Clement:
>
>>> In terms of future-proofness of only allowing regions within a
basic
>>> block, are we confident we can actually ever simulate branches
apart from
>>> "always taken, perfectly predicated" loop ? Even this
simple need requires
>>> knowing quite a few details on the frontend. The current design
could
>>> handle this use case with the addition of an external "loop
mode" option to
>>> MCA. If there are no other strong use cases, I would advocate for
>>> experimental intrinsics unless people can contribute other example
use
>>> cases.
> In short, I am in agreement and think that handling of branching or loop
> constructs should be isolated to the llvm-mca driver/front-end.  The
> only thing the code regions should be concerned with is identifying
> blocks of instructions that will later be used by the front end.
>
> We can place limitations to how those blocks are formed. For example the
> current implementation forces regions to be isolated to a single basic
> block.  However, we anticipate lifting this restriction once branching
> is handled.
>
> -Matt
>
>
> On Mon, Dec 10, 2018 at 04:15:46PM +0100, Guillaume Chatelet wrote:
>> +1 to what Clement said.
>> I believe the intrinsics are a better design to support many
architectures.
>>
>> IACA users are probably decorating their code with IACA_START /
IACA_END
>> macros. One possibility is to provide a header that define these macros
in
>> terms of the new intrinsics.
>>
>> On Mon, Dec 10, 2018 at 3:59 PM Clement Courbet <courbet at
google.com> wrote:
>>
>>> Hi Matt/Andrea,
>>>
>>> I see pros and cons for IACA-style markers vs intrinsics.
>>> On the one hand, IACA-style markers are very magical, and not very
visible
>>> in both the source and object code. Using IACA-style markers has
the
>>> advantage that you can use llvm-mca as a drop-in replacement for
IACA, or
>>> even to compare their outputs on the exact same binary. They also
do not
>>> require tooling on the compiler side and allow comparing the output
of
>>> several compilers.
>>>
>> On the other hand, IACA-style markers do not have a equivalent on other
>>> architectures, and I'm not sure inventing new ones is a good
idea :) I
>>> think the latter makes them pretty much a no-go for llvm-mca as I
don't
>>> think we'll want to teach each target how to parse code
regions. That's
>>> much better handled in a target-agnostic way by the object. Intel
got away
>>> with them because they only had to support one architecture.
>>>
>>> tl;dr: In the case of llvm-mca, I like your design better than the
markers.
>>>
>>> In terms of future-proofness of only allowing regions within a
basic
>>> block, are we confident we can actually ever simulate branches
apart from
>>> "always taken, perfectly predicated" loop ? Even this
simple need requires
>>> knowing quite a few details on the frontend. The current design
could
>>> handle this use case with the addition of an external "loop
mode" option to
>>> MCA. If there are no other strong use cases, I would advocate for
>>> experimental intrinsics unless people can contribute other example
use
>>> cases.
>>>
>>> On Mon, Dec 3, 2018 at 11:38 PM Matt Davis <matthew.davis at
sony.com> wrote:
>>>
>>>> Hi Andrea,
>>>>
>>>> On Mon, Dec 03, 2018 at 01:21:33PM +0000, Andrea Di Biagio
wrote:
>>>>> So, I have been thinking a bit more about this whole
design.
>>>>>
>>>>> The more I think about your suggested design, the more I am
convinced
>>>> that
>>>>> we should do something more to support ranges in binary
object files
>>>> too.
>>>>> My understanding is that the reason why we don't
support object files in
>>>>> general, is because of the presence of relocations. That is
because a
>>>>> region start marker is effectively symbol relative, and the
symbol (a
>>>>> function) would be relocated in the final executable.
>>>>> You mentioned to me that resolving even a 'simple'
symbol-relative
>>>>> relocation is not trivial, beause it requires specific
knowledge about
>>>> the
>>>>> binary format, and the target (i.e. how relocations are
encoded is
>>>> target
>>>>> specific). I am surprised that there is not a utility
library for
>>>> resolving
>>>>> relocations.. but I am not familiar with that part of the
compiler. I
>>>> was
>>>>> hoping that there was a target specific interface to use in
this case...
>>>> There might be a better way of resolving the relocs, but from
what I saw
>>>> looking at llvm-objdump and other related tools, it seems that
resolving
>>>> the relocated symbol is a target specific effort.  I also spent
sometime
>>>> sniffing around ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp
which also
>>>> performs the reloc resolution.  I should clarify that I too am
not an
>>>> expert in llvm's utilities for performing symbol/reloc
resolution, and
>>>> perhaps someone in the community can point me in the right
direction.  I
>>>> can clearly see the reloc data in the object file via tools
like
>>>> objdump; however, accessing the relocs via
>>>> llvm::object::ObjectFile::relocations() did not produce address
values
>>>> that we could use (values of zero).
>>>>
>>>> I was hoping that, for a first pass at this patch, supporting
just
>>>> executables would be okay.  That keeps this initial patch set
simple,
>>>> and hopefully will encourage others to take a peek at it, since
it's
>>>> less daunting than what it might otherwise be.  Of course,
there is the
>>>> concern that this initial patch will lock us into a design that
will be
>>>> more complicated to unravel later.
>>>>
>>>>> An alternative approach would require that you define your
own
>>>>> "symbol-relative" reference. After all, ranges
are just a sequences of
>>>>> instructions in a function. If a function symbol is
described by the
>>>> symbol
>>>>> table, then you should be able to obtain its offset in the
.text
>>>> section.
>>>>> So, you could potentially encode your own symbol+offset.
However, the
>>>>> linker would not be able to understand your "custom
relocation", and
>>>>> information about regions in the final elf would be
basically broken.
>>>>> So,that would not be a solution...
>>>>>
>>>>> I don't know honestly what is the best approach to use
in this case.
>>>>> As a compromise, it would not be a bad idea to add the
ability to
>>>> specify
>>>>> ranges from command line. What do you think?
>>>>> Still, from a user point of view, the idea that we
don't support object
>>>>> files in general sounds like a big limitation.
>>>> I agree, only supporting executables is a limitation.  However,
I'd
>>>> like to land the base support now and add in the additional
>>>> features/support after this large patch set lands.  But I can
see
>>>> where landing the whole thing entirely also makes sense.
>>>>
>>>>> About the new experimental intrinsics: those would
definitely work well
>>>> for
>>>>> the simple case where instructions are from the same basic
block.
>>>>> However, some/most of the constraints that you plan to add
will have to
>>>>> change if in future we decide to allow ranges that
potentially cross
>>>>> multiple basic blocks. How will the rules/constraints on
those new
>>>>> intrinsics change? I just want to make sure that the
suggested design is
>>>>> future-proof.
>>>> Since the llvm/clang parts of the code are just responsible for
>>>> collecting where a range starts/ends, I hope that we can remove
some
>>>> of the baked-in constraints that are specified in
IR/Verifier.cpp.
>>>> As you pointed out earlier in this thread, we might want to
>>>> introduce a dominance check if/when we lift the one-basic-block
>>>> restriction.
>>>>
>>>> -Matt
>>>>
>>>>> -Andrea
>>>>>
>>>>> On Tue, Nov 27, 2018 at 5:08 PM Andrea Di Biagio <
>>>> andrea.dibiagio at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for clarifying it Matt.
>>>>>>
>>>>>> In general, I quite like your suggested design.
>>>>>>
>>>>>> My only concern is about the semantic of the two new
intrinsics. You
>>>>>> design doesn't allow mca ranges to span through
multiple basic
>>>> blocks. That
>>>>>> constraint is acceptable for now, since llvm-mca
doesn't know how to
>>>> deal
>>>>>> with control flow.
>>>>>> However, I am a bit concerned about what might happen
in future if we
>>>>>> decide to let users specify code regions that span
through multiple
>>>> basic
>>>>>> blocks. Basically, I don't particularly like the
idea of changing the
>>>>>> semantic of already existing intrinsic. A design that
already
>>>> accounts for
>>>>>> that particular scenario/future work would be ideal.
That being said,
>>>>>> marking those new intrinsics as 'experimental'
may be a good
>>>> compromise (at
>>>>>> least for now).
>>>>>>
>>>>>> So, I am quite happy overall with the direction of this
RFC.
>>>>>> However, I am interesting to hear from other developers
about your
>>>>>> suggested design.
>>>>>>
>>>>>>> This initial patch only targets ELF object files,
and does not
>>>> handle
>>>>>> relocatable addresses. Since the start of a code region
is
>>>> represented as
>>>>>> an
>>>>>> assembly label, and referenced in the .mca_code_regions
section, that
>>>>>> address
>>>>>> is relocatable.
>>>>>>
>>>>>> This may be okay for now. However, it would be nice to
remove that
>>>>>> constraint in future and add support to generic object
files.
>>>>>>
>>>>>> -Andrea
>>>>>>
>>>>>> On Thu, Nov 22, 2018 at 7:21 PM <Matthew.Davis at
sony.com> wrote:
>>>>>>
>>>>>>> I want to clarify a few restrictions of llvm-mca
code regions that
>>>> this
>>>>>>> RFC proposes:
>>>>>>>
>>>>>>> 1) All llvm-mca code regions must start with an
>>>>>>> llvm.mca.code.region.start intrinsic and end with
>>>>>>> an llvm.mca.code.region.end intrinsic.  This rule
is enforced at the
>>>> IR
>>>>>>> level in the IR verifier.
>>>>>>>
>>>>>>> 2) llvm-mca code regions cannot nest.  This
restriction implies that
>>>> an
>>>>>>> llvm.mca.code.region.start
>>>>>>> must have a llvm.mca.code.region.end intrinsic
without any other
>>>> llvm.mca
>>>>>>> start intrinsics
>>>>>>> between the two. The current implementation in the
patch enforces
>>>> this
>>>>>>> restriction at the
>>>>>>> IR level via the IR Verifier.
>>>>>>>
>>>>>>> 3) An llvm-mca code region cannot span multiple
basic blocks.
>>>> llvm-mca
>>>>>>> does not follow
>>>>>>> branches (yet).  Instead, a branch instruction is
treated by llvm-mca
>>>>>>> like any other instruction.
>>>>>>> The current patch associated with this RFC does not
enforce this
>>>>>>> restriction.  I plan on updating
>>>>>>> the patch to enforce that a code region can only
belong to a single
>>>> basic
>>>>>>> block.  This is a simple
>>>>>>> check, ensuring that both the
llvm.mca.code.region.start and
>>>> accompanying
>>>>>>> end intrinsics live
>>>>>>> in the same basic block. I imagine adding this
check at the IR level
>>>> when
>>>>>>> we also verify points 1 and 2
>>>>>>> above.  That will keep the code-region verification
logic isolated
>>>> to the
>>>>>>> IR verifier.  The start/end
>>>>>>> intrinsics should not have any uses, so I'm not
sure that they would
>>>> be
>>>>>>> moved/sunk on behalf
>>>>>>> of any other instruction.  In other words, I do not
imagine that a
>>>> start
>>>>>>> and end would be split
>>>>>>> apart due to later MI optimizations.  If I discover
that such a case
>>>>>>> occurs, then I might add the
>>>>>>> basic-block check prior to emitting the code region
data to the
>>>> object
>>>>>>> file.    Once  llvm-mca  is
>>>>>>> updated to handle branches, then we can remove this
constraint.
>>>>>>>
>>>>>>> -Matt
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: llvm-dev <llvm-dev-bounces at
lists.llvm.org> On Behalf Of Matt
>>>>>>> Davis via llvm-
>>>>>>>> dev
>>>>>>>> Sent: Wednesday, November 21, 2018 8:47 AM
>>>>>>>> To: Andrea Di Biagio <andrea.dibiagio at
gmail.com>
>>>>>>>> Cc: llvm-dev <llvm-dev at
lists.llvm.org>; Di Biagio, Andrea
>>>>>>>> <Andrea.Dibiagio at sony.com>; cfe-dev at
lists.llvm.org
>>>>>>>> Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding
binary support to
>>>>>>> llvm-mca.
>>>>>>>> Hi Andrea,
>>>>>>>>
>>>>>>>> Thanks for your input.
>>>>>>>>
>>>>>>>> On Wed, Nov 21, 2018 at 12:43:52PM +0000,
Andrea Di Biagio wrote:
>>>>>>>> [... snip ...]
>>>>>>>>> About the suggested design:
>>>>>>>>> I like the idea of being able to identify
code regions using a
>>>> numeric
>>>>>>>>> identifier.
>>>>>>>>> However, what happens if a code region
spans through multiple
>>>> basic
>>>>>>> blocks?
>>>>>>>> The current patch does not take into
consideration cases where the
>>>>>>>> region start and end intrinsics are placed in
different basic
>>>> blocks.
>>>>>>>> Such would be the case if a region is defined
to span multiple
>>>> blocks.
>>>>>>>> This would be similar to the current case where
a user places a
>>>>>>>> #LLVM-MCA-BEGIN assembly comment in one block
and an #LLVM-MCA-END
>>>> in
>>>>>>>> another.  However, as you point out below, if
the user does this
>>>> in the
>>>>>>>> source code via intrinsics (just what this
patch is proposing),
>>>> then
>>>>>>>> there is a chance that optimizations might
change the layout of the
>>>>>>>> instructions and confuse the ordering of the
MCA intrinsics.
>>>>>>>>
>>>>>>>> Since MCA does not follow branches (MCA just
treats a branch as it
>>>> would
>>>>>>>> a non-branching instruction), it seems that a
user should be aware
>>>> that
>>>>>>>> defining MCA code regions that span multiple
blocks might result
>>>> in an
>>>>>>>> unexpected analysis.  While we do not
discourage this, it seems
>>>> like
>>>>>>>> such a case will probably not produce an
expected result for the
>>>> user.
>>>>>>>> We could introduce a warning, or automatically
divide the regions
>>>> so
>>>>>>>> that a single region can only contain a single
block.
>>>>>>>>
>>>>>>>>> My understanding is that code regions are
not allowed to
>>>> overlap. So,
>>>>>>> it
>>>>>>>>> makes sense if ` __mca_code_region_end()`
doesn't take an ID as
>>>> input.
>>>>>>>>> However, what if ` __mca_code_region_end()`
ends in a different
>>>> basic
>>>>>>> block?
>>>>>>>>> `__mca_code_region_start()` has to always
dominate `
>>>>>>>>> __mca_code_region_end()`. This is trivial
to verify when both
>>>> calls
>>>>>>> are in
>>>>>>>>> a same basic block; however, we need to
make sure that the
>>>>>>> relationship is
>>>>>>>>> still the same when the `end()` call is in
a different basic
>>>> block.
>>>>>>>>> That would not be enough. I think we should
also verify  that `
>>>>>>>>> __mca_code_region_end()` always
post-dominates the call to
>>>>>>>>> `__mca_code_region_start()`.
>>>>>>>> In any case this patch should probably check
dominance of the
>>>>>>>> intrinsics, even though MCA does not follow
branches and MCA does
>>>> not
>>>>>>>> not explicitly forbid a region from containing
multiple blocks.
>>>>>>>>
>>>>>>>>> My question is: what happens with basic
block reordering? We
>>>> don't
>>>>>>> know the
>>>>>>>>> layout of basic blocks until we reach code
emission. How does it
>>>> work
>>>>>>> for
>>>>>>>>> regions that span through multiple basic
blocks?. I think your
>>>> RFC
>>>>>>> should
>>>>>>>>> clarify this aspect.
>>>>>>>>>
>>>>>>>>> As a side note: at the moment, llvm-mca
doesn't know how to deal
>>>> with
>>>>>>>>> branches. So, for simplicity we could force
code regions to only
>>>>>>> contain
>>>>>>>>> instructions from a single basic block.
>>>>>>>>>
>>>>>>>>> However, In future we may want to teach
llvm-mca how to analyze
>>>>>>> branchy
>>>>>>>>> code too. For example, we could introduce a
simple control-flow
>>>>>>> analysis in
>>>>>>>>> llvm-mca, and use an external "branch
trace" information (for
>>>>>>> example, a
>>>>>>>>> perf trace generated by an external tool)
to decorate branches
>>>> with
>>>>>>> with
>>>>>>>>> branch probabilities (similarly to what we
currently do in LLVM
>>>> with
>>>>>>> PGO).
>>>>>>>>> We could then use that knowledge to model
branch prediction and
>>>>>>> simulate
>>>>>>>>> what happens in the presence of multiple
branches.
>>>>>>>>>
>>>>>>>>> So, the idea of having regions that
potentially span multiple
>>>> basic
>>>>>>> blocks
>>>>>>>>> is not bad in general. However, I think you
should better clarify
>>>>>>> what are
>>>>>>>>> the constraints (at least, you should
answer to my questions from
>>>>>>> before).
>>>>>>>> I agree! Thanks for pointing that out.
>>>>>>>>
>>>>>>>>> If we decide to use those new intrinsics,
then those should be
>>>>>>> experimental
>>>>>>>>> (at least to start).
>>>>>>>> Agreed.
>>>>>>>>
>>>>>>>> -Matt
>>>>>>>>
>>>>>>>>> On Thu, Nov 15, 2018 at 11:07 PM via
llvm-dev <
>>>>>>> llvm-dev at lists.llvm.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Introduction
>>>>>>>>>> -----------------
>>>>>>>>>> Currently llvm-mca only accepts
assembly code as input. We
>>>> would
>>>>>>> like to
>>>>>>>>>> extend llvm-mca to support object
files, allowing users to
>>>> analyze
>>>>>>> the
>>>>>>>>>> performance of binaries. The proposed
changes (which involve
>>>> both
>>>>>>>>>> clang and llvm) optionally introduce an
object file section,
>>>> but
>>>>>>> this can
>>>>>>>>>> be
>>>>>>>>>> stripped-out if desired.
>>>>>>>>>>
>>>>>>>>>> For the llvm-mca binary support feature
to be useful, a user
>>>> needs
>>>>>>> to tell
>>>>>>>>>> llvm-mca which portions of their code
they would like analyzed.
>>>>>>> Currently,
>>>>>>>>>> this is accomplished via assembly
comments. However, assembly
>>>>>>> comments are
>>>>>>>>>> not
>>>>>>>>>> preserved in object files, and this has
encouraged this RFC.
>>>> For the
>>>>>>>>>> proposed
>>>>>>>>>> binary support, we need to introduce
changes to clang and llvm
>>>> to
>>>>>>> allow the
>>>>>>>>>> user's object code to be recognized
by llvm-mca:
>>>>>>>>>>
>>>>>>>>>> * We need a way for a user to identify
a region/block of code
>>>> they
>>>>>>> want
>>>>>>>>>>     analyzed by llvm-mca.
>>>>>>>>>> * We need the information defining the
user's region of code
>>>> to be
>>>>>>>>>> maintained
>>>>>>>>>>     in the object file so that llvm-mca
can analyze the desired
>>>>>>> region(s)
>>>>>>>>>> from the
>>>>>>>>>>     object file.
>>>>>>>>>>
>>>>>>>>>> We define a "code region" as
a subset of a user's program that
>>>> is
>>>>>>> to be
>>>>>>>>>> analyzed via llvm-mca. The sequence of
instructions to be
>>>> analyzed
>>>>>>> is
>>>>>>>>>> represented as a pair: <start,
end> where the 'start' marks the
>>>>>>> beginning
>>>>>>>>>> of
>>>>>>>>>> the user's source code and
'end' terminates the sequence. The
>>>>>>> instructions
>>>>>>>>>> between 'start' and
'end' form the region that can be analyzed
>>>> by
>>>>>>> llvm-mca
>>>>>>>>>> at a
>>>>>>>>>> later time.
>>>>>>>>>>
>>>>>>>>>> Example
>>>>>>>>>> -----------
>>>>>>>>>> Before we go into the details of this
proposed change, let's
>>>> first
>>>>>>> look at
>>>>>>>>>> a
>>>>>>>>>> simple example:
>>>>>>>>>>
>>>>>>>>>> // example.c -- Analyze a dot-product
expression.
>>>>>>>>>> double test(double x, double y) {
>>>>>>>>>>     double result = 0.0;
>>>>>>>>>>     __mca_code_region_start(42);
>>>>>>>>>>     result += x * y;
>>>>>>>>>>     __mca_code_region_end();
>>>>>>>>>>     return result;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> In the example above, we have
identified a code region, in this
>>>>>>> case a
>>>>>>>>>> single
>>>>>>>>>> dot-product expression. For the sake of
brevity and simplicity,
>>>>>>> we've
>>>>>>>>>> chosen
>>>>>>>>>> a very simple example, but in reality a
more complicated
>>>> example
>>>>>>> could use
>>>>>>>>>> multiple expressions. We have also
denoted this region as
>>>> number
>>>>>>> 42. That
>>>>>>>>>> identifier is only for the user, and
simplifies reading an
>>>> llvm-mca
>>>>>>>>>> analysis
>>>>>>>>>> report later.
>>>>>>>>>>
>>>>>>>>>> When this code is compiled, the region
markers (the
>>>> mca_code_region
>>>>>>>>>> markers)
>>>>>>>>>> are transformed into assembly labels.
While the markers are
>>>>>>> presented as
>>>>>>>>>> function calls, in reality they are
no-ops.
>>>>>>>>>>
>>>>>>>>>> test:
>>>>>>>>>> pushq   %rbp
>>>>>>>>>> movq    %rsp, %rbp
>>>>>>>>>> movsd   %xmm0, -8(%rbp)
>>>>>>>>>> movsd   %xmm1, -16(%rbp)
>>>>>>>>>> .Lmca_code_region_start_0: #
LLVM-MCA-START ID: 42
>>>>>>>>>> xorps   %xmm0, %xmm0
>>>>>>>>>> movsd   %xmm0, -24(%rbp)
>>>>>>>>>> movsd   -8(%rbp), %xmm0
>>>>>>>>>> mulsd   -16(%rbp), %xmm0
>>>>>>>>>> addsd   -24(%rbp), %xmm0
>>>>>>>>>> movsd   %xmm0, -24(%rbp)
>>>>>>>>>> .Lmca_code_region_end_0: # LLVM-MCA-END
ID: 42
>>>>>>>>>> movsd   -24(%rbp), %xmm0
>>>>>>>>>> popq    %rbp
>>>>>>>>>> retq
>>>>>>>>>> .section       
.mca_code_regions,"", at progbits
>>>>>>>>>> .quad   42
>>>>>>>>>> .quad   .Lmca_code_region_start_0
>>>>>>>>>> .quad  
.Lmca_code_region_end_0-.Lmca_code_region_start_0
>>>>>>>>>>
>>>>>>>>>> The assembly has been trimmed to show
the portions relevant to
>>>> this
>>>>>>> RFC.
>>>>>>>>>> Notice the labels enclose the
user's defined region, and that
>>>> they
>>>>>>>>>> preserve the
>>>>>>>>>> user's arbitrary region identifier,
the ever-so-important
>>>> region 42.
>>>>>>>>>> In the object file section
.mca_code_regions, we have noted the
>>>>>>> user's
>>>>>>>>>> region
>>>>>>>>>> identifier (.quad 42), start address,
and region size. A more
>>>>>>> complicated
>>>>>>>>>> example can have multiple regions
defined within a single
>>>>>>> .mca_code_regions
>>>>>>>>>> section. This section can be read by
llvm-mca, allowing
>>>> llvm-mca to
>>>>>>> take
>>>>>>>>>> object files as input instead of
assembly source.
>>>>>>>>>>
>>>>>>>>>> Details
>>>>>>>>>> ---------
>>>>>>>>>> We need a way for a user to identify a
region/block of code
>>>> they
>>>>>>> want
>>>>>>>>>> analyzed
>>>>>>>>>> by llvm-mca. We solve this problem by
introducing two
>>>> intrinsics
>>>>>>> that a
>>>>>>>>>> user can
>>>>>>>>>> specify, for identifying regions of
code for analysis.
>>>>>>>>>>
>>>>>>>>>> The two intrinsics are:
llvm.mca.code.regions.start and
>>>>>>>>>> llvm.mca.code.regions.end. A user can
identify a code region by
>>>>>>> inserting
>>>>>>>>>> the
>>>>>>>>>> mca_code_region_start and
mca_code_region_end markers. These
>>>> are
>>>>>>> simply
>>>>>>>>>> clang builtins and are transformed into
the aforementioned
>>>>>>> intrinsics
>>>>>>>>>> during
>>>>>>>>>> compilation. The code between the
intrinsics are what we call
>>>> "code
>>>>>>>>>> regions"
>>>>>>>>>> and are to be easily identifiable by
llvm-mca; any code
>>>> between a
>>>>>>> start/end
>>>>>>>>>> pair can be analyzed by llvm-mca at a
later time. A user can
>>>> define
>>>>>>>>>> multiple
>>>>>>>>>> non-overlapping code regions within
their program.
>>>>>>>>>>
>>>>>>>>>> The llvm.mca.code.region.start
intrinsic takes an integer
>>>> constant
>>>>>>> as its
>>>>>>>>>> only
>>>>>>>>>> argument. This argument is implemented
as a metadata i32, and
>>>> is
>>>>>>> only used
>>>>>>>>>> when generating llvm-mca reports. This
value allows a user to
>>>> more
>>>>>>> easily
>>>>>>>>>> identify a specific code region.
llvm.mca.code.region.end
>>>> takes no
>>>>>>>>>> arguments.
>>>>>>>>>> Since we disallow nesting of regions,
the first 'end' intrinsic
>>>>>>> lexically
>>>>>>>>>> following a 'start' intrinsic
represents the end of that code
>>>>>>> region.
>>>>>>>>>> Now that we have a solution for
identifying regions for
>>>> analysis,
>>>>>>> we now
>>>>>>>>>> need a
>>>>>>>>>> way for preserving that information to
be read at a later
>>>> time. To
>>>>>>>>>> accomplish
>>>>>>>>>> this we propose adding a new section
(.mca_code_regions) to the
>>>>>>> object file
>>>>>>>>>> generated by llvm. During code
generation, the start/end
>>>> intrinsics
>>>>>>>>>> described
>>>>>>>>>> above will be transformed into
start/end labels in assembly.
>>>> When
>>>>>>> llvm
>>>>>>>>>> generates the object file from the
user's code, these start/end
>>>>>>> labels
>>>>>>>>>> form a
>>>>>>>>>> pair of values identifying the start of
the user's code
>>>> region, and
>>>>>>> size.
>>>>>>>>>> The
>>>>>>>>>> size represents the number of bytes
between the start and end
>>>>>>> address of
>>>>>>>>>> the
>>>>>>>>>> labels. Note that the labels are
emitted during assembly
>>>> printing.
>>>>>>> We hope
>>>>>>>>>> that these labels have no influence on
code generation or
>>>>>>> basic-block
>>>>>>>>>> placement. However, the target
assembler strategy for handling
>>>>>>> labels is
>>>>>>>>>> outside of our control.
>>>>>>>>>>
>>>>>>>>>> This proposed change affects the size
of a binary, but only if
>>>> the
>>>>>>> user
>>>>>>>>>> calls
>>>>>>>>>> the start/end builtins mentioned above.
The additional size of
>>>> the
>>>>>>>>>> .mca_code_regions section, which we
imagine to be very small
>>>> (to
>>>>>>> the order
>>>>>>>>>> of a
>>>>>>>>>> few bytes), can trivially be stripped
by tools like 'strip' or
>>>>>>> 'objcopy'.
>>>>>>>>>> Implementation Status
>>>>>>>>>> ------------------------------
>>>>>>>>>> We currently have the proposed changes
implemented at the url
>>>>>>> posted below.
>>>>>>>>>> This initial patch only targets ELF
object files, and does not
>>>>>>> handle
>>>>>>>>>> relocatable addresses. Since the start
of a code region is
>>>>>>> represented as
>>>>>>>>>> an
>>>>>>>>>> assembly label, and referenced in the
.mca_code_regions
>>>> section,
>>>>>>> that
>>>>>>>>>> address
>>>>>>>>>> is relocatable. That value can be
represented as
>>>> section-relative
>>>>>>>>>> relocatable
>>>>>>>>>> symbol (.text + addend), but we are not
handling that case yet.
>>>>>>> Instead,
>>>>>>>>>> the
>>>>>>>>>> proposed changes only handle
linked/executable object files.
>>>>>>>>>>
>>>>>>>>>> For purposes of review and to
communicate the idea, the change
>>>> is
>>>>>>>>>> presented as a monolithic patch here:
>>>>>>>>>>
>>>>>>>>>> https://reviews.llvm.org/D54603
>>>>>>>>>>
>>>>>>>>>> The change is presented as a monolithic
patch; however, if
>>>> accepted
>>>>>>>>>> the patch will be split into three
smaller patches:
>>>>>>>>>> 1. The introduction of the builtins to
clang.
>>>>>>>>>> 2. The llvm portion (the added
intrinsics).
>>>>>>>>>> 3. The llvm-mca portion.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> -Matt
>>>>>>>>>>
_______________________________________________
>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

via llvm-dev

2018-Dec-11 19:22 UTC

head link

[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Thanks for the response Simon.  My reply is inline:
> From: Simon Pilgrim <llvm-dev at redking.me.uk>
> Sent: Monday, December 10, 2018 1:40 PM
>
> Hi Matt,
> 
> I can see a near future where perf-analysis tooling uses branch history
> profiler captures to determine how often loops/branches are taken and
> feeds that into llvm-mca, especially for hot/branchy loop analysis
> reports etc. Are you confident that your approach will be easily
> extendable for this?
That is a very interesting use case.  The restriction of a code-region to a
single block
is a limitation for any tools that want to analyze branches.  However, I believe
that it will be easy to lift this restriction (it's just a check in
IR/Verifier).  This limitation is not
expressed in the llvm-mca driver.

If the information is coming from a profile report, then we'd most
likely need to extend the llvm-mca driver to accept profile reports.  Currently,
code regions, from the perspective of the  llvm-mca driver, are very simple.
They are
just a collection of MCInst.  The binary support in this RFC+patch 
disassembles just the address range from marker start address for
some specified number of bytes.  It might be useful to add another driver
argument
so that a user (or tool) can specify, from the command line, a range of
instructions to
analyze.  I recently added a class for handling inputs to
llvm::mca::CodeRegionGenerator,
which is just responsible for taking some input and creating a list of MCInst
that llvm-mca uses.
We could subclass this to handle profile reports.
 > Similarly, being able to generally embed the profile markers in object
> libraries for reuse is going to be important for some people - I'd like
> to see more of a plan of how this will be achieved. I understand that it
> might not be easy for some exe formats.
That is definitely a limitation.  This initial patch+RFC only handles linked
executables (i.e., the llvm-mca marker symbol addresses are resolved).
I'm working on a better solution so that this will not be a restriction. 
In fact, I'll probably delay trying to land any patches until I solve
relocations
(or use a different solution for identifying start/end addresses for llvm-mca
code regions).
> Sorry if I'm being too critical, but I'm a bit worried that we end
up
> with an initial implementation that will take a lot of reworking to meet
> our final aims.
> 
> Thanks, Simon.
I understand your criticisms and value your input. Thanks a ton!

-Matt

 > On 10/12/2018 19:32, Matt Davis wrote:
> > Thanks for the feedback Guillaume and Clement!
> >
> > In response to Clement:
> >
> >>> In terms of future-proofness of only allowing regions within a
basic
> >>> block, are we confident we can actually ever simulate branches
apart from
> >>> "always taken, perfectly predicated" loop ? Even
this simple need requires
> >>> knowing quite a few details on the frontend. The current
design could
> >>> handle this use case with the addition of an external
"loop mode" option to
> >>> MCA. If there are no other strong use cases, I would advocate
for
> >>> experimental intrinsics unless people can contribute other
example use
> >>> cases.
> > In short, I am in agreement and think that handling of branching or
loop
> > constructs should be isolated to the llvm-mca driver/front-end.  The
> > only thing the code regions should be concerned with is identifying
> > blocks of instructions that will later be used by the front end.
> >
> > We can place limitations to how those blocks are formed. For example
the
> > current implementation forces regions to be isolated to a single basic
> > block.  However, we anticipate lifting this restriction once branching
> > is handled.
> >
> > -Matt
> >
> >
> > On Mon, Dec 10, 2018 at 04:15:46PM +0100, Guillaume Chatelet wrote:
> >> +1 to what Clement said.
> >> I believe the intrinsics are a better design to support many
architectures.
> >>
> >> IACA users are probably decorating their code with IACA_START /
IACA_END
> >> macros. One possibility is to provide a header that define these
macros in
> >> terms of the new intrinsics.
> >>
> >> On Mon, Dec 10, 2018 at 3:59 PM Clement Courbet <courbet at
google.com>
> wrote:
> >>
> >>> Hi Matt/Andrea,
> >>>
> >>> I see pros and cons for IACA-style markers vs intrinsics.
> >>> On the one hand, IACA-style markers are very magical, and not
very visible
> >>> in both the source and object code. Using IACA-style markers
has the
> >>> advantage that you can use llvm-mca as a drop-in replacement
for IACA, or
> >>> even to compare their outputs on the exact same binary. They
also do not
> >>> require tooling on the compiler side and allow comparing the
output of
> >>> several compilers.
> >>>
> >> On the other hand, IACA-style markers do not have a equivalent on
other
> >>> architectures, and I'm not sure inventing new ones is a
good idea :) I
> >>> think the latter makes them pretty much a no-go for llvm-mca
as I don't
> >>> think we'll want to teach each target how to parse code
regions. That's
> >>> much better handled in a target-agnostic way by the object.
Intel got away
> >>> with them because they only had to support one architecture.
> >>>
> >>> tl;dr: In the case of llvm-mca, I like your design better than
the markers.
> >>>
> >>> In terms of future-proofness of only allowing regions within a
basic
> >>> block, are we confident we can actually ever simulate branches
apart from
> >>> "always taken, perfectly predicated" loop ? Even
this simple need requires
> >>> knowing quite a few details on the frontend. The current
design could
> >>> handle this use case with the addition of an external
"loop mode" option to
> >>> MCA. If there are no other strong use cases, I would advocate
for
> >>> experimental intrinsics unless people can contribute other
example use
> >>> cases.
> >>>
> >>> On Mon, Dec 3, 2018 at 11:38 PM Matt Davis <matthew.davis
at sony.com>
> wrote:
> >>>
> >>>> Hi Andrea,
> >>>>
> >>>> On Mon, Dec 03, 2018 at 01:21:33PM +0000, Andrea Di Biagio
wrote:
> >>>>> So, I have been thinking a bit more about this whole
design.
> >>>>>
> >>>>> The more I think about your suggested design, the more
I am convinced
> >>>> that
> >>>>> we should do something more to support ranges in
binary object files
> >>>> too.
> >>>>> My understanding is that the reason why we don't
support object files in
> >>>>> general, is because of the presence of relocations.
That is because a
> >>>>> region start marker is effectively symbol relative,
and the symbol (a
> >>>>> function) would be relocated in the final executable.
> >>>>> You mentioned to me that resolving even a
'simple' symbol-relative
> >>>>> relocation is not trivial, beause it requires specific
knowledge about
> >>>> the
> >>>>> binary format, and the target (i.e. how relocations
are encoded is
> >>>> target
> >>>>> specific). I am surprised that there is not a utility
library for
> >>>> resolving
> >>>>> relocations.. but I am not familiar with that part of
the compiler. I
> >>>> was
> >>>>> hoping that there was a target specific interface to
use in this case...
> >>>> There might be a better way of resolving the relocs, but
from what I saw
> >>>> looking at llvm-objdump and other related tools, it seems
that resolving
> >>>> the relocated symbol is a target specific effort.  I also
spent sometime
> >>>> sniffing around
ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp which also
> >>>> performs the reloc resolution.  I should clarify that I
too am not an
> >>>> expert in llvm's utilities for performing symbol/reloc
resolution, and
> >>>> perhaps someone in the community can point me in the right
direction.  I
> >>>> can clearly see the reloc data in the object file via
tools like
> >>>> objdump; however, accessing the relocs via
> >>>> llvm::object::ObjectFile::relocations() did not produce
address values
> >>>> that we could use (values of zero).
> >>>>
> >>>> I was hoping that, for a first pass at this patch,
supporting just
> >>>> executables would be okay.  That keeps this initial patch
set simple,
> >>>> and hopefully will encourage others to take a peek at it,
since it's
> >>>> less daunting than what it might otherwise be.  Of course,
there is the
> >>>> concern that this initial patch will lock us into a design
that will be
> >>>> more complicated to unravel later.
> >>>>
> >>>>> An alternative approach would require that you define
your own
> >>>>> "symbol-relative" reference. After all,
ranges are just a sequences of
> >>>>> instructions in a function. If a function symbol is
described by the
> >>>> symbol
> >>>>> table, then you should be able to obtain its offset in
the .text
> >>>> section.
> >>>>> So, you could potentially encode your own
symbol+offset. However, the
> >>>>> linker would not be able to understand your
"custom relocation", and
> >>>>> information about regions in the final elf would be
basically broken.
> >>>>> So,that would not be a solution...
> >>>>>
> >>>>> I don't know honestly what is the best approach to
use in this case.
> >>>>> As a compromise, it would not be a bad idea to add the
ability to
> >>>> specify
> >>>>> ranges from command line. What do you think?
> >>>>> Still, from a user point of view, the idea that we
don't support object
> >>>>> files in general sounds like a big limitation.
> >>>> I agree, only supporting executables is a limitation. 
However, I'd
> >>>> like to land the base support now and add in the
additional
> >>>> features/support after this large patch set lands.  But I
can see
> >>>> where landing the whole thing entirely also makes sense.
> >>>>
> >>>>> About the new experimental intrinsics: those would
definitely work well
> >>>> for
> >>>>> the simple case where instructions are from the same
basic block.
> >>>>> However, some/most of the constraints that you plan to
add will have to
> >>>>> change if in future we decide to allow ranges that
potentially cross
> >>>>> multiple basic blocks. How will the rules/constraints
on those new
> >>>>> intrinsics change? I just want to make sure that the
suggested design is
> >>>>> future-proof.
> >>>> Since the llvm/clang parts of the code are just
responsible for
> >>>> collecting where a range starts/ends, I hope that we can
remove some
> >>>> of the baked-in constraints that are specified in
IR/Verifier.cpp.
> >>>> As you pointed out earlier in this thread, we might want
to
> >>>> introduce a dominance check if/when we lift the
one-basic-block
> >>>> restriction.
> >>>>
> >>>> -Matt
> >>>>
> >>>>> -Andrea
> >>>>>
> >>>>> On Tue, Nov 27, 2018 at 5:08 PM Andrea Di Biagio <
> >>>> andrea.dibiagio at gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Thanks for clarifying it Matt.
> >>>>>>
> >>>>>> In general, I quite like your suggested design.
> >>>>>>
> >>>>>> My only concern is about the semantic of the two
new intrinsics. You
> >>>>>> design doesn't allow mca ranges to span
through multiple basic
> >>>> blocks. That
> >>>>>> constraint is acceptable for now, since llvm-mca
doesn't know how to
> >>>> deal
> >>>>>> with control flow.
> >>>>>> However, I am a bit concerned about what might
happen in future if we
> >>>>>> decide to let users specify code regions that span
through multiple
> >>>> basic
> >>>>>> blocks. Basically, I don't particularly like
the idea of changing the
> >>>>>> semantic of already existing intrinsic. A design
that already
> >>>> accounts for
> >>>>>> that particular scenario/future work would be
ideal. That being said,
> >>>>>> marking those new intrinsics as
'experimental' may be a good
> >>>> compromise (at
> >>>>>> least for now).
> >>>>>>
> >>>>>> So, I am quite happy overall with the direction of
this RFC.
> >>>>>> However, I am interesting to hear from other
developers about your
> >>>>>> suggested design.
> >>>>>>
> >>>>>>> This initial patch only targets ELF object
files, and does not
> >>>> handle
> >>>>>> relocatable addresses. Since the start of a code
region is
> >>>> represented as
> >>>>>> an
> >>>>>> assembly label, and referenced in the
.mca_code_regions section, that
> >>>>>> address
> >>>>>> is relocatable.
> >>>>>>
> >>>>>> This may be okay for now. However, it would be
nice to remove that
> >>>>>> constraint in future and add support to generic
object files.
> >>>>>>
> >>>>>> -Andrea
> >>>>>>
> >>>>>> On Thu, Nov 22, 2018 at 7:21 PM <Matthew.Davis
at sony.com> wrote:
> >>>>>>
> >>>>>>> I want to clarify a few restrictions of
llvm-mca code regions that
> >>>> this
> >>>>>>> RFC proposes:
> >>>>>>>
> >>>>>>> 1) All llvm-mca code regions must start with
an
> >>>>>>> llvm.mca.code.region.start intrinsic and end
with
> >>>>>>> an llvm.mca.code.region.end intrinsic.  This
rule is enforced at the
> >>>> IR
> >>>>>>> level in the IR verifier.
> >>>>>>>
> >>>>>>> 2) llvm-mca code regions cannot nest.  This
restriction implies that
> >>>> an
> >>>>>>> llvm.mca.code.region.start
> >>>>>>> must have a llvm.mca.code.region.end intrinsic
without any other
> >>>> llvm.mca
> >>>>>>> start intrinsics
> >>>>>>> between the two. The current implementation in
the patch enforces
> >>>> this
> >>>>>>> restriction at the
> >>>>>>> IR level via the IR Verifier.
> >>>>>>>
> >>>>>>> 3) An llvm-mca code region cannot span
multiple basic blocks.
> >>>> llvm-mca
> >>>>>>> does not follow
> >>>>>>> branches (yet).  Instead, a branch instruction
is treated by llvm-mca
> >>>>>>> like any other instruction.
> >>>>>>> The current patch associated with this RFC
does not enforce this
> >>>>>>> restriction.  I plan on updating
> >>>>>>> the patch to enforce that a code region can
only belong to a single
> >>>> basic
> >>>>>>> block.  This is a simple
> >>>>>>> check, ensuring that both the
llvm.mca.code.region.start and
> >>>> accompanying
> >>>>>>> end intrinsics live
> >>>>>>> in the same basic block. I imagine adding this
check at the IR level
> >>>> when
> >>>>>>> we also verify points 1 and 2
> >>>>>>> above.  That will keep the code-region
verification logic isolated
> >>>> to the
> >>>>>>> IR verifier.  The start/end
> >>>>>>> intrinsics should not have any uses, so
I'm not sure that they would
> >>>> be
> >>>>>>> moved/sunk on behalf
> >>>>>>> of any other instruction.  In other words, I
do not imagine that a
> >>>> start
> >>>>>>> and end would be split
> >>>>>>> apart due to later MI optimizations.  If I
discover that such a case
> >>>>>>> occurs, then I might add the
> >>>>>>> basic-block check prior to emitting the code
region data to the
> >>>> object
> >>>>>>> file.    Once  llvm-mca  is
> >>>>>>> updated to handle branches, then we can remove
this constraint.
> >>>>>>>
> >>>>>>> -Matt
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: llvm-dev <llvm-dev-bounces at
lists.llvm.org> On Behalf Of Matt
> >>>>>>> Davis via llvm-
> >>>>>>>> dev
> >>>>>>>> Sent: Wednesday, November 21, 2018 8:47 AM
> >>>>>>>> To: Andrea Di Biagio <andrea.dibiagio
at gmail.com>
> >>>>>>>> Cc: llvm-dev <llvm-dev at
lists.llvm.org>; Di Biagio, Andrea
> >>>>>>>> <Andrea.Dibiagio at sony.com>;
cfe-dev at lists.llvm.org
> >>>>>>>> Subject: Re: [llvm-dev] [RFC][llvm-mca]
Adding binary support to
> >>>>>>> llvm-mca.
> >>>>>>>> Hi Andrea,
> >>>>>>>>
> >>>>>>>> Thanks for your input.
> >>>>>>>>
> >>>>>>>> On Wed, Nov 21, 2018 at 12:43:52PM +0000,
Andrea Di Biagio wrote:
> >>>>>>>> [... snip ...]
> >>>>>>>>> About the suggested design:
> >>>>>>>>> I like the idea of being able to
identify code regions using a
> >>>> numeric
> >>>>>>>>> identifier.
> >>>>>>>>> However, what happens if a code region
spans through multiple
> >>>> basic
> >>>>>>> blocks?
> >>>>>>>> The current patch does not take into
consideration cases where the
> >>>>>>>> region start and end intrinsics are placed
in different basic
> >>>> blocks.
> >>>>>>>> Such would be the case if a region is
defined to span multiple
> >>>> blocks.
> >>>>>>>> This would be similar to the current case
where a user places a
> >>>>>>>> #LLVM-MCA-BEGIN assembly comment in one
block and an #LLVM-MCA-
> END
> >>>> in
> >>>>>>>> another.  However, as you point out below,
if the user does this
> >>>> in the
> >>>>>>>> source code via intrinsics (just what this
patch is proposing),
> >>>> then
> >>>>>>>> there is a chance that optimizations might
change the layout of the
> >>>>>>>> instructions and confuse the ordering of
the MCA intrinsics.
> >>>>>>>>
> >>>>>>>> Since MCA does not follow branches (MCA
just treats a branch as it
> >>>> would
> >>>>>>>> a non-branching instruction), it seems
that a user should be aware
> >>>> that
> >>>>>>>> defining MCA code regions that span
multiple blocks might result
> >>>> in an
> >>>>>>>> unexpected analysis.  While we do not
discourage this, it seems
> >>>> like
> >>>>>>>> such a case will probably not produce an
expected result for the
> >>>> user.
> >>>>>>>> We could introduce a warning, or
automatically divide the regions
> >>>> so
> >>>>>>>> that a single region can only contain a
single block.
> >>>>>>>>
> >>>>>>>>> My understanding is that code regions
are not allowed to
> >>>> overlap. So,
> >>>>>>> it
> >>>>>>>>> makes sense if `
__mca_code_region_end()` doesn't take an ID as
> >>>> input.
> >>>>>>>>> However, what if `
__mca_code_region_end()` ends in a different
> >>>> basic
> >>>>>>> block?
> >>>>>>>>> `__mca_code_region_start()` has to
always dominate `
> >>>>>>>>> __mca_code_region_end()`. This is
trivial to verify when both
> >>>> calls
> >>>>>>> are in
> >>>>>>>>> a same basic block; however, we need
to make sure that the
> >>>>>>> relationship is
> >>>>>>>>> still the same when the `end()` call
is in a different basic
> >>>> block.
> >>>>>>>>> That would not be enough. I think we
should also verify  that `
> >>>>>>>>> __mca_code_region_end()` always
post-dominates the call to
> >>>>>>>>> `__mca_code_region_start()`.
> >>>>>>>> In any case this patch should probably
check dominance of the
> >>>>>>>> intrinsics, even though MCA does not
follow branches and MCA does
> >>>> not
> >>>>>>>> not explicitly forbid a region from
containing multiple blocks.
> >>>>>>>>
> >>>>>>>>> My question is: what happens with
basic block reordering? We
> >>>> don't
> >>>>>>> know the
> >>>>>>>>> layout of basic blocks until we reach
code emission. How does it
> >>>> work
> >>>>>>> for
> >>>>>>>>> regions that span through multiple
basic blocks?. I think your
> >>>> RFC
> >>>>>>> should
> >>>>>>>>> clarify this aspect.
> >>>>>>>>>
> >>>>>>>>> As a side note: at the moment,
llvm-mca doesn't know how to deal
> >>>> with
> >>>>>>>>> branches. So, for simplicity we could
force code regions to only
> >>>>>>> contain
> >>>>>>>>> instructions from a single basic
block.
> >>>>>>>>>
> >>>>>>>>> However, In future we may want to
teach llvm-mca how to analyze
> >>>>>>> branchy
> >>>>>>>>> code too. For example, we could
introduce a simple control-flow
> >>>>>>> analysis in
> >>>>>>>>> llvm-mca, and use an external
"branch trace" information (for
> >>>>>>> example, a
> >>>>>>>>> perf trace generated by an external
tool) to decorate branches
> >>>> with
> >>>>>>> with
> >>>>>>>>> branch probabilities (similarly to
what we currently do in LLVM
> >>>> with
> >>>>>>> PGO).
> >>>>>>>>> We could then use that knowledge to
model branch prediction and
> >>>>>>> simulate
> >>>>>>>>> what happens in the presence of
multiple branches.
> >>>>>>>>>
> >>>>>>>>> So, the idea of having regions that
potentially span multiple
> >>>> basic
> >>>>>>> blocks
> >>>>>>>>> is not bad in general. However, I
think you should better clarify
> >>>>>>> what are
> >>>>>>>>> the constraints (at least, you should
answer to my questions from
> >>>>>>> before).
> >>>>>>>> I agree! Thanks for pointing that out.
> >>>>>>>>
> >>>>>>>>> If we decide to use those new
intrinsics, then those should be
> >>>>>>> experimental
> >>>>>>>>> (at least to start).
> >>>>>>>> Agreed.
> >>>>>>>>
> >>>>>>>> -Matt
> >>>>>>>>
> >>>>>>>>> On Thu, Nov 15, 2018 at 11:07 PM via
llvm-dev <
> >>>>>>> llvm-dev at lists.llvm.org>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Introduction
> >>>>>>>>>> -----------------
> >>>>>>>>>> Currently llvm-mca only accepts
assembly code as input. We
> >>>> would
> >>>>>>> like to
> >>>>>>>>>> extend llvm-mca to support object
files, allowing users to
> >>>> analyze
> >>>>>>> the
> >>>>>>>>>> performance of binaries. The
proposed changes (which involve
> >>>> both
> >>>>>>>>>> clang and llvm) optionally
introduce an object file section,
> >>>> but
> >>>>>>> this can
> >>>>>>>>>> be
> >>>>>>>>>> stripped-out if desired.
> >>>>>>>>>>
> >>>>>>>>>> For the llvm-mca binary support
feature to be useful, a user
> >>>> needs
> >>>>>>> to tell
> >>>>>>>>>> llvm-mca which portions of their
code they would like analyzed.
> >>>>>>> Currently,
> >>>>>>>>>> this is accomplished via assembly
comments. However, assembly
> >>>>>>> comments are
> >>>>>>>>>> not
> >>>>>>>>>> preserved in object files, and
this has encouraged this RFC.
> >>>> For the
> >>>>>>>>>> proposed
> >>>>>>>>>> binary support, we need to
introduce changes to clang and llvm
> >>>> to
> >>>>>>> allow the
> >>>>>>>>>> user's object code to be
recognized by llvm-mca:
> >>>>>>>>>>
> >>>>>>>>>> * We need a way for a user to
identify a region/block of code
> >>>> they
> >>>>>>> want
> >>>>>>>>>>     analyzed by llvm-mca.
> >>>>>>>>>> * We need the information defining
the user's region of code
> >>>> to be
> >>>>>>>>>> maintained
> >>>>>>>>>>     in the object file so that
llvm-mca can analyze the desired
> >>>>>>> region(s)
> >>>>>>>>>> from the
> >>>>>>>>>>     object file.
> >>>>>>>>>>
> >>>>>>>>>> We define a "code
region" as a subset of a user's program that
> >>>> is
> >>>>>>> to be
> >>>>>>>>>> analyzed via llvm-mca. The
sequence of instructions to be
> >>>> analyzed
> >>>>>>> is
> >>>>>>>>>> represented as a pair: <start,
end> where the 'start' marks the
> >>>>>>> beginning
> >>>>>>>>>> of
> >>>>>>>>>> the user's source code and
'end' terminates the sequence. The
> >>>>>>> instructions
> >>>>>>>>>> between 'start' and
'end' form the region that can be analyzed
> >>>> by
> >>>>>>> llvm-mca
> >>>>>>>>>> at a
> >>>>>>>>>> later time.
> >>>>>>>>>>
> >>>>>>>>>> Example
> >>>>>>>>>> -----------
> >>>>>>>>>> Before we go into the details of
this proposed change, let's
> >>>> first
> >>>>>>> look at
> >>>>>>>>>> a
> >>>>>>>>>> simple example:
> >>>>>>>>>>
> >>>>>>>>>> // example.c -- Analyze a
dot-product expression.
> >>>>>>>>>> double test(double x, double y) {
> >>>>>>>>>>     double result = 0.0;
> >>>>>>>>>>     __mca_code_region_start(42);
> >>>>>>>>>>     result += x * y;
> >>>>>>>>>>     __mca_code_region_end();
> >>>>>>>>>>     return result;
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> In the example above, we have
identified a code region, in this
> >>>>>>> case a
> >>>>>>>>>> single
> >>>>>>>>>> dot-product expression. For the
sake of brevity and simplicity,
> >>>>>>> we've
> >>>>>>>>>> chosen
> >>>>>>>>>> a very simple example, but in
reality a more complicated
> >>>> example
> >>>>>>> could use
> >>>>>>>>>> multiple expressions. We have also
denoted this region as
> >>>> number
> >>>>>>> 42. That
> >>>>>>>>>> identifier is only for the user,
and simplifies reading an
> >>>> llvm-mca
> >>>>>>>>>> analysis
> >>>>>>>>>> report later.
> >>>>>>>>>>
> >>>>>>>>>> When this code is compiled, the
region markers (the
> >>>> mca_code_region
> >>>>>>>>>> markers)
> >>>>>>>>>> are transformed into assembly
labels. While the markers are
> >>>>>>> presented as
> >>>>>>>>>> function calls, in reality they
are no-ops.
> >>>>>>>>>>
> >>>>>>>>>> test:
> >>>>>>>>>> pushq   %rbp
> >>>>>>>>>> movq    %rsp, %rbp
> >>>>>>>>>> movsd   %xmm0, -8(%rbp)
> >>>>>>>>>> movsd   %xmm1, -16(%rbp)
> >>>>>>>>>> .Lmca_code_region_start_0: #
LLVM-MCA-START ID: 42
> >>>>>>>>>> xorps   %xmm0, %xmm0
> >>>>>>>>>> movsd   %xmm0, -24(%rbp)
> >>>>>>>>>> movsd   -8(%rbp), %xmm0
> >>>>>>>>>> mulsd   -16(%rbp), %xmm0
> >>>>>>>>>> addsd   -24(%rbp), %xmm0
> >>>>>>>>>> movsd   %xmm0, -24(%rbp)
> >>>>>>>>>> .Lmca_code_region_end_0: #
LLVM-MCA-END ID: 42
> >>>>>>>>>> movsd   -24(%rbp), %xmm0
> >>>>>>>>>> popq    %rbp
> >>>>>>>>>> retq
> >>>>>>>>>> .section       
.mca_code_regions,"", at progbits
> >>>>>>>>>> .quad   42
> >>>>>>>>>> .quad   .Lmca_code_region_start_0
> >>>>>>>>>> .quad  
.Lmca_code_region_end_0-.Lmca_code_region_start_0
> >>>>>>>>>>
> >>>>>>>>>> The assembly has been trimmed to
show the portions relevant to
> >>>> this
> >>>>>>> RFC.
> >>>>>>>>>> Notice the labels enclose the
user's defined region, and that
> >>>> they
> >>>>>>>>>> preserve the
> >>>>>>>>>> user's arbitrary region
identifier, the ever-so-important
> >>>> region 42.
> >>>>>>>>>> In the object file section
.mca_code_regions, we have noted the
> >>>>>>> user's
> >>>>>>>>>> region
> >>>>>>>>>> identifier (.quad 42), start
address, and region size. A more
> >>>>>>> complicated
> >>>>>>>>>> example can have multiple regions
defined within a single
> >>>>>>> .mca_code_regions
> >>>>>>>>>> section. This section can be read
by llvm-mca, allowing
> >>>> llvm-mca to
> >>>>>>> take
> >>>>>>>>>> object files as input instead of
assembly source.
> >>>>>>>>>>
> >>>>>>>>>> Details
> >>>>>>>>>> ---------
> >>>>>>>>>> We need a way for a user to
identify a region/block of code
> >>>> they
> >>>>>>> want
> >>>>>>>>>> analyzed
> >>>>>>>>>> by llvm-mca. We solve this problem
by introducing two
> >>>> intrinsics
> >>>>>>> that a
> >>>>>>>>>> user can
> >>>>>>>>>> specify, for identifying regions
of code for analysis.
> >>>>>>>>>>
> >>>>>>>>>> The two intrinsics are:
llvm.mca.code.regions.start and
> >>>>>>>>>> llvm.mca.code.regions.end. A user
can identify a code region by
> >>>>>>> inserting
> >>>>>>>>>> the
> >>>>>>>>>> mca_code_region_start and
mca_code_region_end markers. These
> >>>> are
> >>>>>>> simply
> >>>>>>>>>> clang builtins and are transformed
into the aforementioned
> >>>>>>> intrinsics
> >>>>>>>>>> during
> >>>>>>>>>> compilation. The code between the
intrinsics are what we call
> >>>> "code
> >>>>>>>>>> regions"
> >>>>>>>>>> and are to be easily identifiable
by llvm-mca; any code
> >>>> between a
> >>>>>>> start/end
> >>>>>>>>>> pair can be analyzed by llvm-mca
at a later time. A user can
> >>>> define
> >>>>>>>>>> multiple
> >>>>>>>>>> non-overlapping code regions
within their program.
> >>>>>>>>>>
> >>>>>>>>>> The llvm.mca.code.region.start
intrinsic takes an integer
> >>>> constant
> >>>>>>> as its
> >>>>>>>>>> only
> >>>>>>>>>> argument. This argument is
implemented as a metadata i32, and
> >>>> is
> >>>>>>> only used
> >>>>>>>>>> when generating llvm-mca reports.
This value allows a user to
> >>>> more
> >>>>>>> easily
> >>>>>>>>>> identify a specific code region.
llvm.mca.code.region.end
> >>>> takes no
> >>>>>>>>>> arguments.
> >>>>>>>>>> Since we disallow nesting of
regions, the first 'end' intrinsic
> >>>>>>> lexically
> >>>>>>>>>> following a 'start'
intrinsic represents the end of that code
> >>>>>>> region.
> >>>>>>>>>> Now that we have a solution for
identifying regions for
> >>>> analysis,
> >>>>>>> we now
> >>>>>>>>>> need a
> >>>>>>>>>> way for preserving that
information to be read at a later
> >>>> time. To
> >>>>>>>>>> accomplish
> >>>>>>>>>> this we propose adding a new
section (.mca_code_regions) to the
> >>>>>>> object file
> >>>>>>>>>> generated by llvm. During code
generation, the start/end
> >>>> intrinsics
> >>>>>>>>>> described
> >>>>>>>>>> above will be transformed into
start/end labels in assembly.
> >>>> When
> >>>>>>> llvm
> >>>>>>>>>> generates the object file from the
user's code, these start/end
> >>>>>>> labels
> >>>>>>>>>> form a
> >>>>>>>>>> pair of values identifying the
start of the user's code
> >>>> region, and
> >>>>>>> size.
> >>>>>>>>>> The
> >>>>>>>>>> size represents the number of
bytes between the start and end
> >>>>>>> address of
> >>>>>>>>>> the
> >>>>>>>>>> labels. Note that the labels are
emitted during assembly
> >>>> printing.
> >>>>>>> We hope
> >>>>>>>>>> that these labels have no
influence on code generation or
> >>>>>>> basic-block
> >>>>>>>>>> placement. However, the target
assembler strategy for handling
> >>>>>>> labels is
> >>>>>>>>>> outside of our control.
> >>>>>>>>>>
> >>>>>>>>>> This proposed change affects the
size of a binary, but only if
> >>>> the
> >>>>>>> user
> >>>>>>>>>> calls
> >>>>>>>>>> the start/end builtins mentioned
above. The additional size of
> >>>> the
> >>>>>>>>>> .mca_code_regions section, which
we imagine to be very small
> >>>> (to
> >>>>>>> the order
> >>>>>>>>>> of a
> >>>>>>>>>> few bytes), can trivially be
stripped by tools like 'strip' or
> >>>>>>> 'objcopy'.
> >>>>>>>>>> Implementation Status
> >>>>>>>>>> ------------------------------
> >>>>>>>>>> We currently have the proposed
changes implemented at the url
> >>>>>>> posted below.
> >>>>>>>>>> This initial patch only targets
ELF object files, and does not
> >>>>>>> handle
> >>>>>>>>>> relocatable addresses. Since the
start of a code region is
> >>>>>>> represented as
> >>>>>>>>>> an
> >>>>>>>>>> assembly label, and referenced in
the .mca_code_regions
> >>>> section,
> >>>>>>> that
> >>>>>>>>>> address
> >>>>>>>>>> is relocatable. That value can be
represented as
> >>>> section-relative
> >>>>>>>>>> relocatable
> >>>>>>>>>> symbol (.text + addend), but we
are not handling that case yet.
> >>>>>>> Instead,
> >>>>>>>>>> the
> >>>>>>>>>> proposed changes only handle
linked/executable object files.
> >>>>>>>>>>
> >>>>>>>>>> For purposes of review and to
communicate the idea, the change
> >>>> is
> >>>>>>>>>> presented as a monolithic patch
here:
> >>>>>>>>>>
> >>>>>>>>>> https://reviews.llvm.org/D54603
> >>>>>>>>>>
> >>>>>>>>>> The change is presented as a
monolithic patch; however, if
> >>>> accepted
> >>>>>>>>>> the patch will be split into three
smaller patches:
> >>>>>>>>>> 1. The introduction of the
builtins to clang.
> >>>>>>>>>> 2. The llvm portion (the added
intrinsics).
> >>>>>>>>>> 3. The llvm-mca portion.
> >>>>>>>>>>
> >>>>>>>>>> Thanks!
> >>>>>>>>>>
> >>>>>>>>>> -Matt
> >>>>>>>>>>
_______________________________________________
> >>>>>>>>>> LLVM Developers mailing list
> >>>>>>>>>> llvm-dev at lists.llvm.org
> >>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>>>>>>>>
> >>>>>>>>
_______________________________________________
> >>>>>>>> LLVM Developers mailing list
> >>>>>>>> llvm-dev at lists.llvm.org
> >>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

via llvm-dev

2018-Dec-17 17:48 UTC

head link

[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Adding the llvm-dev list, because my email client decides to remove certain
lists when I reply-all... including the list that I intend to respond to.
> -----Original Message-----
> From: Davis, Matthew <Matthew.Davis at sony.com>
> Sent: Monday, December 17, 2018 9:47 AM
> To: Davis, Matthew <Matthew.Davis at sony.com>; llvm-dev at
redking.me.uk
> Cc: cfe-dev at lists.llvm.org
> Subject: RE: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.
> 
> Just an update to this RFC.  (I thought this was going to be a short
email... my
> apologies).
> 
> One of the primary limitations described in this RFC (earlier in this
thread), is that the
> patch for this RFC only handles linked executables.  This restriction is
due to myself
> wanting to avoid handling relocations in a target specific manner, at least
in the
> initial patch.  Especially so, since I want to keep the initial patch
simple.  However,
> sometimes simple and practical are at odds with each other.  I envision the
main use-
> case of llvm-mca, with binary support, is for analyzing .o files.  Probably
analyzing .o
> more commonly than fully linked executables.  Without handling relocated
objects,
> this patch seems rather useless.
> 
> I started exploring an alternative solution this weekend to the
aforementioned
> problem.  This alternative solution avoids having to handle relocations,
but does give
> us support for object files (with relocated symbols) and linked
executables.  The
> change is quite simple, and seems to be effective.  In short, we still
generate
> intrinsics as discussed in the RFC, one to mark the start of a code region,
and
> another to mark the end.  These intrinsics get lowered into local symbols. 
The
> symbols are already  encoded with address information about their position
in the
> object file.  What is different is that we ensure that these symbols have
unique
> names and also encode the user provided ID value.
> 
> Previously the labels  were named like:
.Lmca_code_region_start_<number>, and
> similar for .Lmca_code_region_end.  The user id number and region size were
> encoded in the .mca_code_regions object file section.  Previously mca never
looked
> at the symbol table.  But, In reality we can calculate the region size by
using the
> symbols in the symbol table (look for the mca symbols), instead of relying
on the
> information encoded in .mca_code_regions.  The alternative approach gets
rid of
> that section entirely but achieves the same functionality by encoding the
information
> in the symbol name. In short the alternative approach just parses the
symbol table
> for MCA symbols, and the symbol names are encoded with the data we need.
> 
> The newly proposed name is formatted
> as: .Lmca_code_region_start.<id>.<function>.<number>. 
Similar for
> mca_code_region_end.  'function' is the function that the marker
appears in, 'ID' is
> the user-specified ID (this is a value that users specify for easily
identifying the code
> region under analysis... just cosmetic), and 'number' is a unique
number to avoid any
> duplicate name conflicts.  The benefit of this alternative solution is that
we can get
> rid of .mca_code_regions, and gather all of the information llvm-mca needs
by
> parsing the symbol table looking for any symbols with the
'mca_code_region_start'
> and 'mca_code_region_end' format discussed above.  Of course, if
the string table is
> stripped, then we will lose this data.  The main drawback from this
alternative
> approach is that it relies on encoding symbol names and string processing
on those
> names.   I'm somewhat biased against doing string parsing, but the code
to perform
> this is simple and small, and more importantly it allows llvm-mca to handle
linked or
> relocated object files.
> 
> -Matt
> 
> 
> 
> > -----Original Message-----
> > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of
via llvm-dev
> > Sent: Tuesday, December 11, 2018 11:23 AM
> > To: llvm-dev at redking.me.uk; llvm-dev at lists.llvm.org
> > Cc: cfe-dev at lists.llvm.org
> > Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to
llvm-mca.
> >
> > Thanks for the response Simon.  My reply is inline:
> >
> > > From: Simon Pilgrim <llvm-dev at redking.me.uk>
> > > Sent: Monday, December 10, 2018 1:40 PM
> > >
> > > Hi Matt,
> > >
> > > I can see a near future where perf-analysis tooling uses branch
history
> > > profiler captures to determine how often loops/branches are taken
and
> > > feeds that into llvm-mca, especially for hot/branchy loop
analysis
> > > reports etc. Are you confident that your approach will be easily
> > > extendable for this?
> >
> > That is a very interesting use case.  The restriction of a code-region
to a single
> block
> > is a limitation for any tools that want to analyze branches.  However,
I believe
> > that it will be easy to lift this restriction (it's just a check
in IR/Verifier).  This
> > limitation is not
> > expressed in the llvm-mca driver.
> >
> > If the information is coming from a profile report, then we'd most
> > likely need to extend the llvm-mca driver to accept profile reports. 
Currently,
> > code regions, from the perspective of the  llvm-mca driver, are very
simple. They
> are
> > just a collection of MCInst.  The binary support in this RFC+patch
> > disassembles just the address range from marker start address for
> > some specified number of bytes.  It might be useful to add another
driver argument
> > so that a user (or tool) can specify, from the command line, a range
of instructions
> > to
> > analyze.  I recently added a class for handling inputs to
> > llvm::mca::CodeRegionGenerator,
> > which is just responsible for taking some input and creating a list of
MCInst that
> llvm-
> > mca uses.
> > We could subclass this to handle profile reports.
> >
> > > Similarly, being able to generally embed the profile markers in
object
> > > libraries for reuse is going to be important for some people -
I'd like
> > > to see more of a plan of how this will be achieved. I understand
that it
> > > might not be easy for some exe formats.
> >
> > That is definitely a limitation.  This initial patch+RFC only handles
linked
> > executables (i.e., the llvm-mca marker symbol addresses are resolved).
> > I'm working on a better solution so that this will not be a
restriction.
> > In fact, I'll probably delay trying to land any patches until I
solve relocations
> > (or use a different solution for identifying start/end addresses for
llvm-mca code
> > regions).
> >
> > > Sorry if I'm being too critical, but I'm a bit worried
that we end up
> > > with an initial implementation that will take a lot of reworking
to meet
> > > our final aims.
> > >
> > > Thanks, Simon.
> >
> > I understand your criticisms and value your input. Thanks a ton!
> >
> > -Matt
> >
> >
> > > On 10/12/2018 19:32, Matt Davis wrote:
> > > > Thanks for the feedback Guillaume and Clement!
> > > >
> > > > In response to Clement:
> > > >
> > > >>> In terms of future-proofness of only allowing
regions within a basic
> > > >>> block, are we confident we can actually ever
simulate branches apart from
> > > >>> "always taken, perfectly predicated" loop
? Even this simple need requires
> > > >>> knowing quite a few details on the frontend. The
current design could
> > > >>> handle this use case with the addition of an
external "loop mode" option to
> > > >>> MCA. If there are no other strong use cases, I would
advocate for
> > > >>> experimental intrinsics unless people can contribute
other example use
> > > >>> cases.
> > > > In short, I am in agreement and think that handling of
branching or loop
> > > > constructs should be isolated to the llvm-mca
driver/front-end.  The
> > > > only thing the code regions should be concerned with is
identifying
> > > > blocks of instructions that will later be used by the front
end.
> > > >
> > > > We can place limitations to how those blocks are formed. For
example the
> > > > current implementation forces regions to be isolated to a
single basic
> > > > block.  However, we anticipate lifting this restriction once
branching
> > > > is handled.
> > > >
> > > > -Matt
> > > >
> > > >
> > > > On Mon, Dec 10, 2018 at 04:15:46PM +0100, Guillaume Chatelet
wrote:
> > > >> +1 to what Clement said.
> > > >> I believe the intrinsics are a better design to support
many architectures.
> > > >>
> > > >> IACA users are probably decorating their code with
IACA_START / IACA_END
> > > >> macros. One possibility is to provide a header that
define these macros in
> > > >> terms of the new intrinsics.
> > > >>
> > > >> On Mon, Dec 10, 2018 at 3:59 PM Clement Courbet
<courbet at google.com>
> > > wrote:
> > > >>
> > > >>> Hi Matt/Andrea,
> > > >>>
> > > >>> I see pros and cons for IACA-style markers vs
intrinsics.
> > > >>> On the one hand, IACA-style markers are very
magical, and not very visible
> > > >>> in both the source and object code. Using IACA-style
markers has the
> > > >>> advantage that you can use llvm-mca as a drop-in
replacement for IACA, or
> > > >>> even to compare their outputs on the exact same
binary. They also do not
> > > >>> require tooling on the compiler side and allow
comparing the output of
> > > >>> several compilers.
> > > >>>
> > > >> On the other hand, IACA-style markers do not have a
equivalent on other
> > > >>> architectures, and I'm not sure inventing new
ones is a good idea :) I
> > > >>> think the latter makes them pretty much a no-go for
llvm-mca as I don't
> > > >>> think we'll want to teach each target how to
parse code regions. That's
> > > >>> much better handled in a target-agnostic way by the
object. Intel got away
> > > >>> with them because they only had to support one
architecture.
> > > >>>
> > > >>> tl;dr: In the case of llvm-mca, I like your design
better than the markers.
> > > >>>
> > > >>> In terms of future-proofness of only allowing
regions within a basic
> > > >>> block, are we confident we can actually ever
simulate branches apart from
> > > >>> "always taken, perfectly predicated" loop
? Even this simple need requires
> > > >>> knowing quite a few details on the frontend. The
current design could
> > > >>> handle this use case with the addition of an
external "loop mode" option to
> > > >>> MCA. If there are no other strong use cases, I would
advocate for
> > > >>> experimental intrinsics unless people can contribute
other example use
> > > >>> cases.
> > > >>>
> > > >>> On Mon, Dec 3, 2018 at 11:38 PM Matt Davis
<matthew.davis at sony.com>
> > > wrote:
> > > >>>
> > > >>>> Hi Andrea,
> > > >>>>
> > > >>>> On Mon, Dec 03, 2018 at 01:21:33PM +0000, Andrea
Di Biagio wrote:
> > > >>>>> So, I have been thinking a bit more about
this whole design.
> > > >>>>>
> > > >>>>> The more I think about your suggested
design, the more I am convinced
> > > >>>> that
> > > >>>>> we should do something more to support
ranges in binary object files
> > > >>>> too.
> > > >>>>> My understanding is that the reason why we
don't support object files in
> > > >>>>> general, is because of the presence of
relocations. That is because a
> > > >>>>> region start marker is effectively symbol
relative, and the symbol (a
> > > >>>>> function) would be relocated in the final
executable.
> > > >>>>> You mentioned to me that resolving even a
'simple' symbol-relative
> > > >>>>> relocation is not trivial, beause it
requires specific knowledge about
> > > >>>> the
> > > >>>>> binary format, and the target (i.e. how
relocations are encoded is
> > > >>>> target
> > > >>>>> specific). I am surprised that there is not
a utility library for
> > > >>>> resolving
> > > >>>>> relocations.. but I am not familiar with
that part of the compiler. I
> > > >>>> was
> > > >>>>> hoping that there was a target specific
interface to use in this case...
> > > >>>> There might be a better way of resolving the
relocs, but from what I saw
> > > >>>> looking at llvm-objdump and other related tools,
it seems that resolving
> > > >>>> the relocated symbol is a target specific
effort.  I also spent sometime
> > > >>>> sniffing around
ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp which also
> > > >>>> performs the reloc resolution.  I should clarify
that I too am not an
> > > >>>> expert in llvm's utilities for performing
symbol/reloc resolution, and
> > > >>>> perhaps someone in the community can point me in
the right direction.  I
> > > >>>> can clearly see the reloc data in the object
file via tools like
> > > >>>> objdump; however, accessing the relocs via
> > > >>>> llvm::object::ObjectFile::relocations() did not
produce address values
> > > >>>> that we could use (values of zero).
> > > >>>>
> > > >>>> I was hoping that, for a first pass at this
patch, supporting just
> > > >>>> executables would be okay.  That keeps this
initial patch set simple,
> > > >>>> and hopefully will encourage others to take a
peek at it, since it's
> > > >>>> less daunting than what it might otherwise be. 
Of course, there is the
> > > >>>> concern that this initial patch will lock us
into a design that will be
> > > >>>> more complicated to unravel later.
> > > >>>>
> > > >>>>> An alternative approach would require that
you define your own
> > > >>>>> "symbol-relative" reference. After
all, ranges are just a sequences of
> > > >>>>> instructions in a function. If a function
symbol is described by the
> > > >>>> symbol
> > > >>>>> table, then you should be able to obtain its
offset in the .text
> > > >>>> section.
> > > >>>>> So, you could potentially encode your own
symbol+offset. However, the
> > > >>>>> linker would not be able to understand your
"custom relocation", and
> > > >>>>> information about regions in the final elf
would be basically broken.
> > > >>>>> So,that would not be a solution...
> > > >>>>>
> > > >>>>> I don't know honestly what is the best
approach to use in this case.
> > > >>>>> As a compromise, it would not be a bad idea
to add the ability to
> > > >>>> specify
> > > >>>>> ranges from command line. What do you think?
> > > >>>>> Still, from a user point of view, the idea
that we don't support object
> > > >>>>> files in general sounds like a big
limitation.
> > > >>>> I agree, only supporting executables is a
limitation.  However, I'd
> > > >>>> like to land the base support now and add in the
additional
> > > >>>> features/support after this large patch set
lands.  But I can see
> > > >>>> where landing the whole thing entirely also
makes sense.
> > > >>>>
> > > >>>>> About the new experimental intrinsics: those
would definitely work well
> > > >>>> for
> > > >>>>> the simple case where instructions are from
the same basic block.
> > > >>>>> However, some/most of the constraints that
you plan to add will have to
> > > >>>>> change if in future we decide to allow
ranges that potentially cross
> > > >>>>> multiple basic blocks. How will the
rules/constraints on those new
> > > >>>>> intrinsics change? I just want to make sure
that the suggested design is
> > > >>>>> future-proof.
> > > >>>> Since the llvm/clang parts of the code are just
responsible for
> > > >>>> collecting where a range starts/ends, I hope
that we can remove some
> > > >>>> of the baked-in constraints that are specified
in IR/Verifier.cpp.
> > > >>>> As you pointed out earlier in this thread, we
might want to
> > > >>>> introduce a dominance check if/when we lift the
one-basic-block
> > > >>>> restriction.
> > > >>>>
> > > >>>> -Matt
> > > >>>>
> > > >>>>> -Andrea
> > > >>>>>
> > > >>>>> On Tue, Nov 27, 2018 at 5:08 PM Andrea Di
Biagio <
> > > >>>> andrea.dibiagio at gmail.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Thanks for clarifying it Matt.
> > > >>>>>>
> > > >>>>>> In general, I quite like your suggested
design.
> > > >>>>>>
> > > >>>>>> My only concern is about the semantic of
the two new intrinsics. You
> > > >>>>>> design doesn't allow mca ranges to
span through multiple basic
> > > >>>> blocks. That
> > > >>>>>> constraint is acceptable for now, since
llvm-mca doesn't know how to
> > > >>>> deal
> > > >>>>>> with control flow.
> > > >>>>>> However, I am a bit concerned about what
might happen in future if we
> > > >>>>>> decide to let users specify code regions
that span through multiple
> > > >>>> basic
> > > >>>>>> blocks. Basically, I don't
particularly like the idea of changing the
> > > >>>>>> semantic of already existing intrinsic.
A design that already
> > > >>>> accounts for
> > > >>>>>> that particular scenario/future work
would be ideal. That being said,
> > > >>>>>> marking those new intrinsics as
'experimental' may be a good
> > > >>>> compromise (at
> > > >>>>>> least for now).
> > > >>>>>>
> > > >>>>>> So, I am quite happy overall with the
direction of this RFC.
> > > >>>>>> However, I am interesting to hear from
other developers about your
> > > >>>>>> suggested design.
> > > >>>>>>
> > > >>>>>>> This initial patch only targets ELF
object files, and does not
> > > >>>> handle
> > > >>>>>> relocatable addresses. Since the start
of a code region is
> > > >>>> represented as
> > > >>>>>> an
> > > >>>>>> assembly label, and referenced in the
.mca_code_regions section, that
> > > >>>>>> address
> > > >>>>>> is relocatable.
> > > >>>>>>
> > > >>>>>> This may be okay for now. However, it
would be nice to remove that
> > > >>>>>> constraint in future and add support to
generic object files.
> > > >>>>>>
> > > >>>>>> -Andrea
> > > >>>>>>
> > > >>>>>> On Thu, Nov 22, 2018 at 7:21 PM
<Matthew.Davis at sony.com> wrote:
> > > >>>>>>
> > > >>>>>>> I want to clarify a few restrictions
of llvm-mca code regions that
> > > >>>> this
> > > >>>>>>> RFC proposes:
> > > >>>>>>>
> > > >>>>>>> 1) All llvm-mca code regions must
start with an
> > > >>>>>>> llvm.mca.code.region.start intrinsic
and end with
> > > >>>>>>> an llvm.mca.code.region.end
intrinsic.  This rule is enforced at the
> > > >>>> IR
> > > >>>>>>> level in the IR verifier.
> > > >>>>>>>
> > > >>>>>>> 2) llvm-mca code regions cannot
nest.  This restriction implies that
> > > >>>> an
> > > >>>>>>> llvm.mca.code.region.start
> > > >>>>>>> must have a llvm.mca.code.region.end
intrinsic without any other
> > > >>>> llvm.mca
> > > >>>>>>> start intrinsics
> > > >>>>>>> between the two. The current
implementation in the patch enforces
> > > >>>> this
> > > >>>>>>> restriction at the
> > > >>>>>>> IR level via the IR Verifier.
> > > >>>>>>>
> > > >>>>>>> 3) An llvm-mca code region cannot
span multiple basic blocks.
> > > >>>> llvm-mca
> > > >>>>>>> does not follow
> > > >>>>>>> branches (yet).  Instead, a branch
instruction is treated by llvm-mca
> > > >>>>>>> like any other instruction.
> > > >>>>>>> The current patch associated with
this RFC does not enforce this
> > > >>>>>>> restriction.  I plan on updating
> > > >>>>>>> the patch to enforce that a code
region can only belong to a single
> > > >>>> basic
> > > >>>>>>> block.  This is a simple
> > > >>>>>>> check, ensuring that both the
llvm.mca.code.region.start and
> > > >>>> accompanying
> > > >>>>>>> end intrinsics live
> > > >>>>>>> in the same basic block. I imagine
adding this check at the IR level
> > > >>>> when
> > > >>>>>>> we also verify points 1 and 2
> > > >>>>>>> above.  That will keep the
code-region verification logic isolated
> > > >>>> to the
> > > >>>>>>> IR verifier.  The start/end
> > > >>>>>>> intrinsics should not have any uses,
so I'm not sure that they would
> > > >>>> be
> > > >>>>>>> moved/sunk on behalf
> > > >>>>>>> of any other instruction.  In other
words, I do not imagine that a
> > > >>>> start
> > > >>>>>>> and end would be split
> > > >>>>>>> apart due to later MI optimizations.
If I discover that such a case
> > > >>>>>>> occurs, then I might add the
> > > >>>>>>> basic-block check prior to emitting
the code region data to the
> > > >>>> object
> > > >>>>>>> file.    Once  llvm-mca  is
> > > >>>>>>> updated to handle branches, then we
can remove this constraint.
> > > >>>>>>>
> > > >>>>>>> -Matt
> > > >>>>>>>
> > > >>>>>>>> -----Original Message-----
> > > >>>>>>>> From: llvm-dev
<llvm-dev-bounces at lists.llvm.org> On Behalf Of Matt
> > > >>>>>>> Davis via llvm-
> > > >>>>>>>> dev
> > > >>>>>>>> Sent: Wednesday, November 21,
2018 8:47 AM
> > > >>>>>>>> To: Andrea Di Biagio
<andrea.dibiagio at gmail.com>
> > > >>>>>>>> Cc: llvm-dev <llvm-dev at
lists.llvm.org>; Di Biagio, Andrea
> > > >>>>>>>> <Andrea.Dibiagio at
sony.com>; cfe-dev at lists.llvm.org
> > > >>>>>>>> Subject: Re: [llvm-dev]
[RFC][llvm-mca] Adding binary support to
> > > >>>>>>> llvm-mca.
> > > >>>>>>>> Hi Andrea,
> > > >>>>>>>>
> > > >>>>>>>> Thanks for your input.
> > > >>>>>>>>
> > > >>>>>>>> On Wed, Nov 21, 2018 at
12:43:52PM +0000, Andrea Di Biagio wrote:
> > > >>>>>>>> [... snip ...]
> > > >>>>>>>>> About the suggested design:
> > > >>>>>>>>> I like the idea of being
able to identify code regions using a
> > > >>>> numeric
> > > >>>>>>>>> identifier.
> > > >>>>>>>>> However, what happens if a
code region spans through multiple
> > > >>>> basic
> > > >>>>>>> blocks?
> > > >>>>>>>> The current patch does not take
into consideration cases where the
> > > >>>>>>>> region start and end intrinsics
are placed in different basic
> > > >>>> blocks.
> > > >>>>>>>> Such would be the case if a
region is defined to span multiple
> > > >>>> blocks.
> > > >>>>>>>> This would be similar to the
current case where a user places a
> > > >>>>>>>> #LLVM-MCA-BEGIN assembly comment
in one block and an #LLVM-
> > MCA-
> > > END
> > > >>>> in
> > > >>>>>>>> another.  However, as you point
out below, if the user does this
> > > >>>> in the
> > > >>>>>>>> source code via intrinsics (just
what this patch is proposing),
> > > >>>> then
> > > >>>>>>>> there is a chance that
optimizations might change the layout of the
> > > >>>>>>>> instructions and confuse the
ordering of the MCA intrinsics.
> > > >>>>>>>>
> > > >>>>>>>> Since MCA does not follow
branches (MCA just treats a branch as it
> > > >>>> would
> > > >>>>>>>> a non-branching instruction), it
seems that a user should be aware
> > > >>>> that
> > > >>>>>>>> defining MCA code regions that
span multiple blocks might result
> > > >>>> in an
> > > >>>>>>>> unexpected analysis.  While we
do not discourage this, it seems
> > > >>>> like
> > > >>>>>>>> such a case will probably not
produce an expected result for the
> > > >>>> user.
> > > >>>>>>>> We could introduce a warning, or
automatically divide the regions
> > > >>>> so
> > > >>>>>>>> that a single region can only
contain a single block.
> > > >>>>>>>>
> > > >>>>>>>>> My understanding is that
code regions are not allowed to
> > > >>>> overlap. So,
> > > >>>>>>> it
> > > >>>>>>>>> makes sense if `
__mca_code_region_end()` doesn't take an ID as
> > > >>>> input.
> > > >>>>>>>>> However, what if `
__mca_code_region_end()` ends in a different
> > > >>>> basic
> > > >>>>>>> block?
> > > >>>>>>>>> `__mca_code_region_start()`
has to always dominate `
> > > >>>>>>>>> __mca_code_region_end()`.
This is trivial to verify when both
> > > >>>> calls
> > > >>>>>>> are in
> > > >>>>>>>>> a same basic block; however,
we need to make sure that the
> > > >>>>>>> relationship is
> > > >>>>>>>>> still the same when the
`end()` call is in a different basic
> > > >>>> block.
> > > >>>>>>>>> That would not be enough. I
think we should also verify  that `
> > > >>>>>>>>> __mca_code_region_end()`
always post-dominates the call to
> > > >>>>>>>>> `__mca_code_region_start()`.
> > > >>>>>>>> In any case this patch should
probably check dominance of the
> > > >>>>>>>> intrinsics, even though MCA does
not follow branches and MCA does
> > > >>>> not
> > > >>>>>>>> not explicitly forbid a region
from containing multiple blocks.
> > > >>>>>>>>
> > > >>>>>>>>> My question is: what happens
with basic block reordering? We
> > > >>>> don't
> > > >>>>>>> know the
> > > >>>>>>>>> layout of basic blocks until
we reach code emission. How does it
> > > >>>> work
> > > >>>>>>> for
> > > >>>>>>>>> regions that span through
multiple basic blocks?. I think your
> > > >>>> RFC
> > > >>>>>>> should
> > > >>>>>>>>> clarify this aspect.
> > > >>>>>>>>>
> > > >>>>>>>>> As a side note: at the
moment, llvm-mca doesn't know how to deal
> > > >>>> with
> > > >>>>>>>>> branches. So, for simplicity
we could force code regions to only
> > > >>>>>>> contain
> > > >>>>>>>>> instructions from a single
basic block.
> > > >>>>>>>>>
> > > >>>>>>>>> However, In future we may
want to teach llvm-mca how to analyze
> > > >>>>>>> branchy
> > > >>>>>>>>> code too. For example, we
could introduce a simple control-flow
> > > >>>>>>> analysis in
> > > >>>>>>>>> llvm-mca, and use an
external "branch trace" information (for
> > > >>>>>>> example, a
> > > >>>>>>>>> perf trace generated by an
external tool) to decorate branches
> > > >>>> with
> > > >>>>>>> with
> > > >>>>>>>>> branch probabilities
(similarly to what we currently do in LLVM
> > > >>>> with
> > > >>>>>>> PGO).
> > > >>>>>>>>> We could then use that
knowledge to model branch prediction and
> > > >>>>>>> simulate
> > > >>>>>>>>> what happens in the presence
of multiple branches.
> > > >>>>>>>>>
> > > >>>>>>>>> So, the idea of having
regions that potentially span multiple
> > > >>>> basic
> > > >>>>>>> blocks
> > > >>>>>>>>> is not bad in general.
However, I think you should better clarify
> > > >>>>>>> what are
> > > >>>>>>>>> the constraints (at least,
you should answer to my questions from
> > > >>>>>>> before).
> > > >>>>>>>> I agree! Thanks for pointing
that out.
> > > >>>>>>>>
> > > >>>>>>>>> If we decide to use those
new intrinsics, then those should be
> > > >>>>>>> experimental
> > > >>>>>>>>> (at least to start).
> > > >>>>>>>> Agreed.
> > > >>>>>>>>
> > > >>>>>>>> -Matt
> > > >>>>>>>>
> > > >>>>>>>>> On Thu, Nov 15, 2018 at
11:07 PM via llvm-dev <
> > > >>>>>>> llvm-dev at lists.llvm.org>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Introduction
> > > >>>>>>>>>> -----------------
> > > >>>>>>>>>> Currently llvm-mca only
accepts assembly code as input. We
> > > >>>> would
> > > >>>>>>> like to
> > > >>>>>>>>>> extend llvm-mca to
support object files, allowing users to
> > > >>>> analyze
> > > >>>>>>> the
> > > >>>>>>>>>> performance of binaries.
The proposed changes (which involve
> > > >>>> both
> > > >>>>>>>>>> clang and llvm)
optionally introduce an object file section,
> > > >>>> but
> > > >>>>>>> this can
> > > >>>>>>>>>> be
> > > >>>>>>>>>> stripped-out if desired.
> > > >>>>>>>>>>
> > > >>>>>>>>>> For the llvm-mca binary
support feature to be useful, a user
> > > >>>> needs
> > > >>>>>>> to tell
> > > >>>>>>>>>> llvm-mca which portions
of their code they would like analyzed.
> > > >>>>>>> Currently,
> > > >>>>>>>>>> this is accomplished via
assembly comments. However, assembly
> > > >>>>>>> comments are
> > > >>>>>>>>>> not
> > > >>>>>>>>>> preserved in object
files, and this has encouraged this RFC.
> > > >>>> For the
> > > >>>>>>>>>> proposed
> > > >>>>>>>>>> binary support, we need
to introduce changes to clang and llvm
> > > >>>> to
> > > >>>>>>> allow the
> > > >>>>>>>>>> user's object code
to be recognized by llvm-mca:
> > > >>>>>>>>>>
> > > >>>>>>>>>> * We need a way for a
user to identify a region/block of code
> > > >>>> they
> > > >>>>>>> want
> > > >>>>>>>>>>     analyzed by
llvm-mca.
> > > >>>>>>>>>> * We need the
information defining the user's region of code
> > > >>>> to be
> > > >>>>>>>>>> maintained
> > > >>>>>>>>>>     in the object file
so that llvm-mca can analyze the desired
> > > >>>>>>> region(s)
> > > >>>>>>>>>> from the
> > > >>>>>>>>>>     object file.
> > > >>>>>>>>>>
> > > >>>>>>>>>> We define a "code
region" as a subset of a user's program that
> > > >>>> is
> > > >>>>>>> to be
> > > >>>>>>>>>> analyzed via llvm-mca.
The sequence of instructions to be
> > > >>>> analyzed
> > > >>>>>>> is
> > > >>>>>>>>>> represented as a pair:
<start, end> where the 'start' marks the
> > > >>>>>>> beginning
> > > >>>>>>>>>> of
> > > >>>>>>>>>> the user's source
code and 'end' terminates the sequence. The
> > > >>>>>>> instructions
> > > >>>>>>>>>> between 'start'
and 'end' form the region that can be analyzed
> > > >>>> by
> > > >>>>>>> llvm-mca
> > > >>>>>>>>>> at a
> > > >>>>>>>>>> later time.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Example
> > > >>>>>>>>>> -----------
> > > >>>>>>>>>> Before we go into the
details of this proposed change, let's
> > > >>>> first
> > > >>>>>>> look at
> > > >>>>>>>>>> a
> > > >>>>>>>>>> simple example:
> > > >>>>>>>>>>
> > > >>>>>>>>>> // example.c -- Analyze
a dot-product expression.
> > > >>>>>>>>>> double test(double x,
double y) {
> > > >>>>>>>>>>     double result = 0.0;
> > > >>>>>>>>>>    
__mca_code_region_start(42);
> > > >>>>>>>>>>     result += x * y;
> > > >>>>>>>>>>    
__mca_code_region_end();
> > > >>>>>>>>>>     return result;
> > > >>>>>>>>>> }
> > > >>>>>>>>>>
> > > >>>>>>>>>> In the example above, we
have identified a code region, in this
> > > >>>>>>> case a
> > > >>>>>>>>>> single
> > > >>>>>>>>>> dot-product expression.
For the sake of brevity and simplicity,
> > > >>>>>>> we've
> > > >>>>>>>>>> chosen
> > > >>>>>>>>>> a very simple example,
but in reality a more complicated
> > > >>>> example
> > > >>>>>>> could use
> > > >>>>>>>>>> multiple expressions. We
have also denoted this region as
> > > >>>> number
> > > >>>>>>> 42. That
> > > >>>>>>>>>> identifier is only for
the user, and simplifies reading an
> > > >>>> llvm-mca
> > > >>>>>>>>>> analysis
> > > >>>>>>>>>> report later.
> > > >>>>>>>>>>
> > > >>>>>>>>>> When this code is
compiled, the region markers (the
> > > >>>> mca_code_region
> > > >>>>>>>>>> markers)
> > > >>>>>>>>>> are transformed into
assembly labels. While the markers are
> > > >>>>>>> presented as
> > > >>>>>>>>>> function calls, in
reality they are no-ops.
> > > >>>>>>>>>>
> > > >>>>>>>>>> test:
> > > >>>>>>>>>> pushq   %rbp
> > > >>>>>>>>>> movq    %rsp, %rbp
> > > >>>>>>>>>> movsd   %xmm0, -8(%rbp)
> > > >>>>>>>>>> movsd   %xmm1, -16(%rbp)
> > > >>>>>>>>>>
.Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
> > > >>>>>>>>>> xorps   %xmm0, %xmm0
> > > >>>>>>>>>> movsd   %xmm0, -24(%rbp)
> > > >>>>>>>>>> movsd   -8(%rbp), %xmm0
> > > >>>>>>>>>> mulsd   -16(%rbp), %xmm0
> > > >>>>>>>>>> addsd   -24(%rbp), %xmm0
> > > >>>>>>>>>> movsd   %xmm0, -24(%rbp)
> > > >>>>>>>>>> .Lmca_code_region_end_0:
# LLVM-MCA-END ID: 42
> > > >>>>>>>>>> movsd   -24(%rbp), %xmm0
> > > >>>>>>>>>> popq    %rbp
> > > >>>>>>>>>> retq
> > > >>>>>>>>>> .section       
.mca_code_regions,"", at progbits
> > > >>>>>>>>>> .quad   42
> > > >>>>>>>>>> .quad  
.Lmca_code_region_start_0
> > > >>>>>>>>>> .quad  
.Lmca_code_region_end_0-.Lmca_code_region_start_0
> > > >>>>>>>>>>
> > > >>>>>>>>>> The assembly has been
trimmed to show the portions relevant to
> > > >>>> this
> > > >>>>>>> RFC.
> > > >>>>>>>>>> Notice the labels
enclose the user's defined region, and that
> > > >>>> they
> > > >>>>>>>>>> preserve the
> > > >>>>>>>>>> user's arbitrary
region identifier, the ever-so-important
> > > >>>> region 42.
> > > >>>>>>>>>> In the object file
section .mca_code_regions, we have noted the
> > > >>>>>>> user's
> > > >>>>>>>>>> region
> > > >>>>>>>>>> identifier (.quad 42),
start address, and region size. A more
> > > >>>>>>> complicated
> > > >>>>>>>>>> example can have
multiple regions defined within a single
> > > >>>>>>> .mca_code_regions
> > > >>>>>>>>>> section. This section
can be read by llvm-mca, allowing
> > > >>>> llvm-mca to
> > > >>>>>>> take
> > > >>>>>>>>>> object files as input
instead of assembly source.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Details
> > > >>>>>>>>>> ---------
> > > >>>>>>>>>> We need a way for a user
to identify a region/block of code
> > > >>>> they
> > > >>>>>>> want
> > > >>>>>>>>>> analyzed
> > > >>>>>>>>>> by llvm-mca. We solve
this problem by introducing two
> > > >>>> intrinsics
> > > >>>>>>> that a
> > > >>>>>>>>>> user can
> > > >>>>>>>>>> specify, for identifying
regions of code for analysis.
> > > >>>>>>>>>>
> > > >>>>>>>>>> The two intrinsics are:
llvm.mca.code.regions.start and
> > > >>>>>>>>>>
llvm.mca.code.regions.end. A user can identify a code region by
> > > >>>>>>> inserting
> > > >>>>>>>>>> the
> > > >>>>>>>>>> mca_code_region_start
and mca_code_region_end markers. These
> > > >>>> are
> > > >>>>>>> simply
> > > >>>>>>>>>> clang builtins and are
transformed into the aforementioned
> > > >>>>>>> intrinsics
> > > >>>>>>>>>> during
> > > >>>>>>>>>> compilation. The code
between the intrinsics are what we call
> > > >>>> "code
> > > >>>>>>>>>> regions"
> > > >>>>>>>>>> and are to be easily
identifiable by llvm-mca; any code
> > > >>>> between a
> > > >>>>>>> start/end
> > > >>>>>>>>>> pair can be analyzed by
llvm-mca at a later time. A user can
> > > >>>> define
> > > >>>>>>>>>> multiple
> > > >>>>>>>>>> non-overlapping code
regions within their program.
> > > >>>>>>>>>>
> > > >>>>>>>>>> The
llvm.mca.code.region.start intrinsic takes an integer
> > > >>>> constant
> > > >>>>>>> as its
> > > >>>>>>>>>> only
> > > >>>>>>>>>> argument. This argument
is implemented as a metadata i32, and
> > > >>>> is
> > > >>>>>>> only used
> > > >>>>>>>>>> when generating llvm-mca
reports. This value allows a user to
> > > >>>> more
> > > >>>>>>> easily
> > > >>>>>>>>>> identify a specific code
region. llvm.mca.code.region.end
> > > >>>> takes no
> > > >>>>>>>>>> arguments.
> > > >>>>>>>>>> Since we disallow
nesting of regions, the first 'end' intrinsic
> > > >>>>>>> lexically
> > > >>>>>>>>>> following a
'start' intrinsic represents the end of that code
> > > >>>>>>> region.
> > > >>>>>>>>>> Now that we have a
solution for identifying regions for
> > > >>>> analysis,
> > > >>>>>>> we now
> > > >>>>>>>>>> need a
> > > >>>>>>>>>> way for preserving that
information to be read at a later
> > > >>>> time. To
> > > >>>>>>>>>> accomplish
> > > >>>>>>>>>> this we propose adding a
new section (.mca_code_regions) to the
> > > >>>>>>> object file
> > > >>>>>>>>>> generated by llvm.
During code generation, the start/end
> > > >>>> intrinsics
> > > >>>>>>>>>> described
> > > >>>>>>>>>> above will be
transformed into start/end labels in assembly.
> > > >>>> When
> > > >>>>>>> llvm
> > > >>>>>>>>>> generates the object
file from the user's code, these start/end
> > > >>>>>>> labels
> > > >>>>>>>>>> form a
> > > >>>>>>>>>> pair of values
identifying the start of the user's code
> > > >>>> region, and
> > > >>>>>>> size.
> > > >>>>>>>>>> The
> > > >>>>>>>>>> size represents the
number of bytes between the start and end
> > > >>>>>>> address of
> > > >>>>>>>>>> the
> > > >>>>>>>>>> labels. Note that the
labels are emitted during assembly
> > > >>>> printing.
> > > >>>>>>> We hope
> > > >>>>>>>>>> that these labels have
no influence on code generation or
> > > >>>>>>> basic-block
> > > >>>>>>>>>> placement. However, the
target assembler strategy for handling
> > > >>>>>>> labels is
> > > >>>>>>>>>> outside of our control.
> > > >>>>>>>>>>
> > > >>>>>>>>>> This proposed change
affects the size of a binary, but only if
> > > >>>> the
> > > >>>>>>> user
> > > >>>>>>>>>> calls
> > > >>>>>>>>>> the start/end builtins
mentioned above. The additional size of
> > > >>>> the
> > > >>>>>>>>>> .mca_code_regions
section, which we imagine to be very small
> > > >>>> (to
> > > >>>>>>> the order
> > > >>>>>>>>>> of a
> > > >>>>>>>>>> few bytes), can
trivially be stripped by tools like 'strip' or
> > > >>>>>>> 'objcopy'.
> > > >>>>>>>>>> Implementation Status
> > > >>>>>>>>>>
------------------------------
> > > >>>>>>>>>> We currently have the
proposed changes implemented at the url
> > > >>>>>>> posted below.
> > > >>>>>>>>>> This initial patch only
targets ELF object files, and does not
> > > >>>>>>> handle
> > > >>>>>>>>>> relocatable addresses.
Since the start of a code region is
> > > >>>>>>> represented as
> > > >>>>>>>>>> an
> > > >>>>>>>>>> assembly label, and
referenced in the .mca_code_regions
> > > >>>> section,
> > > >>>>>>> that
> > > >>>>>>>>>> address
> > > >>>>>>>>>> is relocatable. That
value can be represented as
> > > >>>> section-relative
> > > >>>>>>>>>> relocatable
> > > >>>>>>>>>> symbol (.text + addend),
but we are not handling that case yet.
> > > >>>>>>> Instead,
> > > >>>>>>>>>> the
> > > >>>>>>>>>> proposed changes only
handle linked/executable object files.
> > > >>>>>>>>>>
> > > >>>>>>>>>> For purposes of review
and to communicate the idea, the change
> > > >>>> is
> > > >>>>>>>>>> presented as a
monolithic patch here:
> > > >>>>>>>>>>
> > > >>>>>>>>>>
https://reviews.llvm.org/D54603
> > > >>>>>>>>>>
> > > >>>>>>>>>> The change is presented
as a monolithic patch; however, if
> > > >>>> accepted
> > > >>>>>>>>>> the patch will be split
into three smaller patches:
> > > >>>>>>>>>> 1. The introduction of
the builtins to clang.
> > > >>>>>>>>>> 2. The llvm portion (the
added intrinsics).
> > > >>>>>>>>>> 3. The llvm-mca portion.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thanks!
> > > >>>>>>>>>>
> > > >>>>>>>>>> -Matt
> > > >>>>>>>>>>
_______________________________________________
> > > >>>>>>>>>> LLVM Developers mailing
list
> > > >>>>>>>>>> llvm-dev at
lists.llvm.org
> > > >>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > >>>>>>>>>>
> > > >>>>>>>>
_______________________________________________
> > > >>>>>>>> LLVM Developers mailing list
> > > >>>>>>>> llvm-dev at lists.llvm.org
> > > >>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

via llvm-dev

2018-Dec-18 04:37 UTC

head link

[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Hi Andrea,

I’ve updated the patch here:
https://reviews.llvm.org/D54603/new/

I have not modified the description in the Phabricator.  Of course, if this
solution is the way-forward, then I will be happy to update the change
description.
I also modified the format of the symbol names (from what I previously describe
in my last response below), to save room: 
“.mca_code_region.<id>.<number>”.  Where “id” is the user’s
specified identifier they want associated to the region (cosmetic), and “number”
is a module unique number for the region.  I have tested this with
non-executable objects, executable files, and multiple objects linking into a
single executable.

Also, I am trimming this thread up (sorry, but it’s starting to get unwieldy and
I think the mailing list bot is grumpy).

-Matt


From: Andrea Di Biagio <andrea.dibiagio at gmail.com>
Sent: Monday, December 17, 2018 10:12 AM
To: Davis, Matthew <Matthew.Davis at sony.com>
Cc: Simon Pilgrim <llvm-dev at redking.me.uk>; llvm-dev <llvm-dev at
lists.llvm.org>; cfe-dev at lists.llvm.org
Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Hi Matt,

could you please update the associated patch too?

Thanks
-Andrea

On Mon, Dec 17, 2018 at 5:48 PM via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Adding the llvm-dev list, because my email client decides to remove certain
lists when I reply-all... including the list that I intend to respond to.
> -----Original Message-----
> From: Davis, Matthew <Matthew.Davis at sony.com<mailto:Matthew.Davis
at sony.com>>
> Sent: Monday, December 17, 2018 9:47 AM
> To: Davis, Matthew <Matthew.Davis at sony.com<mailto:Matthew.Davis at
sony.com>>; llvm-dev at redking.me.uk<mailto:llvm-dev at
redking.me.uk>
> Cc: cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
> Subject: RE: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.
>
> Just an update to this RFC.  (I thought this was going to be a short
email... my
> apologies).
>
> One of the primary limitations described in this RFC (earlier in this
thread), is that the
> patch for this RFC only handles linked executables.  This restriction is
due to myself
> wanting to avoid handling relocations in a target specific manner, at least
in the
> initial patch.  Especially so, since I want to keep the initial patch
simple.  However,
> sometimes simple and practical are at odds with each other.  I envision the
main use-
> case of llvm-mca, with binary support, is for analyzing .o files.  Probably
analyzing .o
> more commonly than fully linked executables.  Without handling relocated
objects,
> this patch seems rather useless.
>
> I started exploring an alternative solution this weekend to the
aforementioned
> problem.  This alternative solution avoids having to handle relocations,
but does give
> us support for object files (with relocated symbols) and linked
executables.  The
> change is quite simple, and seems to be effective.  In short, we still
generate
> intrinsics as discussed in the RFC, one to mark the start of a code region,
and
> another to mark the end.  These intrinsics get lowered into local symbols. 
The
> symbols are already  encoded with address information about their position
in the
> object file.  What is different is that we ensure that these symbols have
unique
> names and also encode the user provided ID value.
>
> Previously the labels  were named like:
.Lmca_code_region_start_<number>, and
> similar for .Lmca_code_region_end.  The user id number and region size were
> encoded in the .mca_code_regions object file section.  Previously mca never
looked
> at the symbol table.  But, In reality we can calculate the region size by
using the
> symbols in the symbol table (look for the mca symbols), instead of relying
on the
> information encoded in .mca_code_regions.  The alternative approach gets
rid of
> that section entirely but achieves the same functionality by encoding the
information
> in the symbol name. In short the alternative approach just parses the
symbol table
> for MCA symbols, and the symbol names are encoded with the data we need.
>
> The newly proposed name is formatted
> as: .Lmca_code_region_start.<id>.<function>.<number>. 
Similar for
> mca_code_region_end.  'function' is the function that the marker
appears in, 'ID' is
> the user-specified ID (this is a value that users specify for easily
identifying the code
> region under analysis... just cosmetic), and 'number' is a unique
number to avoid any
> duplicate name conflicts.  The benefit of this alternative solution is that
we can get
> rid of .mca_code_regions, and gather all of the information llvm-mca needs
by
> parsing the symbol table looking for any symbols with the
'mca_code_region_start'
> and 'mca_code_region_end' format discussed above.  Of course, if
the string table is
> stripped, then we will lose this data.  The main drawback from this
alternative
> approach is that it relies on encoding symbol names and string processing
on those
> names.   I'm somewhat biased against doing string parsing, but the code
to perform
> this is simple and small, and more importantly it allows llvm-mca to handle
linked or
> relocated object files.
>
> -Matt
>
>
>
> > -----Original Message-----
> > From: llvm-dev <llvm-dev-bounces at
lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>> On Behalf Of
via llvm-dev
> > Sent: Tuesday, December 11, 2018 11:23 AM
> > To: llvm-dev at redking.me.uk<mailto:llvm-dev at redking.me.uk>;
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
> > Cc: cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
> > Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to
llvm-mca.
> >
> > Thanks for the response Simon.  My reply is inline:
> >
> > > From: Simon Pilgrim <llvm-dev at
redking.me.uk<mailto:llvm-dev at redking.me.uk>>
> > > Sent: Monday, December 10, 2018 1:40 PM
> > >
> > > Hi Matt,
> > >
> > > I can see a near future where perf-analysis tooling uses branch
history
> > > profiler captures to determine how often loops/branches are taken
and
> > > feeds that into llvm-mca, especially for hot/branchy loop
analysis
> > > reports etc. Are you confident that your approach will be easily
> > > extendable for this?
> >
> > That is a very interesting use case.  The restriction of a code-region
to a single
> block
> > is a limitation for any tools that want to analyze branches.  However,
I believe
> > that it will be easy to lift this restriction (it's just a check
in IR/Verifier).  This
> > limitation is not
> > expressed in the llvm-mca driver.
> >
> > If the information is coming from a profile report, then we'd most
> > likely need to extend the llvm-mca driver to accept profile reports. 
Currently,
> > code regions, from the perspective of the  llvm-mca driver, are very
simple. They
> are
> > just a collection of MCInst.  The binary support in this RFC+patch
> > disassembles just the address range from marker start address for
> > some specified number of bytes.  It might be useful to add another
driver argument
> > so that a user (or tool) can specify, from the command line, a range
of instructions
> > to
> > analyze.  I recently added a class for handling inputs to
> > llvm::mca::CodeRegionGenerator,
> > which is just responsible for taking some input and creating a list of
MCInst that
> llvm-
> > mca uses.
> > We could subclass this to handle profile reports.
> >
> > > Similarly, being able to generally embed the profile markers in
object
> > > libraries for reuse is going to be important for some people -
I'd like
> > > to see more of a plan of how this will be achieved. I understand
that it
> > > might not be easy for some exe formats.
> >
> > That is definitely a limitation.  This initial patch+RFC only handles
linked
> > executables (i.e., the llvm-mca marker symbol addresses are resolved).
> > I'm working on a better solution so that this will not be a
restriction.
> > In fact, I'll probably delay trying to land any patches until I
solve relocations
> > (or use a different solution for identifying start/end addresses for
llvm-mca code
> > regions).
> >
> > > Sorry if I'm being too critical, but I'm a bit worried
that we end up
> > > with an initial implementation that will take a lot of reworking
to meet
> > > our final aims.
> > >
> > > Thanks, Simon.
> >
> > I understand your criticisms and value your input. Thanks a ton!
> >
> > -Matt,-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181218/f4859425/attachment-0001.html>

via llvm-dev

2019-Jan-02 20:09 UTC

head link

[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Forgot to cc the dev list (my email client seems to drop certain lists when
reply-all is selected).

From: Davis, Matthew <Matthew.Davis at sony.com>
Sent: Wednesday, January 2, 2019 12:08 PM
To: Davis, Matthew <Matthew.Davis at sony.com>; andrea.dibiagio at
gmail.com
Cc: cfe-dev at lists.llvm.org; Simon Pilgrim <llvm-dev at redking.me.uk>
Subject: RE: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

I wanted to write a response to describe the current state of this RFC and
accompanying patch.  I’ll _try_ not to make this a wall of text.

A bit deeper in the thread we discussed a few limitations of the patch:

1)     The patch only worked on linked objects (relocations are resolved).

2)     The patch limited code regions to a single basic block.

a.      This was a decision made because llvm-mca does not ‘follow’ branch
instructions.

The most recent version of this patch (December 17th (minor clean up on the
23rd)) addresses #1.  This means that code-regions can be defined in an object
file (unlinked) and can be analyzed by llvm-mca. The solution implemented relies
solely on the symbol table.  llvm-mca can locate each code region based on
address values specified by the symbol table, but it means that we have to name
the start and end symbols in a way such that llvm-mca can locate them in the
symbol table.  The format is discussed earlier in this thread, but it relies on
llvm-mca scanning the symbol table looking for symbols with the name
“.mca_code_region_{start,end}.” format.

#2 Is trickier to solve, in the sense that there’s no right way.  I removed the
error checker (IR/Verifier.cpp) that rejected programs if a code region spans
multiple blocks.  In short, code regions are just labels (start,end) and llvm
does not verify their depth (if they span multiple blocks).  Instead, llvm now
just checks that regions are terminated and that a regions are not nested (which
the patch has always done).   This means that llvm-mca will handle regions that
span multiple blocks, as it currently does if the user would have manually
instrumented the assembly input with #LLVM-MCA-BEGIN comments that encapsulate
instructions that span multiple blocks.  I hope that this is a reasonable
solution?

In short I think we have addressed #1, and possibly #2.  By lifting the
restriction to #2 (removing the IR/Verifier check), we now put the
responsibility for rejecting regions to llvm-mca (in the case of regions that
span multiple basic blocks).  In other words, when it comes to multiple blocks, 
this patch maintains the same semantics as the LLVM-MCA-BEGIN comments a user
can add to their assembly.  When branch handling is implemented in llvm-mca, we
would want to lift this restriction in the complier anyways.

https://reviews.llvm.org/D54603/new/

-Matt

From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces
at lists.llvm.org>> On Behalf Of via llvm-dev
Sent: Monday, December 17, 2018 8:38 PM
To: andrea.dibiagio at gmail.com<mailto:andrea.dibiagio at gmail.com>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; cfe-dev
at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Hi Andrea,

I’ve updated the patch here:
https://reviews.llvm.org/D54603/new/

I have not modified the description in the Phabricator.  Of course, if this
solution is the way-forward, then I will be happy to update the change
description.
I also modified the format of the symbol names (from what I previously describe
in my last response below), to save room: 
“.mca_code_region.<id>.<number>”.  Where “id” is the user’s
specified identifier they want associated to the region (cosmetic), and “number”
is a module unique number for the region.  I have tested this with
non-executable objects, executable files, and multiple objects linking into a
single executable.

Also, I am trimming this thread up (sorry, but it’s starting to get unwieldy and
I think the mailing list bot is grumpy).

-Matt

From: Andrea Di Biagio <andrea.dibiagio at
gmail.com<mailto:andrea.dibiagio at gmail.com>>
Sent: Monday, December 17, 2018 10:12 AM
To: Davis, Matthew <Matthew.Davis at sony.com<mailto:Matthew.Davis at
sony.com>>
Cc: Simon Pilgrim <llvm-dev at redking.me.uk<mailto:llvm-dev at
redking.me.uk>>; llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>; cfe-dev at
lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Hi Matt,

could you please update the associated patch too?

Thanks
-Andrea

On Mon, Dec 17, 2018 at 5:48 PM via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Adding the llvm-dev list, because my email client decides to remove certain
lists when I reply-all... including the list that I intend to respond to.
> -----Original Message-----
> From: Davis, Matthew <Matthew.Davis at sony.com<mailto:Matthew.Davis
at sony.com>>
> Sent: Monday, December 17, 2018 9:47 AM
> To: Davis, Matthew <Matthew.Davis at sony.com<mailto:Matthew.Davis at
sony.com>>; llvm-dev at redking.me.uk<mailto:llvm-dev at
redking.me.uk>
> Cc: cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
> Subject: RE: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.
>
> Just an update to this RFC.  (I thought this was going to be a short
email... my
> apologies).
>
> One of the primary limitations described in this RFC (earlier in this
thread), is that the
> patch for this RFC only handles linked executables.  This restriction is
due to myself
> wanting to avoid handling relocations in a target specific manner, at least
in the
> initial patch.  Especially so, since I want to keep the initial patch
simple.  However,
> sometimes simple and practical are at odds with each other.  I envision the
main use-
> case of llvm-mca, with binary support, is for analyzing .o files.  Probably
analyzing .o
> more commonly than fully linked executables.  Without handling relocated
objects,
> this patch seems rather useless.
>
> I started exploring an alternative solution this weekend to the
aforementioned
> problem.  This alternative solution avoids having to handle relocations,
but does give
> us support for object files (with relocated symbols) and linked
executables.  The
> change is quite simple, and seems to be effective.  In short, we still
generate
> intrinsics as discussed in the RFC, one to mark the start of a code region,
and
> another to mark the end.  These intrinsics get lowered into local symbols. 
The
> symbols are already  encoded with address information about their position
in the
> object file.  What is different is that we ensure that these symbols have
unique
> names and also encode the user provided ID value.
>
> Previously the labels  were named like:
.Lmca_code_region_start_<number>, and
> similar for .Lmca_code_region_end.  The user id number and region size were
> encoded in the .mca_code_regions object file section.  Previously mca never
looked
> at the symbol table.  But, In reality we can calculate the region size by
using the
> symbols in the symbol table (look for the mca symbols), instead of relying
on the
> information encoded in .mca_code_regions.  The alternative approach gets
rid of
> that section entirely but achieves the same functionality by encoding the
information
> in the symbol name. In short the alternative approach just parses the
symbol table
> for MCA symbols, and the symbol names are encoded with the data we need.
>
> The newly proposed name is formatted
> as: .Lmca_code_region_start.<id>.<function>.<number>. 
Similar for
> mca_code_region_end.  'function' is the function that the marker
appears in, 'ID' is
> the user-specified ID (this is a value that users specify for easily
identifying the code
> region under analysis... just cosmetic), and 'number' is a unique
number to avoid any
> duplicate name conflicts.  The benefit of this alternative solution is that
we can get
> rid of .mca_code_regions, and gather all of the information llvm-mca needs
by
> parsing the symbol table looking for any symbols with the
'mca_code_region_start'
> and 'mca_code_region_end' format discussed above.  Of course, if
the string table is
> stripped, then we will lose this data.  The main drawback from this
alternative
> approach is that it relies on encoding symbol names and string processing
on those
> names.   I'm somewhat biased against doing string parsing, but the code
to perform
> this is simple and small, and more importantly it allows llvm-mca to handle
linked or
> relocated object files.
>
> -Matt
>
>
>
> > -----Original Message-----
> > From: llvm-dev <llvm-dev-bounces at
lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>> On Behalf Of
via llvm-dev
> > Sent: Tuesday, December 11, 2018 11:23 AM
> > To: llvm-dev at redking.me.uk<mailto:llvm-dev at redking.me.uk>;
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
> > Cc: cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
> > Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to
llvm-mca.
> >
> > Thanks for the response Simon.  My reply is inline:
> >
> > > From: Simon Pilgrim <llvm-dev at
redking.me.uk<mailto:llvm-dev at redking.me.uk>>
> > > Sent: Monday, December 10, 2018 1:40 PM
> > >
> > > Hi Matt,
> > >
> > > I can see a near future where perf-analysis tooling uses branch
history
> > > profiler captures to determine how often loops/branches are taken
and
> > > feeds that into llvm-mca, especially for hot/branchy loop
analysis
> > > reports etc. Are you confident that your approach will be easily
> > > extendable for this?
> >
> > That is a very interesting use case.  The restriction of a code-region
to a single
> block
> > is a limitation for any tools that want to analyze branches.  However,
I believe
> > that it will be easy to lift this restriction (it's just a check
in IR/Verifier).  This
> > limitation is not
> > expressed in the llvm-mca driver.
> >
> > If the information is coming from a profile report, then we'd most
> > likely need to extend the llvm-mca driver to accept profile reports. 
Currently,
> > code regions, from the perspective of the  llvm-mca driver, are very
simple. They
> are
> > just a collection of MCInst.  The binary support in this RFC+patch
> > disassembles just the address range from marker start address for
> > some specified number of bytes.  It might be useful to add another
driver argument
> > so that a user (or tool) can specify, from the command line, a range
of instructions
> > to
> > analyze.  I recently added a class for handling inputs to
> > llvm::mca::CodeRegionGenerator,
> > which is just responsible for taking some input and creating a list of
MCInst that
> llvm-
> > mca uses.
> > We could subclass this to handle profile reports.
> >
> > > Similarly, being able to generally embed the profile markers in
object
> > > libraries for reuse is going to be important for some people -
I'd like
> > > to see more of a plan of how this will be achieved. I understand
that it
> > > might not be easy for some exe formats.
> >
> > That is definitely a limitation.  This initial patch+RFC only handles
linked
> > executables (i.e., the llvm-mca marker symbol addresses are resolved).
> > I'm working on a better solution so that this will not be a
restriction.
> > In fact, I'll probably delay trying to land any patches until I
solve relocations
> > (or use a different solution for identifying start/end addresses for
llvm-mca code
> > regions).
> >
> > > Sorry if I'm being too critical, but I'm a bit worried
that we end up
> > > with an initial implementation that will take a lot of reworking
to meet
> > > our final aims.
> > >
> > > Thanks, Simon.
> >
> > I understand your criticisms and value your input. Thanks a ton!
> >
> > -Matt,-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190102/a539f803/attachment-0001.html>

llvm dev - Dec 2018 - [RFC][llvm-mca] Adding binary support to llvm-mca.

[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.