via llvm-dev
2018-Nov-15  23:06 UTC
[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.
Introduction
-----------------
Currently llvm-mca only accepts assembly code as input. We would like to
extend llvm-mca to support object files, allowing users to analyze the
performance of binaries. The proposed changes (which involve both
clang and llvm) optionally introduce an object file section, but this can be
stripped-out if desired.
For the llvm-mca binary support feature to be useful, a user needs to tell
llvm-mca which portions of their code they would like analyzed. Currently,
this is accomplished via assembly comments. However, assembly comments are not
preserved in object files, and this has encouraged this RFC. For the proposed
binary support, we need to introduce changes to clang and llvm to allow the
user's object code to be recognized by llvm-mca:
* We need a way for a user to identify a region/block of code they want
   analyzed by llvm-mca.
* We need the information defining the user's region of code to be
maintained
   in the object file so that llvm-mca can analyze the desired region(s) from
the
   object file.
We define a "code region" as a subset of a user's program that is
to be
analyzed via llvm-mca. The sequence of instructions to be analyzed is
represented as a pair: <start, end> where the 'start' marks the
beginning of
the user's source code and 'end' terminates the sequence. The
instructions
between 'start' and 'end' form the region that can be analyzed
by llvm-mca at a
later time.
Example
-----------
Before we go into the details of this proposed change, let's first look at a
simple example:
// example.c -- Analyze a dot-product expression.
double test(double x, double y) {
   double result = 0.0;
   __mca_code_region_start(42);
   result += x * y;
   __mca_code_region_end();
   return result;
}
In the example above, we have identified a code region, in this case a single
dot-product expression. For the sake of brevity and simplicity, we've chosen
a very simple example, but in reality a more complicated example could use
multiple expressions. We have also denoted this region as number 42. That
identifier is only for the user, and simplifies reading an llvm-mca analysis
report later.
When this code is compiled, the region markers (the mca_code_region markers)
are transformed into assembly labels. While the markers are presented as
function calls, in reality they are no-ops.
test:
pushq	%rbp
movq	%rsp, %rbp
movsd	%xmm0, -8(%rbp)
movsd	%xmm1, -16(%rbp)
.Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
xorps	%xmm0, %xmm0
movsd	%xmm0, -24(%rbp)
movsd	-8(%rbp), %xmm0
mulsd	-16(%rbp), %xmm0
addsd	-24(%rbp), %xmm0
movsd	%xmm0, -24(%rbp)
.Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
movsd	-24(%rbp), %xmm0
popq	%rbp
retq
.section	.mca_code_regions,"", at progbits
.quad	42
.quad	.Lmca_code_region_start_0
.quad	.Lmca_code_region_end_0-.Lmca_code_region_start_0
The assembly has been trimmed to show the portions relevant to this RFC.
Notice the labels enclose the user's defined region, and that they preserve
the
user's arbitrary region identifier, the ever-so-important region 42.
In the object file section .mca_code_regions, we have noted the user's
region
identifier (.quad 42), start address, and region size. A more complicated
example can have multiple regions defined within a single .mca_code_regions
section. This section can be read by llvm-mca, allowing llvm-mca to take
object files as input instead of assembly source.
Details
---------
We need a way for a user to identify a region/block of code they want analyzed
by llvm-mca. We solve this problem by introducing two intrinsics that a user can
specify, for identifying regions of code for analysis.
The two intrinsics are: llvm.mca.code.regions.start and
llvm.mca.code.regions.end. A user can identify a code region by inserting the
mca_code_region_start and mca_code_region_end markers. These are simply
clang builtins and are transformed into the aforementioned intrinsics during
compilation. The code between the intrinsics are what we call "code
regions"
and are to be easily identifiable by llvm-mca; any code between a start/end
pair can be analyzed by llvm-mca at a later time. A user can define multiple
non-overlapping code regions within their program.
The llvm.mca.code.region.start intrinsic takes an integer constant as its only
argument. This argument is implemented as a metadata i32, and is only used
when generating llvm-mca reports. This value allows a user to more easily
identify a specific code region. llvm.mca.code.region.end takes no arguments.
Since we disallow nesting of regions, the first 'end' intrinsic
lexically
following a 'start' intrinsic represents the end of that code region.
Now that we have a solution for identifying regions for analysis, we now need a
way for preserving that information to be read at a later time. To accomplish
this we propose adding a new section (.mca_code_regions) to the object file
generated by llvm. During code generation, the start/end intrinsics described
above will be transformed into start/end labels in assembly. When llvm
generates the object file from the user's code, these start/end labels form
a
pair of values identifying the start of the user's code region, and size.
The
size represents the number of bytes between the start and end address of the
labels. Note that the labels are emitted during assembly printing. We hope
that these labels have no influence on code generation or basic-block
placement. However, the target assembler strategy for handling labels is
outside of our control.
This proposed change affects the size of a binary, but only if the user calls
the start/end builtins mentioned above. The additional size of the
.mca_code_regions section, which we imagine to be very small (to the order of a
few bytes), can trivially be stripped by tools like 'strip' or
'objcopy'.
Implementation Status
------------------------------
We currently have the proposed changes implemented at the url posted below.
This initial patch only targets ELF object files, and does not handle
relocatable addresses. Since the start of a code region is represented as an
assembly label, and referenced in the .mca_code_regions section, that address
is relocatable. That value can be represented as section-relative relocatable
symbol (.text + addend), but we are not handling that case yet. Instead, the
proposed changes only handle linked/executable object files.
For purposes of review and to communicate the idea, the change is
presented as a monolithic patch here:
https://reviews.llvm.org/D54603
The change is presented as a monolithic patch; however, if accepted
the patch will be split into three smaller patches:
1. The introduction of the builtins to clang.
2. The llvm portion (the added intrinsics).
3. The llvm-mca portion.
Thanks!
-Matt
Andrea Di Biagio via llvm-dev
2018-Nov-21  12:43 UTC
[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.
I would really like to add this feature to llvm-mca. I have spoken off-line with Matt mutiple time about this feature multiple times. I am happy with the suggested approach. Matt prototyped it, and it seems to work okay for us. However, it would be really nice to get feedback from somebody else (not necessarily people involved in the llvm-mca project). For example, I am interested in what people think about the whole design (i.e. the idea of introducing two new intrinsics, and generating the information in a separate section of the binary object file). About the suggested design: I like the idea of being able to identify code regions using a numeric identifier. However, what happens if a code region spans through multiple basic blocks? My understanding is that code regions are not allowed to overlap. So, it makes sense if ` __mca_code_region_end()` doesn't take an ID as input. However, what if ` __mca_code_region_end()` ends in a different basic block? `__mca_code_region_start()` has to always dominate ` __mca_code_region_end()`. This is trivial to verify when both calls are in a same basic block; however, we need to make sure that the relationship is still the same when the `end()` call is in a different basic block. That would not be enough. I think we should also verify that ` __mca_code_region_end()` always post-dominates the call to `__mca_code_region_start()`. My question is: what happens with basic block reordering? We don't know the layout of basic blocks until we reach code emission. How does it work for regions that span through multiple basic blocks?. I think your RFC should clarify this aspect. As a side note: at the moment, llvm-mca doesn't know how to deal with branches. So, for simplicity we could force code regions to only contain instructions from a single basic block. However, In future we may want to teach llvm-mca how to analyze branchy code too. For example, we could introduce a simple control-flow analysis in llvm-mca, and use an external "branch trace" information (for example, a perf trace generated by an external tool) to decorate branches with with branch probabilities (similarly to what we currently do in LLVM with PGO). We could then use that knowledge to model branch prediction and simulate what happens in the presence of multiple branches. So, the idea of having regions that potentially span multiple basic blocks is not bad in general. However, I think you should better clarify what are the constraints (at least, you should answer to my questions from before). If we decide to use those new intrinsics, then those should be experimental (at least to start). On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Introduction > ----------------- > Currently llvm-mca only accepts assembly code as input. We would like to > extend llvm-mca to support object files, allowing users to analyze the > performance of binaries. The proposed changes (which involve both > clang and llvm) optionally introduce an object file section, but this can > be > stripped-out if desired. > > For the llvm-mca binary support feature to be useful, a user needs to tell > llvm-mca which portions of their code they would like analyzed. Currently, > this is accomplished via assembly comments. However, assembly comments are > not > preserved in object files, and this has encouraged this RFC. For the > proposed > binary support, we need to introduce changes to clang and llvm to allow the > user's object code to be recognized by llvm-mca: > > * We need a way for a user to identify a region/block of code they want > analyzed by llvm-mca. > * We need the information defining the user's region of code to be > maintained > in the object file so that llvm-mca can analyze the desired region(s) > from the > object file. > > We define a "code region" as a subset of a user's program that is to be > analyzed via llvm-mca. The sequence of instructions to be analyzed is > represented as a pair: <start, end> where the 'start' marks the beginning > of > the user's source code and 'end' terminates the sequence. The instructions > between 'start' and 'end' form the region that can be analyzed by llvm-mca > at a > later time. > > Example > ----------- > Before we go into the details of this proposed change, let's first look at > a > simple example: > > // example.c -- Analyze a dot-product expression. > double test(double x, double y) { > double result = 0.0; > __mca_code_region_start(42); > result += x * y; > __mca_code_region_end(); > return result; > } > > In the example above, we have identified a code region, in this case a > single > dot-product expression. For the sake of brevity and simplicity, we've > chosen > a very simple example, but in reality a more complicated example could use > multiple expressions. We have also denoted this region as number 42. That > identifier is only for the user, and simplifies reading an llvm-mca > analysis > report later. > > When this code is compiled, the region markers (the mca_code_region > markers) > are transformed into assembly labels. While the markers are presented as > function calls, in reality they are no-ops. > > test: > pushq %rbp > movq %rsp, %rbp > movsd %xmm0, -8(%rbp) > movsd %xmm1, -16(%rbp) > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42 > xorps %xmm0, %xmm0 > movsd %xmm0, -24(%rbp) > movsd -8(%rbp), %xmm0 > mulsd -16(%rbp), %xmm0 > addsd -24(%rbp), %xmm0 > movsd %xmm0, -24(%rbp) > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42 > movsd -24(%rbp), %xmm0 > popq %rbp > retq > .section .mca_code_regions,"", at progbits > .quad 42 > .quad .Lmca_code_region_start_0 > .quad .Lmca_code_region_end_0-.Lmca_code_region_start_0 > > The assembly has been trimmed to show the portions relevant to this RFC. > Notice the labels enclose the user's defined region, and that they > preserve the > user's arbitrary region identifier, the ever-so-important region 42. > > In the object file section .mca_code_regions, we have noted the user's > region > identifier (.quad 42), start address, and region size. A more complicated > example can have multiple regions defined within a single .mca_code_regions > section. This section can be read by llvm-mca, allowing llvm-mca to take > object files as input instead of assembly source. > > Details > --------- > We need a way for a user to identify a region/block of code they want > analyzed > by llvm-mca. We solve this problem by introducing two intrinsics that a > user can > specify, for identifying regions of code for analysis. > > The two intrinsics are: llvm.mca.code.regions.start and > llvm.mca.code.regions.end. A user can identify a code region by inserting > the > mca_code_region_start and mca_code_region_end markers. These are simply > clang builtins and are transformed into the aforementioned intrinsics > during > compilation. The code between the intrinsics are what we call "code > regions" > and are to be easily identifiable by llvm-mca; any code between a start/end > pair can be analyzed by llvm-mca at a later time. A user can define > multiple > non-overlapping code regions within their program. > > The llvm.mca.code.region.start intrinsic takes an integer constant as its > only > argument. This argument is implemented as a metadata i32, and is only used > when generating llvm-mca reports. This value allows a user to more easily > identify a specific code region. llvm.mca.code.region.end takes no > arguments. > Since we disallow nesting of regions, the first 'end' intrinsic lexically > following a 'start' intrinsic represents the end of that code region. > > Now that we have a solution for identifying regions for analysis, we now > need a > way for preserving that information to be read at a later time. To > accomplish > this we propose adding a new section (.mca_code_regions) to the object file > generated by llvm. During code generation, the start/end intrinsics > described > above will be transformed into start/end labels in assembly. When llvm > generates the object file from the user's code, these start/end labels > form a > pair of values identifying the start of the user's code region, and size. > The > size represents the number of bytes between the start and end address of > the > labels. Note that the labels are emitted during assembly printing. We hope > that these labels have no influence on code generation or basic-block > placement. However, the target assembler strategy for handling labels is > outside of our control. > > This proposed change affects the size of a binary, but only if the user > calls > the start/end builtins mentioned above. The additional size of the > .mca_code_regions section, which we imagine to be very small (to the order > of a > few bytes), can trivially be stripped by tools like 'strip' or 'objcopy'. > > Implementation Status > ------------------------------ > We currently have the proposed changes implemented at the url posted below. > This initial patch only targets ELF object files, and does not handle > relocatable addresses. Since the start of a code region is represented as > an > assembly label, and referenced in the .mca_code_regions section, that > address > is relocatable. That value can be represented as section-relative > relocatable > symbol (.text + addend), but we are not handling that case yet. Instead, > the > proposed changes only handle linked/executable object files. > > For purposes of review and to communicate the idea, the change is > presented as a monolithic patch here: > > https://reviews.llvm.org/D54603 > > The change is presented as a monolithic patch; however, if accepted > the patch will be split into three smaller patches: > 1. The introduction of the builtins to clang. > 2. The llvm portion (the added intrinsics). > 3. The llvm-mca portion. > > Thanks! > > -Matt > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181121/d3926463/attachment.html>
Matt Davis via llvm-dev
2018-Nov-21  16:46 UTC
[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.
Hi Andrea, Thanks for your input. On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio wrote: [... snip ...]> About the suggested design: > I like the idea of being able to identify code regions using a numeric > identifier. > However, what happens if a code region spans through multiple basic blocks?The current patch does not take into consideration cases where the region start and end intrinsics are placed in different basic blocks. Such would be the case if a region is defined to span multiple blocks. This would be similar to the current case where a user places a #LLVM-MCA-BEGIN assembly comment in one block and an #LLVM-MCA-END in another. However, as you point out below, if the user does this in the source code via intrinsics (just what this patch is proposing), then there is a chance that optimizations might change the layout of the instructions and confuse the ordering of the MCA intrinsics. Since MCA does not follow branches (MCA just treats a branch as it would a non-branching instruction), it seems that a user should be aware that defining MCA code regions that span multiple blocks might result in an unexpected analysis. While we do not discourage this, it seems like such a case will probably not produce an expected result for the user. We could introduce a warning, or automatically divide the regions so that a single region can only contain a single block.> My understanding is that code regions are not allowed to overlap. So, it > makes sense if ` __mca_code_region_end()` doesn't take an ID as input. > However, what if ` __mca_code_region_end()` ends in a different basic block? > > `__mca_code_region_start()` has to always dominate ` > __mca_code_region_end()`. This is trivial to verify when both calls are in > a same basic block; however, we need to make sure that the relationship is > still the same when the `end()` call is in a different basic block. > That would not be enough. I think we should also verify that ` > __mca_code_region_end()` always post-dominates the call to > `__mca_code_region_start()`.In any case this patch should probably check dominance of the intrinsics, even though MCA does not follow branches and MCA does not not explicitly forbid a region from containing multiple blocks.> > My question is: what happens with basic block reordering? We don't know the > layout of basic blocks until we reach code emission. How does it work for > regions that span through multiple basic blocks?. I think your RFC should > clarify this aspect. > > As a side note: at the moment, llvm-mca doesn't know how to deal with > branches. So, for simplicity we could force code regions to only contain > instructions from a single basic block. > > However, In future we may want to teach llvm-mca how to analyze branchy > code too. For example, we could introduce a simple control-flow analysis in > llvm-mca, and use an external "branch trace" information (for example, a > perf trace generated by an external tool) to decorate branches with with > branch probabilities (similarly to what we currently do in LLVM with PGO). > We could then use that knowledge to model branch prediction and simulate > what happens in the presence of multiple branches. > > So, the idea of having regions that potentially span multiple basic blocks > is not bad in general. However, I think you should better clarify what are > the constraints (at least, you should answer to my questions from before).I agree! Thanks for pointing that out.> If we decide to use those new intrinsics, then those should be experimental > (at least to start).Agreed. -Matt> On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <llvm-dev at lists.llvm.org> > wrote: > > > Introduction > > ----------------- > > Currently llvm-mca only accepts assembly code as input. We would like to > > extend llvm-mca to support object files, allowing users to analyze the > > performance of binaries. The proposed changes (which involve both > > clang and llvm) optionally introduce an object file section, but this can > > be > > stripped-out if desired. > > > > For the llvm-mca binary support feature to be useful, a user needs to tell > > llvm-mca which portions of their code they would like analyzed. Currently, > > this is accomplished via assembly comments. However, assembly comments are > > not > > preserved in object files, and this has encouraged this RFC. For the > > proposed > > binary support, we need to introduce changes to clang and llvm to allow the > > user's object code to be recognized by llvm-mca: > > > > * We need a way for a user to identify a region/block of code they want > > analyzed by llvm-mca. > > * We need the information defining the user's region of code to be > > maintained > > in the object file so that llvm-mca can analyze the desired region(s) > > from the > > object file. > > > > We define a "code region" as a subset of a user's program that is to be > > analyzed via llvm-mca. The sequence of instructions to be analyzed is > > represented as a pair: <start, end> where the 'start' marks the beginning > > of > > the user's source code and 'end' terminates the sequence. The instructions > > between 'start' and 'end' form the region that can be analyzed by llvm-mca > > at a > > later time. > > > > Example > > ----------- > > Before we go into the details of this proposed change, let's first look at > > a > > simple example: > > > > // example.c -- Analyze a dot-product expression. > > double test(double x, double y) { > > double result = 0.0; > > __mca_code_region_start(42); > > result += x * y; > > __mca_code_region_end(); > > return result; > > } > > > > In the example above, we have identified a code region, in this case a > > single > > dot-product expression. For the sake of brevity and simplicity, we've > > chosen > > a very simple example, but in reality a more complicated example could use > > multiple expressions. We have also denoted this region as number 42. That > > identifier is only for the user, and simplifies reading an llvm-mca > > analysis > > report later. > > > > When this code is compiled, the region markers (the mca_code_region > > markers) > > are transformed into assembly labels. While the markers are presented as > > function calls, in reality they are no-ops. > > > > test: > > pushq %rbp > > movq %rsp, %rbp > > movsd %xmm0, -8(%rbp) > > movsd %xmm1, -16(%rbp) > > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42 > > xorps %xmm0, %xmm0 > > movsd %xmm0, -24(%rbp) > > movsd -8(%rbp), %xmm0 > > mulsd -16(%rbp), %xmm0 > > addsd -24(%rbp), %xmm0 > > movsd %xmm0, -24(%rbp) > > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42 > > movsd -24(%rbp), %xmm0 > > popq %rbp > > retq > > .section .mca_code_regions,"", at progbits > > .quad 42 > > .quad .Lmca_code_region_start_0 > > .quad .Lmca_code_region_end_0-.Lmca_code_region_start_0 > > > > The assembly has been trimmed to show the portions relevant to this RFC. > > Notice the labels enclose the user's defined region, and that they > > preserve the > > user's arbitrary region identifier, the ever-so-important region 42. > > > > In the object file section .mca_code_regions, we have noted the user's > > region > > identifier (.quad 42), start address, and region size. A more complicated > > example can have multiple regions defined within a single .mca_code_regions > > section. This section can be read by llvm-mca, allowing llvm-mca to take > > object files as input instead of assembly source. > > > > Details > > --------- > > We need a way for a user to identify a region/block of code they want > > analyzed > > by llvm-mca. We solve this problem by introducing two intrinsics that a > > user can > > specify, for identifying regions of code for analysis. > > > > The two intrinsics are: llvm.mca.code.regions.start and > > llvm.mca.code.regions.end. A user can identify a code region by inserting > > the > > mca_code_region_start and mca_code_region_end markers. These are simply > > clang builtins and are transformed into the aforementioned intrinsics > > during > > compilation. The code between the intrinsics are what we call "code > > regions" > > and are to be easily identifiable by llvm-mca; any code between a start/end > > pair can be analyzed by llvm-mca at a later time. A user can define > > multiple > > non-overlapping code regions within their program. > > > > The llvm.mca.code.region.start intrinsic takes an integer constant as its > > only > > argument. This argument is implemented as a metadata i32, and is only used > > when generating llvm-mca reports. This value allows a user to more easily > > identify a specific code region. llvm.mca.code.region.end takes no > > arguments. > > Since we disallow nesting of regions, the first 'end' intrinsic lexically > > following a 'start' intrinsic represents the end of that code region. > > > > Now that we have a solution for identifying regions for analysis, we now > > need a > > way for preserving that information to be read at a later time. To > > accomplish > > this we propose adding a new section (.mca_code_regions) to the object file > > generated by llvm. During code generation, the start/end intrinsics > > described > > above will be transformed into start/end labels in assembly. When llvm > > generates the object file from the user's code, these start/end labels > > form a > > pair of values identifying the start of the user's code region, and size. > > The > > size represents the number of bytes between the start and end address of > > the > > labels. Note that the labels are emitted during assembly printing. We hope > > that these labels have no influence on code generation or basic-block > > placement. However, the target assembler strategy for handling labels is > > outside of our control. > > > > This proposed change affects the size of a binary, but only if the user > > calls > > the start/end builtins mentioned above. The additional size of the > > .mca_code_regions section, which we imagine to be very small (to the order > > of a > > few bytes), can trivially be stripped by tools like 'strip' or 'objcopy'. > > > > Implementation Status > > ------------------------------ > > We currently have the proposed changes implemented at the url posted below. > > This initial patch only targets ELF object files, and does not handle > > relocatable addresses. Since the start of a code region is represented as > > an > > assembly label, and referenced in the .mca_code_regions section, that > > address > > is relocatable. That value can be represented as section-relative > > relocatable > > symbol (.text + addend), but we are not handling that case yet. Instead, > > the > > proposed changes only handle linked/executable object files. > > > > For purposes of review and to communicate the idea, the change is > > presented as a monolithic patch here: > > > > https://reviews.llvm.org/D54603 > > > > The change is presented as a monolithic patch; however, if accepted > > the patch will be split into three smaller patches: > > 1. The introduction of the builtins to clang. > > 2. The llvm portion (the added intrinsics). > > 3. The llvm-mca portion. > > > > Thanks! > > > > -Matt > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >