thr3ads.net - llvm dev - [llvm-dev] RFC: Combining Annotation Metadata and Remarks [Nov 2020]

If this information is useful, please help other people find it:
Share via:

Florian Hahn via llvm-dev

2020-Nov-04 21:57 UTC

[llvm-dev] RFC: Combining Annotation Metadata and Remarks

Hi,


I would like to propose a new  !annotation metadata kind that can be attached to
arbitrary instructions to drive generating remarks that provide additional
insight into transformations applied to a program.

To motivate this, consider these specific questions we would like to get
answered:

* How many stores added for automatic variable initialization remain after
optimizations? Where are they?
* How many runtime checks inserted by a frontend could be eliminated? Where are
the ones that did not get eliminated?

With the current infrastructure we can issue remarks for removed stores, removed
runtime checks or when these instructions are not removed. However that’s far
too noisy. We would like to filter to stores or checks that originated from user
intent (e.g. auto-init). The new annotation would help to distinguish such
cases.

Asking and answering such questions for large code bases provides metrics to
asses the effectiveness of transformations. It also gives users a way to audit
there code & detect certain problematic patterns. They should also help with
making decision on a more data-driven basis.

At the moment, the existing statistics counters can be used for parts of the
assessment part. The main advantage of using remarks to generate the data is
that we can collect data on a much finer granularity, e.g. at the function
level. This can be helpful, for example to create function-level diffs that show
the impact of certain new transformations. It also allows displaying the
collected data to the user in-line with the source code.


The proposal boils down to three parts.

The first part consists of adding a new metadata kind (!annotation ) to allow
tagging of ‘interesting’ instructions. For example, Clang could add this
metadata to auto-init stores it generates or a library for overflowing math
could instruct Clang to add the metadata to all instructions in its overflow
check functions. !annotation metadata nodes consist of a tuple containing string
nodes indicating the type of the annotation.

Using metadata comes with the drawback that it can be dropped by passes that do
not know how to handle it. This means we will have to adjust existing passes to
ensure the metadata is preserved for interesting instructions, if possible. I
think this is the only intersection of the proposal with existing code in LLVM.
Loosing the metadata for instructions is also not the end of the world. The
information is provided on a best-effort basis and I am not aware of a more
robust alternative to achieve the same goals.

Examples:

    store i32 0, i32* %ptr, !annotation !0
    %c = icmp ult i32 %x, %y, !annotation !2

  …
 
  !0 = !{!1}
  !1 = !{!"auto-init”}
  !2 = !{!3}
  !3 = !{!"overflow-check"}



The second part consists of a pass that runs at the end of the optimization
pipeline and generates remarks for instructions with !annotation metadata. The
remarks could provide summaries of the number of auto-init stores surviving
after optimizations, or the number of overflow checks that could not be
eliminated per function. Additional remarks could also point out where Clang
inserted initialization code that could not be eliminated. Combining this with a
function-level code size diffs should allow users to quickly track down the
origin of code-size regressions caused by auto-init code for example. Similarly,
remarks for overflow checks could be used to spot weaknesses in LLVM’s reasoning
about such checks. An example for a remark generated from annotation metadata is
shown below

  Pass: annotation-remarks
  Name: AutoInitSummary
  Function: test2
  Args:
  		• String: 'Annotated '
  		• count: '2'
  		• String: ' instructions with '
  		• type: auto-init
  …



The third part consists of a set of tools to analyze & summarizes the data
mined from the remarks. One very useful tool I think would be a run-over-run
diff of various remarks. Note that we probably want to initially limit this to
some of the less noisy remarks (e.g. number of auto-init stores remaining per
function, code size per function). One potential practical issue for such a tool
is that currently remarks are defined & emitted somewhat ad-hoc and changes
to remarks could break the diff tool. But I think most existing remarks are at a
point now where they are quite stable. There’s also been a proposal to define
remarks in a more structured way
(http://lists.llvm.org/pipermail/llvm-dev/2020-January/137971.html) which would
also be helpful.

While I intend to work on some parts directly to start with, I hope this pitch
also sparks more interest and others will be interested in collaborating as
well. I also put up an initial patch adding the infrastructure outlined in the
first two points: https://reviews.llvm.org/D89240

The proposal also ties together and is enabled by some of the excellent work
around the remark infrastructure recently, such as Francis’ work on emitting
remarks as part of the binary or Jessica’s work on a remarks-based code-size
diffing tool (https://reviews.llvm.org/D63306)


Cheers,
Florian

Johannes Doerfert via llvm-dev

2020-Nov-06 17:32 UTC

head link

[llvm-dev] RFC: Combining Annotation Metadata and Remarks

Cool! I really like the idea. I left a comment about metadata 
preservation below.
Once this is available we will certainly employ it to understand OpenMP 
programs better.
We could also think about a user facing version of this while we are at 
it ;)

~ Johannes


On 11/4/20 3:57 PM, Florian Hahn via llvm-dev wrote:> Hi,
>
>
> I would like to propose a new  !annotation metadata kind that can be
attached to arbitrary instructions to drive generating remarks that provide
additional insight into transformations applied to a program.
>
> To motivate this, consider these specific questions we would like to get
answered:
>
> * How many stores added for automatic variable initialization remain after
optimizations? Where are they?
> * How many runtime checks inserted by a frontend could be eliminated? Where
are the ones that did not get eliminated?
>
> With the current infrastructure we can issue remarks for removed stores,
removed runtime checks or when these instructions are not removed. However
that’s far too noisy. We would like to filter to stores or checks that
originated from user intent (e.g. auto-init). The new annotation would help to
distinguish such cases.
>
> Asking and answering such questions for large code bases provides metrics
to asses the effectiveness of transformations. It also gives users a way to
audit there code & detect certain problematic patterns. They should also
help with making decision on a more data-driven basis.
>
> At the moment, the existing statistics counters can be used for parts of
the assessment part. The main advantage of using remarks to generate the data is
that we can collect data on a much finer granularity, e.g. at the function
level. This can be helpful, for example to create function-level diffs that show
the impact of certain new transformations. It also allows displaying the
collected data to the user in-line with the source code.
>
>
> The proposal boils down to three parts.
>
> The first part consists of adding a new metadata kind (!annotation ) to
allow tagging of ‘interesting’ instructions. For example, Clang could add this
metadata to auto-init stores it generates or a library for overflowing math
could instruct Clang to add the metadata to all instructions in its overflow
check functions. !annotation metadata nodes consist of a tuple containing string
nodes indicating the type of the annotation.
>
> Using metadata comes with the drawback that it can be dropped by passes
that do not know how to handle it. This means we will have to adjust existing
passes to ensure the metadata is preserved for interesting instructions, if
possible. I think this is the only intersection of the proposal with existing
code in LLVM. Loosing the metadata for instructions is also not the end of the
world. The information is provided on a best-effort basis and I am not aware of
a more robust alternative to achieve the same goals.
I think we already have the idea of "trivially preserved" annotations 
and we should use that everywhere anyway.

The method that "moves/merges" metadata should look at the annotation 
and decide based on the operands if
the annotation is valid for the replacement instruction or an 
instruction in the vicinity. If this sounds reasonable
we could use a second argument, e.g., below `!0 = !{!1, i64 
BIT_ENCODING}` where the bits in BIT_ENCODING define
properties of the annotation. One could be that it is fixed to the 
instruction opcode, e.g., if you replace a store
with a memset you drop the annotation. Though, we would add these things 
on a case-by-case basis I guess.

>
> Examples:
>
>      store i32 0, i32* %ptr, !annotation !0
>      %c = icmp ult i32 %x, %y, !annotation !2
>
>    …
>   
>    !0 = !{!1}
>    !1 = !{!"auto-init”}
>    !2 = !{!3}
>    !3 = !{!"overflow-check"}
>
>
>
> The second part consists of a pass that runs at the end of the optimization
pipeline and generates remarks for instructions with !annotation metadata. The
remarks could provide summaries of the number of auto-init stores surviving
after optimizations, or the number of overflow checks that could not be
eliminated per function. Additional remarks could also point out where Clang
inserted initialization code that could not be eliminated. Combining this with a
function-level code size diffs should allow users to quickly track down the
origin of code-size regressions caused by auto-init code for example. Similarly,
remarks for overflow checks could be used to spot weaknesses in LLVM’s reasoning
about such checks. An example for a remark generated from annotation metadata is
shown below
>
>    Pass: annotation-remarks
>    Name: AutoInitSummary
>    Function: test2
>    Args:
>    		• String: 'Annotated '
>    		• count: '2'
>    		• String: ' instructions with '
>    		• type: auto-init
>    …
>
> The third part consists of a set of tools to analyze & summarizes the
data mined from the remarks. One very useful tool I think would be a
run-over-run diff of various remarks. Note that we probably want to initially
limit this to some of the less noisy remarks (e.g. number of auto-init stores
remaining per function, code size per function). One potential practical issue
for such a tool is that currently remarks are defined & emitted somewhat
ad-hoc and changes to remarks could break the diff tool. But I think most
existing remarks are at a point now where they are quite stable. There’s also
been a proposal to define remarks in a more structured way
(http://lists.llvm.org/pipermail/llvm-dev/2020-January/137971.html) which would
also be helpful.
>
> While I intend to work on some parts directly to start with, I hope this
pitch also sparks more interest and others will be interested in collaborating
as well. I also put up an initial patch adding the infrastructure outlined in
the first two points: https://reviews.llvm.org/D89240
>
> The proposal also ties together and is enabled by some of the excellent
work around the remark infrastructure recently, such as Francis’ work on
emitting remarks as part of the binary or Jessica’s work on a remarks-based
code-size diffing tool (https://reviews.llvm.org/D63306)
>
>
> Cheers,
> Florian
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Florian Hahn via llvm-dev

2020-Nov-09 11:09 UTC

head link

[llvm-dev] RFC: Combining Annotation Metadata and Remarks

> On Nov 6, 2020, at 17:32, Johannes Doerfert <johannesdoerfert at
gmail.com> wrote:
> 
> Cool! I really like the idea. I left a comment about metadata preservation
below.
> Once this is available we will certainly employ it to understand OpenMP
programs better.
That sounds like a great use case! Having multiple different uses cases during
the bring-up would be very helpful to make sure the system is flexible enough.
> We could also think about a user facing version of this while we are at it
;)
> 
Do you mean providing a way for users to add their own annotation to say C/C++
code?

The initial patch contains a Annotation2Metadata pass, which converts Clang
annotations (` __attribute__((annotate("_name”))` ) to `!annotation`
metadata. This allows users to use something like the snippet below, to annotate
all instructions in a function by piggybacking  on Clang’s annotate attribute.

void __attribute__((annotate("__overflow_rt_check")))
custom_overflow_check(int a, int b) { … }

>> 
>> 
>> The proposal boils down to three parts.
>> 
>> The first part consists of adding a new metadata kind (!annotation ) to
allow tagging of ‘interesting’ instructions. For example, Clang could add this
metadata to auto-init stores it generates or a library for overflowing math
could instruct Clang to add the metadata to all instructions in its overflow
check functions. !annotation metadata nodes consist of a tuple containing string
nodes indicating the type of the annotation.
>> 
>> Using metadata comes with the drawback that it can be dropped by passes
that do not know how to handle it. This means we will have to adjust existing
passes to ensure the metadata is preserved for interesting instructions, if
possible. I think this is the only intersection of the proposal with existing
code in LLVM. Loosing the metadata for instructions is also not the end of the
world. The information is provided on a best-effort basis and I am not aware of
a more robust alternative to achieve the same goals.
> 
> I think we already have the idea of "trivially preserved"
annotations and we should use that everywhere anyway.
> 
> The method that "moves/merges" metadata should look at the
annotation and decide based on the operands if
> the annotation is valid for the replacement instruction or an instruction
in the vicinity. If this sounds reasonable
> we could use a second argument, e.g., below `!0 = !{!1, i64 BIT_ENCODING}`
where the bits in BIT_ENCODING define
> properties of the annotation. One could be that it is fixed to the
instruction opcode, e.g., if you replace a store
> with a memset you drop the annotation. Though, we would add these things on
a case-by-case basis I guess.
> 
That’s a good point. It would be great if we could generalize the logic to
preserve the annotation metadata. I think for some transformations the `type` of
the annotation does not matter, but for others it might. In those cases having a
general way to encode the rules would indeed be very helpful!

I think Francis already took a look at some passes that would need updating when
annotating auto-init stores. Most of the problematic transforms (like SROA,
LoopIdiom and InstCombine) are combining stores to stores of wider types or
various mem* intrinsics. As an initial general rule, it probably makes sense to
preserve any annotation metadata, if all combined instructions share the same
metadata?

I think it would make sense to start with something like that and then go from
there and iterate once there once we have concrete use cases that need more
specialized rules. What do you think?

Cheers,
Florian
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201109/36eb7c01/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Nov 2020 - RFC: Combining Annotation Metadata and Remarks

[llvm-dev] RFC: Combining Annotation Metadata and Remarks

[llvm-dev] RFC: Combining Annotation Metadata and Remarks

[llvm-dev] RFC: Combining Annotation Metadata and Remarks

Possibly Parallel Threads