Hi everyone,
We've done work on making remarks more useful for our end-users (not
compiler
devs) at Sony PlayStation, and I think there's some alignment in our end
goals,
though we've taken a slightly different approach so far. We've recently
built a
Visual Studio extension to display remarks next to source lines, and to improve
processing time and readability we've focused on post-processing the remark
YAML. A binary format could dramatically improve processing times, while more
information and better integrated remarks would let us present richer data to
our users.
That said, a binary format wouldn't fundamentally change the nature of the
post-processing we have to do. Due to inlining, to display all the remarks that
correlate to a source line, we have to examine all of the available remarks.
For context, I've included the a description of the main steps here:
1. Scan and extract remark data from all of the YAML. We condense arguments
to two types: reference args (anything with a DebugLoc), and string args
(anything else). Sequential runs of string args are combined, so "with
cost=XX
(threshold=YYY)" becomes one arg.
2. Remarks that target the same source object and describe the same action
(eg, ten "ClobberedBy" remarks on the same line) are merged; at this
point we
chain the arguments together rather than performing deduplication of the text.
After this step the text is discarded and we work with our own in-memory
format.
3. Once all the remarks have been parsed, we group them by the source files
they live in. However, this is a best-effort attempt: some remarks don't
have debug locations, or don't have enough of a filename to uniquely
identify
the source file.
4. Lastly, we de-duplicate the text in the remark for each arg slot. Repeated
string args get condensed, while multiple references (or, eg: different
cost/threshold text) are preserved. These remarks are packed into a by-line,
by-source-file database and written to disk.
To save on space, all strings are cached and referenced by index. We also
demangle names up front to make them human readable. Some of our large projects
generate about 3GB of YAML; with this process, the final remark database is
about ~500MB. This takes about 30 seconds in our (somewhat unoptimized) remark
compiler, or about 400,000 remarks/sec.
A lot of that time is taken simply scrubbing over YAML to perform lexical
analysis. LLVM's remark format is actually a subset of YAML that can be
understood with only a stateful lexer, but a binary format that saved on
stepping over characters would be a huge win. On the simple end, a format that
was a list of:
struct FlatString
{
u32 size;
char data[]; // C99 variable length struct
}
would speed up parsing remarks immensely, but there's obviously room for
improvement there. Clear hierarchical data, header information, and maybe
hashes for strings (or, even only string literals in LLVM source, the hashes
for which could be constructed at LLVM compile time) would also be big wins for
us.
To be clear, I don't believe trying to include this kind of postprocessing
phase in the LLVM toolchain is a good idea. The information required to build
our final remark database isn't available until link time or after, which
raises a lot of issues for an implementation within the toolchain. I also think
there's some separation of concerns here--the toolchain is interested in
outputting complete and correct data, but in the extension we cut corners to
improve readability and understandability (for example, we discard some almost-
duplicated cost/threshold information). The fundamental goals are different.
With regards to improving the content and coherency of remarks, there's a
lot
of possible improvements but the biggest win for us would be to improve remark
to source matching. Some build systems we work with do confusing things with
intermediate files: depending where clang is invoked and the files is moved to,
it can be tricky to find the source file a remark's DebugLoc refers to. With
some files, all we see in the DebugLoc's file field is the name of the file
with no path information, which is unhelpful. With "Game.cpp" there
are at
least 5 files named that in one of our big projects, and several of them have
only that in their DebugLoc. If nothing else, an extra "identifying
remark"
emitted at the top of a .opt.yaml file containing the compiler invocation and
absolute path to the source file would be extremely helpful for divining the
location of files, but using absolute paths in all DebugLocs would let us match
remarks to source perfectly.
> This leads me back to my second category of extensions, creating
connections
> between different events in the opt report
In addition to your example, we'd also like to see improvements to DebugLocs
for remarks in inlined code. In our VS extension, we turn references to source
objects into hyperlinks, anything in a remark that has a DebugLoc can be used
to jump to its declaration (as given by the DebugLoc), which we've found to
be
really useful. However, one of the issues we've found with this is that some
categories of remark don't have very good information about where they come
from. For example, remarks in inlined functions don't contain information
about
the function they're actually in, or the location of the function they were
inlined into. To be clear, the caller's location is never referenced, while
the
callee's name isn't included--I find myself asking "what function
is this code
from?" when I'm looking at the remark. That said, having spoken to
members of
our team closer to LLVM, this would be difficult because that information
isn't
propagated through the pipeline far enough.
> I'd like to be able to either produce a report that shows just the
inlining
> from the LTO pass or produce a report that shows a composite of all the
> inlining decisions that we made.
We took a different route with this one in response to a request to an
end-user--the extension can show different remark databases at the same time.
That way, you can compare LTO vs non-LTO builds line-by-line at a glance.
It isn't perfect, but generating a report about how inlining changed from
two
disparate sets of remarks would be a logical next step.
--Will
(Hopefully this shows up in the right place; I was only subscribed to the
digest.
Sorry if it doesn't thread properly!)
Date: Wed, 8 May 2019 19:35:19 +0000
From: "Finkel, Hal J. via llvm-dev" <llvm-dev at lists.llvm.org>
To: Francis Visoiu Mistrih <francisvm at yahoo.com>, "Kaylor,
Andrew"
<andrew.kaylor at intel.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] RFC: Extending optimization reporting
Message-ID:
<DM5PR0901MB25047C50BD16BB7223809E50A2320 at
DM5PR0901MB2504.namprd09.prod.outlook.com>
Content-Type: text/plain; charset="windows-1252"
> Also, in terms of design philosophy, I completely agree with your goals of
trying to minimize the compile time and memory footprint of the optimization
reporting mechanism, but I think that if we want to support something that
requires more memory or bigger IR it should be OK to take that hit on an opt-in
basis. Do you agree?
I agree.
Also, I'll add that keeping track of the different loop versions is an
important idea. Regarding unrolling, I believe that we have a mechanism for
encoding information on this into DWARF discriminators (see
http://llvm.org/viewvc/llvm-project?rev=349973&view=rev). It would be good
to have a solution here for loop versioning, we want this for both remarks and
PGO.
-Hal
Hal Finkel
Lead, Compiler Technology and Programming Languages Leadership Computing
Facility Argonne National Laboratory
________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Kaylor,
Andrew via llvm-dev <llvm-dev at lists.llvm.org>
Sent: Wednesday, May 8, 2019 2:04 PM
To: Francis Visoiu Mistrih
Cc: llvm-dev
Subject: Re: [llvm-dev] RFC: Extending optimization reporting
Thanks, Francis. I actually wasn’t up to date on your latest work. It sounds
like you’ve laid some helpful groundwork.
I think generalizing the remark handling interface should be fairly manageable,
and that’s probably a good place for me to start getting involved.
My understanding of the very high-level design is that a pass creates an
optimization remark object, passes it to an optimization remark emitter, which
passes it to the diagnostic handler, which passes it to a remark streamer. All
of these stages appear to have pretty clean interfaces. I believe there is even
a mechanism to plug in a different remark streamer, though maybe not all of the
wiring to connect that to a command line option.
I expect there will be glitches that will surface along the way, but supporting
something like an IDE consumable text output format seems like it should be as
simple as plugging in a new remark streamer that can consume the existing
optimization remark objects and produce the desired text format. But that’s at
the hand-waving/QED level, right? Since you’ve started looking into supporting
other formats and say there’s some infrastructure work to be done, I guess it’s
not quite that easy. In any event, I’d be happy to work with you on generalizing
the infrastructure.
For some of the more involved use cases I want to cover, I was thinking the
biggest challenge might be in adding extra information to the remark objects to
support the new formats but doing so in a way that doesn’t break the YAML
streamer or require any significant changes to it.
Also, in terms of design philosophy, I completely agree with your goals of
trying to minimize the compile time and memory footprint of the optimization
reporting mechanism, but I think that if we want to support something that
requires more memory or bigger IR it should be OK to take that hit on an opt-in
basis. Do you agree?
Thanks,
Andy
From: Francis Visoiu Mistrih <francisvm at yahoo.com>
Sent: Monday, May 06, 2019 10:25 AM
To: Kaylor, Andrew <andrew.kaylor at intel.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] RFC: Extending optimization reporting
Hi Andrew,
On Apr 30, 2019, at 2:47 PM, Kaylor, Andrew via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
I would like to begin a discussion about updating LLVM's opt-report
infrastructure. There are some things I'd like to be able to do with
optimization reports that I don't think can be done, or at least aren't
natural to do, with the current implementation.
I understand that there is a lot of code in place already to produce
optimization remarks, and one of my explicit goals is to minimize the amount of
updating existing code while still enabling the new features I would like to
support. I have some ideas in mind for how to achieve what I'm proposing,
but I want to start out by just describing the desired results and if we can
reach a consensus on these being nice things to have then we can move on to
talking about the best way to get there.
I think the extensions I have in mind can broadly be organized into two
categories: (1) ways to support different sorts of output, and (2) ways to
create connections between different events represented in the report.
Near as I can tell, the only support we have in the code base is for YAML
output. I think I could implement a new RemarkStreamer to get other formats, but
nothing in the LLVM code base does that. Is that correct?
Yes, for now only a YAML output is supported.
The current design is the following:
The passes create a remark diagnostic and call
(Machine)OptimizationRemarkEmitter::Emit. That goes through LLVMContext where
the RemarkStreamer is used to handle remark diagnostics. Then in the
RemarkStreamer we serialize each diagnostic to YAML through the YAMLTraits and
immediately write that to the file.
One of the main ideas from the beginning of the optimization remarks is to do as
less work as possible on the compiler side. We don’t want to keep remarks in
memory or significantly increase compile-time because of this. Most of the work
is expected to be done on the client side, with, if possible, help from LLVM
libraries.
Then on the other side, I recently added a parsing infrastructure for remarks in
lib/Remarks, which parses YAML using the YAMLParser, performs some semantic
checks on the remarks and creates a list of remarks::Remark. This does not
re-use any code from the generation side for the following reasons:
* The generation is based on LLVM diagnostics, which has its own class
hierarchy.
* The diagnostics are deeply coupled with LLVM IR / MIR, and we don’t want to
generate dummy IR just for parsing a bunch of remarks and displaying them in a
html view (e.g. opt-viewer.py).
* The YAML generated by LLVM can’t be parsed using the YAMLTraits, because we
have an unknown number of arguments that can have the same key. We use the YAML
parser for this, like tools/llvm-opt-report was doing before I added the remark
parser in-tree.
One main issue right now is that we don’t have a way to serialize a
remarks::Remark to YAML, and if we can somehow manage to use the same
abstraction we use for parsing when we’re generating remarks, that would solve a
lot of issues. The main reason I haven’t looked more deeply into this is because
it would require making extra copies and extra allocations (especially of the
arguments) that we would like to avoid doing during generation.
I'd like to be able to:
- Embed some subset of optimizations remarks as annotations in the generated
assembly output
This would require keeping remarks in memory until we reach the asm-printer.
Another way to do this is to pipe the output of clang to another tool that adds
these annotations based on debug info and the remark file.
- Embed the remarks in the generated executable in a binary format consumed by
the Intel Advisor tool
I recently added -mllvm -remarks-section. See
https://llvm.org/docs/CodeGenerator.html#emitting-remark-diagnostics-in-the-object-file
for what it contains.
The model we’re planning on using on Darwin is through dsymutil, which will
merge all the remark files while processing the debug info, and create a
separate file in the final .dSYM bundle with all the remarks.
- Produce text output in a format recognized by Microsoft Visual Studio or other
IDEs
This would be very nice! I think right now there is no easy (or clean) way to
add a new format, but we should definitely work on making that easier.
I have a few patches coming with a “binary” format that we want to use, so maybe
we can work on building an infrastructure that can serve YAML, the binary
format, and leave room for any new formats.
I tried to make the C API on the parsing side easy to use with any other format.
See llvm-c/Remarks.h.
Let me know what you think!
Thanks,
--
Francis
The last of these is probably straightfoward since it's basically a
streaming format such as the current infrastructure expects. The other two seem
like they might be more complicated, since they involve keeping the information
around, potentially across LTO, and correlated with the evolving IR until the
final machine code or assembly is produced.
This leads me back to my second category of extensions, creating connections
between different events in the opt report. My goal here is to be able to
produce some kind of coherent report after compilation is complete that lets the
user make some sense of how the IR evolved over the course of compilation and
what effects that may have had on optimizations. This mostly has to do with the
handling of loops, vectorization, and inlining.
Let's say, for example, I've got code like this:
for (...)
A
if (lic)
B
C
And the loop-unswitch pass turns it into this:
if (lic)
for (...)
A; B; C
else
for (...)
A; C
Now let's say the vectorizer for some reason is able to vectorize the loop
in the else-clause but not the if-clause. (I don't know if this kind of
thing is possible with the current phase ordering, but I think this theoretical
example illustrates the idea anyway.)
I want some way to produce a report that tells the user about the existence of
the two loops that were created when we unswitched the loop so that we can then
tell the user in some sensible way that we couldn't vectorize one loop but
that we could vectorize the second.
I'm not sure what the opt-viewer would currently do with a case like this,
but what I want to avoid is getting stuck where the report we can emit
essentially conveys the following not very helpful information.
for (...)
// Loop was unswitched.
// Loop could not be vectorized because...
// Loop was vectorized.
A
if (lic)
B
C
Instead I'd like to have a way to produce something like this:
for (...)
// Loop was unswitched for condition (srcloc)
// Unswitched loop version #1
// Unswitched for IF condition (srcloc)
// Loop was not vectorized:
// Unswitched loop version #2
// Loop was vectorized
The primary thing missing, I think, is a way for the vectorizer to give some
indication of which version of the loop it is talking about in its optimization
remarks and maybe a way for the opt-viewer to be able to make sense of that.
Likewise, there are things I want to be able to track with inlining. Let's
say we go through the inlining pass pre-LTO and we make some decisions and
report them. Then during LTO we go through another round of inlining and
possibly make different decisions. I'd like to be able to either produce a
report that shows just the inlining from the LTO pass or produce a report that
shows a composite of all the inlining decisions that we made.
We tried something like this with an inlining report before
(https://reviews.llvm.org/D19397), but it had the misfortune of being proposed
at about the same time that the current opt-viewer mechanism was being developed
and we didn’t manage to get aligned with that. I’m hoping that we can correct
that now.