thr3ads.net - llvm dev - [llvm-dev] RFC: XRay in the LLVM Library [Dec 2016]

If this information is useful, please help other people find it:
Share via:

Dean Michael Berris via llvm-dev

2016-Nov-30 05:08 UTC

[llvm-dev] RFC: XRay in the LLVM Library

Hi llvm-dev,

Recently, we've committed the beginnings of the llvm-xray [0] tool which
allows for conveniently working with both XRay-instrumented libraries as well as
XRay trace/log files. In the course of the review for the conversion tool [1]
which turns a binary/raw XRay log file into YAML for human consumption purposes,
a question arose as to how we intend to allow users to develop tools that deal
with XRay traces (and the instrumentation maps in binaries).

As a bit of background, I've been working on the "flight data
recorder" mode [2] for the XRay runtime library -- this mode lets the XRay
instrumented binary to continuously write trace entries into an in-memory log,
which is kept as a circular buffer of buffers [3]. FDR mode writes more concise
records and has a different log format than the current "naive"
logging implementation in compiler-rt (which continuously writes to disk as soon
as thread-local buffers are full).

# Problem Statement

XRay has two key pieces of information that need to be encoded in a consistent
manner: the instrumentation map embedded in binaries and the xray log files.
However, we run into some issues when we change the encoding of this information
over time either adding or removing information. This situation is very similar
to how LLVM handles backwards compatibility with the bitcode format /
versioning.

The problem we have is how to ensure that as we make changes to the data being
output by the runtime library, that the tools handling this data are able to
read them. A lot of factors play into this, which may be solved in many
different ways (but is not the crux of this RFC):

- The split between the LLVM "core" library/tools and compiler-rt.
This means we implement the writer in compiler-rt but implement the tools
reading the traces in LLVM. We also have to coordinate any changes in LLVM for
encoding new information in to the instrumentation map so that compiler-rt can
take advantage of this new information.

- The potential for allowing user-defined additional information embedded in the
XRay traces. We have ongoing projects that will add things like argument
logging, and custom data logging, which will add information to the log without
necessarily changing the "format" of the data.

# Potential Resolutions

Given the state we're at in XRay's development, we're looking at a
few ways of going about the backwards/forwards compatibility of the
instrumentation map and the xray log files, and the tools that will be written
to read/manipulate them. We're seeking feedback on the following options and
alternatives we may not have considered.

## Option A: Expose a Library that supports all known formats.

We can move out some currently tool-specific code for `llvm-xray extract` [0]
that is able to ingest a binary with XRay instrumentation as something in
(strawman proposal) lib/XRay (i.e. include/llvm/XRay/..., and implementation in
lib/XRay/...), so that the tools become a thin wrapper around the functionality
in this library. We can apply this to the `llvm-xray convert` core logic as
well, to allow for loading all known/supported formats for the log file.

This option gives us the capability to provide a set of canonical
implementations that can handle a set of file formats. This might introduce some
complexity in parsing lots of known/supported formats (like YAML,
compiler-emitted instrumentation maps for x86_64/arm7/aarch64/<insert
platforms where XRay is yet to be ported>) in a library that not all tool
writers actually need.

This option follows closely what the LLVM project does with backwards
compatibility for parsing LLVM IR, applied to XRay instrumentation maps and
traces.

## Option B: Expose a library that only supports one canonical format.

We can keep tool-specific code alongside the tools, but define one canonical
format for the instrumentation map and traces -- as a specification document and
a library implementation. This canonical format could be what we already have
today which will make the log reading and instrumentation map handling library
simple, and evolves only in case we extend/change the canonical format.

This means in the case for FDR mode traces, we'll have the conversion tool
know about the FDR mode trace format/encoding and have a transformation from
that to the canonical format. This means that the transformation logic will be
localised to the conversion tool, while any other tool that builds upon and uses
the reader library will not need to change. This also provides options for users
defining their own log formats using the XRay library interfaces to install
their own handlers to implement the transformations from their format to the
XRay-canonical format in the tool without being tied to maintaining the released
library version.

The evolution of the canonical format can happen more slowly and more
conservatively than when new implementations of the XRay runtime is made
available through compiler-rt.

# Open Questions

Some burning questions we'd like to get some thoughts on:

- Is there a preference between the two options provided above?
- Any other alternatives we should consider?
- Which parts of which options do you prefer, and is there a synthesis of either
of those options that appeals to you?

Thanks in advance!

[0] - `llvm-xray extract` defined in https://reviews.llvm.org/D21987
[1] - `llvm-xray convert` being reviewed in https://reviews.llvm.org/D24376
[2] - FDR mode ongoing implementation (work in progress) at
https://reviews.llvm.org/D27038
[3] - Buffer Queue implementation (work in progress) at
https://reviews.llvm.org/D26232

-- Dean

Renato Golin via llvm-dev

2016-Nov-30 11:26 UTC

head link

[llvm-dev] RFC: XRay in the LLVM Library

On 30 November 2016 at 05:08, Dean Michael Berris via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> - Is there a preference between the two options provided above?
> - Any other alternatives we should consider?
> - Which parts of which options do you prefer, and is there a synthesis of
either of those options that appeals to you?
Hi Dean,

I haven't followed the XRay project that closely, but I have been
around file formats being formed and either of your two approaches
(which are pretty standard) will fail in different ways. But that's
ok, because the "fixes" work, they're just not great.

If you take the LLVM IR, there were lots of changes, but we always
aimed to have one canonical representation. Not just at the syntax of
each instruction/construct, but how to represent complex behaviour in
the same series of instructions, so that all back-ends can identify
and work with it. Of course, the second (semantic) level is less
stringent than the first (syntactical), but we try to make it as
strict as possible.

This hasn't come for free. The two main costs were destructive
semantics, for example when we lower C++ classes into arrays and
change all the access to jumbled reads and writes because IR readers
don't need to understand the ABI of all targets, and backwards
incompatibility, for example when we completely changed how exception
handling is lowered (from special basic blocks to special constructs
as heads/tails of common basic blocks). That price was cheaper than
the alternative, but it's still not free.

Another approach I followed was SwissProt [1], a manually curated
machine readable text file with protein information for cross
referencing. Cutting short to the chase, they introduced "line types"
with strict formatting for the most common information, and one line
type called "comment" where free text was allowed, for additional
information. With time, adding a new line type became impossible, so
all new fields ended up being added in the comment lines, with a
pseudo-strict formatting, which was (probably still is) a nightmare
for parsers and humans alike.

Between the two, the LLVM IR policy for changes is orders of magnitude
better. I suggest you follow that.

I also suggest you don't keep multiple canonical representations, and
create tools to convert from any other to the canonical format.

Finally, I'd separate the design in two phases:

1. Experimental, where the canonical form changes constantly in light
of new input and there are no backwards/forwards compatibility
guarantees at all. This is where all of you get creative and try to
sort out the problems in the best way possible.
2. Stable, when most of the problems were solved, and you now document
a final stable version of the representation. Every new input will
have to be represented as a combination of existing ones, so make them
generic enough. In need of real change, make sure you have a process
that identifies versions and compatibility (for example, having a
version tag on every dump), and letting the canonical tool know all of
the issues.

This last point is important if you want to continue reading old files
that don't have the compatibility issue, warn when they do but it's
irrelevant, or error when they do and it'll produce garbage. You can
also write more efficient converting tools.
>From what I understood of this XRay, you could in theory keep the datafor years in a tape somewhere in the attic, and want to read it later
to compare to a current run, so being compatible is important, but
having a canonical form that can be converted to and from other forms
is more important, or the comparison tools will get really messy
really quickly.

Hope that helps,

cheers,
--renato

[1] http://web.expasy.org/docs/swiss-prot_guideline.html

Kristof Beyls via llvm-dev

2016-Nov-30 13:26 UTC

head link

[llvm-dev] RFC: XRay in the LLVM Library

Hi Dean,

I haven't looked very closely at XRay so far, but I'm wondering if
making CTF (common trace format, e.g. see http://diamon.org/ctf/) the default
format for XRay traces would be useful?
It seems it'd be nice to be able to reuse some of the tools that already
exist for CTF, such as a graphical viewer (http://tracecompass.org/) or a
converter library (http://man7.org/linux/man-pages/man1/babeltrace.1.html).
LTTng already uses this format and linux perf can create traces in CTF format
too. Probably it would be useful for at least some to be able to combine traces
from XRay with traces from LTTng or linux perf?

Maybe the current version of CTF may not have all the features that you need,
but the next version of CTF (CTF 2) seems to be at least addressing some of the
concerns you touch on below:
http://diamon.org/ctf/files/CTF2-PROP-1.0.html#design-goals.

Any thoughts on whether CTF could be a good choice as the format to store XRay
logs in?

Thanks,

Kristof


On 30 Nov 2016, at 06:08, Dean Michael Berris via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hi llvm-dev,

Recently, we've committed the beginnings of the llvm-xray [0] tool which
allows for conveniently working with both XRay-instrumented libraries as well as
XRay trace/log files. In the course of the review for the conversion tool [1]
which turns a binary/raw XRay log file into YAML for human consumption purposes,
a question arose as to how we intend to allow users to develop tools that deal
with XRay traces (and the instrumentation maps in binaries).

As a bit of background, I've been working on the "flight data
recorder" mode [2] for the XRay runtime library -- this mode lets the XRay
instrumented binary to continuously write trace entries into an in-memory log,
which is kept as a circular buffer of buffers [3]. FDR mode writes more concise
records and has a different log format than the current "naive"
logging implementation in compiler-rt (which continuously writes to disk as soon
as thread-local buffers are full).

# Problem Statement

XRay has two key pieces of information that need to be encoded in a consistent
manner: the instrumentation map embedded in binaries and the xray log files.
However, we run into some issues when we change the encoding of this information
over time either adding or removing information. This situation is very similar
to how LLVM handles backwards compatibility with the bitcode format /
versioning.

The problem we have is how to ensure that as we make changes to the data being
output by the runtime library, that the tools handling this data are able to
read them. A lot of factors play into this, which may be solved in many
different ways (but is not the crux of this RFC):

- The split between the LLVM "core" library/tools and compiler-rt.
This means we implement the writer in compiler-rt but implement the tools
reading the traces in LLVM. We also have to coordinate any changes in LLVM for
encoding new information in to the instrumentation map so that compiler-rt can
take advantage of this new information.

- The potential for allowing user-defined additional information embedded in the
XRay traces. We have ongoing projects that will add things like argument
logging, and custom data logging, which will add information to the log without
necessarily changing the "format" of the data.

# Potential Resolutions

Given the state we're at in XRay's development, we're looking at a
few ways of going about the backwards/forwards compatibility of the
instrumentation map and the xray log files, and the tools that will be written
to read/manipulate them. We're seeking feedback on the following options and
alternatives we may not have considered.

## Option A: Expose a Library that supports all known formats.

We can move out some currently tool-specific code for `llvm-xray extract` [0]
that is able to ingest a binary with XRay instrumentation as something in
(strawman proposal) lib/XRay (i.e. include/llvm/XRay/..., and implementation in
lib/XRay/...), so that the tools become a thin wrapper around the functionality
in this library. We can apply this to the `llvm-xray convert` core logic as
well, to allow for loading all known/supported formats for the log file.

This option gives us the capability to provide a set of canonical
implementations that can handle a set of file formats. This might introduce some
complexity in parsing lots of known/supported formats (like YAML,
compiler-emitted instrumentation maps for x86_64/arm7/aarch64/<insert
platforms where XRay is yet to be ported>) in a library that not all tool
writers actually need.

This option follows closely what the LLVM project does with backwards
compatibility for parsing LLVM IR, applied to XRay instrumentation maps and
traces.

## Option B: Expose a library that only supports one canonical format.

We can keep tool-specific code alongside the tools, but define one canonical
format for the instrumentation map and traces -- as a specification document and
a library implementation. This canonical format could be what we already have
today which will make the log reading and instrumentation map handling library
simple, and evolves only in case we extend/change the canonical format.

This means in the case for FDR mode traces, we'll have the conversion tool
know about the FDR mode trace format/encoding and have a transformation from
that to the canonical format. This means that the transformation logic will be
localised to the conversion tool, while any other tool that builds upon and uses
the reader library will not need to change. This also provides options for users
defining their own log formats using the XRay library interfaces to install
their own handlers to implement the transformations from their format to the
XRay-canonical format in the tool without being tied to maintaining the released
library version.

The evolution of the canonical format can happen more slowly and more
conservatively than when new implementations of the XRay runtime is made
available through compiler-rt.

# Open Questions

Some burning questions we'd like to get some thoughts on:

- Is there a preference between the two options provided above?
- Any other alternatives we should consider?
- Which parts of which options do you prefer, and is there a synthesis of either
of those options that appeals to you?

Thanks in advance!

[0] - `llvm-xray extract` defined in https://reviews.llvm.org/D21987
[1] - `llvm-xray convert` being reviewed in https://reviews.llvm.org/D24376
[2] - FDR mode ongoing implementation (work in progress) at
https://reviews.llvm.org/D27038
[3] - Buffer Queue implementation (work in progress) at
https://reviews.llvm.org/D26232

-- Dean

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161130/9f65d6a2/attachment.html>

Dean Michael Berris via llvm-dev

2016-Dec-01 00:17 UTC

head link

[llvm-dev] RFC: XRay in the LLVM Library

> On 30 Nov. 2016, at 22:26, Renato Golin <renato.golin at linaro.org>
wrote:
> 
> On 30 November 2016 at 05:08, Dean Michael Berris via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> - Is there a preference between the two options provided above?
>> - Any other alternatives we should consider?
>> - Which parts of which options do you prefer, and is there a synthesis
of either of those options that appeals to you?
> 
> Hi Dean,
> 
> I haven't followed the XRay project that closely, but I have been
> around file formats being formed and either of your two approaches
> (which are pretty standard) will fail in different ways. But that's
> ok, because the "fixes" work, they're just not great.
> 
> If you take the LLVM IR, there were lots of changes, but we always
> aimed to have one canonical representation. Not just at the syntax of
> each instruction/construct, but how to represent complex behaviour in
> the same series of instructions, so that all back-ends can identify
> and work with it. Of course, the second (semantic) level is less
> stringent than the first (syntactical), but we try to make it as
> strict as possible.
> 
> This hasn't come for free. The two main costs were destructive
> semantics, for example when we lower C++ classes into arrays and
> change all the access to jumbled reads and writes because IR readers
> don't need to understand the ABI of all targets, and backwards
> incompatibility, for example when we completely changed how exception
> handling is lowered (from special basic blocks to special constructs
> as heads/tails of common basic blocks). That price was cheaper than
> the alternative, but it's still not free.
> 
> Another approach I followed was SwissProt [1], a manually curated
> machine readable text file with protein information for cross
> referencing. Cutting short to the chase, they introduced "line
types"
> with strict formatting for the most common information, and one line
> type called "comment" where free text was allowed, for additional
> information. With time, adding a new line type became impossible, so
> all new fields ended up being added in the comment lines, with a
> pseudo-strict formatting, which was (probably still is) a nightmare
> for parsers and humans alike.
> 
> Between the two, the LLVM IR policy for changes is orders of magnitude
> better. I suggest you follow that.
> 
> I also suggest you don't keep multiple canonical representations, and
> create tools to convert from any other to the canonical format.
Thanks Renato! Just so I understand this one sentence (to disambiguate), you
meant:

1) Don't have multiple canonical forms, just have one.
2) Create tools that will convert to/from that one canonical format.

I think this follows closely the Option B mental model that I had, with the only
difference being the canonical reader is a library made part of LLVM "when
it's ready", as you suggest later. Would that be accurate?
> 
> Finally, I'd separate the design in two phases:
> 
> 1. Experimental, where the canonical form changes constantly in light
> of new input and there are no backwards/forwards compatibility
> guarantees at all. This is where all of you get creative and try to
> sort out the problems in the best way possible.
> 2. Stable, when most of the problems were solved, and you now document
> a final stable version of the representation. Every new input will
> have to be represented as a combination of existing ones, so make them
> generic enough. In need of real change, make sure you have a process
> that identifies versions and compatibility (for example, having a
> version tag on every dump), and letting the canonical tool know all of
> the issues.
> 
> This last point is important if you want to continue reading old files
> that don't have the compatibility issue, warn when they do but it's
> irrelevant, or error when they do and it'll produce garbage. You can
> also write more efficient converting tools.
> 
I like this suggestion -- thanks!

So in essence we can treat the current implementation as experimental, and make
that abundantly clear in any point release where XRay functionality will be
included. Is there a clear place where this ought to be documented clearly
(aside from the documentation at http://llvm.org/docs/XRay.html)?

XRay trace file headers already contain a version identifier, intended to
precisely identify how a reader would interpret the data in there.
> From what I understood of this XRay, you could in theory keep the data
> for years in a tape somewhere in the attic, and want to read it later
> to compare to a current run, so being compatible is important, but
> having a canonical form that can be converted to and from other forms
> is more important, or the comparison tools will get really messy
> really quickly.
> 
Yep, this is definitely one of the goals which is why we're being very
careful about what we write down in the traces, optimising for efficient writing
and smaller traces at the cost of potential complexity in the analysis tooling.
> Hope that helps,
Definitely does, thanks again!

Cheers

-- Dean

Dean Michael Berris via llvm-dev

2016-Dec-01 00:32 UTC

head link

[llvm-dev] RFC: XRay in the LLVM Library

On 1 Dec. 2016, at 00:26, Kristof Beyls <Kristof.Beyls at arm.com>
wrote:> 
> Hi Dean,
> 
> I haven't looked very closely at XRay so far, but I'm wondering if
making CTF (common trace format, e.g. see http://diamon.org/ctf/) the default
format for XRay traces would be useful?
Nice! Thanks for mentioning this, I've not had a look at this before.

There's a couple issues I can think of, off the top of my head as to why
using that as the default format for XRay may be slightly problematic. More on
this below.
> It seems it'd be nice to be able to reuse some of the tools that
already exist for CTF, such as a graphical viewer (http://tracecompass.org/) or
a converter library (http://man7.org/linux/man-pages/man1/babeltrace.1.html).
> LTTng already uses this format and linux perf can create traces in CTF
format too. Probably it would be useful for at least some to be able to combine
traces from XRay with traces from LTTng or linux perf?
> 
This sounds like a great idea!

I'm working on a conversion tool that aims to target multiple output
formats. It's being developed at https://reviews.llvm.org/D24376 where the
intent is to start with something simple, but then grow support for multiple
other formats. CTF sounds like a perfectly reasonable target format.

Writing CTF though might be slightly problematic for XRay only for the potential
complexity that will bring into the runtime library. While conceptually the
formats are very similar (XRay uses binary logging, and efficient in-memory
structures to save on both space and time required to write them down) we'd
like the XRay library to make some more optimisations and evolve into a certain
direction without being tied down to one particular format.

I'll need to think about this a little more, but I definitely think
converting from whatever XRay format we come up with to CTF sounds like a great
feature to the conversion tool.
> Maybe the current version of CTF may not have all the features that you
need, but the next version of CTF (CTF 2) seems to be at least addressing some
of the concerns you touch on below:
http://diamon.org/ctf/files/CTF2-PROP-1.0.html#design-goals.
> 
> Any thoughts on whether CTF could be a good choice as the format to store
XRay logs in?
> 
I may need to think about it more, but I don't see a bad reason for being
able to convert XRay traces to CTF. :)

Cheers

-- Dean

David Blaikie via llvm-dev

2016-Dec-01 22:06 UTC

head link

[llvm-dev] RFC: XRay in the LLVM Library

On Wed, Nov 30, 2016 at 3:26 AM Renato Golin <renato.golin at linaro.org>
wrote:
> On 30 November 2016 at 05:08, Dean Michael Berris via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > - Is there a preference between the two options provided above?
> > - Any other alternatives we should consider?
> > - Which parts of which options do you prefer, and is there a synthesis
> of either of those options that appeals to you?
>
> Hi Dean,
>
> I haven't followed the XRay project that closely, but I have been
> around file formats being formed and either of your two approaches
> (which are pretty standard) will fail in different ways. But that's
> ok, because the "fixes" work, they're just not great.
>
> If you take the LLVM IR, there were lots of changes, but we always
> aimed to have one canonical representation. Not just at the syntax of
> each instruction/construct, but how to represent complex behaviour in
> the same series of instructions, so that all back-ends can identify
> and work with it. Of course, the second (semantic) level is less
> stringent than the first (syntactical), but we try to make it as
> strict as possible.
>
> This hasn't come for free. The two main costs were destructive
> semantics, for example when we lower C++ classes into arrays and
> change all the access to jumbled reads and writes because IR readers
> don't need to understand the ABI of all targets, and backwards
> incompatibility, for example when we completely changed how exception
> handling is lowered (from special basic blocks to special constructs
> as heads/tails of common basic blocks). That price was cheaper than
> the alternative, but it's still not free.
>
> Another approach I followed was SwissProt [1], a manually curated
> machine readable text file with protein information for cross
> referencing. Cutting short to the chase, they introduced "line
types"
> with strict formatting for the most common information, and one line
> type called "comment" where free text was allowed, for additional
> information. With time, adding a new line type became impossible, so
> all new fields ended up being added in the comment lines, with a
> pseudo-strict formatting, which was (probably still is) a nightmare
> for parsers and humans alike.
>
> Between the two, the LLVM IR policy for changes is orders of magnitude
> better. I suggest you follow that.
>
> I also suggest you don't keep multiple canonical representations, and
> create tools to convert from any other to the canonical format.
>
> Finally, I'd separate the design in two phases:
>
> 1. Experimental, where the canonical form changes constantly in light
> of new input and there are no backwards/forwards compatibility
> guarantees at all. This is where all of you get creative and try to
> sort out the problems in the best way possible.
> 2. Stable, when most of the problems were solved, and you now document
> a final stable version of the representation. Every new input will
> have to be represented as a combination of existing ones, so make them
> generic enough. In need of real change, make sure you have a process
> that identifies versions and compatibility (for example, having a
> version tag on every dump), and letting the canonical tool know all of
> the issues.
>
> This last point is important if you want to continue reading old files
> that don't have the compatibility issue, warn when they do but it's
> irrelevant, or error when they do and it'll produce garbage. You can
> also write more efficient converting tools.
>
> From what I understood of this XRay, you could in theory keep the data
> for years in a tape somewhere in the attic, and want to read it later
> to compare to a current run, so being compatible is important, but
> having a canonical form that can be converted to and from other forms
> is more important, or the comparison tools will get really messy
> really quickly.
>
Not sure I quite follow here - perhaps some misunderstanding.

My mental model here is that the formats are semantically equivalent - with
a common in-memory representation (like LLVM IR APIs). It
doesn't/shouldn't
complicate a comparison tool to support both LLVM IR and bitcode input (or
some other hypothetical formats that are semantically equivalent that we
could integrate into a common reading API). At least that's my mental model.

Is there something different here?

What I'm picturing is that we need an API for reading all these formats and
either we use that API only in the conversion tool - and users then have to
run the conversion tool before running the tool they want. Or we sink that
API into a common place, and all tools use that API to load inputs - making
the user experience simpler (they don't have to run an extra conversion
step/tool) but it doesn't seem like it should make the development
experience more complicated/messy/difficult.

- Dave

>
> Hope that helps,
>
> cheers,
> --renato
>
>
> [1] http://web.expasy.org/docs/swiss-prot_guideline.html
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161201/c3a06b9b/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Dec 2016 - RFC: XRay in the LLVM Library

[llvm-dev] RFC: XRay in the LLVM Library

[llvm-dev] RFC: XRay in the LLVM Library

[llvm-dev] RFC: XRay in the LLVM Library

[llvm-dev] RFC: XRay in the LLVM Library

[llvm-dev] RFC: XRay in the LLVM Library

[llvm-dev] RFC: XRay in the LLVM Library

Possibly Parallel Threads