thr3ads.net - llvm dev - [llvm-dev] [PGO] Thoughts on adding a key-value store to profile data formats [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Sean Silva via llvm-dev

2016-Jan-15 22:18 UTC

[llvm-dev] [PGO] Thoughts on adding a key-value store to profile data formats

On Fri, Jan 15, 2016 at 11:41 AM, Xinliang David Li <davidxl at
google.com>
wrote:
> Tagging profile data with such information is generally useful. My
> thoughts are
>
> 1) such information is probably not needed to be stored in raw format
> profile data -- so no runtime changes are needed -- only llvm-profdata
> and indexed format need to be enhanced to support this.
> 2) A more general way is just add an option:
> --embed_label=<customized_label>, where the label is a string can be
> key/value pairs encoded in user's favorite format. The format of the
> key-value pairs are not specified and remain opaque to Instr/Sample
> Profiler
> 3) labels from multiple source profiles will be merged when merge
> command is used.
>
> On Fri, Jan 15, 2016 at 11:06 AM, Nathan Slingerland <slingn at
gmail.com>
> wrote:
> > Hi all,
> >
> > I'd liked to get your thoughts on possibly adding a generic
key-value
> store
> > to the profile data formats for 'metadata'. Some potential
uses cases:
> >
> > I. Profile Features
> >
> > The most basic use could be as a central repository for internal bits
of
> > housekeeping information about the profile data. For example, to
> > differentiate between FE and IR instrumentation:
> >
> >   llvm.instrumentation_source: "IR"
> >
> > A key-value store would make it simple to add new bits of information
and
> > help keep everything human-readable for the text-based test formats.
This
> > could potentially also help with error checking at the llvm-profdata
> level
> > if the Reader classes exposed it.
> >
>
> This is ok to have, but I don't think the reader class should rely on
> meta data to make decisions (as meta data can be thrown away without
> affecting correctness). Formal approach such as the one proposed (to
> encode it in variant bits of the version field) should be used.
>
We could potentially have a "reserved namespace" like `llvm.*` which
tools
are not allowed to drop (or that have special handling inside tools).

Assuming that we have a semantics that guarantees that some
labels/"metadata" are kept (and that the compiler can communicate
certain
predefined labels to the runtime which propagate back to the profraw and
then to the profdata), what do you think about using a generic format like
this for things like versions and profile source, rather than attempting to
fit everything in a small version field or having to come up with some
convention for a variable being defined or not (as in
http://reviews.llvm.org/D15540)? My impression is that it would give more
flexibility and potentially simplify compatibility.

-- Sean Silva

>
>
> > II. Profile Context
> >
> > Basic (lightweight) information about the profile could be
automatically
> > gathered at profile time. The idea would be to automatically label
> profiles
> > with contextual information so that the age/origin of a profile could
be
> > inspected using the llvm-profdata tool.
> >
> >   $ llvm-profdata show -metadata foo.profdata
> >   llvm.profile_start_time: "2016-01-08T23:41:56.755Z"
> >   llvm.profile_duration: 5.102s
> >   llvm.exe_time: "2016-01-08T23:35:56.745Z"
>
> Other examples include options and workload used in the training run.
>
> >   Total functions: 4
> >   Maximum function count: 866988873
> >   Maximum internal block count: 267914296
> >
> > Other possibilities: executable path, command line arguments, system
info
> > (uname)
>
> yes.
>
> >
> > III. Custom Content
> >
> > The key-value store itself could be exposed to developers via the
> > llvm-profdata tool. This would allow for users to associate arbitrary
> custom
> > data with a profile, as well as inspect it:
> >
> >   $ llvm-profdata merge -metadata=customkey,value1 foo.profraw -o
> > foo.profdata
> >   $ llvm-profdata show -metadata foo.profdata
> >   customkey: "value1"
> >   Total functions: 4
> >   Maximum function count: 866988873
> >   Maximum internal block count: 267914296
> >
> > Developers could add as much custom context as they find valuable:
>
> I think all meta data should be custom defined -- the profile reader
> should not need to understand them.
>
>
> >
> >   $ llvm-profdata merge
-metadata="mysoft.version,${SOFTWARE_VERSION}
> > (${BUILD_NUMBER})" -metadata="mysoft.exe_md5,`md5 -q
foo.exe`
> foo.profraw -o
> > foo.profdata
> >   $ llvm-profdata show -metadata foo.profdata
> >   mysoft.version: "0.1.0"
> >   mysoft.exe_md5: "337b5c5bc29cbdca090a1921a58465d6"
> >   Total functions: 4
> >   Maximum function count: 866988873
> >   Maximum internal block count: 267914296
> >
> > Other information that might be interesting: git/svn revision,
workload
> > description, system info (uname -a)
> >
> > This would be a way to embed almost any platform-specific or
heavy-weight
> > data without requiring the addition of platform-specific code in
> compiler-rt
> > and without impacting other developers.
> >
>
> yes.
>
> >
> > When profiles are merged it might be simplest to keep all input
metadata
> > (machine-readable things such as feature bits might need to be handled
> > differently):
>
> Feature bits should not be part of it.
>
> >
> >   $ llvm-profdata merge -weighted-input=3,foo.profdata bar.profdata -o
> > foobar.profdata
> >   $ llvm-profdata show -metadata foobar.profdata
> >   foo.profdata
> >     llvm.profile_weight: 3
> >     llvm.profile_start_time: "2016-01-08T23:41:56.755Z"
> >     llvm.profile_duration: 5.102s
> >     llvm.exe_time: "2016-01-08T23:35:56.745Z"
> >     customkey: "value1"
> >   bar.profdata
> >     llvm.profile_weight: 1
> >     llvm.profile_start_time: "2016-01-15T00:08:41.168Z"
> >     llvm.profile_duration: "1.001s"
> >     llvm.exe_time: "2016-01-15T00:08:13.000Z"
> >     customkey: "value2"
> >   Total functions: 4
> >   Maximum function count: 866988873
> >   Maximum internal block count: 267914296
> >
> > In terms of implementation, the metadata could live as a separate
> contiguous
> > section in the binary profile formats. It might make sense to encode
it
> in
> > something like YAML so that it could also be directly embedded in the
> > various text formats.
> >
>
> A single string after the header should do.
>
> thanks,
>
> David
>
> > ----
> >
> > What do you think? How useful would any of the above be to you or
other
> PGO
> > users?
> > Can you think of any other use cases?
> >
> > -Nathan
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160115/4940d41f/attachment.html>

Xinliang David Li via llvm-dev

2016-Jan-15 23:53 UTC

head link

[llvm-dev] [PGO] Thoughts on adding a key-value store to profile data formats

This scheme is more flexible but not necessarily simplifying
compatibility.  We probably need more use cases in mind before we jump
into this flexibility (i.e passing arbitrary info from instrumentation
compile time to runtime  and pass it back to profile-use in a round
trip).  Note that we have 64 bits in version field -- and perhaps only
8 bits is actually needed for the actual version in reality so we have
lots of bits to use for this purpose.  On the other hand, I think this
is also orthogonal to the other approach -- if we run out of bits some
day, we can always implement this.

The offline profiling tagging proposed by Nathan is useful to have
regardless of the above.

David

On Fri, Jan 15, 2016 at 2:18 PM, Sean Silva <chisophugis at gmail.com>
wrote:>
>
> On Fri, Jan 15, 2016 at 11:41 AM, Xinliang David Li <davidxl at
google.com>
> wrote:
>>
>> Tagging profile data with such information is generally useful. My
>> thoughts are
>>
>> 1) such information is probably not needed to be stored in raw format
>> profile data -- so no runtime changes are needed -- only llvm-profdata
>> and indexed format need to be enhanced to support this.
>> 2) A more general way is just add an option:
>> --embed_label=<customized_label>, where the label is a string can
be
>> key/value pairs encoded in user's favorite format. The format of
the
>> key-value pairs are not specified and remain opaque to Instr/Sample
>> Profiler
>> 3) labels from multiple source profiles will be merged when merge
>> command is used.
>>
>> On Fri, Jan 15, 2016 at 11:06 AM, Nathan Slingerland <slingn at
gmail.com>
>> wrote:
>> > Hi all,
>> >
>> > I'd liked to get your thoughts on possibly adding a generic
key-value
>> > store
>> > to the profile data formats for 'metadata'. Some potential
uses cases:
>> >
>> > I. Profile Features
>> >
>> > The most basic use could be as a central repository for internal
bits of
>> > housekeeping information about the profile data. For example, to
>> > differentiate between FE and IR instrumentation:
>> >
>> >   llvm.instrumentation_source: "IR"
>> >
>> > A key-value store would make it simple to add new bits of
information
>> > and
>> > help keep everything human-readable for the text-based test
formats.
>> > This
>> > could potentially also help with error checking at the
llvm-profdata
>> > level
>> > if the Reader classes exposed it.
>> >
>>
>> This is ok to have, but I don't think the reader class should rely
on
>> meta data to make decisions (as meta data can be thrown away without
>> affecting correctness). Formal approach such as the one proposed (to
>> encode it in variant bits of the version field) should be used.
>
>
> We could potentially have a "reserved namespace" like `llvm.*`
which tools
> are not allowed to drop (or that have special handling inside tools).
>
> Assuming that we have a semantics that guarantees that some
> labels/"metadata" are kept (and that the compiler can communicate
certain
> predefined labels to the runtime which propagate back to the profraw and
> then to the profdata), what do you think about using a generic format like
> this for things like versions and profile source, rather than attempting to
> fit everything in a small version field or having to come up with some
> convention for a variable being defined or not (as in
> http://reviews.llvm.org/D15540)? My impression is that it would give more
> flexibility and potentially simplify compatibility.
>
> -- Sean Silva
>
>>
>>
>>
>> > II. Profile Context
>> >
>> > Basic (lightweight) information about the profile could be
automatically
>> > gathered at profile time. The idea would be to automatically label
>> > profiles
>> > with contextual information so that the age/origin of a profile
could be
>> > inspected using the llvm-profdata tool.
>> >
>> >   $ llvm-profdata show -metadata foo.profdata
>> >   llvm.profile_start_time: "2016-01-08T23:41:56.755Z"
>> >   llvm.profile_duration: 5.102s
>> >   llvm.exe_time: "2016-01-08T23:35:56.745Z"
>>
>> Other examples include options and workload used in the training run.
>>
>> >   Total functions: 4
>> >   Maximum function count: 866988873
>> >   Maximum internal block count: 267914296
>> >
>> > Other possibilities: executable path, command line arguments,
system
>> > info
>> > (uname)
>>
>> yes.
>>
>> >
>> > III. Custom Content
>> >
>> > The key-value store itself could be exposed to developers via the
>> > llvm-profdata tool. This would allow for users to associate
arbitrary
>> > custom
>> > data with a profile, as well as inspect it:
>> >
>> >   $ llvm-profdata merge -metadata=customkey,value1 foo.profraw -o
>> > foo.profdata
>> >   $ llvm-profdata show -metadata foo.profdata
>> >   customkey: "value1"
>> >   Total functions: 4
>> >   Maximum function count: 866988873
>> >   Maximum internal block count: 267914296
>> >
>> > Developers could add as much custom context as they find valuable:
>>
>> I think all meta data should be custom defined -- the profile reader
>> should not need to understand them.
>>
>>
>> >
>> >   $ llvm-profdata merge
-metadata="mysoft.version,${SOFTWARE_VERSION}
>> > (${BUILD_NUMBER})" -metadata="mysoft.exe_md5,`md5 -q
foo.exe`
>> > foo.profraw -o
>> > foo.profdata
>> >   $ llvm-profdata show -metadata foo.profdata
>> >   mysoft.version: "0.1.0"
>> >   mysoft.exe_md5: "337b5c5bc29cbdca090a1921a58465d6"
>> >   Total functions: 4
>> >   Maximum function count: 866988873
>> >   Maximum internal block count: 267914296
>> >
>> > Other information that might be interesting: git/svn revision,
workload
>> > description, system info (uname -a)
>> >
>> > This would be a way to embed almost any platform-specific or
>> > heavy-weight
>> > data without requiring the addition of platform-specific code in
>> > compiler-rt
>> > and without impacting other developers.
>> >
>>
>> yes.
>>
>> >
>> > When profiles are merged it might be simplest to keep all input
metadata
>> > (machine-readable things such as feature bits might need to be
handled
>> > differently):
>>
>> Feature bits should not be part of it.
>>
>> >
>> >   $ llvm-profdata merge -weighted-input=3,foo.profdata
bar.profdata -o
>> > foobar.profdata
>> >   $ llvm-profdata show -metadata foobar.profdata
>> >   foo.profdata
>> >     llvm.profile_weight: 3
>> >     llvm.profile_start_time: "2016-01-08T23:41:56.755Z"
>> >     llvm.profile_duration: 5.102s
>> >     llvm.exe_time: "2016-01-08T23:35:56.745Z"
>> >     customkey: "value1"
>> >   bar.profdata
>> >     llvm.profile_weight: 1
>> >     llvm.profile_start_time: "2016-01-15T00:08:41.168Z"
>> >     llvm.profile_duration: "1.001s"
>> >     llvm.exe_time: "2016-01-15T00:08:13.000Z"
>> >     customkey: "value2"
>> >   Total functions: 4
>> >   Maximum function count: 866988873
>> >   Maximum internal block count: 267914296
>> >
>> > In terms of implementation, the metadata could live as a separate
>> > contiguous
>> > section in the binary profile formats. It might make sense to
encode it
>> > in
>> > something like YAML so that it could also be directly embedded in
the
>> > various text formats.
>> >
>>
>> A single string after the header should do.
>>
>> thanks,
>>
>> David
>>
>> > ----
>> >
>> > What do you think? How useful would any of the above be to you or
other
>> > PGO
>> > users?
>> > Can you think of any other use cases?
>> >
>> > -Nathan
>
>

Sean Silva via llvm-dev

2016-Jan-16 00:36 UTC

head link

[llvm-dev] [PGO] Thoughts on adding a key-value store to profile data formats

On Fri, Jan 15, 2016 at 3:53 PM, Xinliang David Li <davidxl at google.com>
wrote:
> This scheme is more flexible but not necessarily simplifying
> compatibility.  We probably need more use cases in mind before we jump
> into this flexibility (i.e passing arbitrary info from instrumentation
> compile time to runtime  and pass it back to profile-use in a round
> trip).  Note that we have 64 bits in version field -- and perhaps only
> 8 bits is actually needed for the actual version in reality so we have
> lots of bits to use for this purpose.  On the other hand, I think this
> is also orthogonal to the other approach -- if we run out of bits some
> day, we can always implement this.
>
It may be worth thinking about even now. I've seen multiple patches
recently that are using ad-hoc techniques to communicate with the runtime.
E.g. r257230 uses a hack due to not having an orthogonal way to set the
version and variant bits; the result is inferior diagnostic quality and
obscured code intent.

-- Sean Silva

>
> The offline profiling tagging proposed by Nathan is useful to have
> regardless of the above.
>
> David
>
> On Fri, Jan 15, 2016 at 2:18 PM, Sean Silva <chisophugis at
gmail.com> wrote:
> >
> >
> > On Fri, Jan 15, 2016 at 11:41 AM, Xinliang David Li <davidxl at
google.com>
> > wrote:
> >>
> >> Tagging profile data with such information is generally useful. My
> >> thoughts are
> >>
> >> 1) such information is probably not needed to be stored in raw
format
> >> profile data -- so no runtime changes are needed -- only
llvm-profdata
> >> and indexed format need to be enhanced to support this.
> >> 2) A more general way is just add an option:
> >> --embed_label=<customized_label>, where the label is a
string can be
> >> key/value pairs encoded in user's favorite format. The format
of the
> >> key-value pairs are not specified and remain opaque to
Instr/Sample
> >> Profiler
> >> 3) labels from multiple source profiles will be merged when merge
> >> command is used.
> >>
> >> On Fri, Jan 15, 2016 at 11:06 AM, Nathan Slingerland <slingn at
gmail.com>
> >> wrote:
> >> > Hi all,
> >> >
> >> > I'd liked to get your thoughts on possibly adding a
generic key-value
> >> > store
> >> > to the profile data formats for 'metadata'. Some
potential uses cases:
> >> >
> >> > I. Profile Features
> >> >
> >> > The most basic use could be as a central repository for
internal bits
> of
> >> > housekeeping information about the profile data. For example,
to
> >> > differentiate between FE and IR instrumentation:
> >> >
> >> >   llvm.instrumentation_source: "IR"
> >> >
> >> > A key-value store would make it simple to add new bits of
information
> >> > and
> >> > help keep everything human-readable for the text-based test
formats.
> >> > This
> >> > could potentially also help with error checking at the
llvm-profdata
> >> > level
> >> > if the Reader classes exposed it.
> >> >
> >>
> >> This is ok to have, but I don't think the reader class should
rely on
> >> meta data to make decisions (as meta data can be thrown away
without
> >> affecting correctness). Formal approach such as the one proposed
(to
> >> encode it in variant bits of the version field) should be used.
> >
> >
> > We could potentially have a "reserved namespace" like
`llvm.*` which
> tools
> > are not allowed to drop (or that have special handling inside tools).
> >
> > Assuming that we have a semantics that guarantees that some
> > labels/"metadata" are kept (and that the compiler can
communicate certain
> > predefined labels to the runtime which propagate back to the profraw
and
> > then to the profdata), what do you think about using a generic format
> like
> > this for things like versions and profile source, rather than
attempting
> to
> > fit everything in a small version field or having to come up with some
> > convention for a variable being defined or not (as in
> > http://reviews.llvm.org/D15540)? My impression is that it would give
> more
> > flexibility and potentially simplify compatibility.
> >
> > -- Sean Silva
> >
> >>
> >>
> >>
> >> > II. Profile Context
> >> >
> >> > Basic (lightweight) information about the profile could be
> automatically
> >> > gathered at profile time. The idea would be to automatically
label
> >> > profiles
> >> > with contextual information so that the age/origin of a
profile could
> be
> >> > inspected using the llvm-profdata tool.
> >> >
> >> >   $ llvm-profdata show -metadata foo.profdata
> >> >   llvm.profile_start_time:
"2016-01-08T23:41:56.755Z"
> >> >   llvm.profile_duration: 5.102s
> >> >   llvm.exe_time: "2016-01-08T23:35:56.745Z"
> >>
> >> Other examples include options and workload used in the training
run.
> >>
> >> >   Total functions: 4
> >> >   Maximum function count: 866988873
> >> >   Maximum internal block count: 267914296
> >> >
> >> > Other possibilities: executable path, command line arguments,
system
> >> > info
> >> > (uname)
> >>
> >> yes.
> >>
> >> >
> >> > III. Custom Content
> >> >
> >> > The key-value store itself could be exposed to developers via
the
> >> > llvm-profdata tool. This would allow for users to associate
arbitrary
> >> > custom
> >> > data with a profile, as well as inspect it:
> >> >
> >> >   $ llvm-profdata merge -metadata=customkey,value1
foo.profraw -o
> >> > foo.profdata
> >> >   $ llvm-profdata show -metadata foo.profdata
> >> >   customkey: "value1"
> >> >   Total functions: 4
> >> >   Maximum function count: 866988873
> >> >   Maximum internal block count: 267914296
> >> >
> >> > Developers could add as much custom context as they find
valuable:
> >>
> >> I think all meta data should be custom defined -- the profile
reader
> >> should not need to understand them.
> >>
> >>
> >> >
> >> >   $ llvm-profdata merge
-metadata="mysoft.version,${SOFTWARE_VERSION}
> >> > (${BUILD_NUMBER})" -metadata="mysoft.exe_md5,`md5
-q foo.exe`
> >> > foo.profraw -o
> >> > foo.profdata
> >> >   $ llvm-profdata show -metadata foo.profdata
> >> >   mysoft.version: "0.1.0"
> >> >   mysoft.exe_md5:
"337b5c5bc29cbdca090a1921a58465d6"
> >> >   Total functions: 4
> >> >   Maximum function count: 866988873
> >> >   Maximum internal block count: 267914296
> >> >
> >> > Other information that might be interesting: git/svn
revision,
> workload
> >> > description, system info (uname -a)
> >> >
> >> > This would be a way to embed almost any platform-specific or
> >> > heavy-weight
> >> > data without requiring the addition of platform-specific code
in
> >> > compiler-rt
> >> > and without impacting other developers.
> >> >
> >>
> >> yes.
> >>
> >> >
> >> > When profiles are merged it might be simplest to keep all
input
> metadata
> >> > (machine-readable things such as feature bits might need to
be handled
> >> > differently):
> >>
> >> Feature bits should not be part of it.
> >>
> >> >
> >> >   $ llvm-profdata merge -weighted-input=3,foo.profdata
bar.profdata -o
> >> > foobar.profdata
> >> >   $ llvm-profdata show -metadata foobar.profdata
> >> >   foo.profdata
> >> >     llvm.profile_weight: 3
> >> >     llvm.profile_start_time:
"2016-01-08T23:41:56.755Z"
> >> >     llvm.profile_duration: 5.102s
> >> >     llvm.exe_time: "2016-01-08T23:35:56.745Z"
> >> >     customkey: "value1"
> >> >   bar.profdata
> >> >     llvm.profile_weight: 1
> >> >     llvm.profile_start_time:
"2016-01-15T00:08:41.168Z"
> >> >     llvm.profile_duration: "1.001s"
> >> >     llvm.exe_time: "2016-01-15T00:08:13.000Z"
> >> >     customkey: "value2"
> >> >   Total functions: 4
> >> >   Maximum function count: 866988873
> >> >   Maximum internal block count: 267914296
> >> >
> >> > In terms of implementation, the metadata could live as a
separate
> >> > contiguous
> >> > section in the binary profile formats. It might make sense to
encode
> it
> >> > in
> >> > something like YAML so that it could also be directly
embedded in the
> >> > various text formats.
> >> >
> >>
> >> A single string after the header should do.
> >>
> >> thanks,
> >>
> >> David
> >>
> >> > ----
> >> >
> >> > What do you think? How useful would any of the above be to
you or
> other
> >> > PGO
> >> > users?
> >> > Can you think of any other use cases?
> >> >
> >> > -Nathan
> >
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160115/03bf36cc/attachment.html>

llvm dev - Jan 2016 - [PGO] Thoughts on adding a key-value store to profile data formats

[llvm-dev] [PGO] Thoughts on adding a key-value store to profile data formats

[llvm-dev] [PGO] Thoughts on adding a key-value store to profile data formats

[llvm-dev] [PGO] Thoughts on adding a key-value store to profile data formats