Nathan Slingerland via llvm-dev
2016-Jan-15 19:06 UTC
[llvm-dev] [PGO] Thoughts on adding a key-value store to profile data formats
Hi all, I'd liked to get your thoughts on possibly adding a generic key-value store to the profile data formats for 'metadata'. Some potential uses cases: *I. Profile Features* The most basic use could be as a central repository for internal bits of housekeeping information about the profile data. For example, to differentiate between FE and IR instrumentation: llvm.instrumentation_source: "IR" A key-value store would make it simple to add new bits of information and help keep everything human-readable for the text-based test formats. This could potentially also help with error checking at the llvm-profdata level if the Reader classes exposed it. *II. Profile Context* Basic (lightweight) information about the profile could be automatically gathered at profile time. The idea would be to automatically label profiles with contextual information so that the age/origin of a profile could be inspected using the llvm-profdata tool. $ llvm-profdata show -metadata foo.profdata llvm.profile_start_time: "2016-01-08T23:41:56.755Z" llvm.profile_duration: 5.102s llvm.exe_time: "2016-01-08T23:35:56.745Z" Total functions: 4 Maximum function count: 866988873 Maximum internal block count: 267914296 Other possibilities: executable path, command line arguments, system info (uname) *III. Custom Content* The key-value store itself could be exposed to developers via the llvm-profdata tool. This would allow for users to associate arbitrary custom data with a profile, as well as inspect it: $ llvm-profdata merge -metadata=customkey,value1 foo.profraw -o foo.profdata $ llvm-profdata show -metadata foo.profdata customkey: "value1" Total functions: 4 Maximum function count: 866988873 Maximum internal block count: 267914296 Developers could add as much custom context as they find valuable: $ llvm-profdata merge -metadata="mysoft.version,${SOFTWARE_VERSION} (${BUILD_NUMBER})" -metadata="mysoft.exe_md5,`md5 -q foo.exe` foo.profraw -o foo.profdata $ llvm-profdata show -metadata foo.profdata mysoft.version: "0.1.0" mysoft.exe_md5: "337b5c5bc29cbdca090a1921a58465d6" Total functions: 4 Maximum function count: 866988873 Maximum internal block count: 267914296 Other information that might be interesting: git/svn revision, workload description, system info (uname -a) This would be a way to embed almost any platform-specific or heavy-weight data without requiring the addition of platform-specific code in compiler-rt and without impacting other developers. When profiles are merged it might be simplest to keep all input metadata (machine-readable things such as feature bits might need to be handled differently): $ llvm-profdata merge -weighted-input=3,foo.profdata bar.profdata -o foobar.profdata $ llvm-profdata show -metadata foobar.profdata foo.profdata llvm.profile_weight: 3 llvm.profile_start_time: "2016-01-08T23:41:56.755Z" llvm.profile_duration: 5.102s llvm.exe_time: "2016-01-08T23:35:56.745Z" customkey: "value1" bar.profdata llvm.profile_weight: 1 llvm.profile_start_time: "2016-01-15T00:08:41.168Z" llvm.profile_duration: "1.001s" llvm.exe_time: "2016-01-15T00:08:13.000Z" customkey: "value2" Total functions: 4 Maximum function count: 866988873 Maximum internal block count: 267914296 In terms of implementation, the metadata could live as a separate contiguous section in the binary profile formats. It might make sense to encode it in something like YAML so that it could also be directly embedded in the various text formats. ---- What do you think? How useful would any of the above be to you or other PGO users? Can you think of any other use cases? -Nathan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160115/bfb9b5e1/attachment.html>
Xinliang David Li via llvm-dev
2016-Jan-15 19:41 UTC
[llvm-dev] [PGO] Thoughts on adding a key-value store to profile data formats
Tagging profile data with such information is generally useful. My thoughts are 1) such information is probably not needed to be stored in raw format profile data -- so no runtime changes are needed -- only llvm-profdata and indexed format need to be enhanced to support this. 2) A more general way is just add an option: --embed_label=<customized_label>, where the label is a string can be key/value pairs encoded in user's favorite format. The format of the key-value pairs are not specified and remain opaque to Instr/Sample Profiler 3) labels from multiple source profiles will be merged when merge command is used. On Fri, Jan 15, 2016 at 11:06 AM, Nathan Slingerland <slingn at gmail.com> wrote:> Hi all, > > I'd liked to get your thoughts on possibly adding a generic key-value store > to the profile data formats for 'metadata'. Some potential uses cases: > > I. Profile Features > > The most basic use could be as a central repository for internal bits of > housekeeping information about the profile data. For example, to > differentiate between FE and IR instrumentation: > > llvm.instrumentation_source: "IR" > > A key-value store would make it simple to add new bits of information and > help keep everything human-readable for the text-based test formats. This > could potentially also help with error checking at the llvm-profdata level > if the Reader classes exposed it. >This is ok to have, but I don't think the reader class should rely on meta data to make decisions (as meta data can be thrown away without affecting correctness). Formal approach such as the one proposed (to encode it in variant bits of the version field) should be used.> II. Profile Context > > Basic (lightweight) information about the profile could be automatically > gathered at profile time. The idea would be to automatically label profiles > with contextual information so that the age/origin of a profile could be > inspected using the llvm-profdata tool. > > $ llvm-profdata show -metadata foo.profdata > llvm.profile_start_time: "2016-01-08T23:41:56.755Z" > llvm.profile_duration: 5.102s > llvm.exe_time: "2016-01-08T23:35:56.745Z"Other examples include options and workload used in the training run.> Total functions: 4 > Maximum function count: 866988873 > Maximum internal block count: 267914296 > > Other possibilities: executable path, command line arguments, system info > (uname)yes.> > III. Custom Content > > The key-value store itself could be exposed to developers via the > llvm-profdata tool. This would allow for users to associate arbitrary custom > data with a profile, as well as inspect it: > > $ llvm-profdata merge -metadata=customkey,value1 foo.profraw -o > foo.profdata > $ llvm-profdata show -metadata foo.profdata > customkey: "value1" > Total functions: 4 > Maximum function count: 866988873 > Maximum internal block count: 267914296 > > Developers could add as much custom context as they find valuable:I think all meta data should be custom defined -- the profile reader should not need to understand them.> > $ llvm-profdata merge -metadata="mysoft.version,${SOFTWARE_VERSION} > (${BUILD_NUMBER})" -metadata="mysoft.exe_md5,`md5 -q foo.exe` foo.profraw -o > foo.profdata > $ llvm-profdata show -metadata foo.profdata > mysoft.version: "0.1.0" > mysoft.exe_md5: "337b5c5bc29cbdca090a1921a58465d6" > Total functions: 4 > Maximum function count: 866988873 > Maximum internal block count: 267914296 > > Other information that might be interesting: git/svn revision, workload > description, system info (uname -a) > > This would be a way to embed almost any platform-specific or heavy-weight > data without requiring the addition of platform-specific code in compiler-rt > and without impacting other developers. >yes.> > When profiles are merged it might be simplest to keep all input metadata > (machine-readable things such as feature bits might need to be handled > differently):Feature bits should not be part of it.> > $ llvm-profdata merge -weighted-input=3,foo.profdata bar.profdata -o > foobar.profdata > $ llvm-profdata show -metadata foobar.profdata > foo.profdata > llvm.profile_weight: 3 > llvm.profile_start_time: "2016-01-08T23:41:56.755Z" > llvm.profile_duration: 5.102s > llvm.exe_time: "2016-01-08T23:35:56.745Z" > customkey: "value1" > bar.profdata > llvm.profile_weight: 1 > llvm.profile_start_time: "2016-01-15T00:08:41.168Z" > llvm.profile_duration: "1.001s" > llvm.exe_time: "2016-01-15T00:08:13.000Z" > customkey: "value2" > Total functions: 4 > Maximum function count: 866988873 > Maximum internal block count: 267914296 > > In terms of implementation, the metadata could live as a separate contiguous > section in the binary profile formats. It might make sense to encode it in > something like YAML so that it could also be directly embedded in the > various text formats. >A single string after the header should do. thanks, David> ---- > > What do you think? How useful would any of the above be to you or other PGO > users? > Can you think of any other use cases? > > -Nathan
Sean Silva via llvm-dev
2016-Jan-15 22:18 UTC
[llvm-dev] [PGO] Thoughts on adding a key-value store to profile data formats
On Fri, Jan 15, 2016 at 11:41 AM, Xinliang David Li <davidxl at google.com> wrote:> Tagging profile data with such information is generally useful. My > thoughts are > > 1) such information is probably not needed to be stored in raw format > profile data -- so no runtime changes are needed -- only llvm-profdata > and indexed format need to be enhanced to support this. > 2) A more general way is just add an option: > --embed_label=<customized_label>, where the label is a string can be > key/value pairs encoded in user's favorite format. The format of the > key-value pairs are not specified and remain opaque to Instr/Sample > Profiler > 3) labels from multiple source profiles will be merged when merge > command is used. > > On Fri, Jan 15, 2016 at 11:06 AM, Nathan Slingerland <slingn at gmail.com> > wrote: > > Hi all, > > > > I'd liked to get your thoughts on possibly adding a generic key-value > store > > to the profile data formats for 'metadata'. Some potential uses cases: > > > > I. Profile Features > > > > The most basic use could be as a central repository for internal bits of > > housekeeping information about the profile data. For example, to > > differentiate between FE and IR instrumentation: > > > > llvm.instrumentation_source: "IR" > > > > A key-value store would make it simple to add new bits of information and > > help keep everything human-readable for the text-based test formats. This > > could potentially also help with error checking at the llvm-profdata > level > > if the Reader classes exposed it. > > > > This is ok to have, but I don't think the reader class should rely on > meta data to make decisions (as meta data can be thrown away without > affecting correctness). Formal approach such as the one proposed (to > encode it in variant bits of the version field) should be used. >We could potentially have a "reserved namespace" like `llvm.*` which tools are not allowed to drop (or that have special handling inside tools). Assuming that we have a semantics that guarantees that some labels/"metadata" are kept (and that the compiler can communicate certain predefined labels to the runtime which propagate back to the profraw and then to the profdata), what do you think about using a generic format like this for things like versions and profile source, rather than attempting to fit everything in a small version field or having to come up with some convention for a variable being defined or not (as in http://reviews.llvm.org/D15540)? My impression is that it would give more flexibility and potentially simplify compatibility. -- Sean Silva> > > > II. Profile Context > > > > Basic (lightweight) information about the profile could be automatically > > gathered at profile time. The idea would be to automatically label > profiles > > with contextual information so that the age/origin of a profile could be > > inspected using the llvm-profdata tool. > > > > $ llvm-profdata show -metadata foo.profdata > > llvm.profile_start_time: "2016-01-08T23:41:56.755Z" > > llvm.profile_duration: 5.102s > > llvm.exe_time: "2016-01-08T23:35:56.745Z" > > Other examples include options and workload used in the training run. > > > Total functions: 4 > > Maximum function count: 866988873 > > Maximum internal block count: 267914296 > > > > Other possibilities: executable path, command line arguments, system info > > (uname) > > yes. > > > > > III. Custom Content > > > > The key-value store itself could be exposed to developers via the > > llvm-profdata tool. This would allow for users to associate arbitrary > custom > > data with a profile, as well as inspect it: > > > > $ llvm-profdata merge -metadata=customkey,value1 foo.profraw -o > > foo.profdata > > $ llvm-profdata show -metadata foo.profdata > > customkey: "value1" > > Total functions: 4 > > Maximum function count: 866988873 > > Maximum internal block count: 267914296 > > > > Developers could add as much custom context as they find valuable: > > I think all meta data should be custom defined -- the profile reader > should not need to understand them. > > > > > > $ llvm-profdata merge -metadata="mysoft.version,${SOFTWARE_VERSION} > > (${BUILD_NUMBER})" -metadata="mysoft.exe_md5,`md5 -q foo.exe` > foo.profraw -o > > foo.profdata > > $ llvm-profdata show -metadata foo.profdata > > mysoft.version: "0.1.0" > > mysoft.exe_md5: "337b5c5bc29cbdca090a1921a58465d6" > > Total functions: 4 > > Maximum function count: 866988873 > > Maximum internal block count: 267914296 > > > > Other information that might be interesting: git/svn revision, workload > > description, system info (uname -a) > > > > This would be a way to embed almost any platform-specific or heavy-weight > > data without requiring the addition of platform-specific code in > compiler-rt > > and without impacting other developers. > > > > yes. > > > > > When profiles are merged it might be simplest to keep all input metadata > > (machine-readable things such as feature bits might need to be handled > > differently): > > Feature bits should not be part of it. > > > > > $ llvm-profdata merge -weighted-input=3,foo.profdata bar.profdata -o > > foobar.profdata > > $ llvm-profdata show -metadata foobar.profdata > > foo.profdata > > llvm.profile_weight: 3 > > llvm.profile_start_time: "2016-01-08T23:41:56.755Z" > > llvm.profile_duration: 5.102s > > llvm.exe_time: "2016-01-08T23:35:56.745Z" > > customkey: "value1" > > bar.profdata > > llvm.profile_weight: 1 > > llvm.profile_start_time: "2016-01-15T00:08:41.168Z" > > llvm.profile_duration: "1.001s" > > llvm.exe_time: "2016-01-15T00:08:13.000Z" > > customkey: "value2" > > Total functions: 4 > > Maximum function count: 866988873 > > Maximum internal block count: 267914296 > > > > In terms of implementation, the metadata could live as a separate > contiguous > > section in the binary profile formats. It might make sense to encode it > in > > something like YAML so that it could also be directly embedded in the > > various text formats. > > > > A single string after the header should do. > > thanks, > > David > > > ---- > > > > What do you think? How useful would any of the above be to you or other > PGO > > users? > > Can you think of any other use cases? > > > > -Nathan >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160115/4940d41f/attachment.html>
Nathan Slingerland via llvm-dev
2016-Jan-18 16:38 UTC
[llvm-dev] [PGO] Thoughts on adding a key-value store to profile data formats
On Fri, Jan 15, 2016 at 11:41 AM, Xinliang David Li <davidxl at google.com> wrote:> Tagging profile data with such information is generally useful. My > thoughts are > > 1) such information is probably not needed to be stored in raw format > profile data -- so no runtime changes are needed -- only llvm-profdata > and indexed format need to be enhanced to support this. > 2) A more general way is just add an option: > --embed_label=<customized_label>, where the label is a string can be > key/value pairs encoded in user's favorite format. The format of the > key-value pairs are not specified and remain opaque to Instr/Sample > Profiler >...> > I think all meta data should be custom defined -- the profile reader > should not need to understand them. >OK. The benefit of enforcing some structure from the start is that it gives us the the possibility of machine parsing/round trip of the content for future applications. Initially this would just impact how we encode the label content - the reader classes could still treat the content as opaque for the time being if the format were something intended to be human-readable like YAML. On the other hand, if the metadata content begins life unstructured, it would be harder to retrofit structure later.> ... > > > > In terms of implementation, the metadata could live as a separate > contiguous > > section in the binary profile formats. It might make sense to encode it > in > > something like YAML so that it could also be directly embedded in the > > various text formats. > > > > A single string after the header should do. >For the text formats I'd suggest that we delimit the label information with known prefix/suffix lines. That keeps it easy to parse (and skip) - especially since the label content can be multiple lines. The delimiters would only be a part of the file format and wouldn't be displayed from llvm-profdata. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160118/ee0beca6/attachment.html>