Dean Michael Berris via llvm-dev
2018-Feb-15 02:34 UTC
[llvm-dev] RFC: XRay Profiling in LLVM
# Objective Implement an XRay mode 'xray-profiling' that gathers stack trace latencies/durations and builds histograms to provide basic statistics about where time is going in an execution of the application. # Background XRay has two modes currently implemented in compiler-rt: a basic (nee naive) mode and flight data recorder (FDR) mode. Basic mode logging, when enabled, will collect traces and write verbose logs to disk. This mode has some unsophisticated filtering and stack-depth limiting support. This mode is useful for tracing short-lived applications, so that execution can be traced throughout the lifetime of the application. We've used basic mode logging to identify sources of latency in Clang/LLVM. FDR mode logging is designed to record traces and aggressively filter out events that fall below a latency threshold. This mode uses a fixed amount of heap space to record events as they're happening, and only writes out the logs on demand. FDR mode is used mostly for long-running applications and services. We also already have tools in LLVM that can turn these XRay traces into other forms that are easier to digest. One tool turns XRay traces into something that the Chrome Trace Viewer can render [1] and another summarises it to generate latency flame graphs [2]. # Problem Statement Both basic and FDR mode logging can create massive traces. One example, where a fully-instrumented clang binary was traced with XRay's basic mode, we found that it could generate an 89GB trace compiling the canonical "Hello, World!" program. One way to address this problem is to limit the amount of memory being used by the implementation through flags. Unfortunately that also means basic mode XRay will block more on writing out the log files. Filtering at runtime also takes some CPU time, so the overheads of writing out the log as well as filtering the records become non-trivial for heavily instrumented programs. # Proposed Solution We'd like to propose a profiling mode to XRay, called 'xray-profiling'. The intent is to implement handlers that will generate profile data [3] that other tools can digest. We'll export the data in an XRay-specific format for a more compact representation. We intend to implement conversion tools to support the profile.proto format that pprof can handle as input. To do this, we need to record the "stack" of functions that are XRay instrumented, represented as a prefix tree (or forest, as we'll see later). This prefix tree will represent the XRay instrumented functions that have been entered, and then exited. A() -> B() -> C() \ -> C() Each node in the tree will contain a histogram of the latencies/durations of the function by subtracting the timestamp at the time of exit from the timestamp at entry. When the tracing functionality is finalised and flushed, we end up with a data file that represents the prefix tree in a summarised form following the profile.proto format. # Logistics As with the Basic and FDR modes, the profiling mode will be implemented in compiler-rt and will be default-linked into any XRay instrumented binary. In the future we can provide more targeted archives that can be controlled in clang, such that users can determine which runtime implementations they'd like to be linked into the final binary. This work may be done in parallel, but is unrelated to the work involved with the XRay profiling mode implementation. # Call for Reviewers One of the issues we're finding with the XRay work is there's very little community knowledge about what's happening in the project. I would like to change that by encouraging more reviewers, interested in XRay, to please sign up to be a reviewer. Eventually, we'd like to grow the list of people contributing to XRay from the community and this is an opportunity to do so. # Open Questions * The triggering mechanism is still through programmatically turning on/off XRay instrumentation in an XRay-instrumented binary. We're exploring some other options based on ptrace (will send another RFC at some point about this) and a tool that will explicitly control the XRay tracing externally. The alternative to this is having a dedicated XRay thread that will periodically check whether a signal has been tripped, and do the standardised procedure for enabling/disabling/flushing trace data. Any alternatives to this that we may not have considered yet? Thanks in advance! References [0] https://github.com/google/pprof [1] http://lists.llvm.org/pipermail/llvm-dev/2017-December/119529.html [2] https://llvm.org/docs/XRayExample.html#flame-graph-generation [3] https://github.com/google/pprof/blob/master/proto/profile.proto -- -- Dean -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180215/0c91b8c1/attachment.html>
Dean Michael Berris via llvm-dev
2018-Mar-27 01:10 UTC
[llvm-dev] RFC: XRay Profiling in LLVM
FYI: Patch is now available for review in https://reviews.llvm.org/D44620. On Thu, Feb 15, 2018 at 1:34 PM Dean Michael Berris <dean.berris at gmail.com> wrote:> # Objective > > Implement an XRay mode 'xray-profiling' that gathers stack trace > latencies/durations and builds histograms to provide basic statistics about > where time is going in an execution of the application. > > # Background > > XRay has two modes currently implemented in compiler-rt: a basic (nee > naive) mode and flight data recorder (FDR) mode. > > Basic mode logging, when enabled, will collect traces and write verbose > logs to disk. This mode has some unsophisticated filtering and stack-depth > limiting support. This mode is useful for tracing short-lived applications, > so that execution can be traced throughout the lifetime of the application. > We've used basic mode logging to identify sources of latency in Clang/LLVM. > > FDR mode logging is designed to record traces and aggressively filter out > events that fall below a latency threshold. This mode uses a fixed amount > of heap space to record events as they're happening, and only writes out > the logs on demand. FDR mode is used mostly for long-running applications > and services. > > We also already have tools in LLVM that can turn these XRay traces into > other forms that are easier to digest. One tool turns XRay traces into > something that the Chrome Trace Viewer can render [1] and another > summarises it to generate latency flame graphs [2]. > > # Problem Statement > > Both basic and FDR mode logging can create massive traces. One example, > where a fully-instrumented clang binary was traced with XRay's basic mode, > we found that it could generate an 89GB trace compiling the canonical > "Hello, World!" program. > > One way to address this problem is to limit the amount of memory being > used by the implementation through flags. Unfortunately that also means > basic mode XRay will block more on writing out the log files. > > Filtering at runtime also takes some CPU time, so the overheads of writing > out the log as well as filtering the records become non-trivial for heavily > instrumented programs. > > # Proposed Solution > > We'd like to propose a profiling mode to XRay, called 'xray-profiling'. > The intent is to implement handlers that will generate profile data [3] > that other tools can digest. We'll export the data in an XRay-specific > format for a more compact representation. We intend to implement conversion > tools to support the profile.proto format that pprof can handle as input. > > To do this, we need to record the "stack" of functions that are XRay > instrumented, represented as a prefix tree (or forest, as we'll see later). > This prefix tree will represent the XRay instrumented functions that have > been entered, and then exited. > > A() -> B() -> C() > \ > -> C() > > Each node in the tree will contain a histogram of the latencies/durations > of the function by subtracting the timestamp at the time of exit from the > timestamp at entry. > > When the tracing functionality is finalised and flushed, we end up with a > data file that represents the prefix tree in a summarised form following > the profile.proto format. > > # Logistics > > As with the Basic and FDR modes, the profiling mode will be implemented in > compiler-rt and will be default-linked into any XRay instrumented binary. > > In the future we can provide more targeted archives that can be controlled > in clang, such that users can determine which runtime implementations > they'd like to be linked into the final binary. This work may be done in > parallel, but is unrelated to the work involved with the XRay profiling > mode implementation. > > # Call for Reviewers > > One of the issues we're finding with the XRay work is there's very little > community knowledge about what's happening in the project. I would like to > change that by encouraging more reviewers, interested in XRay, to please > sign up to be a reviewer. Eventually, we'd like to grow the list of people > contributing to XRay from the community and this is an opportunity to do so. > > # Open Questions > > * The triggering mechanism is still through programmatically turning > on/off XRay instrumentation in an XRay-instrumented binary. We're exploring > some other options based on ptrace (will send another RFC at some point > about this) and a tool that will explicitly control the XRay tracing > externally. The alternative to this is having a dedicated XRay thread that > will periodically check whether a signal has been tripped, and do the > standardised procedure for enabling/disabling/flushing trace data. Any > alternatives to this that we may not have considered yet? > > Thanks in advance! > > > References > > [0] https://github.com/google/pprof > [1] http://lists.llvm.org/pipermail/llvm-dev/2017-December/119529.html > [2] https://llvm.org/docs/XRayExample.html#flame-graph-generation > [3] https://github.com/google/pprof/blob/master/proto/profile.proto > -- > -- Dean >-- -- Dean -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180327/d80ab673/attachment.html>
Dean Michael Berris via llvm-dev
2018-May-03 02:11 UTC
[llvm-dev] RFC: XRay Profiling in LLVM
As an additional update, this is now a series of patches. If you're interested in learning more about XRay in compiler-rt, please feel free to dive in and review the code. Any insights/questions/feedback would be most appreciated! https://reviews.llvm.org/D45757 - Part 2: Function Call Trie https://reviews.llvm.org/D45758 - Part 3: Profile Collector Service https://reviews.llvm.org/D44620 - Part 4: Profiler Mode Wiring https://reviews.llvm.org/D45998 - Part 5: Profile File Writing Cheers On Tue, Mar 27, 2018 at 12:10 PM Dean Michael Berris <dean.berris at gmail.com> wrote:> FYI: Patch is now available for review in https://reviews.llvm.org/D44620.> On Thu, Feb 15, 2018 at 1:34 PM Dean Michael Berris > <dean.berris at gmail.com>wrote:>> # Objective>> Implement an XRay mode 'xray-profiling' that gathers stack tracelatencies/durations and builds histograms to provide basic statistics about where time is going in an execution of the application.>> # Background>> XRay has two modes currently implemented in compiler-rt: a basic (neenaive) mode and flight data recorder (FDR) mode.>> Basic mode logging, when enabled, will collect traces and write verboselogs to disk. This mode has some unsophisticated filtering and stack-depth limiting support. This mode is useful for tracing short-lived applications, so that execution can be traced throughout the lifetime of the application. We've used basic mode logging to identify sources of latency in Clang/LLVM.>> FDR mode logging is designed to record traces and aggressively filterout events that fall below a latency threshold. This mode uses a fixed amount of heap space to record events as they're happening, and only writes out the logs on demand. FDR mode is used mostly for long-running applications and services.>> We also already have tools in LLVM that can turn these XRay traces intoother forms that are easier to digest. One tool turns XRay traces into something that the Chrome Trace Viewer can render [1] and another summarises it to generate latency flame graphs [2].>> # Problem Statement>> Both basic and FDR mode logging can create massive traces. One example,where a fully-instrumented clang binary was traced with XRay's basic mode, we found that it could generate an 89GB trace compiling the canonical "Hello, World!" program.>> One way to address this problem is to limit the amount of memory beingused by the implementation through flags. Unfortunately that also means basic mode XRay will block more on writing out the log files.>> Filtering at runtime also takes some CPU time, so the overheads ofwriting out the log as well as filtering the records become non-trivial for heavily instrumented programs.>> # Proposed Solution>> We'd like to propose a profiling mode to XRay, called 'xray-profiling'.The intent is to implement handlers that will generate profile data [3] that other tools can digest. We'll export the data in an XRay-specific format for a more compact representation. We intend to implement conversion tools to support the profile.proto format that pprof can handle as input.>> To do this, we need to record the "stack" of functions that are XRayinstrumented, represented as a prefix tree (or forest, as we'll see later). This prefix tree will represent the XRay instrumented functions that have been entered, and then exited.>> A() -> B() -> C() >> \ >> -> C()>> Each node in the tree will contain a histogram of thelatencies/durations of the function by subtracting the timestamp at the time of exit from the timestamp at entry.>> When the tracing functionality is finalised and flushed, we end up witha data file that represents the prefix tree in a summarised form following the profile.proto format.>> # Logistics>> As with the Basic and FDR modes, the profiling mode will be implementedin compiler-rt and will be default-linked into any XRay instrumented binary.>> In the future we can provide more targeted archives that can becontrolled in clang, such that users can determine which runtime implementations they'd like to be linked into the final binary. This work may be done in parallel, but is unrelated to the work involved with the XRay profiling mode implementation.>> # Call for Reviewers>> One of the issues we're finding with the XRay work is there's verylittle community knowledge about what's happening in the project. I would like to change that by encouraging more reviewers, interested in XRay, to please sign up to be a reviewer. Eventually, we'd like to grow the list of people contributing to XRay from the community and this is an opportunity to do so.>> # Open Questions>> * The triggering mechanism is still through programmatically turningon/off XRay instrumentation in an XRay-instrumented binary. We're exploring some other options based on ptrace (will send another RFC at some point about this) and a tool that will explicitly control the XRay tracing externally. The alternative to this is having a dedicated XRay thread that will periodically check whether a signal has been tripped, and do the standardised procedure for enabling/disabling/flushing trace data. Any alternatives to this that we may not have considered yet?>> Thanks in advance!>> References>> [0] https://github.com/google/pprof >> [1] http://lists.llvm.org/pipermail/llvm-dev/2017-December/119529.html >> [2] https://llvm.org/docs/XRayExample.html#flame-graph-generation >> [3] https://github.com/google/pprof/blob/master/proto/profile.proto >> -- >> -- Dean> -- > -- Dean-- Dean