betulb at codeaurora.org
2014-Oct-30 18:28 UTC
[LLVMdev] RFC: Indirect Call Target Profiling
Hi All, We've been working on adding indirect call target profiling support to the instrumented profiler for PGO purposes. Id like to propose the following design. Goal: Our aim is to add instrumentation around indirect call sites, so that the run-time can track the callee addresses and their access frequencies. From the addresses wed like to infer the callee names and use it in optimizations to improve the performance of applications which make heavy use of indirect calls. Spec is a candidate benchmark that gives us applications both written in C and C++ and makes use of indirect calls. Spec can prove the effectiveness of optimizations making use of this additional data. Design: To determine the function names from the profiled target addresses, we've extended the data variable that is built by build_data_var() in CodeGenPGO.cpp (abbr. PFDV: Per Function Data Variable) to save the function addresses. PFDV is communicated to the run-time during function registration and outputted in the raw profile data file. This data structure is also extended to contain the number of indirect call sites for each function. To help communicate the target addresses to run-time, we insert a call to a run-time routine before each indirect call site in clang. Something like: void instrument_indirect_call_site(uint8_t *TargetAddress, void *Data, uint32_t CounterIndex); This run-time function takes in the target address, the index/id of the indirect call site and the pointer to the profile data variable of the caller (i.e. PFDV). The runtime routine checks if the target address has been seen before for the indirect call site index/id or not. If not, then an entry is added into an internal data structure. If yes, the counter associated with the target address is incremented by 1. This counter records the number of times the target address is called. Raw profile data file stores the target addresses and the number of times any target address is taken per each call site index. llvm-profdata reads the function addresses from the raw profile data file, then compares them against the target addresses from the same file. Each match helps identify the function names for the recorded addresses. llvm-profdata processed files contain the target function names. In case no function matches the target address then the target address is converted to string and stored in that format in the indexed data files. On the PGO path, clang consumes the returned indirect target data and attaches the following metadata at the indirect call sites. !33 = metadata !{metadata !"indirect_call_targets", i64 <total_exec_count>, metadata !"target_fn1, i64 <target_fn1_count>, metadata !"target_fn2, i64 <target_fn2_count>, .} Only the top most called N function names are recorded at each indirect call site. indirect_call_targets is the string literal identifying the fields of this metadata. <total_exec_count> is a 64 bit value for the total number of times the indirect call is executed followed by the function names and execution counts of each target. We're working on collecting further data points on the overhead of this additional instrumentation on the original profiler. Looking forward to hearing your comments. Thanks, -Betul Buyukkurt Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
<slightly off topic> Indirect Call Target Profiling can be used for other purposes as well, for example to provide feedback to fuzzers. I've recently committed a very simple patch to AddressSanitizer that adds indirect call instrumentation specifically for this purpose. (llvm part: http://llvm.org/viewvc/llvm-project?rev=220699&view=rev, compiler-rt part coming soon). If this or similar proposal gets implemented in clang I'd love to see two things: - extreme performance of instrumented code (instrument_indirect_call_site needs to be very fast, lock-free and, ideally contention-free) - command-line utility that can dump the indir call coverage data in human- and scrip- readable format. </slightly off topic> On Thu, Oct 30, 2014 at 11:28 AM, <betulb at codeaurora.org> wrote:> Hi All, > > We've been working on adding indirect call target profiling support to the > instrumented profiler for PGO purposes. I’d like to propose the following > design. > > Goal: Our aim is to add instrumentation around indirect call sites, so > that the run-time can track the callee addresses and their access > frequencies. From the addresses we’d like to infer the callee names and > use it in optimizations to improve the performance of applications which > make heavy use of indirect calls. Spec is a candidate benchmark that gives > us applications both written in C and C++ and makes use of indirect calls. > Spec can prove the effectiveness of optimizations making use of this > additional data. > > Design: > To determine the function names from the profiled target addresses, we've > extended the data variable that is built by build_data_var() in > CodeGenPGO.cpp (abbr. PFDV: Per Function Data Variable) to save the > function addresses. PFDV is communicated to the run-time during function > registration and outputted in the raw profile data file. This data > structure is also extended to contain the number of indirect call sites > for each function. > > To help communicate the target addresses to run-time, we insert a call to > a run-time routine before each indirect call site in clang. Something > like: > > void instrument_indirect_call_site(uint8_t *TargetAddress, void *Data, > uint32_t CounterIndex); > > This run-time function takes in the target address, the index/id of the > indirect call site and the pointer to the profile data variable of the > caller (i.e. PFDV). The runtime routine checks if the target address has > been seen before for the indirect call site index/id or not. If not, then > an entry is added into an internal data structure. If yes, the counter > associated with the target address is incremented by 1. This counter > records the number of times the target address is called. > > Raw profile data file stores the target addresses and the number of times > any target address is taken per each call site index. llvm-profdata reads > the function addresses from the raw profile data file, then compares them > against the target addresses from the same file. Each match helps identify > the function names for the recorded addresses. > > llvm-profdata processed files contain the target function names. In case > no function matches the target address then the target address is > converted to string and stored in that format in the “indexed” data files. > On the PGO path, clang consumes the returned indirect target data and > attaches the following metadata at the indirect call sites. > > !33 = metadata !{metadata !"indirect_call_targets", i64 > <total_exec_count>, metadata !"target_fn1”, i64 <target_fn1_count>, > metadata !"target_fn2”, i64 <target_fn2_count>, ….} > > Only the top most called N function names are recorded at each indirect > call site. “indirect_call_targets” is the string literal identifying the > fields of this metadata. <total_exec_count> is a 64 bit value for the > total number of times the indirect call is executed followed by the > function names and execution counts of each target. > > We're working on collecting further data points on the overhead of this > additional instrumentation on the original profiler. Looking forward to > hearing your comments. > > Thanks, > -Betul Buyukkurt > > Qualcomm Innovation Center, Inc. > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, > a Linux Foundation Collaborative Project > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141030/fac23e9f/attachment.html>
betulb at codeaurora.org writes:> Hi All, > > We've been working on adding indirect call target profiling support to the > instrumented profiler for PGO purposes. I’d like to propose the following > design.This is an interesting idea! Do you have any data on performance improvements we might be able to expect from this work?> Goal: Our aim is to add instrumentation around indirect call sites, so > that the run-time can track the callee addresses and their access > frequencies. From the addresses we’d like to infer the callee names and > use it in optimizations to improve the performance of applications which > make heavy use of indirect calls. Spec is a candidate benchmark that gives > us applications both written in C and C++ and makes use of indirect calls. > Spec can prove the effectiveness of optimizations making use of this > additional data. > > Design: > To determine the function names from the profiled target addresses, we've > extended the data variable that is built by build_data_var() in > CodeGenPGO.cpp (abbr. PFDV: Per Function Data Variable) to save the > function addresses. PFDV is communicated to the run-time during function > registration and outputted in the raw profile data file. This data > structure is also extended to contain the number of indirect call sites > for each function.Where are the function addresses stored? The layout of the data variables has been designed to be very simple to write to a file efficiently and fairly simple to read back in for conversion. Will this change that?> To help communicate the target addresses to run-time, we insert a call to > a run-time routine before each indirect call site in clang. Something > like: > > void instrument_indirect_call_site(uint8_t *TargetAddress, void *Data, > uint32_t CounterIndex); > > This run-time function takes in the target address, the index/id of the > indirect call site and the pointer to the profile data variable of the > caller (i.e. PFDV). The runtime routine checks if the target address has > been seen before for the indirect call site index/id or not. If not, then > an entry is added into an internal data structure. If yes, the counter > associated with the target address is incremented by 1. This counter > records the number of times the target address is called.This sounds like it will be a fairly high overhead. Also, how will we manage the memory for the internal data structure? It's currently possible to use instrumentation based profiling in environments where malloc isn't available, and it would be unfortunate to lose this property.> Raw profile data file stores the target addresses and the number of times > any target address is taken per each call site index. llvm-profdata reads > the function addresses from the raw profile data file, then compares them > against the target addresses from the same file. Each match helps identify > the function names for the recorded addresses. > > llvm-profdata processed files contain the target function names. In case > no function matches the target address then the target address is > converted to string and stored in that format in the “indexed” data files. > On the PGO path, clang consumes the returned indirect target data and > attaches the following metadata at the indirect call sites. > > !33 = metadata !{metadata !"indirect_call_targets", i64 > <total_exec_count>, metadata !"target_fn1”, i64 <target_fn1_count>, > metadata !"target_fn2”, i64 <target_fn2_count>, ….} > > Only the top most called N function names are recorded at each indirect > call site. “indirect_call_targets” is the string literal identifying the > fields of this metadata. <total_exec_count> is a 64 bit value for the > total number of times the indirect call is executed followed by the > function names and execution counts of each target. > > We're working on collecting further data points on the overhead of this > additional instrumentation on the original profiler. Looking forward to > hearing your comments. > > Thanks, > -Betul Buyukkurt > > Qualcomm Innovation Center, Inc. > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, > a Linux Foundation Collaborative Project > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Philip Reames
2014-Nov-06 00:30 UTC
[LLVMdev] [cfe-dev] RFC: Indirect Call Target Profiling
On 10/30/2014 11:28 AM, betulb at codeaurora.org wrote:> Hi All, > > We've been working on adding indirect call target profiling support to the > instrumented profiler for PGO purposes. I’d like to propose the following > design. > > Goal: Our aim is to add instrumentation around indirect call sites, so > that the run-time can track the callee addresses and their access > frequencies. From the addresses we’d like to infer the callee names and > use it in optimizations to improve the performance of applications which > make heavy use of indirect calls. Spec is a candidate benchmark that gives > us applications both written in C and C++ and makes use of indirect calls. > Spec can prove the effectiveness of optimizations making use of this > additional data. > > Design: > To determine the function names from the profiled target addresses, we've > extended the data variable that is built by build_data_var() in > CodeGenPGO.cpp (abbr. PFDV: Per Function Data Variable) to save the > function addresses. PFDV is communicated to the run-time during function > registration and outputted in the raw profile data file. This data > structure is also extended to contain the number of indirect call sites > for each function. > > To help communicate the target addresses to run-time, we insert a call to > a run-time routine before each indirect call site in clang. Something > like: > > void instrument_indirect_call_site(uint8_t *TargetAddress, void *Data, > uint32_t CounterIndex); > > This run-time function takes in the target address, the index/id of the > indirect call site and the pointer to the profile data variable of the > caller (i.e. PFDV). The runtime routine checks if the target address has > been seen before for the indirect call site index/id or not. If not, then > an entry is added into an internal data structure. If yes, the counter > associated with the target address is incremented by 1. This counter > records the number of times the target address is called. > > Raw profile data file stores the target addresses and the number of times > any target address is taken per each call site index. llvm-profdata reads > the function addresses from the raw profile data file, then compares them > against the target addresses from the same file. Each match helps identify > the function names for the recorded addresses. > > llvm-profdata processed files contain the target function names. In case > no function matches the target address then the target address is > converted to string and stored in that format in the “indexed” data files.Up to here, I have no substantial comments. I haven't given this part of the proposal much thought; it's not directly aligned with my interests at the moment. Nothing jumped at me as fundamentally wrong though. As others pointed out, collection speed and storage are complicated topics with lots of details to work through.> On the PGO path, clang consumes the returned indirect target data and > attaches the following metadata at the indirect call sites. > > !33 = metadata !{metadata !"indirect_call_targets", i64 > <total_exec_count>, metadata !"target_fn1”, i64 <target_fn1_count>, > metadata !"target_fn2”, i64 <target_fn2_count>, ….} > > Only the top most called N function names are recorded at each indirect > call site. “indirect_call_targets” is the string literal identifying the > fields of this metadata. <total_exec_count> is a 64 bit value for the > total number of times the indirect call is executed followed by the > function names and execution counts of each target.This part I'm very interested in. I suggest that we separate this part from the broader proposal since it seems generally useful. Once the metadata is in place (from whatever source), we can share the optimization work. A couple of suggestions on format: - Name it "call_target_profile" (or something). The fact it's indirect call targets in your case is an irrelevant detail. - Allow an arbitrary number of {func, count} pairs - Describe the function via a direct Value reference. This allows symbolic functions, constant addresses, etc... - Drop the total count field. It doesn't contain any useful information. - Allow an "unknown count" marker for each function. This could easily be "-1" or something. - We need a marker for "other callee" with an associated (potentially unknown) count. Once this in place, we could potentially implemented a guarded-inline heuristic in the inliner based on the available profiling info. This would be fairly straight-forward (I think!). What other optimizations have you implemented (or plan to implement) using this data? I could see things like loop-unswitching on a loop-invariant call target with a profile (for example).> > We're working on collecting further data points on the overhead of this > additional instrumentation on the original profiler. Looking forward to > hearing your comments. > > Thanks, > -Betul Buyukkurt > > Qualcomm Innovation Center, Inc. > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, > a Linux Foundation Collaborative Project > > > > > _______________________________________________ > cfe-dev mailing list > cfe-dev at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev