Ananthakrishna Sowda (asowda) via llvm-dev
2019-Feb-07  21:49 UTC
[llvm-dev] RFC: [DebugInfo] Improving Debug Information in LLVM to Recover Optimized-out Function Parameters
Hi, 
Following is a proposal to improve location coverage for Function parameters in
LLVM. The patches for review will be posted soon.
RFC: [DebugInfo] Improving Debug Information in LLVM to Recover Optimized-out
Function Parameters
Ananthakrishna Sowda(Cisco), asowda at cisco.com
Nikola Prica (RT-RK/Cisco), nprica at rtrk.com
Djordje Todorovic(RT-RK/Cisco), djtodorovic at rtrk.com
Ivan Baev (Cisco), ibaev at cisco.com
Overview of the problem 
Software release products are compiled with optimization level –O2 and higher.
Such products might produce a core-file in case of a failure. Support engineers
usually begin debug analysis by looking at the backtrace from a core-file.
Unfortunately, many parameters in backtraces are reported as optimized out due
to variety of reasons. This makes triaging the issue and assigning ownership
harder due to missing information. It is harder for the product team to
understand the cause of the failure. In summary, we are describing a well-known
serviceability problem for optimized production code.
Proposal for solution
Function parameters have a natural fall-back location which is parent frame.
Debuggers can easily go up a frame in call-chain and evaluate any expression.
Expert developers can find what values parameters had at function entry point by
examining disassembly of caller frame at that particular function call. With
additional call-site information produced by compiler, debugger can fully
automate this technique. DWARF 5 specification has new tags and attributes to
describe call-site parameter information [1][2]. it is already implemented in
GCC and GDB since 2011[3]. We propose implementing this feature in LLVM to
enhance the debugging of optimized code experience of LLVM users.
Prior mention
An initial version of our work was presented as a poster during LLVM Developer
Meeting, in San Jose, 2018. The feature is now fully implemented in internal
Clang/LLVM 4.0 version.
We presented a talk on our work at FOSDEM 2019[4].
Implementation notes in Clang and LLVM
On the callee side the only information that we need is whether a parameter is
never modified in the function. If true then we can use parameter’s entry value
when we lose track of parameter’s location. As a natural way of handling this
problem we used Clang’s Sema and its constness check to embed this information
in variable’s declaration which is later used for DILocalVariable construction.
For call-site information, new DINode metadata DICallSite and DICallSiteParam
are defined and these are emitted by the Clang frontend. The metadata is
associated to the call or invoke IR instruction. Here is an example:
%call5 = call i32 @fed_em_strncmp(i8* %arraydecay, i8* %arraydecay1, i64 5),
!dbg !114, !call_site !101
…
!99 = !DICallSiteParam(argno: 1, variable: !91, expr: !DIExpression())
!100 = !DICallSiteParam(argno: 2, variable: !95, expr: !DIExpression())
!101 = !DICallSite(scope: !87, file: !3, parameters: !102, line: 40,
calledSubprogram: !13)
!102 = !{!99, !100, !103}
!103 = !DICallSiteParam(argno: 3, expr: !DIExpression(DW_OP_lit5)
For tracking call sites and call site parameters in backend two new pseudo
instructions, DBG_CALLSITE and DBG_CALLSITEPARAM, are introduced. See the MIR
code bellow:
DBG_CALLSITE 0, %noreg, <!19>; dbg:strncmp.c:40:47
          * DBG_CALLSITEPARAM %RDX,  <0x727fee0> =
!DIExpression(DW_OP_lit5), 5, %noreg ; dbg:strncmp.c:40:47
          * DBG_CALLSITEPARAM %RSI, "str2" <0x71a6dd0> =
!DIExpression(), %RBX, %noreg ; dbg:strncmp.c:40:47
          * DBG_CALLSITEPARAM %RDI, "str1" <0x71a6dd0> =
!DIExpression(), %, %RSP, 4 ; dbg:strncmp.c:40:47
There is a challenge in ISel phase to produce them. Algorithm that collects
information about call site parameters iterates over call sequence chained nodes
returned from target specific call lowering interface. Goal of the algorithm is
to recognize SDNodes that represent instructions which will load function
arguments into registers that transfer them into another function call frame.
There is a question whether this is effectively implemented as a general
matching algorithm or it should be lowered to target specific level.
DBG_CALLSITE pseudo instruction will need to be revisited since information
whether a call is tail call or not could be extracted differently but for sake
of simplicity we chose this.
Most of passes handle DBG_CALLSITE and DBG_CALLSITEPARAM through target
instruction info interface method isDebugInstr(). This method is used to skip
processing of pseudo debug instructions. Since these new pseudo debug
instructions relay on virtual registers and frame objects we need to follow up
their substitution through the compilation phases. There were several backend
passes that needed special attention: Register Coalesce, Inline Spiller,
Prologue Epilog Inserter, Split Kit and Virtual Register Rewriter. Virtual
Register Rewriter required implementation of target specific salvaging interface
for “call site parameter identities” – situation following identity copy
instructions that leads to overlapping of parameter transferring register
location and location that is loaded into that register.
The last challenge is to extend LiveDebugValues pass to generate additional
DBG_VALUE instructions with new kind of debug expression (with
‘DW_OP_entry_value’) for parameters that meet the requirements described in [1].
Finally, emitting call-site debug information can be controlled by
‘-debugger-tune’ and ‘-dwarf-version’ LLVM code-generation flags, since not all
debuggers used in the community consume this DWARF information.
Location coverage improvement
The important criteria in debugging-optimized-code is whether the compiler has
location information for variables and parameters. We use ‘locstats’ utility
from elfutils [5] package to guide us in improving overall location coverage in
final executable. For each non-artificial variable or formal parameter - or
Debugging Information Entry in DWARF - ‘locstats’ computes what percentage from
the code section bytes where the variable is in scope, the variable has a
non-empty location description. 100% coverage is not expected for non-global
variables and function parameters, since value may not be ‘live’ through the
entire scope. On the other end, 0% coverage for variables which are used in the
code is indicative of compiler losing track of values.
The second column in Table 1 shows ‘locstats’ report for gdb-7.11 compiled for
x86-64 with “-g –O2” by Clang 4.0. For example, there are 29476 parameters whose
coverage is in 91..100% range. The third column shows locstats’ report with “-g
–O2” and the parameter-entry-value feature. There are now 37671 parameters whose
coverage is in 91..100% range – for a 28% improvement.
Because our implementation computes an additional location list entry to
parameters whenever possible, and DW_OP_entry_value is valid through the entire
scope of the parameter, the numbers at 91..100 row are relevant indication of
improvement with parameter-entry-value.
Coverage	Parameters	Parameters with emit-param-entry-values
(% range)	(number/%))	(number/%)
0..10		22682/30%	21342/28%
11..20		3498/4%	2337/3%
21..30		3083/4%	1986/2%
31..40		3050/4%	1862/2%
41..50		2534/3%	1574/2%
51..60		2349/3%	1571/2%
61..70		2184/2%	1649/2%
71..80		2620/3%	2069/2%
81..90		3432/4%	2847/3%
91..100		29476/39%	37671/50%
Table 1 Location coverage statistics for function parasmeters
Improved backtrace for optimized code in debugger
Figure 1 below shows improved backtrace for optimized code when compiled with
parameter entry value tracking feature. Please note the new @entry values
reported for parameters in backtrace. These parameters will otherwise be
reported as <optimized-out>.
gdb) bt
#0 get_next_move_from_list (list=list at entry=0x7fffffffbf88,
color=color at entry=1, moves=moves at entry=0x7fffffffbfb0,
cutoff=cutoff at entry=100) at engine/owl.c:3032
#1 0x000000000042a957 in do_owl_attack (str=<optimized out>,
move=<optimized out>, move at entry=0x7fffffffc334, wormid=<optimized
out>, wormid at entry=0x7fffffffc33c, owl=<optimized out>, owl at
entry=0x0,
komaster=komaster at entry=0, kom_pos=kom_pos at entry=0,escape=<optimized
out>)
at engine/owl.c:1306
#2 0x000000000042a0d0 in owl_attack (target=target at entry=148,
attack_point=attack_point at entry=0x7fffffffc580, certain=<optimized
out>, certain at entry=0xb63048 <dragon+11288>, kworm=kworm at
entry=0x7fffffffc3c4) at engine/owl.c:1144
#3 0x0000000000412c71 in make_dragons (color=<optimized out>, color at
entry=1, stop_before_owl=<optimized out>, stop_before_owl at entry=0,
save_verbose=<optimized out>, save_verbose at entry=0) at
engine/dragon.c:346
#4 0x0000000000417fdc in examine_position (color=color at entry=1,
how_much=how_much at entry=99) at engine/genmove.c:152
#5 0x00000000004183c6 in do_genmove (move=move at entry=0x7fffffffd344, color=1,
color at entry=3, pure_threat_value=<optimized out>,
allowed_moves=<optimized out>, allowed_moves at entry=0x0)
at engine/genmove.c:334
#6 0x000000000041926d in genmove_conservative (i=i at entry=0x7fffffffd36c, j=j
at entry=0x7fffffffd368, color=3) at engine/genmove.c:255
#7 0x00000000004618ae in gtp_gg_genmove (s=<optimized out>) at
interface/play_gtp.c:2163
#8 0x000000000045b0f8 in gtp_main_loop (commands=<optimized out>,
gtp_input=0xb8b100) at interface/gtp.c:126
Figure 1: Backtrace with @entry value parameters
Cost in disk image size increase and compile-time
The parameter-entry-value feature is enabled with -g compilation. Due to new
DebugInfo metadata generation which adds entries to DWARF sections such as
.debug_info and .debug_loc, there is expected size increase of disk image of the
executable built with “-g –O”. For SPEC CPU 2006 benchmark, the average size
increase is 15%. However, there is no change in sections loaded at runtime such
as .text. .data, .bss.  Hence, there is no runtime size increase.
Compile-time cost increase is 1-3% percent for SPEC CPU 2006.
Community up-streaming
Since we have implemented this for LLVM-4.0 we are currently in process of
porting this implementation on LLVM trunk. We are planning to share this set of
patches with LLVM community and seek feedback in improving certain parts of our
implementation.
References
[1]Jakub Jelínek and Roland McGrath. DWARF DW_OP_entry_value extension
proposal. http://dwarfstd.org/ShowIssue.php? issue=100909.1.
[2 Jakub Jelínek, Roland McGrath, Jan Kratochvíl, and Alexandre Oliva. DWARF
DW_TAG_call_site extension proposal. http://dwarfstd.org/
ShowIssue.php?issue=100909.2
[3] J. Jelinek “Improving debug info for optimized away parameters”
https://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=view&target=jelinek.pdf
[4] FOSDEM talk http://bofh.nikhef.nl/events/FOSDEM/2019/K.4.201/llvm_debug.webm
[5] Elfutils https://sourceware.org/elfutils/
Adrian Prantl via llvm-dev
2019-Feb-07  23:17 UTC
[llvm-dev] RFC: [DebugInfo] Improving Debug Information in LLVM to Recover Optimized-out Function Parameters
> On Feb 7, 2019, at 1:49 PM, Ananthakrishna Sowda (asowda) via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi, > Following is a proposal to improve location coverage for Function parameters in LLVM. The patches for review will be posted soon. > > RFC: [DebugInfo] Improving Debug Information in LLVM to Recover Optimized-out Function Parameters > > Ananthakrishna Sowda(Cisco), asowda at cisco.com > Nikola Prica (RT-RK/Cisco), nprica at rtrk.com > Djordje Todorovic(RT-RK/Cisco), djtodorovic at rtrk.com > Ivan Baev (Cisco), ibaev at cisco.com > > > Overview of the problem > Software release products are compiled with optimization level –O2 and higher. Such products might produce a core-file in case of a failure. Support engineers usually begin debug analysis by looking at the backtrace from a core-file. Unfortunately, many parameters in backtraces are reported as optimized out due to variety of reasons. This makes triaging the issue and assigning ownership harder due to missing information. It is harder for the product team to understand the cause of the failure. In summary, we are describing a well-known serviceability problem for optimized production code. > > Proposal for solution > Function parameters have a natural fall-back location which is parent frame. Debuggers can easily go up a frame in call-chain and evaluate any expression. Expert developers can find what values parameters had at function entry point by examining disassembly of caller frame at that particular function call. With additional call-site information produced by compiler, debugger can fully automate this technique. DWARF 5 specification has new tags and attributes to describe call-site parameter information [1][2]. it is already implemented in GCC and GDB since 2011[3]. We propose implementing this feature in LLVM to enhance the debugging of optimized code experience of LLVM users. > > Prior mention > An initial version of our work was presented as a poster during LLVM Developer Meeting, in San Jose, 2018. The feature is now fully implemented in internal Clang/LLVM 4.0 version. > We presented a talk on our work at FOSDEM 2019[4]. >Thank you for posting this. This looks very interesting! Since your proposal has a lot of different components, Sema support, DW_AT_call_site_parameter support, DW_OP_entry_value support, it will probably be best to split them out into separate reviews, but it's also good to discuss the proposal in its entirety first. I have a bunch of questions to make sure I fully understand what you are doing.> Implementation notes in Clang and LLVM > On the callee side the only information that we need is whether a parameter is never modified in the function. If true then we can use parameter’s entry value when we lose track of parameter’s location. As a natural way of handling this problem we used Clang’s Sema and its constness check to embed this information in variable’s declaration which is later used for DILocalVariable construction.By looking at whether an argument is modified in the function, you can identify variables that can be described with an entry value location and that entry value would be valid throughout the function. Are you using this information in the function body to identify whether to emit an entry value location, or are you using this information at the call site to identify call sites for which call site parameters would be beneficial (or both)? Is emitting an entry value location in the function body an either-or thing or do you also emit plain old locations if you have them available in the location list together with the entry values? In the function, I assume you don't know whether all call sites will have call site parameters. How do you decide whether to emit entry value locations?> For call-site information, new DINode metadata DICallSite and DICallSiteParam are defined and these are emitted by the Clang frontend. The metadata is associated to the call or invoke IR instruction. Here is an example: > > %call5 = call i32 @fed_em_strncmp(i8* %arraydecay, i8* %arraydecay1, i64 5), !dbg !114, !call_site !101 > … > !99 = !DICallSiteParam(argno: 1, variable: !91, expr: !DIExpression()) > !100 = !DICallSiteParam(argno: 2, variable: !95, expr: !DIExpression()) > !101 = !DICallSite(scope: !87, file: !3, parameters: !102, line: 40, calledSubprogram: !13) > !102 = !{!99, !100, !103} > !103 = !DICallSiteParam(argno: 3, expr: !DIExpression(DW_OP_lit5) > For tracking call sites and call site parameters in backend two new pseudo instructions, DBG_CALLSITE and DBG_CALLSITEPARAM, are introduced. See the MIR code bellow: > > DBG_CALLSITE 0, %noreg, <!19>; dbg:strncmp.c:40:47 > * DBG_CALLSITEPARAM %RDX, <0x727fee0> = !DIExpression(DW_OP_lit5), 5, %noreg ; dbg:strncmp.c:40:47 > * DBG_CALLSITEPARAM %RSI, "str2" <0x71a6dd0> = !DIExpression(), %RBX, %noreg ; dbg:strncmp.c:40:47 > * DBG_CALLSITEPARAM %RDI, "str1" <0x71a6dd0> = !DIExpression(), %, %RSP, 4 ; dbg:strncmp.c:40:47I'll refrain from bike-shedding the actual implementation; let's save this for phabricator, but conceptually, this makes sense to me. If I understand correctly, you are identifying at the call site parameters that are in locations that can be restored by unwinding the function call, such as constants, stack slots, and callee/r-saved registers. Can you explain why you need to identify them at the IR level? Could you do it just in MIR, too, or is there some information missing in MIR? What happens to your DICallSiteParam when a function call gets inlined?> > There is a challenge in ISel phase to produce them. Algorithm that collects information about call site parameters iterates over call sequence chained nodes returned from target specific call lowering interface. Goal of the algorithm is to recognize SDNodes that represent instructions which will load function arguments into registers that transfer them into another function call frame. There is a question whether this is effectively implemented as a general matching algorithm or it should be lowered to target specific level. DBG_CALLSITE pseudo instruction will need to be revisited since information whether a call is tail call or not could be extracted differently but for sake of simplicity we chose this. > Most of passes handle DBG_CALLSITE and DBG_CALLSITEPARAM through target instruction info interface method isDebugInstr(). This method is used to skip processing of pseudo debug instructions. Since these new pseudo debug instructions relay on virtual registers and frame objects we need to follow up their substitution through the compilation phases. There were several backend passes that needed special attention: Register Coalesce, Inline Spiller, Prologue Epilog Inserter, Split Kit and Virtual Register Rewriter. Virtual Register Rewriter required implementation of target specific salvaging interface for “call site parameter identities” – situation following identity copy instructions that leads to overlapping of parameter transferring register location and location that is loaded into that register. > The last challenge is to extend LiveDebugValues pass to generate additional DBG_VALUE instructions with new kind of debug expression (with ‘DW_OP_entry_value’) for parameters that meet the requirements described in [1]. > Finally, emitting call-site debug information can be controlled by ‘-debugger-tune’ and ‘-dwarf-version’ LLVM code-generation flags, since not all debuggers used in the community consume this DWARF information.Have you considered to instead insert a very late MIR pass that does some backwards analysis on the machine code to yield potential call site parameters instead of threading it all the way through the compiler? If yes, why did you choose this implementation?> > Location coverage improvement > The important criteria in debugging-optimized-code is whether the compiler has location information for variables and parameters. We use ‘locstats’ utility from elfutils [5] package to guide us in improving overall location coverage in final executable. For each non-artificial variable or formal parameter - or Debugging Information Entry in DWARF - ‘locstats’ computes what percentage from the code section bytes where the variable is in scope, the variable has a non-empty location description. 100% coverage is not expected for non-global variables and function parameters, since value may not be ‘live’ through the entire scope. On the other end, 0% coverage for variables which are used in the code is indicative of compiler losing track of values. > The second column in Table 1 shows ‘locstats’ report for gdb-7.11 compiled for x86-64 with “-g –O2” by Clang 4.0. For example, there are 29476 parameters whose coverage is in 91..100% range. The third column shows locstats’ report with “-g –O2” and the parameter-entry-value feature. There are now 37671 parameters whose coverage is in 91..100% range – for a 28% improvement. > Because our implementation computes an additional location list entry to parameters whenever possible, and DW_OP_entry_value is valid through the entire scope of the parameter, the numbers at 91..100 row are relevant indication of improvement with parameter-entry-value. > > Coverage Parameters Parameters with emit-param-entry-values > (% range) (number/%)) (number/%) > 0..10 22682/30% 21342/28% > 11..20 3498/4% 2337/3% > 21..30 3083/4% 1986/2% > 31..40 3050/4% 1862/2% > 41..50 2534/3% 1574/2% > 51..60 2349/3% 1571/2% > 61..70 2184/2% 1649/2% > 71..80 2620/3% 2069/2% > 81..90 3432/4% 2847/3% > 91..100 29476/39% 37671/50% > Table 1 Location coverage statistics for function parasmeters > > Improved backtrace for optimized code in debugger > Figure 1 below shows improved backtrace for optimized code when compiled with parameter entry value tracking feature. Please note the new @entry values reported for parameters in backtrace. These parameters will otherwise be reported as <optimized-out>. > > gdb) bt > #0 get_next_move_from_list (list=list at entry=0x7fffffffbf88, > color=color at entry=1, moves=moves at entry=0x7fffffffbfb0, > cutoff=cutoff at entry=100) at engine/owl.c:3032 > #1 0x000000000042a957 in do_owl_attack (str=<optimized out>, > move=<optimized out>, move at entry=0x7fffffffc334, wormid=<optimized out>, wormid at entry=0x7fffffffc33c, owl=<optimized out>, owl at entry=0x0, > komaster=komaster at entry=0, kom_pos=kom_pos at entry=0,escape=<optimized out>) > at engine/owl.c:1306 > #2 0x000000000042a0d0 in owl_attack (target=target at entry=148, > attack_point=attack_point at entry=0x7fffffffc580, certain=<optimized out>, certain at entry=0xb63048 <dragon+11288>, kworm=kworm at entry=0x7fffffffc3c4) at engine/owl.c:1144 > #3 0x0000000000412c71 in make_dragons (color=<optimized out>, color at entry=1, stop_before_owl=<optimized out>, stop_before_owl at entry=0, save_verbose=<optimized out>, save_verbose at entry=0) at engine/dragon.c:346 > #4 0x0000000000417fdc in examine_position (color=color at entry=1, how_much=how_much at entry=99) at engine/genmove.c:152 > #5 0x00000000004183c6 in do_genmove (move=move at entry=0x7fffffffd344, color=1, color at entry=3, pure_threat_value=<optimized out>, > allowed_moves=<optimized out>, allowed_moves at entry=0x0) > at engine/genmove.c:334 > #6 0x000000000041926d in genmove_conservative (i=i at entry=0x7fffffffd36c, j=j at entry=0x7fffffffd368, color=3) at engine/genmove.c:255 > #7 0x00000000004618ae in gtp_gg_genmove (s=<optimized out>) at interface/play_gtp.c:2163 > #8 0x000000000045b0f8 in gtp_main_loop (commands=<optimized out>, gtp_input=0xb8b100) at interface/gtp.c:126 > Figure 1: Backtrace with @entry value parametersVery nice!> > Cost in disk image size increase and compile-time > The parameter-entry-value feature is enabled with -g compilation. Due to new DebugInfo metadata generation which adds entries to DWARF sections such as .debug_info and .debug_loc, there is expected size increase of disk image of the executable built with “-g –O”. For SPEC CPU 2006 benchmark, the average size increase is 15%. However, there is no change in sections loaded at runtime such as .text. .data, .bss. Hence, there is no runtime size increase. > Compile-time cost increase is 1-3% percent for SPEC CPU 2006.At this early stage I'm not yet worried by the size increase. We'll probably find some opportunities to fine-tune the heuristics that decide whether a call site parameter / entry value is profitable. We can also always provide a tuning option to turn the feature off.> Community up-streaming > Since we have implemented this for LLVM-4.0 we are currently in process of porting this implementation on LLVM trunk. We are planning to share this set of patches with LLVM community and seek feedback in improving certain parts of our implementation.Sounds great! thanks for sharing this, adrian> > > References > [1]Jakub Jelínek and Roland McGrath. DWARF DW_OP_entry_value extension proposal. http://dwarfstd.org/ShowIssue.php? issue=100909.1. > [2 Jakub Jelínek, Roland McGrath, Jan Kratochvíl, and Alexandre Oliva. DWARF DW_TAG_call_site extension proposal. http://dwarfstd.org/ ShowIssue.php?issue=100909.2 > [3] J. Jelinek “Improving debug info for optimized away parameters” https://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=view&target=jelinek.pdf > [4] FOSDEM talk http://bofh.nikhef.nl/events/FOSDEM/2019/K.4.201/llvm_debug.webm > [5] Elfutils https://sourceware.org/elfutils/ > > > > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Ananthakrishna Sowda (asowda) via llvm-dev
2019-Feb-08  02:31 UTC
[llvm-dev] RFC: [DebugInfo] Improving Debug Information in LLVM to Recover Optimized-out Function Parameters
Thank you for your interest and comments! Please see my responses inline.
On 2/7/19, 3:17 PM, "aprantl at apple.com on behalf of Adrian Prantl"
<aprantl at apple.com> wrote:
    
    
    > On Feb 7, 2019, at 1:49 PM, Ananthakrishna Sowda (asowda) via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
    > 
    > Hi, 
    > Following is a proposal to improve location coverage for Function
parameters in LLVM. The patches for review will be posted soon.
    > 
    > RFC: [DebugInfo] Improving Debug Information in LLVM to Recover
Optimized-out Function Parameters
    > 
    > Ananthakrishna Sowda(Cisco), asowda at cisco.com
    > Nikola Prica (RT-RK/Cisco), nprica at rtrk.com
    > Djordje Todorovic(RT-RK/Cisco), djtodorovic at rtrk.com
    > Ivan Baev (Cisco), ibaev at cisco.com
    > 
    > 
    > Overview of the problem 
    > Software release products are compiled with optimization level –O2 and
higher. Such products might produce a core-file in case of a failure. Support
engineers usually begin debug analysis by looking at the backtrace from a
core-file. Unfortunately, many parameters in backtraces are reported as
optimized out due to variety of reasons. This makes triaging the issue and
assigning ownership harder due to missing information. It is harder for the
product team to understand the cause of the failure. In summary, we are
describing a well-known serviceability problem for optimized production code.
    > 
    > Proposal for solution
    > Function parameters have a natural fall-back location which is parent
frame. Debuggers can easily go up a frame in call-chain and evaluate any
expression. Expert developers can find what values parameters had at function
entry point by examining disassembly of caller frame at that particular function
call. With additional call-site information produced by compiler, debugger can
fully automate this technique. DWARF 5 specification has new tags and attributes
to describe call-site parameter information [1][2]. it is already implemented in
GCC and GDB since 2011[3]. We propose implementing this feature in LLVM to
enhance the debugging of optimized code experience of LLVM users.
    > 
    > Prior mention
    > An initial version of our work was presented as a poster during LLVM
Developer Meeting, in San Jose, 2018. The feature is now fully implemented in
internal Clang/LLVM 4.0 version.
    > We presented a talk on our work at FOSDEM 2019[4].
    > 
    
    Thank you for posting this. This looks very interesting! Since your proposal
has a lot of different components, Sema support, DW_AT_call_site_parameter
support, DW_OP_entry_value support, it will probably be best to split them out
into separate reviews, but it's also good to discuss the proposal in its
entirety first. I have a bunch of questions to make sure I fully understand what
you are doing.
Sure, we will post several patches, each as a logical unit.
    
    > Implementation notes in Clang and LLVM
    > On the callee side the only information that we need is whether a
parameter is never modified in the function. If true then we can use parameter’s
entry value when we lose track of parameter’s location. As a natural way of
handling this problem we used Clang’s Sema and its constness check to embed this
information in variable’s declaration which is later used for DILocalVariable
construction.
    
    By looking at whether an argument is modified in the function, you can
identify variables that can be described with an entry value location and that
entry value would be valid throughout the function. Are you using this
information in the function body to identify whether to emit an entry value
location, or are you using this information at the call site to identify call
sites for which call site parameters would be beneficial (or both)?
    
    Is emitting an entry value location in the function body an either-or thing
or do you also emit plain old locations if you have them available in the
location list together with the entry values?
    
    In the function, I assume you don't know whether all call sites will
have call site parameters. How do you decide whether to emit entry value
locations?
Entry value location is added to same conventional location list. We emit them
when we see holes in the coverage, looking at the whole function scope. It is
used by the debugger when there is no conventional location for a program range.
It is reported by the debugger as   “<optimized-out>, @entry  = <value
at call site>”.
Unmodified argument is a sub-set for which debugger can report  @entry value is
same as actual value. So, it will not be reported as optimized-out.
We generate entry value location whenever  conventional does not cover one
hundred percent. We generate call-site information when the call-site parameter
values can be evaluated by unwinding to the parent frame. Only debugger can tell
if look-up of an entry value finds matching call site and call site parameter.
    > For call-site information, new DINode metadata DICallSite and
DICallSiteParam are defined and these are emitted by the Clang frontend. The
metadata is associated to the call or invoke IR instruction. Here is an example:
    > 
    > %call5 = call i32 @fed_em_strncmp(i8* %arraydecay, i8* %arraydecay1,
i64 5), !dbg !114, !call_site !101
    > …
    > !99 = !DICallSiteParam(argno: 1, variable: !91, expr: !DIExpression())
    > !100 = !DICallSiteParam(argno: 2, variable: !95, expr: !DIExpression())
    > !101 = !DICallSite(scope: !87, file: !3, parameters: !102, line: 40,
calledSubprogram: !13)
    > !102 = !{!99, !100, !103}
    > !103 = !DICallSiteParam(argno: 3, expr: !DIExpression(DW_OP_lit5)
    > For tracking call sites and call site parameters in backend two new
pseudo instructions, DBG_CALLSITE and DBG_CALLSITEPARAM, are introduced. See the
MIR code bellow:
    > 
    > DBG_CALLSITE 0, %noreg, <!19>; dbg:strncmp.c:40:47
    >          * DBG_CALLSITEPARAM %RDX,  <0x727fee0> =
!DIExpression(DW_OP_lit5), 5, %noreg ; dbg:strncmp.c:40:47
    >          * DBG_CALLSITEPARAM %RSI, "str2" <0x71a6dd0> =
!DIExpression(), %RBX, %noreg ; dbg:strncmp.c:40:47
    >          * DBG_CALLSITEPARAM %RDI, "str1" <0x71a6dd0> =
!DIExpression(), %, %RSP, 4 ; dbg:strncmp.c:40:47
    
    I'll refrain from bike-shedding the actual implementation; let's
save this for phabricator, but conceptually, this makes sense to me. If I
understand correctly, you are identifying at the call site parameters that are
in locations that can be restored by unwinding the function call, such as
constants, stack slots, and callee/r-saved registers.
    Can you explain why you need to identify them at the IR level? Could you do
it just in MIR, too, or is there some information missing in MIR?
    
    What happens to your DICallSiteParam when a function call gets inlined?
    
Your understanding about call site parameter is right!   
When a function call gets inlined,   call site information is eliminated.
We kept to the LLVM guidelines of introducing DI metadata in the front end and
carrying it through IR and MIR. I don’t think there is missing information in
MIR, though.
> 
    > There is a challenge in ISel phase to produce them. Algorithm that
collects information about call site parameters iterates over call sequence
chained nodes returned from target specific call lowering interface. Goal of the
algorithm is to recognize SDNodes that represent instructions which will load
function arguments into registers that transfer them into another function call
frame. There is a question whether this is effectively implemented as a general
matching algorithm or it should be lowered to target specific level.
DBG_CALLSITE pseudo instruction will need to be revisited since information
whether a call is tail call or not could be extracted differently but for sake
of simplicity we chose this.
    > Most of passes handle DBG_CALLSITE and DBG_CALLSITEPARAM through target
instruction info interface method isDebugInstr(). This method is used to skip
processing of pseudo debug instructions. Since these new pseudo debug
instructions relay on virtual registers and frame objects we need to follow up
their substitution through the compilation phases. There were several backend
passes that needed special attention: Register Coalesce, Inline Spiller,
Prologue Epilog Inserter, Split Kit and Virtual Register Rewriter. Virtual
Register Rewriter required implementation of target specific salvaging interface
for “call site parameter identities” – situation following identity copy
instructions that leads to overlapping of parameter transferring register
location and location that is loaded into that register.
    > The last challenge is to extend LiveDebugValues pass to generate
additional DBG_VALUE instructions with new kind of debug expression (with
‘DW_OP_entry_value’) for parameters that meet the requirements described in [1].
    > Finally, emitting call-site debug information can be controlled by
‘-debugger-tune’ and ‘-dwarf-version’ LLVM code-generation flags, since not all
debuggers used in the community consume this DWARF information.
    
    Have you considered to instead insert a very late MIR pass that does some
backwards analysis on the machine code to yield potential call site parameters
instead of threading it all the way through the compiler? If yes, why did you
choose this implementation?
If I have not convinced you on this, we are open for suggestion. I will ask my
colleagues on this project to respond too.
    
    > 
    > Location coverage improvement
    > The important criteria in debugging-optimized-code is whether the
compiler has location information for variables and parameters. We use
‘locstats’ utility from elfutils [5] package to guide us in improving overall
location coverage in final executable. For each non-artificial variable or
formal parameter - or Debugging Information Entry in DWARF - ‘locstats’ computes
what percentage from the code section bytes where the variable is in scope, the
variable has a non-empty location description. 100% coverage is not expected for
non-global variables and function parameters, since value may not be ‘live’
through the entire scope. On the other end, 0% coverage for variables which are
used in the code is indicative of compiler losing track of values.
    > The second column in Table 1 shows ‘locstats’ report for gdb-7.11
compiled for x86-64 with “-g –O2” by Clang 4.0. For example, there are 29476
parameters whose coverage is in 91..100% range. The third column shows locstats’
report with “-g –O2” and the parameter-entry-value feature. There are now 37671
parameters whose coverage is in 91..100% range – for a 28% improvement.
    > Because our implementation computes an additional location list entry
to parameters whenever possible, and DW_OP_entry_value is valid through the
entire scope of the parameter, the numbers at 91..100 row are relevant
indication of improvement with parameter-entry-value.
    > 
    > Coverage	Parameters	Parameters with emit-param-entry-values
    > (% range)	(number/%))	(number/%)
    > 0..10		22682/30%	21342/28%
    > 11..20		3498/4%	2337/3%
    > 21..30		3083/4%	1986/2%
    > 31..40		3050/4%	1862/2%
    > 41..50		2534/3%	1574/2%
    > 51..60		2349/3%	1571/2%
    > 61..70		2184/2%	1649/2%
    > 71..80		2620/3%	2069/2%
    > 81..90		3432/4%	2847/3%
    > 91..100		29476/39%	37671/50%
    > Table 1 Location coverage statistics for function parasmeters
    > 
    > Improved backtrace for optimized code in debugger
    > Figure 1 below shows improved backtrace for optimized code when
compiled with parameter entry value tracking feature. Please note the new @entry
values reported for parameters in backtrace. These parameters will otherwise be
reported as <optimized-out>.
    > 
    > gdb) bt
    > #0 get_next_move_from_list (list=list at entry=0x7fffffffbf88,
    > color=color at entry=1, moves=moves at entry=0x7fffffffbfb0,
    > cutoff=cutoff at entry=100) at engine/owl.c:3032
    > #1 0x000000000042a957 in do_owl_attack (str=<optimized out>,
    > move=<optimized out>, move at entry=0x7fffffffc334,
wormid=<optimized out>, wormid at entry=0x7fffffffc33c, owl=<optimized
out>, owl at entry=0x0,
    > komaster=komaster at entry=0, kom_pos=kom_pos at
entry=0,escape=<optimized out>)
    > at engine/owl.c:1306
    > #2 0x000000000042a0d0 in owl_attack (target=target at entry=148,
    > attack_point=attack_point at entry=0x7fffffffc580,
certain=<optimized out>, certain at entry=0xb63048 <dragon+11288>,
kworm=kworm at entry=0x7fffffffc3c4) at engine/owl.c:1144
    > #3 0x0000000000412c71 in make_dragons (color=<optimized out>,
color at entry=1, stop_before_owl=<optimized out>, stop_before_owl at
entry=0, save_verbose=<optimized out>, save_verbose at entry=0) at
engine/dragon.c:346
    > #4 0x0000000000417fdc in examine_position (color=color at entry=1,
how_much=how_much at entry=99) at engine/genmove.c:152
    > #5 0x00000000004183c6 in do_genmove (move=move at entry=0x7fffffffd344,
color=1, color at entry=3, pure_threat_value=<optimized out>,
    > allowed_moves=<optimized out>, allowed_moves at entry=0x0)
    > at engine/genmove.c:334
    > #6 0x000000000041926d in genmove_conservative (i=i at
entry=0x7fffffffd36c, j=j at entry=0x7fffffffd368, color=3) at
engine/genmove.c:255
    > #7 0x00000000004618ae in gtp_gg_genmove (s=<optimized out>) at
interface/play_gtp.c:2163
    > #8 0x000000000045b0f8 in gtp_main_loop (commands=<optimized out>,
gtp_input=0xb8b100) at interface/gtp.c:126
    > Figure 1: Backtrace with @entry value parameters
    
    Very nice!
    
    > 
    > Cost in disk image size increase and compile-time
    > The parameter-entry-value feature is enabled with -g compilation. Due
to new DebugInfo metadata generation which adds entries to DWARF sections such
as .debug_info and .debug_loc, there is expected size increase of disk image of
the executable built with “-g –O”. For SPEC CPU 2006 benchmark, the average size
increase is 15%. However, there is no change in sections loaded at runtime such
as .text. .data, .bss.  Hence, there is no runtime size increase.
    > Compile-time cost increase is 1-3% percent for SPEC CPU 2006.
    
    At this early stage I'm not yet worried by the size increase. We'll
probably find some opportunities to fine-tune the heuristics that decide whether
a call site parameter / entry value is profitable. We can also always provide a
tuning option to turn the feature off.
    
    
    > Community up-streaming
    > Since we have implemented this for LLVM-4.0 we are currently in process
of porting this implementation on LLVM trunk. We are planning to share this set
of patches with LLVM community and seek feedback in improving certain parts of
our implementation.
    
    Sounds great!
    thanks for sharing this,
    adrian
    
    
Thanks,
Ananth
    
    > 
    > 
    > References
    > [1]Jakub Jelínek and Roland McGrath. DWARF DW_OP_entry_value extension
proposal. http://dwarfstd.org/ShowIssue.php? issue=100909.1.
    > [2 Jakub Jelínek, Roland McGrath, Jan Kratochvíl, and Alexandre
Oliva. DWARF DW_TAG_call_site extension proposal. http://dwarfstd.org/
ShowIssue.php?issue=100909.2
    > [3] J. Jelinek “Improving debug info for optimized away parameters”
https://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=view&target=jelinek.pdf
    > [4] FOSDEM talk
http://bofh.nikhef.nl/events/FOSDEM/2019/K.4.201/llvm_debug.webm
    > [5] Elfutils https://sourceware.org/elfutils/
    > 
    > 
    > 
    > 
    > 
    > 
    > 
    > _______________________________________________
    > LLVM Developers mailing list
    > llvm-dev at lists.llvm.org
    > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev