Jeremy Morse via llvm-dev
2021-Jan-27 15:26 UTC
[llvm-dev] [DebugInfo] Different representations of optimised-out variables in DWARF
Hi, This was "[llvm-dev] [DebugInfo] The current status of debug values using multiple machine locations" but I don't want to de-rail Stephens thread, Paul wrote:> I'm not actually sure what causes variables to be dropped from the DWARF > entirely, as opposed to them existing but having an unknown location for > their entire scope; however, outside of our desire to use dwarfdump to > analyze our debug info it's simply more efficient to omit variables with no > location, since they inflate the debug info size and I don't believe > there's any practical value in having them.David wrote:> When does this ^ happen? In optimized builds we include all local variables > in a "variables" attachment to the DISubprogram, so we shouldn't be losing > variables entirely. > [...] > I think it's pretty important that we keep them. It helps a user understand > that they've not mistyped the name of a variable, etc [...]This is something that's bothered me for a while, as it messes with our statistics when changing how variable locations are tracked. Take this completely contrived C file: int foo(int bar) { int baz = 12 + bar; return baz; } int qux(int quux) { int xyzzy = foo(quux); return xyzzy; } Using clang ef0dcb50630 and options "-O3 -g -c", llvm-locstats reports the object file has five variables in it. If you emit LLVM-IR, and replace the first operand of all "llvm.dbg.value" intrinsic invocations with "undef" and compile the IR with llc, then llvm-locstats still reports five variables. However: if you instead /delete/ all the invocations of "llvm.dbg.value", four variables are reported by llvm-locstats. This indicates there's an observable difference in the way we represent optimised-out variables in DWARF. The difference between the object files is the way they represent the inlined copy of "foo", here's the output with undef dbg.values, followed by the output when I delete all the dbg.value intrinsics: DW_TAG_inlined_subroutine DW_AT_abstract_origin (0x0000004e "foo") DW_AT_low_pc (0x0000000000000010) DW_AT_high_pc (0x0000000000000013) DW_AT_call_file ("/tmp/test.c") DW_AT_call_line (7) DW_AT_call_column (0x0f) DW_TAG_formal_parameter DW_AT_abstract_origin (0x0000005a "bar") NULL and: DW_TAG_inlined_subroutine DW_AT_abstract_origin (0x00000048 "foo") DW_AT_low_pc (0x0000000000000010) DW_AT_high_pc (0x0000000000000013) DW_AT_call_file ("/tmp/test.c") DW_AT_call_line (7) DW_AT_call_column (0x0f) NULL When there are dbg.value intrinsics present, then the inlined subroutine gets an empty DW_TAG_format_parameter that links back to the abstract origin. If there are no dbg.value intrinsics present, it does not. As far as I understand it, consumers can still determine that "bar" exists in the inlined subroutine by looking at the inlined subroutines abstract origin. This is what the "retained nodes" collection preserves. llvm-locstats / llvm-dwarfdump --statistics should probably be taught to look at the inlined subroutines abstract origin to find all variables, however, it seems unwise to have internal compiler state reflected in the output file in the way it is above. The cause of the empty DW_TAG_formal_parameter being created in DwarfDebug::collectEntityInfo [0] -- it distinguishes between a variable that has no location intrinsics, and a variable that has only empty location intrinsics. Putting a filter in to skip variables with only empty locations avoids the difference in output, and reduce the size of .debug_info on a stage2reldeb clang build by about 20Mb, or ~1%. To ensure this email contains a question: would there be any objections to adding that filter, and obliging consumers to look in the inlined subroutines abstract origin to find optimised-out variables, instead of giving them a list per-inlined-instance? [0] https://github.com/llvm/llvm-project/blob/70e251497c4e26f8cfd85e745459afff97c909ce/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1779 -- Thanks, Jeremy
via llvm-dev
2021-Jan-27 15:57 UTC
[llvm-dev] [DebugInfo] Different representations of optimised-out variables in DWARF
> -----Original Message----- > From: Jeremy Morse <jeremy.morse.llvm at gmail.com> > Sent: Wednesday, January 27, 2021 10:26 AM > To: David Blaikie <dblaikie at gmail.com>; Robinson, Paul > <paul.robinson at sony.com>; Tozer, Stephen <stephen.tozer at sony.com>; llvm- > dev <llvm-dev at lists.llvm.org> > Subject: [DebugInfo] Different representations of optimised-out variables > in DWARF > > Hi, > > This was "[llvm-dev] [DebugInfo] The current status of debug values > using multiple machine locations" but I don't want to de-rail Stephens > thread, > > Paul wrote:Actually that was Stephen. I don't know why Sony insists on dropping the personal names from outgoing email (or maybe it's just Outlook365's fault).> > I'm not actually sure what causes variables to be dropped from the DWARF > > entirely, as opposed to them existing but having an unknown location for > > their entire scope; however, outside of our desire to use dwarfdump to > > analyze our debug info it's simply more efficient to omit variables with > no > > location, since they inflate the debug info size and I don't believe > > there's any practical value in having them. > > David wrote: > > When does this ^ happen? In optimized builds we include all local > variables > > in a "variables" attachment to the DISubprogram, so we shouldn't be > losing > > variables entirely. > > [...] > > I think it's pretty important that we keep them. It helps a user > understand > > that they've not mistyped the name of a variable, etc [...]I'm with David here; we shouldn't be dropping declared variables from a scope just because they get optimized away.> This is something that's bothered me for a while, as it messes with > our statistics when changing how variable locations are tracked. Take > this completely contrived C file: > > int foo(int bar) { > int baz = 12 + bar; > return baz; > } > > int qux(int quux) { > int xyzzy = foo(quux); > return xyzzy; > } > > Using clang ef0dcb50630 and options "-O3 -g -c", llvm-locstats reports > the object file has five variables in it. If you emit LLVM-IR, and > replace the first operand of all "llvm.dbg.value" intrinsic > invocations with "undef" and compile the IR with llc, then > llvm-locstats still reports five variables. However: if you instead > /delete/ all the invocations of "llvm.dbg.value", four variables are > reported by llvm-locstats. This indicates there's an observable > difference in the way we represent optimised-out variables in DWARF. > > The difference between the object files is the way they represent the > inlined copy of "foo", here's the output with undef dbg.values, > followed by the output when I delete all the dbg.value intrinsics: > > DW_TAG_inlined_subroutine > DW_AT_abstract_origin (0x0000004e "foo") > DW_AT_low_pc (0x0000000000000010) > DW_AT_high_pc (0x0000000000000013) > DW_AT_call_file ("/tmp/test.c") > DW_AT_call_line (7) > DW_AT_call_column (0x0f) > > DW_TAG_formal_parameter > DW_AT_abstract_origin (0x0000005a "bar") > > NULL > > and: > > DW_TAG_inlined_subroutine > DW_AT_abstract_origin (0x00000048 "foo") > DW_AT_low_pc (0x0000000000000010) > DW_AT_high_pc (0x0000000000000013) > DW_AT_call_file ("/tmp/test.c") > DW_AT_call_line (7) > DW_AT_call_column (0x0f) > > NULL > > When there are dbg.value intrinsics present, then the inlined > subroutine gets an empty DW_TAG_format_parameter that links back to > the abstract origin. If there are no dbg.value intrinsics present, it > does not. As far as I understand it, consumers can still determine > that "bar" exists in the inlined subroutine by looking at the inlined > subroutines abstract origin. This is what the "retained nodes" > collection preserves. > > llvm-locstats / llvm-dwarfdump --statistics should probably be taught > to look at the inlined subroutines abstract origin to find all > variables, however, it seems unwise to have internal compiler state > reflected in the output file in the way it is above. The cause of the > empty DW_TAG_formal_parameter being created in > DwarfDebug::collectEntityInfo [0] -- it distinguishes between a > variable that has no location intrinsics, and a variable that has only > empty location intrinsics. Putting a filter in to skip variables with > only empty locations avoids the difference in output, and reduce the > size of .debug_info on a stage2reldeb clang build by about 20Mb, or > ~1%. > > To ensure this email contains a question: would there be any > objections to adding that filter, and obliging consumers to look in > the inlined subroutines abstract origin to find optimised-out > variables, instead of giving them a list per-inlined-instance?This is specifically about concrete (inlined) instances, it seems. Rereading the description of concrete instance trees (DWARF 5, section 3.3.8.2) it explicitly permits omitting a useless entry (has only abstract_origin as an attribute, and no children) (p.85 item 1). I think it is legal to omit these, and I would go so far as to say it's specifically legal to omit formal_parameter DIEs with no attributes (other than abstract_origin). --paulr> > [0] https://urldefense.com/v3/__https://github.com/llvm/llvm- > project/blob/70e251497c4e26f8cfd85e745459afff97c909ce/llvm/lib/CodeGen/Asm > Printer/DwarfDebug.cpp*L1779__;Iw!!JmoZiZGBv3RvKRSx!tY6GfPI- > g0N1h42dFceJMSVgUrJkaLldqFUlVzWL2QrrVEbrcltoibOrpKdWjMiz8g$ > > -- > Thanks, > Jeremy
Djordje Todorovic via llvm-dev
2021-Jan-27 16:22 UTC
[llvm-dev] [DebugInfo] Different representations of optimised-out variables in DWARF
Jeremy wrote:>llvm-locstats / llvm-dwarfdump --statistics should probably be taught >to look at the inlined subroutines abstract origin to find all >variables, however, it seems unwise to have internal compiler state >reflected in the output file in the way it is above. The cause of the >empty DW_TAG_formal_parameter being created in >DwarfDebug::collectEntityInfo [0] -- it distinguishes between a >variable that has no location intrinsics, and a variable that has only >empty location intrinsics. Putting a filter in to skip variables with >only empty locations avoids the difference in output, and reduce the >size of .debug_info on a stage2reldeb clang build by about 20Mb, or >~1%. > >To ensure this email contains a question: would there be any >objections to adding that filter, and obliging consumers to look in >the inlined subroutines abstract origin to find optimised-out >variables, instead of giving them a list per-inlined-instance?+1 for resolving this at compiler level. I think llvm-locstats / llvm-dwarfdump --statistics displays good output for this case. Thanks, Djordje ________________________________ From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Jeremy Morse via llvm-dev <llvm-dev at lists.llvm.org> Sent: Wednesday, January 27, 2021 4:26 PM To: David Blaikie <dblaikie at gmail.com>; Paul Robinson <paul.robinson at sony.com>; Tozer, Stephen <Stephen.Tozer at sony.com>; llvm-dev <llvm-dev at lists.llvm.org> Subject: [llvm-dev] [DebugInfo] Different representations of optimised-out variables in DWARF Hi, This was "[llvm-dev] [DebugInfo] The current status of debug values using multiple machine locations" but I don't want to de-rail Stephens thread, Paul wrote:> I'm not actually sure what causes variables to be dropped from the DWARF > entirely, as opposed to them existing but having an unknown location for > their entire scope; however, outside of our desire to use dwarfdump to > analyze our debug info it's simply more efficient to omit variables with no > location, since they inflate the debug info size and I don't believe > there's any practical value in having them.David wrote:> When does this ^ happen? In optimized builds we include all local variables > in a "variables" attachment to the DISubprogram, so we shouldn't be losing > variables entirely. > [...] > I think it's pretty important that we keep them. It helps a user understand > that they've not mistyped the name of a variable, etc [...]This is something that's bothered me for a while, as it messes with our statistics when changing how variable locations are tracked. Take this completely contrived C file: int foo(int bar) { int baz = 12 + bar; return baz; } int qux(int quux) { int xyzzy = foo(quux); return xyzzy; } Using clang ef0dcb50630 and options "-O3 -g -c", llvm-locstats reports the object file has five variables in it. If you emit LLVM-IR, and replace the first operand of all "llvm.dbg.value" intrinsic invocations with "undef" and compile the IR with llc, then llvm-locstats still reports five variables. However: if you instead /delete/ all the invocations of "llvm.dbg.value", four variables are reported by llvm-locstats. This indicates there's an observable difference in the way we represent optimised-out variables in DWARF. The difference between the object files is the way they represent the inlined copy of "foo", here's the output with undef dbg.values, followed by the output when I delete all the dbg.value intrinsics: DW_TAG_inlined_subroutine DW_AT_abstract_origin (0x0000004e "foo") DW_AT_low_pc (0x0000000000000010) DW_AT_high_pc (0x0000000000000013) DW_AT_call_file ("/tmp/test.c") DW_AT_call_line (7) DW_AT_call_column (0x0f) DW_TAG_formal_parameter DW_AT_abstract_origin (0x0000005a "bar") NULL and: DW_TAG_inlined_subroutine DW_AT_abstract_origin (0x00000048 "foo") DW_AT_low_pc (0x0000000000000010) DW_AT_high_pc (0x0000000000000013) DW_AT_call_file ("/tmp/test.c") DW_AT_call_line (7) DW_AT_call_column (0x0f) NULL When there are dbg.value intrinsics present, then the inlined subroutine gets an empty DW_TAG_format_parameter that links back to the abstract origin. If there are no dbg.value intrinsics present, it does not. As far as I understand it, consumers can still determine that "bar" exists in the inlined subroutine by looking at the inlined subroutines abstract origin. This is what the "retained nodes" collection preserves. llvm-locstats / llvm-dwarfdump --statistics should probably be taught to look at the inlined subroutines abstract origin to find all variables, however, it seems unwise to have internal compiler state reflected in the output file in the way it is above. The cause of the empty DW_TAG_formal_parameter being created in DwarfDebug::collectEntityInfo [0] -- it distinguishes between a variable that has no location intrinsics, and a variable that has only empty location intrinsics. Putting a filter in to skip variables with only empty locations avoids the difference in output, and reduce the size of .debug_info on a stage2reldeb clang build by about 20Mb, or ~1%. To ensure this email contains a question: would there be any objections to adding that filter, and obliging consumers to look in the inlined subroutines abstract origin to find optimised-out variables, instead of giving them a list per-inlined-instance? [0] https://github.com/llvm/llvm-project/blob/70e251497c4e26f8cfd85e745459afff97c909ce/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1779 -- Thanks, Jeremy _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210127/8cf0b59b/attachment-0001.html>
David Blaikie via llvm-dev
2021-Jan-27 20:40 UTC
[llvm-dev] [DebugInfo] Different representations of optimised-out variables in DWARF
I'm a bit confused by some of the stuff in this thread, but rather than trying to puzzle all of that out, it might be simpler/more agreeable for me to say this: We should never produce DWARF like this: DW_TAG_formal_parameter DW_AT_abstract_origin (0x0000005a "bar") (as Paul quoted from the DWARF spec - this should not be necessary/seems just like wasted bytes) And locstats/dwarfdump statistics should not produce different results if it reads DWARF like that compared to DWARF missing this inlined instance entirely. On Wed, Jan 27, 2021 at 7:26 AM Jeremy Morse <jeremy.morse.llvm at gmail.com> wrote:> Hi, > > This was "[llvm-dev] [DebugInfo] The current status of debug values > using multiple machine locations" but I don't want to de-rail Stephens > thread, > > Paul wrote: > > I'm not actually sure what causes variables to be dropped from the DWARF > > entirely, as opposed to them existing but having an unknown location for > > their entire scope; however, outside of our desire to use dwarfdump to > > analyze our debug info it's simply more efficient to omit variables with > no > > location, since they inflate the debug info size and I don't believe > > there's any practical value in having them. > > David wrote: > > When does this ^ happen? In optimized builds we include all local > variables > > in a "variables" attachment to the DISubprogram, so we shouldn't be > losing > > variables entirely. > > [...] > > I think it's pretty important that we keep them. It helps a user > understand > > that they've not mistyped the name of a variable, etc [...] > > This is something that's bothered me for a while, as it messes with > our statistics when changing how variable locations are tracked. Take > this completely contrived C file: > > int foo(int bar) { > int baz = 12 + bar; > return baz; > } > > int qux(int quux) { > int xyzzy = foo(quux); > return xyzzy; > } > > Using clang ef0dcb50630 and options "-O3 -g -c", llvm-locstats reports > the object file has five variables in it. If you emit LLVM-IR, and > replace the first operand of all "llvm.dbg.value" intrinsic > invocations with "undef" and compile the IR with llc, then > llvm-locstats still reports five variables. However: if you instead > /delete/ all the invocations of "llvm.dbg.value", four variables are > reported by llvm-locstats. This indicates there's an observable > difference in the way we represent optimised-out variables in DWARF. > > The difference between the object files is the way they represent the > inlined copy of "foo", here's the output with undef dbg.values, > followed by the output when I delete all the dbg.value intrinsics: > > DW_TAG_inlined_subroutine > DW_AT_abstract_origin (0x0000004e "foo") > DW_AT_low_pc (0x0000000000000010) > DW_AT_high_pc (0x0000000000000013) > DW_AT_call_file ("/tmp/test.c") > DW_AT_call_line (7) > DW_AT_call_column (0x0f) > > DW_TAG_formal_parameter > DW_AT_abstract_origin (0x0000005a "bar") > > NULL > > and: > > DW_TAG_inlined_subroutine > DW_AT_abstract_origin (0x00000048 "foo") > DW_AT_low_pc (0x0000000000000010) > DW_AT_high_pc (0x0000000000000013) > DW_AT_call_file ("/tmp/test.c") > DW_AT_call_line (7) > DW_AT_call_column (0x0f) > > NULL > > When there are dbg.value intrinsics present, then the inlined > subroutine gets an empty DW_TAG_format_parameter that links back to > the abstract origin. If there are no dbg.value intrinsics present, it > does not. As far as I understand it, consumers can still determine > that "bar" exists in the inlined subroutine by looking at the inlined > subroutines abstract origin. This is what the "retained nodes" > collection preserves. > > llvm-locstats / llvm-dwarfdump --statistics should probably be taught > to look at the inlined subroutines abstract origin to find all > variables, however, it seems unwise to have internal compiler state > reflected in the output file in the way it is above. The cause of the > empty DW_TAG_formal_parameter being created in > DwarfDebug::collectEntityInfo [0] -- it distinguishes between a > variable that has no location intrinsics, and a variable that has only > empty location intrinsics. Putting a filter in to skip variables with > only empty locations avoids the difference in output, and reduce the > size of .debug_info on a stage2reldeb clang build by about 20Mb, or > ~1%. > > To ensure this email contains a question: would there be any > objections to adding that filter, and obliging consumers to look in > the inlined subroutines abstract origin to find optimised-out > variables, instead of giving them a list per-inlined-instance? > > [0] > https://github.com/llvm/llvm-project/blob/70e251497c4e26f8cfd85e745459afff97c909ce/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1779 > > -- > Thanks, > Jeremy >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210127/5bcb0572/attachment-0001.html>