Djordje via llvm-dev
2020-Jun-17 09:10 UTC
[llvm-dev] [DebugInfo] RFC: Introduce LLVM DI Checker utility
Hi, I am sharing the proposal [0] which gives a brief introduction for the implementation of the LLVM DI Checker utility. On a very high level, it is a pair of LLVM (IR) Passes that check the preservation of the original debug info in the optimizations. There are options controlling the passes, that could be invoked from ``clang`` as well as from ``opt`` level. By testing the utility on the GDB 7.11 project (using it as a testbed), it has found a certain number of potential issues regarding the DILocations (using it on LLVM project build itself, it has found one bug regarding DISubprogram metadata). Please take a look into the final report (on the GDB 7.11 testbed) generated from the script that collects the data at [1]. By looking at these data, it looks that the utility like this could be useful when trying to detect the real issues related to debug info production by the compiler. Any thoughts on this? Thanks in advance! [0] https://github.com/djolertrk/llvm-di-checker [1] https://djolertrk.github.io/di-checker-html-report-example/ Best regards, Djordje
Adrian Prantl via llvm-dev
2020-Jun-17 17:03 UTC
[llvm-dev] [DebugInfo] RFC: Introduce LLVM DI Checker utility
That's a neat idea! How would a tool like this distinguish between situations where debug locations are expected to be dropped or merged, such as the ones outlined in https://reviews.llvm.org/D81198? Is it generating false positives? You mention that "An alternative to this is the debugify utility, but the difference is that the LLVM DI Checker deals with real debug info, rather than with the synthetic ones". How is that an advantage? Are you seeing too many false positives with the debugify-generated debug locations? -- adrian
Vedant Kumar via llvm-dev
2020-Jun-17 19:14 UTC
[llvm-dev] [DebugInfo] RFC: Introduce LLVM DI Checker utility
Hey Djordje, It looks like a lot of the new infrastructure introduced here <https://github.com/djolertrk/llvm-di-checker/commit/9d26ac2557c584f6cf82ac5535fc47f8bd267a27> consists of logic copied from the debugify implementation. Why is introducing a new pair of passes better than extending the ones we have? The core infrastructure needed to track location loss for real (non-synthetic) source variables is is in place already. Stepping back a bit, I’m also surprised by the decision to move away from synthetic testing when there’s still so much low-hanging fruit to pick using that technique. The example from https://reviews.llvm.org/D81939 <https://reviews.llvm.org/D81939> illustrates this perfectly: in this case it’s not necessary to invent a new testing technique to uncover the bug, because simply running `./bin/llvm-lit -Dopt="opt -debugify-each" test/Transforms/DeadArgElim` finds the same issue. In D81939, you discuss finding the new tool useful when responding to bug reports about optimized-out variables or missing locations. We sorely do need something better than -opt-bisect-limit, but why not start with something simple? -check-debugify already knows how to report when & where a location is dropped, it would be simple to teach it to emit a report when a variable is fully optimized-out.> On Jun 17, 2020, at 2:10 AM, Djordje <djordje.todorovic at syrmia.com> wrote: > > I am sharing the proposal [0] which gives a brief introduction for the implementation of the LLVM DI Checker utility. On a very high level, it is a pair of LLVM (IR) Passes that check the preservation of the original debug info in the optimizations. There are options controlling the passes, that could be invoked from ``clang`` as well as from ``opt`` level. > > By testing the utility on the GDB 7.11 project (using it as a testbed), it has found a certain number of potential issues regarding the DILocations (using it on LLVM project build itself, it has found one bug regarding DISubprogram metadata). Please take a look into the final report (on the GDB 7.11 testbed) generated from the script that collects the data at [1]. By looking at these data, it looks that the utility like this could be useful when trying to detect the real issues related to debug info production by the compiler.Thanks for sharing these results. The data here is older (from the 2018 debug info BoF) and from a different project (sqlite3), but we saw some similar patterns: https://llvm.org/devmtg/2018-10/slides/Prantl-Kumar-debug-info-bof-2018.pdf <https://llvm.org/devmtg/2018-10/slides/Prantl-Kumar-debug-info-bof-2018.pdf> best vedant -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200617/e2e15fbf/attachment.html>
Djordje via llvm-dev
2020-Jun-18 08:15 UTC
[llvm-dev] [DebugInfo] RFC: Introduce LLVM DI Checker utility
Hi Adrian, Thanks for the comments! > How would a tool like this distinguish between situations where debug locations are expected to be dropped or merged, such as the ones outlined in https://reviews.llvm.org/D81198? Is it generating false positives? Since it is still a proposal, it does not cover these cases, but it shouldn't generate false positives in that case. My impression is that we can check if dropping/merging a location meets requirements outlined within D81198 (e.g. to check whether the instruction is in the same basic block when dropping occurs etc.) & mark it as a "known dropping". > You mention that "An alternative to this is the debugify utility, but the difference is that the LLVM DI Checker deals with real debug info, rather than with the synthetic ones". How is that an advantage? Are you seeing too many false positives with the debugify-generated debug locations? I was wrong when saying "alternative". These two are more likely to be used in the combination. There are no false positives from debugify report (at least I haven't seen it; the same core logic was used for di-checker), but I think that since debugify deals with synthetic debug info it is potentially limited to certain set of metadata kinds that could be generated synthetically (but I might have been mistaken about that) & it is part of Transformation lib, but the di-checker performs analysis only (I am not sure what is the overhead if we run debugify on a large project on every single CU; my impression was that this analysis was chipper) & the di-checker reports failures (instead of e.g for variables called "1", "2", etc.) for real entities such as "a", "b", etc. (and these are the entities being reported from users as "My var 'a' is optimized out..." or "I cannot attach breakpoint to function 'fn1()'"). I don't want to make a picture that we are choosing between these two, since I really think the debugify is great tool & these two can/should coexist. I use the di-checker to detect failures from clang's level & then I run debugify on the certain pass-test-directory. As I just mentioned, the di-checker option could be called from clang's level, since it has been linked into the IR library. In addition, the di-checker should be extended to support all kinds of debug info metadata, such as DILexicalScopes, DIGlobalVariables, dbg_labels, and so on. Best, Djordje On 17.6.20. 19:03, Adrian Prantl wrote:> That's a neat idea! > > How would a tool like this distinguish between situations where debug locations are expected to be dropped or merged, such as the ones outlined in https://reviews.llvm.org/D81198? Is it generating false positives? > > You mention that "An alternative to this is the debugify utility, but the difference is that the LLVM DI Checker deals with real debug info, rather than with the synthetic ones". How is that an advantage? Are you seeing too many false positives with the debugify-generated debug locations? > > -- adrian > >
Djordje via llvm-dev
2020-Jun-18 08:58 UTC
[llvm-dev] [DebugInfo] RFC: Introduce LLVM DI Checker utility
Hi Vedant, Thanks a lot for your comments! >It looks like a lot of the new infrastructure introduced here <https://github.com/djolertrk/llvm-di-checker/commit/9d26ac2557c584f6cf82ac5535fc47f8bd267a27> consists of logic copied from the debugify implementation. Why is introducing a new pair of passes better than extending the ones we have? The core infrastructure needed to track location loss for real (non-synthetic) source variables is is in place already. Since it is a proposal, I thought it'd easier to understand the idea if I duplicate things. Ideally, we can make an API that could be used from both tools. Initially, I made a few patches locally turning the real debug info into debugify ones, but I realized it breaks the original idea/design of the debugify & that is why I decided to make a separate pass(es). This cannot stay as is with the respect to the implementation, it should be either merged into debugify file(s) or refactored using the API mentioned above. Another reason for implementing it as a different pass was the fact the debugify is meant to be used from 'opt' level only, but if we want to invoke the option from front end level, we need to merge it into the IR library. >Stepping back a bit, I’m also surprised by the decision to move away from synthetic testing when there’s still so much low-hanging fruit to pick using that technique. The example from https://reviews.llvm.org/D81939 illustrates this perfectly: in this case it’s not necessary to invent a new testing technique to uncover the bug, because simply running `./bin/llvm-lit -Dopt="opt -debugify-each" test/Transforms/DeadArgElim` finds the same issue. As I mentioned in the previous mail, I do really think the debugify technique is great & I use it. But, in order to detect that variable "x" was optimized-out starting from pass Y, I only run the di-checker option (that performs analysis only) & find the variable in the final html report. I think that is very user friendly concept. At the end, when we detected what was the spot of loosing the location, we can run debugify on the pass-directory-tests (but there is a concern the tests does not cover all the possible cases; and the case found from the high level could be new to the pass). In addition, the di-checker detects issues for metadata other than locations (currently, the preservation map keeps the disubprograms only, but it should keep other kinds too). >In D81939, you discuss finding the new tool useful when responding to bug reports about optimized-out variables or missing locations. We sorely do need something better than -opt-bisect-limit, but why not start with something simple? -check-debugify already knows how to report when & where a location is dropped, it would be simple to teach it to emit a report when a variable is fully optimized-out. I agree. We can do that and that could be used from both utilities. Best regards, Djordje On 17.6.20. 21:14, Vedant Kumar wrote:> Hey Djordje, > > It looks like a lot of the new infrastructure introduced here > <https://github.com/djolertrk/llvm-di-checker/commit/9d26ac2557c584f6cf82ac5535fc47f8bd267a27> consists > of logic copied from the debugify implementation. Why is introducing a > new pair of passes better than extending the ones we have? The core > infrastructure needed to track location loss for real (non-synthetic) > source variables is is in place already. > > Stepping back a bit, I’m also surprised by the decision to move away > from synthetic testing when there’s still so much low-hanging fruit to > pick using that technique. The example from > https://reviews.llvm.org/D81939 illustrates this perfectly: in this > case it’s not necessary to invent a new testing technique to uncover > the bug, because simply running `./bin/llvm-lit -Dopt="opt > -debugify-each" test/Transforms/DeadArgElim` finds the same issue. > > In D81939, you discuss finding the new tool useful when responding to > bug reports about optimized-out variables or missing locations. We > sorely do need something better than -opt-bisect-limit, but why not > start with something simple? -check-debugify already knows how to > report when & where a location is dropped, it would be simple to teach > it to emit a report when a variable is fully optimized-out. > > >> On Jun 17, 2020, at 2:10 AM, Djordje <djordje.todorovic at syrmia.com >> <mailto:djordje.todorovic at syrmia.com>> wrote: >> >> I am sharing the proposal [0] which gives a brief introduction for >> the implementation of the LLVM DI Checker utility. On a very high >> level, it is a pair of LLVM (IR) Passes that check the preservation >> of the original debug info in the optimizations. There are options >> controlling the passes, that could be invoked from ``clang`` as well >> as from ``opt`` level. >> >> By testing the utility on the GDB 7.11 project (using it as a >> testbed), it has found a certain number of potential issues regarding >> the DILocations (using it on LLVM project build itself, it has found >> one bug regarding DISubprogram metadata). Please take a look into the >> final report (on the GDB 7.11 testbed) generated from the script that >> collects the data at [1]. By looking at these data, it looks that the >> utility like this could be useful when trying to detect the real >> issues related to debug info production by the compiler. > > Thanks for sharing these results. The data here is older (from the > 2018 debug info BoF) and from a different project (sqlite3), but we > saw some similar patterns: > https://llvm.org/devmtg/2018-10/slides/Prantl-Kumar-debug-info-bof-2018.pdf > > best > vedant-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200618/2d5beac6/attachment.html>