Phipps, Alan via llvm-dev
2020-Jan-24 18:56 UTC
[llvm-dev] Adding support for LLVM Branch Condition Coverage
+ Vedant Hi Hal, thanks. I apologize if my answers aren't as thorough as you would like; what I'm proposing is simply an extension to the existing infrastructure, so it would be enabled automatically as part of code coverage. Mapping of branch regions would be done in CoverageMappingGen and instrumented using the same profiling instrumentation mechanism under CodeGenPGO::mapRegionCounters() and around CodeGenFunction::EmitBranchOnBoolExpr() . In fact, as I mention below, we'd largely be reusing the same profiling counters (except in at least one exception case that I described in my email). The existing functionality of coverage and profiling would still work exactly as it has. Further, I can add a switch to llvm-cov to enable/disable branch coverage visualization and whether it's included in the coverage report. With respect to fuzzing, to be sure I don't misunderstand you, are you referring to testing the branch coverage capability itself using fuzzing, or are you referring to the leveraging of coverage by a fuzzer itself (i.e. coverage-guided fuzzing)? For the latter, I could look into libFuzzer and see how this might impact it. For the former, I haven't thought much about using fuzzing to test coverage although I am certainly open to suggestions. -Alan From: Finkel, Hal J. [mailto:hfinkel at anl.gov] Sent: Friday, January 24, 2020 11:02 AM To: Phipps, Alan; llvm-dev at lists.llvm.org Subject: [EXTERNAL] Re: [llvm-dev] Adding support for LLVM Branch Condition Coverage Thanks, Alan. This certainly seems useful. Can you please provide a quick overview on how this relates to our other infrastructure for coverage, for profiling, and what's used for fuzz testing? -Hal On 1/23/20 6:09 PM, Phipps, Alan via llvm-dev wrote: Vedant Kumar asked me to post my design thoughts concerning branch coverage at llvm-dev since there is general interest. My team at Texas Instruments is developing an embedded ARM C/C++ compiler with LLVM. I would like to enhance LLVM's code coverage capability with branch condition coverage (for C/C++), similar to GCC/GCOV support for branch coverage. This is useful for TI, and I think this will be a useful feature enhancement to LLVM that I can upstream. In a nutshell, the functionality boils down to tracking how many times a generated "branch" instruction (based on a source code condition) is taken or not taken (i.e. evaluated into "True" and "False"). This applies to decision points in control flow (if, for, while, ...) as well as individual conditions on logical operators ("&&", "||") in Boolean expressions. In sketching out a design, there are three primary areas in the design that I am proposing: 1.) Add a new CounterMappingRegion kind for branch conditions a. This new region kind would track two counters, one for the "True" branch taken count of a branch condition, and one for the "False" branch taken count. i. Alternatively, I could use two separate CounterMappingRegions to track individual counters since this is how the class was originally written to be used. However, using a single region kind to represent a single branch condition that ties all of the pertinent counter information together seems like a cleaner design. ii. Just as for all counters, the two branch condition counters can represent a reference to an instrumentation counter or to a counter expression. The two counters are encoded along with the MappingRegions and distinguished based on the region kind. iii. All other CounterMappingRegion kinds simply ignore the second counter; nothing changes in how they're encoded, which preserves format backward compatibility. b. I think this change also requires an adjustment to the class SourceMappingRegion to support branch conditions that can be generated into CounterMappingRegion instances. 2.) Counter Instrumentation a. We can reuse most of the existing profile instrumentation counters that are emitted as part of profiling/coverage to calculate branch condition counts (True/False). i. This assumption leverages the fact that logical operators in C are "short-circuit" operators. For example, the "False-taken" count for the left-hand-side condition in a logical-or expression (e.g. condition "C1" in "C1 || C2") can be derived from the execution count we already track for the right-hand-side (condition "C2" in "C1 || C2"). b. There does exist a case when evaluating the right-hand-side condition of a logical operator that isn't part of a control-flow statement (e.g. condition "C2" in "x = C1 || C2;") that will require instrumenting a new counter in order to properly derive that condition's "true" count and "false" count. c. I'll avoid going too deep into detail here, but my goal is to ensure we reuse existing profile counters as much as possible. 3.) Visualization using llvm-cov a. The notion of CoverageSegment needs to be extended to comprehend the branch condition data represented by a CounterMappingRegion above. But then llvm-cov can treat the segment distinctly when displaying True/False counts for each branch condition as well as tracking total missed branches. b. We can also add a BranchCoverageInfo class to track branch coverage data, similar to LineCoverageInfo and RegionCoverageInfo. c. The text output could look something like GCOV but with more detail that we know (I prototyped this using logical-or): 9| |int main(int argc, char *argv[]) 10| 3|{ 11| 3| if (argc == 1) Branch (11:9): [True: 1, False: 2] 12| 1| { 13| 1| return 0; 14| 1| } . . . 23| 2| if (a == 0 || b == 2 || b == 34 || a == b) Branch (23:9): [True: 1, False: 1] Branch (23:19): [True: 1, False: 0] Branch (23:29): [True: 0, False: 0] Branch (23:40): [True: 0, False: 0] . . . 31| 2| b = a || c; Branch (31:9): [True: 1, False: 1] Branch (31:14): [True: 1, False: 0] d. I thought about extending the "region-count" carat markers in the text display, but it could get messy. For the HTML output, we can get a bit more fancy. e. Branch miss percentages/totals will be added to the coverage report. Additional Notes - I'm aware that constant condition folding in CodeGenFunction::EmitBranchOnBoolExpr() needs to be taken into account. Is there anything else related to branch optimization that I ought to be aware of? Please let me know if these design thoughts look reasonable and if this would be useful. The goal is to start full implementation soon and upstream in a few months. Thanks! Alan Phipps Texas Instruments, Inc. _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200124/f52fb70e/attachment-0001.html>
via llvm-dev
2020-Jan-24 19:05 UTC
[llvm-dev] Adding support for LLVM Branch Condition Coverage
+ Kostya> On Jan 24, 2020, at 10:56 AM, Phipps, Alan <a-phipps at ti.com> wrote: > > + Vedant > > Hi Hal, thanks. > > I apologize if my answers aren’t as thorough as you would like; what I’m proposing is simply an extension to the existing infrastructure, so it would be enabled automatically as part of code coverage. Mapping of branch regions would be done in CoverageMappingGen and instrumented using the same profiling instrumentation mechanism under CodeGenPGO::mapRegionCounters() and around CodeGenFunction::EmitBranchOnBoolExpr() . In fact, as I mention below, we’d largely be reusing the same profiling counters (except in at least one exception case that I described in my email). The existing functionality of coverage and profiling would still work exactly as it has. Further, I can add a switch to llvm-cov to enable/disable branch coverage visualization and whether it’s included in the coverage report. > > With respect to fuzzing, to be sure I don’t misunderstand you, are you referring to testing the branch coverage capability itself using fuzzing, or are you referring to the leveraging of coverage by a fuzzer itself (i.e. coverage-guided fuzzing)? For the latter, I could look into libFuzzer and see how this might impact it. For the former, I haven’t thought much about using fuzzing to test coverage although I am certainly open to suggestions.To add a bit to this, llvm ships sancov (https://clang.llvm.org/docs/SanitizerCoverage.html <https://clang.llvm.org/docs/SanitizerCoverage.html>) to support function/block/edge-level coverage guided fuzzing. Alan's proposal targets the 'source-based' coverage feature (https://clang.llvm.org/docs/SourceBasedCodeCoverage.html <https://clang.llvm.org/docs/SourceBasedCodeCoverage.html>). The goal of this feature is to provide precise, human-readable reports (e.g. without artifacts related to optimized debug info). As for profiling, llvm+clang support both frontend-level instrumentation (profiles collected this way are meant to degrade slowly over time), and IR-level instrumentation (profiles are tied to the specific revision of the instrumented binary). The source-based coverage implementation is built on top of frontend-level PGO instrumentation. vedant> > -Alan > <> > From: Finkel, Hal J. [mailto:hfinkel at anl.gov <mailto:hfinkel at anl.gov>] > Sent: Friday, January 24, 2020 11:02 AM > To: Phipps, Alan; llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > Subject: [EXTERNAL] Re: [llvm-dev] Adding support for LLVM Branch Condition Coverage > > Thanks, Alan. This certainly seems useful. Can you please provide a quick overview on how this relates to our other infrastructure for coverage, for profiling, and what's used for fuzz testing? > > -Hal > > On 1/23/20 6:09 PM, Phipps, Alan via llvm-dev wrote: > Vedant Kumar asked me to post my design thoughts concerning branch coverage at llvm-dev since there is general interest. > > My team at Texas Instruments is developing an embedded ARM C/C++ compiler with LLVM. I would like to enhance LLVM’s code coverage capability with branch condition coverage (for C/C++), similar to GCC/GCOV support for branch coverage. This is useful for TI, and I think this will be a useful feature enhancement to LLVM that I can upstream. > > In a nutshell, the functionality boils down to tracking how many times a generated “branch” instruction (based on a source code condition) is taken or not taken (i.e. evaluated into “True” and “False”). This applies to decision points in control flow (if, for, while, …) as well as individual conditions on logical operators (“&&”, “||”) in Boolean expressions. > > In sketching out a design, there are three primary areas in the design that I am proposing: > > 1.) Add a new CounterMappingRegion kind for branch conditions > a. This new region kind would track two counters, one for the “True” branch taken count of a branch condition, and one for the “False” branch taken count. > i. Alternatively, I could use two separate CounterMappingRegions to track individual counters since this is how the class was originally written to be used. However, using a single region kind to represent a single branch condition that ties all of the pertinent counter information together seems like a cleaner design. > ii. Just as for all counters, the two branch condition counters can represent a reference to an instrumentation counter or to a counter expression. The two counters are encoded along with the MappingRegions and distinguished based on the region kind. > iii. All other CounterMappingRegion kinds simply ignore the second counter; nothing changes in how they’re encoded, which preserves format backward compatibility. > b. I think this change also requires an adjustment to the class SourceMappingRegion to support branch conditions that can be generated into CounterMappingRegion instances. > > 2.) Counter Instrumentation > a. We can reuse most of the existing profile instrumentation counters that are emitted as part of profiling/coverage to calculate branch condition counts (True/False). > i. This assumption leverages the fact that logical operators in C are “short-circuit” operators. For example, the “False-taken” count for the left-hand-side condition in a logical-or expression (e.g. condition “C1” in “C1 || C2”) can be derived from the execution count we already track for the right-hand-side (condition “C2” in “C1 || C2”). > b. There does exist a case when evaluating the right-hand-side condition of a logical operator that isn’t part of a control-flow statement (e.g. condition “C2” in “x = C1 || C2;”) that will require instrumenting a new counter in order to properly derive that condition’s “true” count and “false” count. > c. I’ll avoid going too deep into detail here, but my goal is to ensure we reuse existing profile counters as much as possible. > > 3.) Visualization using llvm-cov > a. The notion of CoverageSegment needs to be extended to comprehend the branch condition data represented by a CounterMappingRegion above. But then llvm-cov can treat the segment distinctly when displaying True/False counts for each branch condition as well as tracking total missed branches. > b. We can also add a BranchCoverageInfo class to track branch coverage data, similar to LineCoverageInfo and RegionCoverageInfo. > c. The text output could look something like GCOV but with more detail that we know (I prototyped this using logical-or): > > 9| |int main(int argc, char *argv[]) > 10| 3|{ > 11| 3| if (argc == 1) > Branch (11:9): [True: 1, False: 2] > 12| 1| { > 13| 1| return 0; > 14| 1| } > . . . > > 23| 2| if (a == 0 || b == 2 || b == 34 || a == b) > Branch (23:9): [True: 1, False: 1] > Branch (23:19): [True: 1, False: 0] > Branch (23:29): [True: 0, False: 0] > Branch (23:40): [True: 0, False: 0] > . . . > > 31| 2| b = a || c; > Branch (31:9): [True: 1, False: 1] > Branch (31:14): [True: 1, False: 0] > > d. I thought about extending the “region-count” carat markers in the text display, but it could get messy. For the HTML output, we can get a bit more fancy. > e. Branch miss percentages/totals will be added to the coverage report. > > Additional Notes > - I’m aware that constant condition folding in CodeGenFunction::EmitBranchOnBoolExpr() needs to be taken into account. Is there anything else related to branch optimization that I ought to be aware of? > > > Please let me know if these design thoughts look reasonable and if this would be useful. The goal is to start full implementation soon and upstream in a few months. > > Thanks! > Alan Phipps > Texas Instruments, Inc. > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200124/3327f9e0/attachment.html>
Finkel, Hal J. via llvm-dev
2020-Jan-24 21:47 UTC
[llvm-dev] Adding support for LLVM Branch Condition Coverage
On 1/24/20 1:05 PM, vsk at apple.com<mailto:vsk at apple.com> wrote: + Kostya On Jan 24, 2020, at 10:56 AM, Phipps, Alan <a-phipps at ti.com<mailto:a-phipps at ti.com>> wrote: + Vedant Hi Hal, thanks. I apologize if my answers aren’t as thorough as you would like; what I’m proposing is simply an extension to the existing infrastructure, so it would be enabled automatically as part of code coverage. Mapping of branch regions would be done in CoverageMappingGen and instrumented using the same profiling instrumentation mechanism under CodeGenPGO::mapRegionCounters() and around CodeGenFunction::EmitBranchOnBoolExpr() . In fact, as I mention below, we’d largely be reusing the same profiling counters (except in at least one exception case that I described in my email). The existing functionality of coverage and profiling would still work exactly as it has. Further, I can add a switch to llvm-cov to enable/disable branch coverage visualization and whether it’s included in the coverage report. With respect to fuzzing, to be sure I don’t misunderstand you, are you referring to testing the branch coverage capability itself using fuzzing, or are you referring to the leveraging of coverage by a fuzzer itself (i.e. coverage-guided fuzzing)? For the latter, I could look into libFuzzer and see how this might impact it. For the former, I haven’t thought much about using fuzzing to test coverage although I am certainly open to suggestions. To add a bit to this, llvm ships sancov (https://clang.llvm.org/docs/SanitizerCoverage.html) to support function/block/edge-level coverage guided fuzzing. Alan's proposal targets the 'source-based' coverage feature (https://clang.llvm.org/docs/SourceBasedCodeCoverage.html). The goal of this feature is to provide precise, human-readable reports (e.g. without artifacts related to optimized debug info). As for profiling, llvm+clang support both frontend-level instrumentation (profiles collected this way are meant to degrade slowly over time), and IR-level instrumentation (profiles are tied to the specific revision of the instrumented binary). The source-based coverage implementation is built on top of frontend-level PGO instrumentation. vedant Thanks for clarifying all of this! -Hal -Alan From: Finkel, Hal J. [mailto:hfinkel at anl.gov] Sent: Friday, January 24, 2020 11:02 AM To: Phipps, Alan; llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> Subject: [EXTERNAL] Re: [llvm-dev] Adding support for LLVM Branch Condition Coverage Thanks, Alan. This certainly seems useful. Can you please provide a quick overview on how this relates to our other infrastructure for coverage, for profiling, and what's used for fuzz testing? -Hal On 1/23/20 6:09 PM, Phipps, Alan via llvm-dev wrote: Vedant Kumar asked me to post my design thoughts concerning branch coverage at llvm-dev since there is general interest. My team at Texas Instruments is developing an embedded ARM C/C++ compiler with LLVM. I would like to enhance LLVM’s code coverage capability with branch condition coverage (for C/C++), similar to GCC/GCOV support for branch coverage. This is useful for TI, and I think this will be a useful feature enhancement to LLVM that I can upstream. In a nutshell, the functionality boils down to tracking how many times a generated “branch” instruction (based on a source code condition) is taken or not taken (i.e. evaluated into “True” and “False”). This applies to decision points in control flow (if, for, while, …) as well as individual conditions on logical operators (“&&”, “||”) in Boolean expressions. In sketching out a design, there are three primary areas in the design that I am proposing: 1.) Add a new CounterMappingRegion kind for branch conditions a. This new region kind would track two counters, one for the “True” branch taken count of a branch condition, and one for the “False” branch taken count. i. Alternatively, I could use two separate CounterMappingRegions to track individual counters since this is how the class was originally written to be used. However, using a single region kind to represent a single branch condition that ties all of the pertinent counter information together seems like a cleaner design. ii. Just as for all counters, the two branch condition counters can represent a reference to an instrumentation counter or to a counter expression. The two counters are encoded along with the MappingRegions and distinguished based on the region kind. iii. All other CounterMappingRegion kinds simply ignore the second counter; nothing changes in how they’re encoded, which preserves format backward compatibility. b. I think this change also requires an adjustment to the class SourceMappingRegion to support branch conditions that can be generated into CounterMappingRegion instances. 2.) Counter Instrumentation a. We can reuse most of the existing profile instrumentation counters that are emitted as part of profiling/coverage to calculate branch condition counts (True/False). i. This assumption leverages the fact that logical operators in C are “short-circuit” operators. For example, the “False-taken” count for the left-hand-side condition in a logical-or expression (e.g. condition “C1” in “C1 || C2”) can be derived from the execution count we already track for the right-hand-side (condition “C2” in “C1 || C2”). b. There does exist a case when evaluating the right-hand-side condition of a logical operator that isn’t part of a control-flow statement (e.g. condition “C2” in “x = C1 || C2;”) that will require instrumenting a new counter in order to properly derive that condition’s “true” count and “false” count. c. I’ll avoid going too deep into detail here, but my goal is to ensure we reuse existing profile counters as much as possible. 3.) Visualization using llvm-cov a. The notion of CoverageSegment needs to be extended to comprehend the branch condition data represented by a CounterMappingRegion above. But then llvm-cov can treat the segment distinctly when displaying True/False counts for each branch condition as well as tracking total missed branches. b. We can also add a BranchCoverageInfo class to track branch coverage data, similar to LineCoverageInfo and RegionCoverageInfo. c. The text output could look something like GCOV but with more detail that we know (I prototyped this using logical-or): 9| |int main(int argc, char *argv[]) 10| 3|{ 11| 3| if (argc == 1) Branch (11:9): [True: 1, False: 2] 12| 1| { 13| 1| return 0; 14| 1| } . . . 23| 2| if (a == 0 || b == 2 || b == 34 || a == b) Branch (23:9): [True: 1, False: 1] Branch (23:19): [True: 1, False: 0] Branch (23:29): [True: 0, False: 0] Branch (23:40): [True: 0, False: 0] . . . 31| 2| b = a || c; Branch (31:9): [True: 1, False: 1] Branch (31:14): [True: 1, False: 0] d. I thought about extending the “region-count” carat markers in the text display, but it could get messy. For the HTML output, we can get a bit more fancy. e. Branch miss percentages/totals will be added to the coverage report. Additional Notes - I’m aware that constant condition folding in CodeGenFunction::EmitBranchOnBoolExpr() needs to be taken into account. Is there anything else related to branch optimization that I ought to be aware of? Please let me know if these design thoughts look reasonable and if this would be useful. The goal is to start full implementation soon and upstream in a few months. Thanks! Alan Phipps Texas Instruments, Inc. _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200124/6d234aa6/attachment.html>