Siddharth Shankar Swain via llvm-dev
2018-Aug-24 12:31 UTC
[llvm-dev] [cfe-dev] Soundness in clang SA
Thanks for the detailed explanation. So is changing the analysis technique from path sensitive (symbolic execution) analysis to some other technique bring in the soundness property ? Thanks, Siddharth On Sat, Aug 18, 2018 at 1:27 AM, Artem Dergachev <noqnoqneo at gmail.com> wrote:> Nope, at least not with the "path-sensitive" (symbolic execution) engine. > Which is the whole point of using the Analyzer, as compared to writing an > arbitrary tool by exploring the AST yourself. Depends on what you want to > check, of course, something really trivial should be possible. > > While we do care about false positives a lot, there are some decisions > within the Analyzer's core that prevent perfect soundness from happening > but are still considered to be good because they greatly boost our ability > to find any bugs at all and at the same time simplify development > dramatically. > > One of the most common unsound assumptions that the Analyzer makes is that > in a program that looks like this: "if (x) { ... } if (y) { ... }", where > 'x' and 'y' are completely unrelated to each other, the Analyzer would > explore all four paths (i.e., (x is true or false) times (y is true or > false)) as if it's sure that they are feasible. This is not only unsound on > its own, but it amplifies other inaccuracies in modeling (eg., if function > body is unavailable for analysis, destroying information when such function > is called would lead not only to false negatives but to false positives as > well). But these false positives are relatively easy to suppress (eg., by > adding some sort of assert(x || !y), which is in most cases anyway a good > thing to document), and the primary benefit of this simplification is that > checkers no longer need to implement a "merge" function for the information > that they track in the program state: they only need to update their state > by modeling effect of every non-control-flow event in the program. With > ~100 checkers already written, limiting them to only simple operations on > the program state helps dramatically. > > Additionally, the Analyzer's understanding of the language is not perfect. > We're constantly working on this, but languages like C++ are *huge*. We > have a separation of responsibilities between core and checkers that allow > us to write most transfer functions once within the core and have checkers > simply consume the results of the modeling (checkers may still want to > model their own metadata), but covering all AST nodes and their > interactions is still, well, harder than writing a CodeGen. Any bugs in > core transfer functions will prevent you from doing a sound analysis, and > making sure all such bugs are fixed is probably much more work than > whatever you plan to accomplish here. > > On the non-technical side, it is always up to a developer of the program > to decide what is a genuine bug. For example, there are a lot of projects > on which a memory leak is not considered to be a bug. Or if an application > is not security-critical, a null pointer dereference we find may be on such > an unlikely path that the developer will only be annoyed to know about it > and never really appreciate the report. And the Analyzer doesn't even > guarantee that it'd find anything that's more severe than a dead code: > after all, the only reason symbolic execution works is "state splits are > justified because otherwise it's dead code". > > Last but not least, there's also the "Clang CFG" thing, which the Analyzer > uses (and some compiler warnings also use), but it's not used for > compilation (LLVM CFG is used instead) and it's not entirely accurate > (especially for C++, though it's getting better). Clang CFG is usable for > creating usable data flow analyses which you can make as sound as the CFG > is accurate. There's a collection of such analyses in lib/Analysis. The > Analyzer uses some of them internally and we're very happy with them most > of the time. At the same time, there's no fancy framework for creating > custom data flow analyses, like there is for the Analyzer's checkers; > you'll have to write all transfer functions yourself. So if all you want is > a ready-made CFG for C, you should have a look at that. > > > On 8/17/18 4:35 AM, Siddharth Shankar Swain via cfe-dev wrote: > > Hi all, > > Is it possible to develop a checker or some feature in clang SA which > will only have perfect soundness property ( if we don't care about > completness property ) i.e if the analyzer says X is a genuine bug then X > is really a genuine bug. Whatever bug it reports are all genuine but it > doesn't report all genuine bugs. Please guide. > > Thanks, > Siddharth > > _______________________________________________ > cfe-dev mailing listcfe-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180824/aa83f355/attachment.html>
Artem Dergachev via llvm-dev
2018-Aug-24 23:12 UTC
[llvm-dev] [cfe-dev] Soundness in clang SA
On 8/24/18 5:31 AM, Siddharth Shankar Swain wrote:> Thanks for the detailed explanation. So is changing the analysis > technique from path sensitive (symbolic execution) analysis to some > other technique bring in the soundness property ?Yup, depending on the technique. And that wouldn't be Analyzer-specific anymore.> Thanks, > Siddharth > > On Sat, Aug 18, 2018 at 1:27 AM, Artem Dergachev <noqnoqneo at gmail.com > <mailto:noqnoqneo at gmail.com>> wrote: > > Nope, at least not with the "path-sensitive" (symbolic execution) > engine. Which is the whole point of using the Analyzer, as > compared to writing an arbitrary tool by exploring the AST > yourself. Depends on what you want to check, of course, something > really trivial should be possible. > > While we do care about false positives a lot, there are some > decisions within the Analyzer's core that prevent perfect > soundness from happening but are still considered to be good > because they greatly boost our ability to find any bugs at all and > at the same time simplify development dramatically. > > One of the most common unsound assumptions that the Analyzer makes > is that in a program that looks like this: "if (x) { ... } if (y) > { ... }", where 'x' and 'y' are completely unrelated to each > other, the Analyzer would explore all four paths (i.e., (x is true > or false) times (y is true or false)) as if it's sure that they > are feasible. This is not only unsound on its own, but it > amplifies other inaccuracies in modeling (eg., if function body is > unavailable for analysis, destroying information when such > function is called would lead not only to false negatives but to > false positives as well). But these false positives are relatively > easy to suppress (eg., by adding some sort of assert(x || !y), > which is in most cases anyway a good thing to document), and the > primary benefit of this simplification is that checkers no longer > need to implement a "merge" function for the information that they > track in the program state: they only need to update their state > by modeling effect of every non-control-flow event in the program. > With ~100 checkers already written, limiting them to only simple > operations on the program state helps dramatically. > > Additionally, the Analyzer's understanding of the language is not > perfect. We're constantly working on this, but languages like C++ > are *huge*. We have a separation of responsibilities between core > and checkers that allow us to write most transfer functions once > within the core and have checkers simply consume the results of > the modeling (checkers may still want to model their own > metadata), but covering all AST nodes and their interactions is > still, well, harder than writing a CodeGen. Any bugs in core > transfer functions will prevent you from doing a sound analysis, > and making sure all such bugs are fixed is probably much more work > than whatever you plan to accomplish here. > > On the non-technical side, it is always up to a developer of the > program to decide what is a genuine bug. For example, there are a > lot of projects on which a memory leak is not considered to be a > bug. Or if an application is not security-critical, a null pointer > dereference we find may be on such an unlikely path that the > developer will only be annoyed to know about it and never really > appreciate the report. And the Analyzer doesn't even guarantee > that it'd find anything that's more severe than a dead code: after > all, the only reason symbolic execution works is "state splits are > justified because otherwise it's dead code". > > Last but not least, there's also the "Clang CFG" thing, which the > Analyzer uses (and some compiler warnings also use), but it's not > used for compilation (LLVM CFG is used instead) and it's not > entirely accurate (especially for C++, though it's getting > better). Clang CFG is usable for creating usable data flow > analyses which you can make as sound as the CFG is accurate. > There's a collection of such analyses in lib/Analysis. The > Analyzer uses some of them internally and we're very happy with > them most of the time. At the same time, there's no fancy > framework for creating custom data flow analyses, like there is > for the Analyzer's checkers; you'll have to write all transfer > functions yourself. So if all you want is a ready-made CFG for C, > you should have a look at that. > > > On 8/17/18 4:35 AM, Siddharth Shankar Swain via cfe-dev wrote: >> Hi all, >> >> Is it possible to develop a checker or some feature in clang SA >> which will only have perfect soundness property ( if we don't >> care about completness property ) i.e if the analyzer says X is >> a genuine bug then X is really a genuine bug. Whatever bug it >> reports are all genuine but it doesn't report all genuine bugs. >> Please guide. >> >> Thanks, >> Siddharth >> >> _______________________________________________ >> cfe-dev mailing list >> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180824/f7388068/attachment.html>
Siddharth Shankar Swain via llvm-dev
2018-Aug-26 06:18 UTC
[llvm-dev] [cfe-dev] Soundness in clang SA
Hi, What all methods are possible for program analysis? There are :- 1) Flow sensitive analysis 2) Path Sensitive analysis ( Symbolic execution ) 3) Context sensitive analysis Any other current research in this field ? Please guide Thanks, Siddharth On Sat, Aug 25, 2018 at 4:42 AM, Artem Dergachev <noqnoqneo at gmail.com> wrote:> > > On 8/24/18 5:31 AM, Siddharth Shankar Swain wrote: > > Thanks for the detailed explanation. So is changing the analysis technique > from path sensitive (symbolic execution) analysis to some other technique > bring in the soundness property ? > > > Yup, depending on the technique. > > And that wouldn't be Analyzer-specific anymore. > > Thanks, > Siddharth > > On Sat, Aug 18, 2018 at 1:27 AM, Artem Dergachev <noqnoqneo at gmail.com> > wrote: > >> Nope, at least not with the "path-sensitive" (symbolic execution) engine. >> Which is the whole point of using the Analyzer, as compared to writing an >> arbitrary tool by exploring the AST yourself. Depends on what you want to >> check, of course, something really trivial should be possible. >> >> While we do care about false positives a lot, there are some decisions >> within the Analyzer's core that prevent perfect soundness from happening >> but are still considered to be good because they greatly boost our ability >> to find any bugs at all and at the same time simplify development >> dramatically. >> >> One of the most common unsound assumptions that the Analyzer makes is >> that in a program that looks like this: "if (x) { ... } if (y) { ... }", >> where 'x' and 'y' are completely unrelated to each other, the Analyzer >> would explore all four paths (i.e., (x is true or false) times (y is true >> or false)) as if it's sure that they are feasible. This is not only unsound >> on its own, but it amplifies other inaccuracies in modeling (eg., if >> function body is unavailable for analysis, destroying information when such >> function is called would lead not only to false negatives but to false >> positives as well). But these false positives are relatively easy to >> suppress (eg., by adding some sort of assert(x || !y), which is in most >> cases anyway a good thing to document), and the primary benefit of this >> simplification is that checkers no longer need to implement a "merge" >> function for the information that they track in the program state: they >> only need to update their state by modeling effect of every >> non-control-flow event in the program. With ~100 checkers already written, >> limiting them to only simple operations on the program state helps >> dramatically. >> >> Additionally, the Analyzer's understanding of the language is not >> perfect. We're constantly working on this, but languages like C++ are >> *huge*. We have a separation of responsibilities between core and checkers >> that allow us to write most transfer functions once within the core and >> have checkers simply consume the results of the modeling (checkers may >> still want to model their own metadata), but covering all AST nodes and >> their interactions is still, well, harder than writing a CodeGen. Any bugs >> in core transfer functions will prevent you from doing a sound analysis, >> and making sure all such bugs are fixed is probably much more work than >> whatever you plan to accomplish here. >> >> On the non-technical side, it is always up to a developer of the program >> to decide what is a genuine bug. For example, there are a lot of projects >> on which a memory leak is not considered to be a bug. Or if an application >> is not security-critical, a null pointer dereference we find may be on such >> an unlikely path that the developer will only be annoyed to know about it >> and never really appreciate the report. And the Analyzer doesn't even >> guarantee that it'd find anything that's more severe than a dead code: >> after all, the only reason symbolic execution works is "state splits are >> justified because otherwise it's dead code". >> >> Last but not least, there's also the "Clang CFG" thing, which the >> Analyzer uses (and some compiler warnings also use), but it's not used for >> compilation (LLVM CFG is used instead) and it's not entirely accurate >> (especially for C++, though it's getting better). Clang CFG is usable for >> creating usable data flow analyses which you can make as sound as the CFG >> is accurate. There's a collection of such analyses in lib/Analysis. The >> Analyzer uses some of them internally and we're very happy with them most >> of the time. At the same time, there's no fancy framework for creating >> custom data flow analyses, like there is for the Analyzer's checkers; >> you'll have to write all transfer functions yourself. So if all you want is >> a ready-made CFG for C, you should have a look at that. >> >> >> On 8/17/18 4:35 AM, Siddharth Shankar Swain via cfe-dev wrote: >> >> Hi all, >> >> Is it possible to develop a checker or some feature in clang SA which >> will only have perfect soundness property ( if we don't care about >> completness property ) i.e if the analyzer says X is a genuine bug then X >> is really a genuine bug. Whatever bug it reports are all genuine but it >> doesn't report all genuine bugs. Please guide. >> >> Thanks, >> Siddharth >> >> _______________________________________________ >> cfe-dev mailing listcfe-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180826/2be6ad1e/attachment.html>