thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] Soundness in clang SA [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Siddharth Shankar Swain via llvm-dev

2018-Aug-17 11:35 UTC

[llvm-dev] Soundness in clang SA

Hi all,

Is it possible to develop a checker  or some feature in clang SA which will
only have perfect soundness property ( if we don't care about completness
property )  i.e if the analyzer says X is a genuine bug then X is really a
genuine bug. Whatever  bug  it reports are all genuine but it doesn't
report all genuine bugs. Please guide.

Thanks,
Siddharth
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180817/90524fad/attachment.html>

Artem Dergachev via llvm-dev

2018-Aug-17 19:57 UTC

head link

[llvm-dev] [cfe-dev] Soundness in clang SA

Nope, at least not with the "path-sensitive" (symbolic execution) 
engine. Which is the whole point of using the Analyzer, as compared to 
writing an arbitrary tool by exploring the AST yourself. Depends on what 
you want to check, of course, something really trivial should be possible.

While we do care about false positives a lot, there are some decisions 
within the Analyzer's core that prevent perfect soundness from happening 
but are still considered to be good because they greatly boost our 
ability to find any bugs at all and at the same time simplify 
development dramatically.

One of the most common unsound assumptions that the Analyzer makes is 
that in a program that looks like this: "if (x) { ... } if (y) { ...
}",
where 'x' and 'y' are completely unrelated to each other, the
Analyzer
would explore all four paths (i.e., (x is true or false) times (y is 
true or false)) as if it's sure that they are feasible. This is not only 
unsound on its own, but it amplifies other inaccuracies in modeling 
(eg., if function body is unavailable for analysis, destroying 
information when such function is called would lead not only to false 
negatives but to false positives as well). But these false positives are 
relatively easy to suppress (eg., by adding some sort of assert(x || 
!y), which is in most cases anyway a good thing to document), and the 
primary benefit of this simplification is that checkers no longer need 
to implement a "merge" function for the information that they track in
the program state: they only need to update their state by modeling 
effect of every non-control-flow event in the program. With ~100 
checkers already written, limiting them to only simple operations on the 
program state helps dramatically.

Additionally, the Analyzer's understanding of the language is not 
perfect. We're constantly working on this, but languages like C++ are 
*huge*. We have a separation of responsibilities between core and 
checkers that allow us to write most transfer functions once within the 
core and have checkers simply consume the results of the modeling 
(checkers may still want to model their own metadata), but covering all 
AST nodes and their interactions is still, well, harder than writing a 
CodeGen. Any bugs in core transfer functions will prevent you from doing 
a sound analysis, and making sure all such bugs are fixed is probably 
much more work than whatever you plan to accomplish here.

On the non-technical side, it is always up to a developer of the program 
to decide what is a genuine bug. For example, there are a lot of 
projects on which a memory leak is not considered to be a bug. Or if an 
application is not security-critical, a null pointer dereference we find 
may be on such an unlikely path that the developer will only be annoyed 
to know about it and never really appreciate the report. And the 
Analyzer doesn't even guarantee that it'd find anything that's more 
severe than a dead code: after all, the only reason symbolic execution 
works is "state splits are justified because otherwise it's dead
code".

Last but not least, there's also the "Clang CFG" thing, which the 
Analyzer uses (and some compiler warnings also use), but it's not used 
for compilation (LLVM CFG is used instead) and it's not entirely 
accurate (especially for C++, though it's getting better). Clang CFG is 
usable for creating usable data flow analyses which you can make as 
sound as the CFG is accurate. There's a collection of such analyses in 
lib/Analysis. The Analyzer uses some of them internally and we're very 
happy with them most of the time. At the same time, there's no fancy 
framework for creating custom data flow analyses, like there is for the 
Analyzer's checkers; you'll have to write all transfer functions 
yourself. So if all you want is a ready-made CFG for C, you should have 
a look at that.

On 8/17/18 4:35 AM, Siddharth Shankar Swain via cfe-dev
wrote:> Hi all,
>
> Is it possible to develop a checker  or some feature in clang SA which 
> will only have perfect soundness property ( if we don't care about 
> completness property )  i.e if the analyzer says X is a genuine bug 
> then X is really a genuine bug. Whatever  bug  it reports are all 
> genuine but it doesn't report all genuine bugs. Please guide.
>
> Thanks,
> Siddharth
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180817/06406565/attachment-0001.html>

Siddharth Shankar Swain via llvm-dev

2018-Aug-24 12:31 UTC

head link

[llvm-dev] [cfe-dev] Soundness in clang SA

Thanks for the detailed explanation. So is changing the analysis technique
from path sensitive (symbolic execution) analysis to some other technique
bring in the soundness property ?
Thanks,
Siddharth

On Sat, Aug 18, 2018 at 1:27 AM, Artem Dergachev <noqnoqneo at gmail.com>
wrote:
> Nope, at least not with the "path-sensitive" (symbolic execution)
engine.
> Which is the whole point of using the Analyzer, as compared to writing an
> arbitrary tool by exploring the AST yourself. Depends on what you want to
> check, of course, something really trivial should be possible.
>
> While we do care about false positives a lot, there are some decisions
> within the Analyzer's core that prevent perfect soundness from
happening
> but are still considered to be good because they greatly boost our ability
> to find any bugs at all and at the same time simplify development
> dramatically.
>
> One of the most common unsound assumptions that the Analyzer makes is that
> in a program that looks like this: "if (x) { ... } if (y) { ...
}", where
> 'x' and 'y' are completely unrelated to each other, the
Analyzer would
> explore all four paths (i.e., (x is true or false) times (y is true or
> false)) as if it's sure that they are feasible. This is not only
unsound on
> its own, but it amplifies other inaccuracies in modeling (eg., if function
> body is unavailable for analysis, destroying information when such function
> is called would lead not only to false negatives but to false positives as
> well). But these false positives are relatively easy to suppress (eg., by
> adding some sort of assert(x || !y), which is in most cases anyway a good
> thing to document), and the primary benefit of this simplification is that
> checkers no longer need to implement a "merge" function for the
information
> that they track in the program state: they only need to update their state
> by modeling effect of every non-control-flow event in the program. With
> ~100 checkers already written, limiting them to only simple operations on
> the program state helps dramatically.
>
> Additionally, the Analyzer's understanding of the language is not
perfect.
> We're constantly working on this, but languages like C++ are *huge*. We
> have a separation of responsibilities between core and checkers that allow
> us to write most transfer functions once within the core and have checkers
> simply consume the results of the modeling (checkers may still want to
> model their own metadata), but covering all AST nodes and their
> interactions is still, well, harder than writing a CodeGen. Any bugs in
> core transfer functions will prevent you from doing a sound analysis, and
> making sure all such bugs are fixed is probably much more work than
> whatever you plan to accomplish here.
>
> On the non-technical side, it is always up to a developer of the program
> to decide what is a genuine bug. For example, there are a lot of projects
> on which a memory leak is not considered to be a bug. Or if an application
> is not security-critical, a null pointer dereference we find may be on such
> an unlikely path that the developer will only be annoyed to know about it
> and never really appreciate the report. And the Analyzer doesn't even
> guarantee that it'd find anything that's more severe than a dead
code:
> after all, the only reason symbolic execution works is "state splits
are
> justified because otherwise it's dead code".
>
> Last but not least, there's also the "Clang CFG" thing, which
the Analyzer
> uses (and some compiler warnings also use), but it's not used for
> compilation (LLVM CFG is used instead) and it's not entirely accurate
> (especially for C++, though it's getting better). Clang CFG is usable
for
> creating usable data flow analyses which you can make as sound as the CFG
> is accurate. There's a collection of such analyses in lib/Analysis. The
> Analyzer uses some of them internally and we're very happy with them
most
> of the time. At the same time, there's no fancy framework for creating
> custom data flow analyses, like there is for the Analyzer's checkers;
> you'll have to write all transfer functions yourself. So if all you
want is
> a ready-made CFG for C, you should have a look at that.
>
>
> On 8/17/18 4:35 AM, Siddharth Shankar Swain via cfe-dev wrote:
>
> Hi all,
>
> Is it possible to develop a checker  or some feature in clang SA which
> will only have perfect soundness property ( if we don't care about
> completness property )  i.e if the analyzer says X is a genuine bug then X
> is really a genuine bug. Whatever  bug  it reports are all genuine but it
> doesn't report all genuine bugs. Please guide.
>
> Thanks,
> Siddharth
>
> _______________________________________________
> cfe-dev mailing listcfe-dev at
lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180824/aa83f355/attachment.html>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Aug 2018 - [cfe-dev] Soundness in clang SA

[llvm-dev] Soundness in clang SA

[llvm-dev] [cfe-dev] Soundness in clang SA

[llvm-dev] [cfe-dev] Soundness in clang SA

Maybe Matching Threads