Ben Liblit
2011-Aug-10 15:37 UTC
[LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby
In an earlier message (http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-August/042298.html), Andrew Lenharth suggested that EQTDDataStructures (from the poolalloc project) may only try to resolve indirect function calls. However, I am now finding that the generated DSCallGraph over-approximates the callees in a very simple indirect call. Some over-approximation is unavoidable, but this case seems simple enough that any reasonable analysis ought to have been able to give the exact, correct answer. My example C++ source file, compiled to bitcode using clang from the current llvm trunk, is as follows: struct Base { virtual void virt(); }; void Base::virt() { } void red(); void blue(); extern volatile int unknown; extern Base *base; void test() { base->virt(); (unknown ? red : blue)(); } Observe that we have a class with a virtual method, and two indirect calls: one through a vtable to that virtual method, and another which is a simple non-deterministic choice between two possible function pointers. I can understand an inexact (over-approximating) set of callees for the vtable call, as that is rather complex at the bitcode level. However, I am very surprised to see an over-approximation at the second indirect call, which is merely picking between two possible values: %11 = phi void ()* [ @red(), %8 ], [ @blue(), %9 ] # demangled call void %11() Yet my DSCallGraph tester code reports that this instruction might call either red(), blue(), or Base::virt(). I am at a loss to explain why. Equally strange: if I comment out the "base->virt();" call, then DSCallGraph *does* give the expected answer that "(unknown ? red : blue)()" could call either red() or blue() but not Base::virt(). Somehow having that vtable-based call present forces the other call to be unexpectedly conservative in its over-approximation. Source code for my DSCallGraph tester is attached below. I'd be most grateful for any clues you good people can provide. Thanks, Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: ShowCallGraph.cpp Type: text/x-c++src Size: 1379 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110810/6aa969f0/attachment.cpp>
Andrew Lenharth
2011-Aug-10 17:17 UTC
[LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby
On Wed, Aug 10, 2011 at 10:37 AM, Ben Liblit <liblit at cs.wisc.edu> wrote:> Equally strange: if I comment out the "base->virt();" call, then DSCallGraph > *does* give the expected answer that "(unknown ? red : blue)()" could call > either red() or blue() but not Base::virt(). Somehow having that > vtable-based call present forces the other call to be unexpectedly > conservative in its over-approximation.It's hard to say without looking at the bytecode. One possible explanation involves the extern property of the functions. base is external, so you don't know what virt may point to, except it may at least point to any function whose address is taken and that address escapes or the address is externally visible. (external code may set base->virt = red (either directly, DSA doesn't know about vtables, or by subclassing). DSA (again, I haven't looked at the specific code in a while), tries to see through casts in a call instruction to treat apparent indirect calls in the IR as direct (look at your previous post to see if the ir involved a cast of the function pointer), but it doesn't do the kind of flow-sensitive analysis you are looking for here. So, the call to %11 is indirect. Thus we pull the alias set for it. That set includes red and blue obviously. However, since they are both externally visible, they may, through base, alias virt. DSA pulls the alias set, not the minimal flow-based set of possible callees. Andrew
Ben Liblit
2011-Aug-10 17:58 UTC
[LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby
Andrew Lenharth wrote:> It's hard to say without looking at the bytecode.Thanks for the speedy reply, Andrew. I'm happy to provide that bytecode if it would help. Binary form or LLVM assembly source code form? On this list or off-list directly to you?> That set includes red and blue obviously. However, since they are > both externally visible, they may, through base, alias virt. DSA > pulls the alias set, not the minimal flow-based set of possible > callees.Oh, I see. I was assuming something inclusion-based, as in Andersen's analysis. But if DSA is fundamentally unification-based (as in Steensgaard's analysis), that could explain why red() and blue() and Base::virt() all end up lumped together. If that is true, then I'll definitely need a different approach to resolving indirect calls. The code I intend to analyze is vtable-rich, so unification through those vtables is going to infect just about everything, making my analysis too conservative to be useful in practice. Does anyone know of an inclusion-based alternative, either in poolalloc or core LLVM or any other openly-available LLVM component? Surely there must be a decent LLVM implementation of an inclusion-based alias analysis out there somewhere...? -- Ben
John Criswell
2011-Aug-10 17:59 UTC
[LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby
Dear Ben, Just a few comments on DSA: 1) I'll try out your example C++ code below and see if I can get the same results that you do. However, I'm at a conference right now (Usenix Security), so I don't know exactly when I'll get to it. 2) DSA can get pessimistic results when dealing with external code (as Andrew described). It is designed for whole program analysis, meaning that the entire program should be available (e.g., no variables defined in other compilation units). Can you: a) Modify your example so that all variables are internally defined. You may need to use volatile keywords or printf() to ensure that dead code isn't eliminated. b) Make sure that you run the -internalize pass before running your analysis to make sure that all variables are marked with internal linkage. 3) As an FYI, we just ported DSA from LLVM 2.7 to LLVM mainline last week and haven't had time to shake it down and see how well it is working. We'll be shaking down mainline DSA as we integrate it into an optional libLTO replacement for use with the SAFECode memory safety compiler. DSA for LLVM 2.7 is still available in the release_27 branch of the poolalloc project. -- John T. On 8/10/11 8:37 AM, Ben Liblit wrote:> In an earlier message > (http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-August/042298.html), > Andrew Lenharth suggested that EQTDDataStructures (from the poolalloc > project) may only try to resolve indirect function calls. However, I > am now finding that the generated DSCallGraph over-approximates the > callees in a very simple indirect call. Some over-approximation is > unavoidable, but this case seems simple enough that any reasonable > analysis ought to have been able to give the exact, correct answer. > > My example C++ source file, compiled to bitcode using clang from the > current llvm trunk, is as follows: > > struct Base > { > virtual void virt(); > }; > > void Base::virt() > { > } > > void red(); > void blue(); > > extern volatile int unknown; > extern Base *base; > > void test() > { > base->virt(); > (unknown ? red : blue)(); > } > > Observe that we have a class with a virtual method, and two indirect > calls: one through a vtable to that virtual method, and another which > is a simple non-deterministic choice between two possible function > pointers. I can understand an inexact (over-approximating) set of > callees for the vtable call, as that is rather complex at the bitcode > level. However, I am very surprised to see an over-approximation at > the second indirect call, which is merely picking between two possible > values: > > %11 = phi void ()* [ @red(), %8 ], [ @blue(), %9 ] # demangled > call void %11() > > Yet my DSCallGraph tester code reports that this instruction might > call either red(), blue(), or Base::virt(). I am at a loss to explain > why. > > Equally strange: if I comment out the "base->virt();" call, then > DSCallGraph *does* give the expected answer that "(unknown ? red : > blue)()" could call either red() or blue() but not Base::virt(). > Somehow having that vtable-based call present forces the other call to > be unexpectedly conservative in its over-approximation. > > Source code for my DSCallGraph tester is attached below. I'd be most > grateful for any clues you good people can provide. > > Thanks, > Ben > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110810/3e360713/attachment.html>
Ben Liblit
2011-Aug-10 18:39 UTC
[LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby
John Criswell wrote:> 1) I'll try out your example C++ code below and see if I can get the > same results that you do. However, I'm at a conference right now (Usenix > Security), so I don't know exactly when I'll get to it.Excellent. Thanks, John!> 2) DSA can get pessimistic results when dealing with external code (as > Andrew described). It is designed for whole program analysis, meaning > that the entire program should be available (e.g., no variables defined > in other compilation units). Can you: [...]I have made the recommended changes. My test input is now a complete, self-contained program with a proper main. I use "-internalize" on the "opt" command line to run llvm::InternalizePass before my ShowCallGraph pass. (Sadly, llvm::InternalizePass::ID is not exposed through any headers, making it impossible to compile this pass-ordering requirement directly into my ShowCallGraph sources.) The modified test input is attached below. I'm happy to provide compiled bitcode, LLVM assembly source, or whatever else you need to reproduce the problem. The ShowCallGraph pass is the same as in my earlier message at <http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-August/042312.html>. When run on the bitcode for my updated test input, ShowCallGraph reports: call void %6(%struct.Base* %2) red() blue() Base::virt() const Derived::virt() const call void %12() red() blue() Base::virt() const Derived::virt() const The first of those two calls is a vtable dispatch; the ideal answer would be Base::virt() const and Derived::virt() const, without red() and blue(). Still, vtable lookups are complex, so I could imagine an over-approximation here. The second of those two calls is just a non-deterministic choice between two functions. I'd really hoped that DSA would give the ideal answer here: red() or blue(), but not Base::virt() const or Derived::virt() const. -- Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: test.cpp Type: text/x-c++src Size: 387 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110810/e9ad2370/attachment.cpp>
Possibly Parallel Threads
- [LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby
- [LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby
- [LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby
- [LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby
- [LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby