thr3ads.net - llvm dev - [LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby [Aug 2011]

If this information is useful, please help other people find it:
Share via:

Ben Liblit

2011-Aug-10 15:37 UTC

[LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby

In an earlier message 
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-August/042298.html), 
Andrew Lenharth suggested that EQTDDataStructures (from the poolalloc 
project) may only try to resolve indirect function calls.  However, I am 
now finding that the generated DSCallGraph over-approximates the callees 
in a very simple indirect call.  Some over-approximation is unavoidable, 
but this case seems simple enough that any reasonable analysis ought to 
have been able to give the exact, correct answer.

My example C++ source file, compiled to bitcode using clang from the 
current llvm trunk, is as follows:

	struct Base
	{
	  virtual void virt();
	};

	void Base::virt()
	{
	}

	void red();
	void blue();

	extern volatile int unknown;
	extern Base *base;

	void test()
	{
	  base->virt();
	  (unknown ? red : blue)();
	}

Observe that we have a class with a virtual method, and two indirect 
calls: one through a vtable to that virtual method, and another which is 
a simple non-deterministic choice between two possible function 
pointers.  I can understand an inexact (over-approximating) set of 
callees for the vtable call, as that is rather complex at the bitcode 
level.  However, I am very surprised to see an over-approximation at the 
second indirect call, which is merely picking between two possible values:

	%11 = phi void ()* [ @red(), %8 ], [ @blue(), %9 ]   # demangled
	call void %11()

Yet my DSCallGraph tester code reports that this instruction might call 
either red(), blue(), or Base::virt().  I am at a loss to explain why.

Equally strange: if I comment out the "base->virt();" call, then 
DSCallGraph *does* give the expected answer that "(unknown ? red : 
blue)()" could call either red() or blue() but not Base::virt(). 
Somehow having that vtable-based call present forces the other call to 
be unexpectedly conservative in its over-approximation.

Source code for my DSCallGraph tester is attached below.  I'd be most 
grateful for any clues you good people can provide.

Thanks,
Ben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ShowCallGraph.cpp
Type: text/x-c++src
Size: 1379 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110810/6aa969f0/attachment.cpp>

Andrew Lenharth

2011-Aug-10 17:17 UTC

head link

[LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby

On Wed, Aug 10, 2011 at 10:37 AM, Ben Liblit <liblit at cs.wisc.edu>
wrote:> Equally strange: if I comment out the "base->virt();" call,
then DSCallGraph
> *does* give the expected answer that "(unknown ? red : blue)()"
could call
> either red() or blue() but not Base::virt(). Somehow having that
> vtable-based call present forces the other call to be unexpectedly
> conservative in its over-approximation.
It's hard to say without looking at the bytecode.  One possible
explanation involves the extern property of the functions.  base is
external, so you don't know what virt may point to, except it may at
least point to any function whose address is taken and that address
escapes or the address is externally visible.  (external code may set
base->virt = red (either directly, DSA doesn't know about vtables, or
by subclassing).  DSA (again, I haven't looked at the specific code in
a while), tries to see through casts in a call instruction to treat
apparent indirect calls in the IR as direct (look at your previous
post to see if the ir involved a cast of the function pointer), but it
doesn't do the kind of flow-sensitive analysis you are looking for
here.

So, the call to %11 is indirect.  Thus we pull the alias set for it.
That set includes red and blue obviously.  However, since they are
both externally visible, they may, through base, alias virt.  DSA
pulls the alias set, not the minimal flow-based set of possible
callees.

Andrew

Ben Liblit

2011-Aug-10 17:58 UTC

head link

[LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby

Andrew Lenharth wrote:> It's hard to say without looking at the bytecode.
Thanks for the speedy reply, Andrew.  I'm happy to provide that bytecode 
if it would help.  Binary form or LLVM assembly source code form?  On 
this list or off-list directly to you?
> That set includes red and blue obviously.  However, since they are
> both externally visible, they may, through base, alias virt.  DSA
> pulls the alias set, not the minimal flow-based set of possible
> callees.
Oh, I see.  I was assuming something inclusion-based, as in Andersen's 
analysis.  But if DSA is fundamentally unification-based (as in 
Steensgaard's analysis), that could explain why red() and blue() and 
Base::virt() all end up lumped together.

If that is true, then I'll definitely need a different approach to 
resolving indirect calls.  The code I intend to analyze is vtable-rich, 
so unification through those vtables is going to infect just about 
everything, making my analysis too conservative to be useful in practice.

Does anyone know of an inclusion-based alternative, either in poolalloc 
or core LLVM or any other openly-available LLVM component?  Surely there 
must be a decent LLVM implementation of an inclusion-based alias 
analysis out there somewhere...?

-- Ben

John Criswell

2011-Aug-10 17:59 UTC

head link

[LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby

Dear Ben,

Just a few comments on DSA:

1) I'll try out your example C++ code below and see if I can get the 
same results that you do.  However, I'm at a conference right now 
(Usenix Security), so I don't know exactly when I'll get to it.

2) DSA can get pessimistic results when dealing with external code (as 
Andrew described).  It is designed for whole program analysis, meaning 
that the entire program should be available (e.g., no variables defined 
in other compilation units).  Can you:

a) Modify your example so that all variables are internally defined.  
You may need to use volatile keywords or printf() to ensure that dead 
code isn't eliminated.

b) Make sure that you run the -internalize pass before running your 
analysis to make sure that all variables are marked with internal linkage.

3) As an FYI, we just ported DSA from LLVM 2.7 to LLVM mainline last 
week and haven't had time to shake it down and see how well it is 
working.  We'll be shaking down mainline DSA as we integrate it into an 
optional libLTO replacement for use with the SAFECode memory safety 
compiler.  DSA for LLVM 2.7 is still available in the release_27 branch 
of the poolalloc project.

-- John T.

On 8/10/11 8:37 AM, Ben Liblit wrote:> In an earlier message 
> (http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-August/042298.html), 
> Andrew Lenharth suggested that EQTDDataStructures (from the poolalloc 
> project) may only try to resolve indirect function calls.  However, I 
> am now finding that the generated DSCallGraph over-approximates the 
> callees in a very simple indirect call.  Some over-approximation is 
> unavoidable, but this case seems simple enough that any reasonable 
> analysis ought to have been able to give the exact, correct answer.
>
> My example C++ source file, compiled to bitcode using clang from the 
> current llvm trunk, is as follows:
>
>     struct Base
>     {
>       virtual void virt();
>     };
>
>     void Base::virt()
>     {
>     }
>
>     void red();
>     void blue();
>
>     extern volatile int unknown;
>     extern Base *base;
>
>     void test()
>     {
>       base->virt();
>       (unknown ? red : blue)();
>     }
>
> Observe that we have a class with a virtual method, and two indirect 
> calls: one through a vtable to that virtual method, and another which 
> is a simple non-deterministic choice between two possible function 
> pointers.  I can understand an inexact (over-approximating) set of 
> callees for the vtable call, as that is rather complex at the bitcode 
> level.  However, I am very surprised to see an over-approximation at 
> the second indirect call, which is merely picking between two possible 
> values:
>
>     %11 = phi void ()* [ @red(), %8 ], [ @blue(), %9 ]   # demangled
>     call void %11()
>
> Yet my DSCallGraph tester code reports that this instruction might 
> call either red(), blue(), or Base::virt().  I am at a loss to explain 
> why.
>
> Equally strange: if I comment out the "base->virt();" call,
then
> DSCallGraph *does* give the expected answer that "(unknown ? red : 
> blue)()" could call either red() or blue() but not Base::virt(). 
> Somehow having that vtable-based call present forces the other call to 
> be unexpectedly conservative in its over-approximation.
>
> Source code for my DSCallGraph tester is attached below.  I'd be most 
> grateful for any clues you good people can provide.
>
> Thanks,
> Ben
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110810/3e360713/attachment.html>

Ben Liblit

2011-Aug-10 18:39 UTC

head link

[LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby

John Criswell wrote:> 1) I'll try out your example C++ code below and see if I can get the
> same results that you do. However, I'm at a conference right now
(Usenix
> Security), so I don't know exactly when I'll get to it.
Excellent.  Thanks, John!
> 2) DSA can get pessimistic results when dealing with external code (as
> Andrew described). It is designed for whole program analysis, meaning
> that the entire program should be available (e.g., no variables defined
> in other compilation units). Can you: [...]
I have made the recommended changes.  My test input is now a complete, 
self-contained program with a proper main.  I use "-internalize" on
the
"opt" command line to run llvm::InternalizePass before my
ShowCallGraph
pass.  (Sadly, llvm::InternalizePass::ID is not exposed through any 
headers, making it impossible to compile this pass-ordering requirement 
directly into my ShowCallGraph sources.)

The modified test input is attached below.  I'm happy to provide 
compiled bitcode, LLVM assembly source, or whatever else you need to 
reproduce the problem.  The ShowCallGraph pass is the same as in my 
earlier message at 
<http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-August/042312.html>. 
When run on the bitcode for my updated test input, ShowCallGraph reports:

   call void %6(%struct.Base* %2)
	red()
	blue()
	Base::virt() const
	Derived::virt() const
   call void %12()
	red()
	blue()
	Base::virt() const
	Derived::virt() const

The first of those two calls is a vtable dispatch; the ideal answer 
would be Base::virt() const and Derived::virt() const, without red() and 
blue().  Still, vtable lookups are complex, so I could imagine an 
over-approximation here.

The second of those two calls is just a non-deterministic choice between 
two functions.  I'd really hoped that DSA would give the ideal answer 
here: red() or blue(), but not Base::virt() const or Derived::virt() const.

-- Ben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.cpp
Type: text/x-c++src
Size: 387 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110810/e9ad2370/attachment.cpp>

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Aug 2011 - [LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby

[LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby

[LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby

[LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby

[LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby

[LLVMdev] incorrect DSCallGraph for simple indirect call with vtable nearby

Reasonably Related Threads