thr3ads.net - llvm dev - [LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results? [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Christian Convey

2015-Jun-15 17:33 UTC

[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?

On Mon, Jun 15, 2015 at 11:02 AM, Daniel Berlin <dberlin at dberlin.org>
wrote:
> Points-to analysis on LLVM-IR itself is fine (see the current CFL-AA,
> or the old deleted andersen's implementations), and giving may-alias
> and no-alias results also works. Giving must-alias answers, however,
> is difficult.
>
> In particular, i would not simply ignore some types of constructs and
> expect to produce valid answers.
>
Makes sense.  Thanks for the advice.

>
> There are plenty of things that are illegal in C but legal in LLVM IR.
>
> For example, the following is legal LLVM IR (sorry for c style, it's
early)
>
> bar(int64 a) {
> int64 * foo = inttoptr(a);
> baz = load *foo;
> }
>
> This is not illegal, and will produce a valid result.
>
> Same with stuff like:
> bar(int64 *a) {
> int64 foo = ptrtoint(a);
> baz = foo + 5;
> int64 *b = inttoptr(baz);
> c = load *b;
> }
>
> Again, not illegal, and produces a valid result.
> You can pretty much do what you want.
>
> Things like "c pointer aliasing rules" exist only as metadata.
> So in general, you can't expect "invalid pointers" to buy you
very much.
>
I see, thanks for clarifying.  The AA algorithm I've been working with
assumes that the type system is going to lie, since C allows type punning.
I'm pretty sure I can port that distrust to the LLVM IR version of the
algorithm.  It sounds like that would cover the examples you gave above, if
I'm also appropriately pessimistic about the behavior of unknown /
unanalyzed callers and callees.

Maybe what I'll try is to add a flag to each vertex in the may-point-to
graph, indicating whether or not the vertex's memory might hold additional,
poorly understood pointers.  Then I can let an appropriate amount of hell
break loose in the analysis, if a piece of memory with that flag is used in
various ways.

That way, if over time I can make the algorithm better at detecting and
making sense of code which generates new pointer values, I can just
gradually reduce the cases where I need to set that flag.


> > I did look at the LLVM IR for calling a virtual function in C++, since
> you
> > mentioned that as an example earlier.  From manual inspection, I
thought
> I
> > could spot the value flow of the virtual function pointer from where
the
> > function was defined, into the vtable constant for that class, and
then
> into
> > the class instance's vtable pointer.
> This depends on the frontend generating the llvm IR :)

Touche.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150615/aa41a3ae/attachment.html>

Daniel Berlin

2015-Jun-15 19:29 UTC

head link

[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?

On Mon, Jun 15, 2015 at 10:33 AM, Christian Convey
<christian.convey at gmail.com> wrote:> On Mon, Jun 15, 2015 at 11:02 AM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>>
>> Points-to analysis on LLVM-IR itself is fine (see the current CFL-AA,
>> or the old deleted andersen's implementations), and giving
may-alias
>> and no-alias results also works. Giving must-alias answers, however,
>> is difficult.
>>
>> In particular, i would not simply ignore some types of constructs and
>> expect to produce valid answers.
>
>
> Makes sense.  Thanks for the advice.
>
>>
>>
>> There are plenty of things that are illegal in C but legal in LLVM IR.
>>
>> For example, the following is legal LLVM IR (sorry for c style,
it's
>> early)
>>
>> bar(int64 a) {
>> int64 * foo = inttoptr(a);
>> baz = load *foo;
>> }
>>
>> This is not illegal, and will produce a valid result.
>>
>> Same with stuff like:
>> bar(int64 *a) {
>> int64 foo = ptrtoint(a);
>> baz = foo + 5;
>> int64 *b = inttoptr(baz);
>> c = load *b;
>> }
>>
>> Again, not illegal, and produces a valid result.
>> You can pretty much do what you want.
>>
>> Things like "c pointer aliasing rules" exist only as
metadata.
>> So in general, you can't expect "invalid pointers" to buy
you very much.
>
>
> I see, thanks for clarifying.  The AA algorithm I've been working with
> assumes that the type system is going to lie, since C allows type punning.
Which paper are you using?> I'm pretty sure I can port that distrust to the LLVM IR version of the
> algorithm.  It sounds like that would cover the examples you gave above, if
> I'm also appropriately pessimistic about the behavior of unknown /
> unanalyzed callers and callees.
Yup.>
> Maybe what I'll try is to add a flag to each vertex in the may-point-to
> graph, indicating whether or not the vertex's memory might hold
additional,
> poorly understood pointers.  Then I can let an appropriate amount of hell
> break loose in the analysis, if a piece of memory with that flag is used in
> various ways.
This is essentially what we do in gcc.
It is based on http://www.cs.ucsb.edu/~benh/research/papers/hardekopf07ant.pdf
as a solver, and some earlier papers that deal with field-sensitivity,
to build a field sensitive set of constraints for the program.

When we see bad things happen, we propagate various flags to say what
bad thing happened.
We used to explicitly track what the "set of variables that has become
unknown" are, but it grows too large to be sane for large programs.

Also note that to speed propagation, we prioritize propagation of that flag.

IE we propagate the various "unknown/points-to-anything/etc" flags as
fast as possible, to the exclusion of discovering other points-to
sets.

This is because once something points-to anything, it doesn't matter
what else it points to :)>
> That way, if over time I can make the algorithm better at detecting and
> making sense of code which generates new pointer values, I can just
> gradually reduce the cases where I need to set that flag.
>
>
>>
>> > I did look at the LLVM IR for calling a virtual function in C++,
since
>> > you
>> > mentioned that as an example earlier.  From manual inspection, I
thought
>> > I
>> > could spot the value flow of the virtual function pointer from
where the
>> > function was defined, into the vtable constant for that class, and
then
>> > into
>> > the class instance's vtable pointer.
>> This depends on the frontend generating the llvm IR :)
>
>
> Touche.

Christian Convey

2015-Jun-15 20:46 UTC

head link

[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?

On Mon, Jun 15, 2015 at 3:29 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:
> On Mon, Jun 15, 2015 at 10:33 AM, Christian Convey
> <christian.convey at gmail.com> wrote:
> > On Mon, Jun 15, 2015 at 11:02 AM, Daniel Berlin <dberlin at
dberlin.org>
> wrote:
> Which paper are you using?
>
I'm mostly going from Robert Wilson's 1997 phd thesis, although I'm
pretty
sure I've seen a lot of the same ideas elsewhere as well.

> IE we propagate the various "unknown/points-to-anything/etc"
flags as
>fast as possible, to the exclusion of discovering other
points-to> sets.
>
> This is because once something points-to anything, it doesn't matter
> what else it points to :)
>
Nice to know that the idea has been vetted in at least one AA
implementation.  Thanks for the info.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150615/c11d3d88/attachment.html>

llvm dev - Jun 2015 - [LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?

[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?

[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?

[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?