thr3ads.net - llvm dev - [LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results? [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Christian Convey

2015-Jun-15 21:20 UTC

[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?

On Mon, Jun 15, 2015 at 4:54 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:
> > I'm mostly going from Robert Wilson's 1997 phd thesis,
although I'm
> pretty
> > sure I've seen a lot of the same ideas elsewhere as well.
>
> Yes, using summary/transfer functions has been tried a lot.
> Note: the numbers in most of these phd thesis do *not* get born out in
> practice.
>
> See, e.g, http://www.cse.unsw.edu.au/~ysui/papers/spe14.pdf
>
> vs
>
> http://impact.crhc.illinois.edu/shared/report/phd-thesis-erik-nystrom.pdf
>
> vs
> the wilson thesis.
>
Cool, I'll give those papers a read, thanks.

> In particular, i expect the wilson algorithm is going to fall over
> very badly on anything of even medium size.
>
Interesting.  Do you mean "fall over" in terms of running time, or
precision, or something else?

> Nowadays, the thing of the day seems to be using cfl reachability
> based formulations and doing context-sensitivity on demand.
>
Out of curiosity, what's the appeal (aside from academic novelty) of the
CFL approaches?
>From a personal perspective, I'm particularly interested in the maximumanalytic precision each AA approach can take, almost without regard to how
much time or memory the computation takes to run.  So one thing I found
appealing about Wilson's thesis was his "location set" approach
for
providing field-sensitivity.

I also liked Alain Deutsch's 1994 paper "Interprocedural may-alias
analysis
for pointers: Beyond k-limiting" for the same reason.  But I didn't
take a
crack at implementing *that* because based on the paper's complexity, I
thought I'd go clinically insane trying to turn it into code :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150615/f01e8377/attachment.html>

Daniel Berlin

2015-Jun-15 22:03 UTC

head link

[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?

On Mon, Jun 15, 2015 at 2:20 PM, Christian Convey
<christian.convey at gmail.com> wrote:> On Mon, Jun 15, 2015 at 4:54 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>>
>> > I'm mostly going from Robert Wilson's 1997 phd thesis,
although I'm
>> > pretty
>> > sure I've seen a lot of the same ideas elsewhere as well.
>>
>> Yes, using summary/transfer functions has been tried a lot.
>> Note: the numbers in most of these phd thesis do *not* get born out in
>> practice.
>>
>> See, e.g, http://www.cse.unsw.edu.au/~ysui/papers/spe14.pdf
>>
>> vs
>>
>>
http://impact.crhc.illinois.edu/shared/report/phd-thesis-erik-nystrom.pdf
>>
>> vs
>> the wilson thesis.
>
>
> Cool, I'll give those papers a read, thanks.
>
>>
>> In particular, i expect the wilson algorithm is going to fall over
>> very badly on anything of even medium size.
>
>
> Interesting.  Do you mean "fall over" in terms of running time,
or
> precision, or something else?
Running time.  Wilson's approach took 16 seconds for 4500 lines of code.
Let's assume his implementation actually did things right. Most
research impls do things like "ignore effects of external function
calls", which significantly impacts runtime in practice.

Let's further assume that scaling of computer speed from 1995 to now
eats your entire exponential factor, making the algorithm in practice
linear (This is not even close to true).

His algorithm, on a million lines of code, would take an hour to run.
Which would be pretty unacceptable.

In practice, i think you'll find it probably never finishes :)


>
>>
>> Nowadays, the thing of the day seems to be using cfl reachability
>> based formulations and doing context-sensitivity on demand.
>
>
> Out of curiosity, what's the appeal (aside from academic novelty) of
the CFL
> approaches?
They are demand driven and easy to add context sensitivity to.
At the same time, on an all-pairs basis, they are competitive with
constraint based andersen's solvers.

You can't really do better than this, because it means you pay pretty
close to the minimal  cost for the stuff you query.
>
> From a personal perspective, I'm particularly interested in the maximum
> analytic precision each AA approach can take, almost without regard to how
> much time or memory the computation takes to run.
I'm wildly curious why.
> So one thing I found
> appealing about Wilson's thesis was his "location set"
approach for
> providing field-sensitivity.
I never saw this paper (it was too old for where i started), but GCC
uses something similar.

We create variables for each field of a given variable.
They are linked in a linked list with an offset and size (if known) to
the variables that contain them.

Constraints contain stride info, and intersection is then used to
compute what sub-variables a constraint touches during solving.
It is a variant of this:
http://homepages.mcs.vuw.ac.nz/~djp/files/paste04.ps

Note a few things:

Some programs have so many sub-variables you will run out of memory.
This is true in the location set approach as well.
GCC only created variables for possibly-used field ranges too.
I can point you at GCC bugs offline if you are interested

>
> I also liked Alain Deutsch's 1994 paper "Interprocedural may-alias
analysis
> for pointers: Beyond k-limiting" for the same reason.  But I
didn't take a
> crack at implementing *that* because based on the paper's complexity, I
> thought I'd go clinically insane trying to turn it into code :)

Christian Convey

2015-Jun-15 23:00 UTC

head link

[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?

On Mon, Jun 15, 2015 at 6:03 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:
> > From a personal perspective, I'm particularly interested in the
maximum
> > analytic precision each AA approach can take, almost without regard to
> how
> > much time or memory the computation takes to run.
>
> I'm wildly curious why.
>
One reason is that I'm simply curious about the trade-off space between AA
precision and AA running-time.  Since I don't make compilers for a living,
I have the luxury of staring at corners of the design space which would be
idiotic to actually include in LLVM :)

Another reason is that in past work, I've sometimes worked on code where
I'd gladly accept a 10-day compilation time, if it bought me an extra 10%
in performance.  So I sometimes wonder what's involved in making such
optimization passes available, even though people wouldn't want to use them
on a day-to-day basis.

Constraints contain stride info, and intersection is then used
to> compute what sub-variables a constraint touches during solving.
> It is a variant of this:
> http://homepages.mcs.vuw.ac.nz/~djp/files/paste04.ps

Thanks, I'll try to give it a read.

> Some programs have so many sub-variables you will run out of memory.
> This is true in the location set approach as well.
>
I'm surprised you ran into that kind of trouble with memory.  Was that back
in 32-bit days?

> GCC only created variables for possibly-used field ranges too.
> I can point you at GCC bugs offline if you are interested
>
Yeah, if you don't mind I'd be grateful for the links.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150615/a3168783/attachment.html>

llvm dev - Jun 2015 - [LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?

[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?

[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?

[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?