thr3ads.net - llvm dev - [llvm-dev] Existing studies on the benefits of pointer analysis [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Jia Chen via llvm-dev

2016-Mar-15 20:37 UTC

[llvm-dev] Existing studies on the benefits of pointer analysis

Dear llvm devs,

tl;dr: What prevents llvm from switching to a fancier pointer analysis?

Currently, there exists a variety of general-purpose alias analyses in 
the LLVM codebase: basic-aa, globalsmodref-aa, tbaa, scev-aa, and 
cfl-aa. However, only the first three are actually turned on when 
invoking clang with -O2 or -O3 (please correct me if I'm wrong about this).

If one looks at existing research literatures, there are even more 
algorithm to consider for doing pointer analysis. Some are 
field-sensitive, some are field-based, some are flow-sensitive, some are 
context-sensitive. Even for flow-insensitive ones, they could also be 
inclusion-style (-andersen-aa) and equality-style (-steens-aa and 
-ds-aa). Those algorithms are often backed up by rich theoretical 
framework as well as preliminary evaluations which demonstrate their 
superior precision and/or performance.

Given such an abundance choices of pointer analyses that seem to be much 
better in the research land, why does real-world compiler 
infrastructures like llvm still rely on those three simple (and ad-hoc) 
ones to perform IR optimization? Based on my understanding (and again 
please correct me if I am wrong):

(1) The minor reason: those "better" algorithms are very hard to 
implement in a robust way and nobody seems to be interested in trying to 
write and maintain them.
(2) The major reason: it's not clear whether those "better"
algorithms
are actually better for llvm. More precise pointer analyses tend to slow 
down compile time a lot while contributing too little to the 
optimization passes that use them. The benefit one gets from a more 
precise analysis may not justify the compile-time or the maintenance cost.

So my question here is: what kind(s) of precision really justify the 
cost and what kinds do not? Has anybody done any study in the past to 
evaluate what kinds of features in pointer analyses will benefit what 
kinds of optimization passes? Could there potentially be more 
improvement on pointer analysis precision without adding too much 
compile-time/maintenance cost? Has the precision/performance tradeoffs 
got fully explored before?

Any pointers will be much appreciated. No pun intended :)

PS1: To be more concrete, what I am looking for is not some black-box 
information like "we switched from basic-aa to cfl-aa and observed 1% 
improvement at runtime". I believe white-box studies such as "the licm
pass failed to hoist x instructions because -tbaa is not flow sensitive" 
are much more interesting for understanding the problem here.

PS2: If no such evaluation exists in the past, I'd happy to do that 
myself and report back my findings if anyone here is interested.

-- 
Best Regards,

--
Jia Chen
Department of Computer Science
University of Texas at Austin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/7f9a6a2d/attachment.html>

Christian Convey via llvm-dev

2016-Mar-21 14:32 UTC

head link

[llvm-dev] Existing studies on the benefits of pointer analysis

Hi Jia,

If one looks at existing research literatures, there are even
more> algorithm to consider for doing pointer analysis.

For at least some published AA algorithms, there may be some uncertainty
about software patents and/or copyright.

At one point I was interested in the status of this AA implementation
<https://dl.acm.org/citation.cfm?id=2466483> by Lian Li et al. IIRC, when
I contacted Lian to ask if there was any chance of getting it into LLVM,
IIRC she said that her employer wouldn't promise to relinquish all possible
claims it had on that algorithm's IP. So unfortunately, at least in the
U.S., an algorithm being published in an academic journal doesn't remove
all legal risk associated with using it.

Also, speaking from my own experience, even when an AA algorithm seems to
be described in great detail in some piece of literature (e.g., a phd
thesis), there can still be a lot of details which are glossed over, or
which seem clear when reading the document but which get a lot more
confusing when one tries to actually implement it.

That can make implementing such an algorithm take far longer than one would
expect based on just reading the document. (It's also an argument in favor
of requiring academic papers which describe the behavior of a software
implementation to actually include a working copy of the source code, IMHO.)

So my question here is: what kind(s) of precision really justify the
cost> and what kinds do not? Has anybody done any study in the past to evaluate
> what kinds of features in pointer analyses will benefit what kinds of
> optimization passes?
>
At one point I discussed this with Daniel Berlin, but I'm having trouble
find a record of the conversation. IIRC, he says that he once threw a huge
amount of computing power at doing a *full* context-sensitive AA on some
software, and the speedup he observed in the resulting program as
underwhelming (10-15%?).

I can't remember if that was with GCC or LLVM. That result is a data
point, although it may not say much about how much additional speedup could
be realized if the algorithms which use the AA results were themselves
adapted to capitalize on fully context-sensitive, flow-sensitive,
hula-dancer-on-the-dashboard AA results.

Cheers,
Christian
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160321/0f683fa7/attachment-0001.html>

Jia Chen via llvm-dev

2016-Mar-21 15:56 UTC

head link

[llvm-dev] Existing studies on the benefits of pointer analysis

Hi Christian,

Thank you so much for the reply! Please see my comments inline.

On 03/21/2016 09:32 AM, Christian Convey wrote:> Hi Jia,
>
>     If one looks at existing research literatures, there are even more
>     algorithm to consider for doing pointer analysis.
>
>
> For at least some published AA algorithms, there may be some 
> uncertainty about software patents and/or copyright.
>
> At one point I was interested in the status of this AA implementation 
> <https://dl.acm.org/citation.cfm?id=2466483> by Lian Li et al.  IIRC,
> when I contacted Lian to ask if there was any chance of getting it 
> into LLVM, IIRC she said that her employer wouldn't promise to 
> relinquish all possible claims it had on that algorithm's IP.  So 
> unfortunately, at least in the U.S., an algorithm being published in 
> an academic journal doesn't remove all legal risk associated with 
> using it.
This is news to me. Thanks for sharing it.>
> Also, speaking from my own experience, even when an AA algorithm seems 
> to be described in great detail in some piece of literature (e.g., a 
> phd thesis), there can still be a lot of details which are glossed 
> over, or which seem clear when reading the document but which get a 
> lot more confusing when one tries to actually implement it.
>
> That can make implementing such an algorithm take far longer than one 
> would expect based on just reading the document.  (It's also an 
> argument in favor of requiring academic papers which describe the 
> behavior of a software implementation to actually include a working 
> copy of the source code, IMHO.)My personal experience also coincides. And even if the paper does come 
with an artifact or source codes, they are usually proof-of-concept 
implementations with lots of important real-world corner cases
ignored.>
>     So my question here is: what kind(s) of precision really justify
>     the cost and what kinds do not? Has anybody done any study in the
>     past to evaluate what kinds of features in pointer analyses will
>     benefit what kinds of optimization passes?
>
>
> At one point I discussed this with Daniel Berlin, but I'm having 
> trouble find a record of the conversation.  IIRC, he says that he once 
> threw a huge amount of computing power at doing a /full/ 
> context-sensitive AA on some software, and the speedup he observed in 
> the resulting program as underwhelming (10-15%?).I kind of expect that. As you mentioned later, most optimization passes 
work in a context-insensitive manner (i.e. they won't clone a function 
and optimize differently on different clones). Context sensitivity on 
the pointer analysis side is probably not going to help a lot if the 
client cannot fully capitalize on it. In the settings of compiler 
optimization, my guess is that flow sensitivity, field sensitivity, heap 
model and external library annotations are the four aspects that are 
likely to matter.

I did some preliminary experiments with licm on c programs over the last 
weekend. I chose licm because intuitively that's the optimization that 
could have the biggest performance impact. The result suggested that 
tbaa, cfl-aa, scev-aa and globals-aa yields very little additional 
benefits over basic-aa. Again, both the methodology and benchmark 
selection are very immature and the results need to be double-checked, 
but my hope is that by looking at how aa algorithms and their clients 
interact I may be able to get some hints on what kind of aa a compiler 
really wants.>
> I can't remember if that was with GCC or LLVM.  That result is a data 
> point, although it may not say much about how much additional speedup 
> could be realized if the algorithms which use the AA results were 
> themselves adapted to capitalize on fully context-sensitive, 
> flow-sensitive, hula-dancer-on-the-dashboard AA results.
>
> Cheers,
> Christian
>
-- 
Best Regards,

--
Jia Chen
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160321/6adf397c/attachment.html>

Daniel Berlin via llvm-dev

2016-Mar-21 16:05 UTC

head link

[llvm-dev] Existing studies on the benefits of pointer analysis

On Tue, Mar 15, 2016 at 1:37 PM, Jia Chen via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Dear llvm devs,
>
> tl;dr: What prevents llvm from switching to a fancier pointer analysis?
>
Nothing.

>
> Currently, there exists a variety of general-purpose alias analyses in the
> LLVM codebase: basic-aa, globalsmodref-aa, tbaa, scev-aa, and cfl-aa.
> However, only the first three are actually turned on when invoking clang
> with -O2 or -O3 (please correct me if I'm wrong about this).
>
This is correct.
Eventually, i hope george will have time to get back to CFL-AA and turn it
on by default.

>
> If one looks at existing research literatures, there are even more
> algorithm to consider for doing pointer analysis. Some are field-sensitive,
> some are field-based, some are flow-sensitive, some are context-sensitive.
> Even for flow-insensitive ones, they could also be inclusion-style
> (-andersen-aa) and equality-style (-steens-aa and -ds-aa). Those algorithms
> are often backed up by rich theoretical framework as well as preliminary
> evaluations which demonstrate their superior precision and/or performance.
>
CFL-AA is a middle ground between steens and anders, can be easily made
field and context sensitive, etc.

>
> Given such an abundance choices of pointer analyses that seem to be much
> better in the research land, why does real-world compiler infrastructures
> like llvm still rely on those three simple (and ad-hoc) ones to perform IR
> optimization?
>
Time and energy.

> Based on my understanding (and again please correct me if I am wrong):
>
> (1) The minor reason: those "better" algorithms are very hard to
implement
> in a robust way and nobody seems to be interested in trying to write and
> maintain them.
>
This is false.  Heck, at the time i implemented it in GCC, field-sensitive
andersen's analysis was unknown in production compilers.  That's why
i'm
thanked in all the papers - i did the engineering work to make it fast and
reliable.


> (2) The major reason: it's not clear whether those "better"
algorithms are
> actually better for llvm. More precise pointer analyses tend to slow down
> compile time a lot while contributing too little to the optimization passes
> that use them. The benefit one gets from a more precise analysis may not
> justify the compile-time or the maintenance cost.
>

CFL-AA is probably the right trade-off here. You can stop at any time and
have correct answers, you can be as lazy as you like.
etc.

The reality is i think you overlook the realistic answer:

3. Nobody has had time or energy to fix up CFL-AA or SCEV-AA. They spend
their time on lower-hanging fruit until that lower hanging fruit is gone.

IE For the moment, CFL-AA and SCEV-AA and ... are not the thing holding
llvm back.


>
> So my question here is: what kind(s) of precision really justify the cost
> and what kinds do not?
>
Depends entirely on your applications.

> Has anybody done any study in the past to evaluate what kinds of features
> in pointer analyses will benefit what kinds of optimization passes?
>Yes.
Chris did many years ago, and i've done one more recently.

> Could there potentially be more improvement on pointer analysis precision
> without adding too much compile-time/maintenance cost?
>Yes.

Has the precision/performance tradeoffs got fully explored
before?>Yes
>
> Any pointers will be much appreciated. No pun intended :)
>
> PS1: To be more concrete, what I am looking for is not some black-box
> information like "we switched from basic-aa to cfl-aa and observed 1%
> improvement at runtime". I believe white-box studies such as "the
licm pass
> failed to hoist x instructions because -tbaa is not flow sensitive"
are
> much more interesting for understanding the problem here.
>
White-box studies are very application specific, and often very pass
specific.

>
> PS2: If no such evaluation exists in the past, I'd happy to do that
myself
> and report back my findings if anyone here is interested.
>I don't think any of the world is set up to make that valuable.

Nothing takes advantage of context sensitivity, flow sensitivity, etc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160321/1dab2e33/attachment.html>

Daniel Berlin via llvm-dev

2016-Mar-21 16:13 UTC

head link

[llvm-dev] Existing studies on the benefits of pointer analysis

On Mon, Mar 21, 2016 at 7:32 AM, Christian Convey via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi Jia,
>
> If one looks at existing research literatures, there are even more
>> algorithm to consider for doing pointer analysis.
>
>
> For at least some published AA algorithms, there may be some uncertainty
> about software patents and/or copyright.
>
> At one point I was interested in the status of this AA implementation
> <https://dl.acm.org/citation.cfm?id=2466483> by Lian Li et al.  IIRC,
> when I contacted Lian to ask if there was any chance of getting it into
> LLVM, IIRC she said that her employer wouldn't promise to relinquish
all
> possible claims it had on that algorithm's IP.  So unfortunately, at
least
> in the U.S., an algorithm being published in an academic journal
doesn't
> remove all legal risk associated with using it.
>
I wouldn't worry about this part.  I'm a pointer analysis guy and an IP
lawyer.
I'm pretty careful about what algorithms we end up with in LLVM :)

>
> Also, speaking from my own experience, even when an AA algorithm seems to
> be described in great detail in some piece of literature (e.g., a phd
> thesis), there can still be a lot of details which are glossed over, or
> which seem clear when reading the document but which get a lot more
> confusing when one tries to actually implement it.
>

Yes, i had a blog post on this one, which was basically titled "most
pointer analysis research is bullshit".  People mostly do research
implementations, and ignore little things like "the effect of external
function calls" (or worse "stub all of them"), and yes,
implementing those
things significantly changes the time bounds.  Or they do things like tell
me that field-sensitivity slows nothing down because they are working on
java, where you can't take the address of fields :)

Over the years, you get a good eye for what will end up practical.

GCC's implementation of andersen's, which uses hardekopf's research
and
work, is *very* fast in both Intra and inter procedural mode, field
sensitive, and handles all issues.

Adding context sensitivity to it would be expensive, however.

> That can make implementing such an algorithm take far longer than one
> would expect based on just reading the document.  (It's also an
argument in
> favor of requiring academic papers which describe the behavior of a
> software implementation to actually include a working copy of the source
> code, IMHO.)
>
Yes.
This is one of the reasons's i always liked ben's research so much. He
published all code and benchmarks.

Note also that you a lot of them do have source code, you just have to look
really hard ;)

>
> So my question here is: what kind(s) of precision really justify the cost
>> and what kinds do not? Has anybody done any study in the past to
evaluate
>> what kinds of features in pointer analyses will benefit what kinds of
>> optimization passes?
>>
>
> At one point I discussed this with Daniel Berlin, but I'm having
trouble
> find a record of the conversation.  IIRC, he says that he once threw a huge
> amount of computing power at doing a *full* context-sensitive AA on some
> software, and the speedup he observed in the resulting program as
> underwhelming (10-15%?).
>Yes.  But see below.

>
> I can't remember if that was with GCC or LLVM.  That result is a data
> point, although it may not say much about how much additional speedup could
> be realized if the algorithms which use the AA results were themselves
> adapted to capitalize on fully context-sensitive, flow-sensitive,
> hula-dancer-on-the-dashboard AA results.
>
Note however that this is going to be true of any new AA algorithm.   You
have to have the infrastructure necessary to make use of it, you have to
tune optimizations to use the information well, etc.

Realistically, getting the infrastructure ready and tuning it is a year or
two of work, at least (for one person).

As i mentioned, at this point, there is still much lower hanging fruit.
When there isn't, i suspect we'll get back to AA.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160321/aa131944/attachment.html>

Jia Chen via llvm-dev

2016-Mar-21 17:00 UTC

head link

[llvm-dev] Existing studies on the benefits of pointer analysis

Hi Daniel,

On 03/21/2016 11:05 AM, Daniel Berlin wrote:>
>
> On Tue, Mar 15, 2016 at 1:37 PM, Jia Chen via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     Dear llvm devs,
>
>     tl;dr: What prevents llvm from switching to a fancier pointer
>     analysis?
>
>
> Nothing.
>
>
>     Currently, there exists a variety of general-purpose alias
>     analyses in the LLVM codebase: basic-aa, globalsmodref-aa, tbaa,
>     scev-aa, and cfl-aa. However, only the first three are actually
>     turned on when invoking clang with -O2 or -O3 (please correct me
>     if I'm wrong about this).
>
>
> This is correct.
> Eventually, i hope george will have time to get back to CFL-AA and 
> turn it on by default.
>
>
>     If one looks at existing research literatures, there are even more
>     algorithm to consider for doing pointer analysis. Some are
>     field-sensitive, some are field-based, some are flow-sensitive,
>     some are context-sensitive. Even for flow-insensitive ones, they
>     could also be inclusion-style (-andersen-aa) and equality-style
>     (-steens-aa and -ds-aa). Those algorithms are often backed up by
>     rich theoretical framework as well as preliminary evaluations
>     which demonstrate their superior precision and/or performance.
>
>
> CFL-AA is a middle ground between steens and anders, can be easily 
> made field and context sensitive, etc.
>
>
>     Given such an abundance choices of pointer analyses that seem to
>     be much better in the research land, why does real-world compiler
>     infrastructures like llvm still rely on those three simple (and
>     ad-hoc) ones to perform IR optimization?
>
>
> Time and energy.
>
>     Based on my understanding (and again please correct me if I am wrong):
>
>     (1) The minor reason: those "better" algorithms are very hard
to
>     implement in a robust way and nobody seems to be interested in
>     trying to write and maintain them.
>
>
> This is false.  Heck, at the time i implemented it in GCC, 
> field-sensitive andersen's analysis was unknown in production 
> compilers.  That's why i'm thanked in all the papers - i did the 
> engineering work to make it fast and reliable.
>
>     (2) The major reason: it's not clear whether those
"better"
>     algorithms are actually better for llvm. More precise pointer
>     analyses tend to slow down compile time a lot while contributing
>     too little to the optimization passes that use them. The benefit
>     one gets from a more precise analysis may not justify the
>     compile-time or the maintenance cost.
>
>
>
> CFL-AA is probably the right trade-off here. You can stop at any time 
> and have correct answers, you can be as lazy as you like.
> etc.
Regarding CFL-AA: in my understanding, cfl-aa does not introduce a new 
precision tradeoff. It is merely a demand-driven way of implementing 
existing analyses, by extending those algorithms to track additional 
"pointed-to-by" information. Laziness may help with the running time
of
the cfl analysis when only partial points-to info is needed, but if the 
client wants to do a whole-program analysis and require whole-program 
points-to info (which is usually true for optimizing compilers since 
they will eventually examine and touch every piece of the codes given to 
it), should cfl-aa be no different than traditional whatever-sensitive 
pointer analysis?
>
> The reality is i think you overlook the realistic answer:
>
> 3. Nobody has had time or energy to fix up CFL-AA or SCEV-AA. They 
> spend their time on lower-hanging fruit until that lower hanging fruit 
> is gone.
>
> IE For the moment, CFL-AA and SCEV-AA and ... are not the thing 
> holding llvm back.
>I'd love to hear some examples of "lower-hanging fruit" in LLVM, 
especially in the area of middle-end analyses and optimizations. I 
thought LLVM is mature enough that any obvious chances of improvement in 
analyses and optimization have already been taken, no?>
>
>     So my question here is: what kind(s) of precision really justify
>     the cost and what kinds do not?
>
>
> Depends entirely on your applications.
>
>     Has anybody done any study in the past to evaluate what kinds of
>     features in pointer analyses will benefit what kinds of
>     optimization passes?
>
> Yes.
> Chris did many years ago, and i've done one more recently.
Great! Are they published somewhere? Can the data be shared
somehow?>
>     Could there potentially be more improvement on pointer analysis
>     precision without adding too much compile-time/maintenance cost?
>
> Yes.
>
>     Has the precision/performance tradeoffs got fully explored before?
>
> Yes
>
>
>     Any pointers will be much appreciated. No pun intended :)
>
>     PS1: To be more concrete, what I am looking for is not some
>     black-box information like "we switched from basic-aa to cfl-aa
>     and observed 1% improvement at runtime". I believe white-box
>     studies such as "the licm pass failed to hoist x instructions
>     because -tbaa is not flow sensitive" are much more interesting for
>     understanding the problem here.
>
>
> White-box studies are very application specific, and often very pass 
> specific.
And I understand that. My goal is to look for commonalities among passes 
and applications. However, if the existing studies you mentioned above 
are extensive and conclusive enough to show that the aas we have today 
have fully exploited almost all such commonalities, then it's probably a 
better idea for me to find something else to work on. But again, I'd 
like to see the data first.>
>
>     PS2: If no such evaluation exists in the past, I'd happy to do
>     that myself and report back my findings if anyone here is interested.
>
> I don't think any of the world is set up to make that valuable.
>
> Nothing takes advantage of context sensitivity, flow sensitivity, etc.I agree that nothing takes advantage of context sensitivity. But I would 
argue against flow sensitivity, field sensitivity, heap model and 
external function models. Flow sensitivity is helpful when the 
optimization pass itself is flow-sensitive (e.g. adce, gvn), and field 
sensitivity is helpful when a struct contains multiple pointers. Heap 
sensitivity is basically what motivates Chris Lattner's PLDI'07 paper, 
and external function models are helpful since without them the analysis 
has to be extremely conservative and concludes everything that external 
functions touch all may-alias each other.

-- 
Best Regards,

--
Jia Chen
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160321/3b974ef9/attachment.html>

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Mar 2016 - Existing studies on the benefits of pointer analysis

[llvm-dev] Existing studies on the benefits of pointer analysis

[llvm-dev] Existing studies on the benefits of pointer analysis

[llvm-dev] Existing studies on the benefits of pointer analysis

[llvm-dev] Existing studies on the benefits of pointer analysis

[llvm-dev] Existing studies on the benefits of pointer analysis

[llvm-dev] Existing studies on the benefits of pointer analysis

Apparently Analagous Threads