thr3ads.net - llvm dev - [llvm-dev] Existing studies on the benefits of pointer analysis [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Jia Chen via llvm-dev

2016-Mar-21 17:00 UTC

[llvm-dev] Existing studies on the benefits of pointer analysis

Hi Daniel,

On 03/21/2016 11:05 AM, Daniel Berlin wrote:>
>
> On Tue, Mar 15, 2016 at 1:37 PM, Jia Chen via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     Dear llvm devs,
>
>     tl;dr: What prevents llvm from switching to a fancier pointer
>     analysis?
>
>
> Nothing.
>
>
>     Currently, there exists a variety of general-purpose alias
>     analyses in the LLVM codebase: basic-aa, globalsmodref-aa, tbaa,
>     scev-aa, and cfl-aa. However, only the first three are actually
>     turned on when invoking clang with -O2 or -O3 (please correct me
>     if I'm wrong about this).
>
>
> This is correct.
> Eventually, i hope george will have time to get back to CFL-AA and 
> turn it on by default.
>
>
>     If one looks at existing research literatures, there are even more
>     algorithm to consider for doing pointer analysis. Some are
>     field-sensitive, some are field-based, some are flow-sensitive,
>     some are context-sensitive. Even for flow-insensitive ones, they
>     could also be inclusion-style (-andersen-aa) and equality-style
>     (-steens-aa and -ds-aa). Those algorithms are often backed up by
>     rich theoretical framework as well as preliminary evaluations
>     which demonstrate their superior precision and/or performance.
>
>
> CFL-AA is a middle ground between steens and anders, can be easily 
> made field and context sensitive, etc.
>
>
>     Given such an abundance choices of pointer analyses that seem to
>     be much better in the research land, why does real-world compiler
>     infrastructures like llvm still rely on those three simple (and
>     ad-hoc) ones to perform IR optimization?
>
>
> Time and energy.
>
>     Based on my understanding (and again please correct me if I am wrong):
>
>     (1) The minor reason: those "better" algorithms are very hard
to
>     implement in a robust way and nobody seems to be interested in
>     trying to write and maintain them.
>
>
> This is false.  Heck, at the time i implemented it in GCC, 
> field-sensitive andersen's analysis was unknown in production 
> compilers.  That's why i'm thanked in all the papers - i did the 
> engineering work to make it fast and reliable.
>
>     (2) The major reason: it's not clear whether those
"better"
>     algorithms are actually better for llvm. More precise pointer
>     analyses tend to slow down compile time a lot while contributing
>     too little to the optimization passes that use them. The benefit
>     one gets from a more precise analysis may not justify the
>     compile-time or the maintenance cost.
>
>
>
> CFL-AA is probably the right trade-off here. You can stop at any time 
> and have correct answers, you can be as lazy as you like.
> etc.
Regarding CFL-AA: in my understanding, cfl-aa does not introduce a new 
precision tradeoff. It is merely a demand-driven way of implementing 
existing analyses, by extending those algorithms to track additional 
"pointed-to-by" information. Laziness may help with the running time
of
the cfl analysis when only partial points-to info is needed, but if the 
client wants to do a whole-program analysis and require whole-program 
points-to info (which is usually true for optimizing compilers since 
they will eventually examine and touch every piece of the codes given to 
it), should cfl-aa be no different than traditional whatever-sensitive 
pointer analysis?
>
> The reality is i think you overlook the realistic answer:
>
> 3. Nobody has had time or energy to fix up CFL-AA or SCEV-AA. They 
> spend their time on lower-hanging fruit until that lower hanging fruit 
> is gone.
>
> IE For the moment, CFL-AA and SCEV-AA and ... are not the thing 
> holding llvm back.
>I'd love to hear some examples of "lower-hanging fruit" in LLVM, 
especially in the area of middle-end analyses and optimizations. I 
thought LLVM is mature enough that any obvious chances of improvement in 
analyses and optimization have already been taken, no?>
>
>     So my question here is: what kind(s) of precision really justify
>     the cost and what kinds do not?
>
>
> Depends entirely on your applications.
>
>     Has anybody done any study in the past to evaluate what kinds of
>     features in pointer analyses will benefit what kinds of
>     optimization passes?
>
> Yes.
> Chris did many years ago, and i've done one more recently.
Great! Are they published somewhere? Can the data be shared
somehow?>
>     Could there potentially be more improvement on pointer analysis
>     precision without adding too much compile-time/maintenance cost?
>
> Yes.
>
>     Has the precision/performance tradeoffs got fully explored before?
>
> Yes
>
>
>     Any pointers will be much appreciated. No pun intended :)
>
>     PS1: To be more concrete, what I am looking for is not some
>     black-box information like "we switched from basic-aa to cfl-aa
>     and observed 1% improvement at runtime". I believe white-box
>     studies such as "the licm pass failed to hoist x instructions
>     because -tbaa is not flow sensitive" are much more interesting for
>     understanding the problem here.
>
>
> White-box studies are very application specific, and often very pass 
> specific.
And I understand that. My goal is to look for commonalities among passes 
and applications. However, if the existing studies you mentioned above 
are extensive and conclusive enough to show that the aas we have today 
have fully exploited almost all such commonalities, then it's probably a 
better idea for me to find something else to work on. But again, I'd 
like to see the data first.>
>
>     PS2: If no such evaluation exists in the past, I'd happy to do
>     that myself and report back my findings if anyone here is interested.
>
> I don't think any of the world is set up to make that valuable.
>
> Nothing takes advantage of context sensitivity, flow sensitivity, etc.I agree that nothing takes advantage of context sensitivity. But I would 
argue against flow sensitivity, field sensitivity, heap model and 
external function models. Flow sensitivity is helpful when the 
optimization pass itself is flow-sensitive (e.g. adce, gvn), and field 
sensitivity is helpful when a struct contains multiple pointers. Heap 
sensitivity is basically what motivates Chris Lattner's PLDI'07 paper, 
and external function models are helpful since without them the analysis 
has to be extremely conservative and concludes everything that external 
functions touch all may-alias each other.

-- 
Best Regards,

--
Jia Chen
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160321/3b974ef9/attachment.html>

Daniel Berlin via llvm-dev

2016-Mar-21 17:17 UTC

head link

[llvm-dev] Existing studies on the benefits of pointer analysis

On Mon, Mar 21, 2016, 10:00 AM Jia Chen <jchen at cs.utexas.edu> wrote:
> Hi Daniel,
>
>
> On 03/21/2016 11:05 AM, Daniel Berlin wrote:
>
>
>
> On Tue, Mar 15, 2016 at 1:37 PM, Jia Chen via llvm-dev <
> <llvm-dev at lists.llvm.org>llvm-dev at lists.llvm.org> wrote:
>
>> Dear llvm devs,
>>
>> tl;dr: What prevents llvm from switching to a fancier pointer analysis?
>>
>
> Nothing.
>
>
>>
>> Currently, there exists a variety of general-purpose alias analyses in
>> the LLVM codebase: basic-aa, globalsmodref-aa, tbaa, scev-aa, and
cfl-aa.
>> However, only the first three are actually turned on when invoking
clang
>> with -O2 or -O3 (please correct me if I'm wrong about this).
>>
>
> This is correct.
> Eventually, i hope george will have time to get back to CFL-AA and turn it
> on by default.
>
>
>>
>> If one looks at existing research literatures, there are even more
>> algorithm to consider for doing pointer analysis. Some are
field-sensitive,
>> some are field-based, some are flow-sensitive, some are
context-sensitive.
>> Even for flow-insensitive ones, they could also be inclusion-style
>> (-andersen-aa) and equality-style (-steens-aa and -ds-aa). Those
algorithms
>> are often backed up by rich theoretical framework as well as
preliminary
>> evaluations which demonstrate their superior precision and/or
performance.
>>
>
> CFL-AA is a middle ground between steens and anders, can be easily made
> field and context sensitive, etc.
>
>
>>
>> Given such an abundance choices of pointer analyses that seem to be
much
>> better in the research land, why does real-world compiler
infrastructures
>> like llvm still rely on those three simple (and ad-hoc) ones to perform
IR
>> optimization?
>>
>
> Time and energy.
>
>
>> Based on my understanding (and again please correct me if I am wrong):
>>
>> (1) The minor reason: those "better" algorithms are very hard
to
>> implement in a robust way and nobody seems to be interested in trying
to
>> write and maintain them.
>>
>
> This is false.  Heck, at the time i implemented it in GCC, field-sensitive
> andersen's analysis was unknown in production compilers.  That's
why i'm
> thanked in all the papers - i did the engineering work to make it fast and
> reliable.
>
>
>
>> (2) The major reason: it's not clear whether those
"better" algorithms
>> are actually better for llvm. More precise pointer analyses tend to
slow
>> down compile time a lot while contributing too little to the
optimization
>> passes that use them. The benefit one gets from a more precise analysis
may
>> not justify the compile-time or the maintenance cost.
>>
>
>
> CFL-AA is probably the right trade-off here. You can stop at any time and
> have correct answers, you can be as lazy as you like.
> etc.
>
>
> Regarding CFL-AA: in my understanding, cfl-aa does not introduce a new
> precision tradeoff.
>
You can make it do what you want much easier than existing frameworks in my
experience.

It is merely a demand-driven way of implementing existing analyses,
by> extending those algorithms to track additional "pointed-to-by"
information.
> Laziness may help with the running time of the cfl analysis when only
> partial points-to info is needed, but if the client wants to do a
> whole-program analysis and require whole-program points-to info (which is
> usually true for optimizing compilers since they will eventually examine
> and touch every piece of the codes given to it), should cfl-aa be no
> different than traditional whatever-sensitive pointer analysis?
>
CFL, at least when I ran the numbers, was faster at all pairs than existing
analysis.
>
>
>
> The reality is i think you overlook the realistic answer:
>
> 3. Nobody has had time or energy to fix up CFL-AA or SCEV-AA. They spend
> their time on lower-hanging fruit until that lower hanging fruit is gone.
>
> IE For the moment, CFL-AA and SCEV-AA and ... are not the thing holding
> llvm back.
>
> I'd love to hear some examples of "lower-hanging fruit" in
LLVM,
> especially in the area of middle-end analyses and optimizations. I thought
> LLVM is mature enough that any obvious chances of improvement in analyses
> and optimization have already been taken, no?
>
No.
For example, gvn and pre are fairly simple implementations that miss
obvious optimizations.
>
>
>
>>
>> So my question here is: what kind(s) of precision really justify the
cost
>> and what kinds do not?
>>
>
> Depends entirely on your applications.
>
>
>> Has anybody done any study in the past to evaluate what kinds of
features
>> in pointer analyses will benefit what kinds of optimization passes?
>>
> Yes.
> Chris did many years ago, and i've done one more recently.
>
>
> Great! Are they published somewhere? Can the data be shared somehow?
>
No, and sadly, no
>
>
>> Could there potentially be more improvement on pointer analysis
precision
>> without adding too much compile-time/maintenance cost?
>>
> Yes.
>
> Has the precision/performance tradeoffs got fully explored before?
>>
> Yes
>
>>
>> Any pointers will be much appreciated. No pun intended :)
>>
>> PS1: To be more concrete, what I am looking for is not some black-box
>> information like "we switched from basic-aa to cfl-aa and observed
1%
>> improvement at runtime". I believe white-box studies such as
"the licm pass
>> failed to hoist x instructions because -tbaa is not flow
sensitive" are
>> much more interesting for understanding the problem here.
>>
>
> White-box studies are very application specific, and often very pass
> specific.
>
>
> And I understand that. My goal is to look for commonalities among passes
> and applications.
>
This generally just discovers things we already know, which is that certain
passes have deficiencies.

However, if the existing studies you mentioned above are extensive
and> conclusive enough to show that the aas we have today have fully exploited
> almost all such commonalities, then it's probably a better idea for me
to
> find something else to work on. But again, I'd like to see the data
first.
>
>
>
>>
>> PS2: If no such evaluation exists in the past, I'd happy to do that
>> myself and report back my findings if anyone here is interested.
>>
> I don't think any of the world is set up to make that valuable.
>
> Nothing takes advantage of context sensitivity, flow sensitivity, etc.
>
> I agree that nothing takes advantage of context sensitivity. But I would
> argue against flow sensitivity, field sensitivity, heap model and external
> function models
>
I'm talking about infrastructure wise, nothing in llvm can take advantage
because the APIs don't exist.

. Flow sensitivity is helpful when the optimization pass itself
is> flow-sensitive (e.g. adce, gvn),
>
No api exists that they could use right now for this, and you'd have to
change things to understand answers are not valid over the entire function.

and field sensitivity is helpful when a struct contains multiple
pointers.> Heap sensitivity is basically what motivates Chris Lattner's
PLDI'07 paper,
> and external function models are helpful since without them the analysis
> has to be extremely conservative and concludes everything that external
> functions touch all may-alias each other.
>
I don't disagree, this is the one to two years of work I said would be
needed
>
>
>
> --
> Best Regards,
>
> --
> Jia Chen
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160321/62fce41e/attachment-0001.html>

Renato Golin via llvm-dev

2016-Mar-21 17:40 UTC

head link

[llvm-dev] Existing studies on the benefits of pointer analysis

On 21 March 2016 at 17:17, Daniel Berlin via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>>> Has anybody done any study in the past to evaluate what kinds of
features
>>> in pointer analyses will benefit what kinds of optimization passes?
>>
>> Yes.
>> Chris did many years ago, and i've done one more recently.
>>
>> Great! Are they published somewhere? Can the data be shared somehow?
>
> No, and sadly, no
This sounds like a good GSOC project.

Having the evaluation done is great, but if you can't share, than
that's pretty much useless to the community at large.

Even if a student does a less thorough evaluation, having something
out is better than having nothing, and with your expertise, I'm sure
we can get such a student doing some pretty capable analysis with
little resources.

cheers,
--renato

Jia Chen via llvm-dev

2016-Mar-21 18:34 UTC

head link

[llvm-dev] Existing studies on the benefits of pointer analysis

>     It is merely a demand-driven way of implementing existing
>     analyses, by extending those algorithms to track additional
>     "pointed-to-by" information. Laziness may help with the
running
>     time of the cfl analysis when only partial points-to info is
>     needed, but if the client wants to do a whole-program analysis and
>     require whole-program points-to info (which is usually true for
>     optimizing compilers since they will eventually examine and touch
>     every piece of the codes given to it), should cfl-aa be no
>     different than traditional whatever-sensitive pointer analysis?
>
>
> CFL, at least when I ran the numbers, was faster at all pairs than 
> existing analysis.
There could be many reasons for it, e.g. better implementations. Again, 
my point is that cfl-aa is more of an implementation strategy than a 
fundamentally superior approach.>
>
>     Great! Are they published somewhere? Can the data be shared somehow?
>
>
> No, and sadly, no
:(> I'm talking about infrastructure wise, nothing in llvm can take 
> advantage because the APIs don't exist.
>
>     . Flow sensitivity is helpful when the optimization pass itself is
>     flow-sensitive (e.g. adce, gvn),
>
>
> No api exists that they could use right now for this, and you'd have 
> to change things to understand answers are not valid over the entire 
> function.
I see what you are saying now. Sometimes flow/ctx-insensitive alias 
queries can benefit from a flow/ctx-sensitive analysis, yet my intuition 
is that such cases are likely to be rare. I could go ahead and modify 
those passes myself to carry on the study, but that option probably 
won't be too interesting to the community.

Thank you very much for pointing that out to me.

-- 
Best Regards,

--
Jia Chen
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160321/8e70c55f/attachment.html>

Chris Lattner via llvm-dev

2016-Mar-26 01:08 UTC

head link

[llvm-dev] Existing studies on the benefits of pointer analysis

On Mar 21, 2016, at 10:00 AM, Jia Chen via llvm-dev <llvm-dev at
lists.llvm.org> wrote:>> 
>> So my question here is: what kind(s) of precision really justify the
cost and what kinds do not?
>> 
>> Depends entirely on your applications.
>>  
>> Has anybody done any study in the past to evaluate what kinds of
features in pointer analyses will benefit what kinds of optimization passes?
>> Yes.
>> Chris did many years ago, and i've done one more recently.
> 
> Great! Are they published somewhere? Can the data be shared somehow?

My results are published here:
http://llvm.org/pubs/2005-05-04-LattnerPHDThesis.html
<http://llvm.org/pubs/2005-05-04-LattnerPHDThesis.html>

Sadly, they are over a decade out of date, and some things have changed since
then :-)
>> PS2: If no such evaluation exists in the past, I'd happy to do that
myself and report back my findings if anyone here is interested.
>> I don't think any of the world is set up to make that valuable.
>> 
>> Nothing takes advantage of context sensitivity, flow sensitivity, etc.
> I agree that nothing takes advantage of context sensitivity. But I would
argue against flow sensitivity, field sensitivity, heap model and external
function models. Flow sensitivity is helpful when the optimization pass itself
is flow-sensitive (e.g. adce, gvn), and field sensitivity is helpful when a
struct contains multiple pointers. Heap sensitivity is basically what motivates
Chris Lattner's PLDI'07 paper, and external function models are helpful
since without them the analysis has to be extremely conservative and concludes
everything that external functions touch all may-alias each other.
I’m still a big fan of context sensitive, flow insensitive, unification based
models.  Contrary to your claim, context sensitivity *is* useful for mod-ref
analysis, e.g. “can I hoist a load across this call”?  Context sensitivity
improves the precision of the mod/ref set of the call.

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160325/02c9fd60/attachment.html>

Daniel Berlin via llvm-dev

2016-Mar-26 01:26 UTC

head link

[llvm-dev] Existing studies on the benefits of pointer analysis

>
>
>
> I’m still a big fan of context sensitive, flow insensitive, unification
> based models.
>
CFL can emulate this in the same time bound.

>  Contrary to your claim, context sensitivity *is* useful for mod-ref
> analysis, e.g. “can I hoist a load across this call”?  Context sensitivity
> improves the precision of the mod/ref set of the call.
>
>
-Chris>
Yeah.
It depends entirely on your goal. In reality, often what you really want is
something to say "hey, i've got this pointer over here, and i really
want
to hoist  it up here. Do something, tell me if that is possible".

(This is actually why i'm a fan of CFL-AA. You can essentially make it as
expensive or not expensive as you want, and it still does really well in
pracftice in time)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160325/a8173694/attachment.html>

Jia Chen via llvm-dev

2016-Mar-26 04:04 UTC

head link

[llvm-dev] Existing studies on the benefits of pointer analysis

On 03/25/2016 08:08 PM, Chris Lattner wrote:> I’m still a big fan of context sensitive, flow insensitive, 
> unification based models.
Interestingly I find the unification approach quite unsatisfactory 
sometime. What happens there is pointers with the same "depth" are too
often clobbered together unless they are really unrelated to each other.
> Contrary to your claim, context sensitivity *is* useful for mod-ref 
> analysis, e.g. “can I hoist a load across this call”?  Context 
> sensitivity improves the precision of the mod/ref set of the call.
>I'm not sure about that. How often does mod-ref information change 
across callsites? Isn't a good context-insensitive function summary good 
enough?

-Jia

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160325/2473e70e/attachment.html>

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Mar 2016 - Existing studies on the benefits of pointer analysis

[llvm-dev] Existing studies on the benefits of pointer analysis

[llvm-dev] Existing studies on the benefits of pointer analysis

[llvm-dev] Existing studies on the benefits of pointer analysis

[llvm-dev] Existing studies on the benefits of pointer analysis

[llvm-dev] Existing studies on the benefits of pointer analysis

[llvm-dev] Existing studies on the benefits of pointer analysis

[llvm-dev] Existing studies on the benefits of pointer analysis

Possibly Parallel Threads