thr3ads.net - llvm dev - [LLVMdev] Contribute a new precise pointer analysis to LLVM [Oct 2013]

If this information is useful, please help other people find it:
Share via:

lian li

2013-Oct-18 14:27 UTC

[LLVMdev] Contribute a new precise pointer analysis to LLVM

Hi Daniel,

I want to clarify that our analysis is not based on CFL-reachability.
We apply CFL-reachability to matching context information where the
exist from a function to a call-site must match
the entry from the corresponding call-site. The problem is a simple
balanced parentheses problem in CFL-reachability, and it can be
computed
efficiently.

The paper you mentioned is a very nice paper that proposed an
efficient algorithm to solve Dyck-CFL-reachability problem in
bidirectional graphs. However, precise pointer analysis cannot be
formulated as a Dyck-CFL-reachability problem. The alias analysis
presented in that paper is an over approximation. Hence the time bound
from our paper still applies.

Regards,
Lian





On Fri, Oct 18, 2013 at 4:39 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:> I notice you guys formulate your CFL reachability problem as a
> balanced parentheses problem.
> What algorithm do you use to solve it?
>
> Are you aware of recent work that comes up with linear time and n log
> n time algorithms to solve this class of problems:
> http://www.cse.cuhk.edu.hk/lyu/_media/paper/pldi2013.pdf
>
> In particular, the time bound from the paper:
>
> "However, if we need the precise pointer information for all
> variables, CFL-based
> pointer analysis is generally more expensive as it has O(L^3 N^3)
> complexity [29], where L is the size of the grammar and N is the
> size of the graph."
>
> should now been reduced to  O(n + m log m) time, assuming you are
> using a standard balanced parentheses formulation.
>
>
>
>
> On Thu, Oct 17, 2013 at 9:51 PM, lian li <lianli at gmail.com> wrote:
>> Hi Hal,
>>
>> Thanks for your interest.
>>
>> We tested with the following existing compiler optimizations in LLVM
>> with SPECINT2006 benchmarks:
>> -dse (dead store elimination),
>> -gvn (global value numbering),
>> -licm (loop invariant code motion),
>> -bb-vectorize (basic block vectorization),
>> -memcpyopt (memcpy optimization),
>> -sink (code sinking),
>> -loop-idom (recognize loop idioms),
>> -argpromotion (argument promotion).
>>
>> As far as I know, this list includes all optmizations in LLVM that
>> requires alias information. With our pointer analysis, for the three
>> optimizations "-dse, -gvn, -licm", we have observed
significant
>> improvements in terms of the number of optimizations being conducted.
>> Take licm as an example, with our analysis, the number of instructions
>> being hoisted out of loops more than doubled for some benchmarks. We
>> have not seen big changes in terms of runtime performance though.
>>
>> We measure compile time by first linking all modules of the same
>> benchmark together, then running the analysis against the linked
>> module. It finishes in less than 1 minute over applications such as
>> httpd and sendmail, both with more than 100,000 LOC. For some
>> benchmarks, it is time-consuming: the analysis time for GCC is close
>> to 45 minutes.  But if you run the analysis over individual files (as
>> in your normal compilation process), there is no observable overhead
>> in terms of compile time.
>>
>> Regards,
>> Lian
>>
>>
>>
>> On Fri, Oct 18, 2013 at 12:15 PM, Hal Finkel <hfinkel at anl.gov>
wrote:
>>> Hi Lian,
>>>
>>> I am certainly interested in seeing this; do you have performance
numbers (compile time)? Also, can you share more information about the promising
optimization results you mentioned?
>>>
>>> Thanks,
>>> Hal
>>>
>>> ----- Original Message -----
>>>> Hi All,
>>>>
>>>> This is Lian Li  from Oracle Labs in Brisbane Australia.
>>>>
>>>> We have developed a precise and highly efficient pointer
analysis
>>>> framework on top of LLVM,  The approach is flow, context, and
field
>>>> sensitive, details are described in the two papers below:
>>>>
>>>> "Boosting the performance of flow-sensitive points-to
analysis using
>>>> value flow" (in ESEC-FSE 2011), and
>>>> "Precise and scalable context-sensitive pointer analysis
via value
>>>> flow graph" (in ISMM 2013).
>>>>
>>>> The analysis was initially developed for the purpose of bug
checking,
>>>> and is now extended to support compiler optimizations. We have
tested
>>>> it with existing compiler optimizations in LLVM, and have seen
>>>> promising results.
>>>>
>>>> We are now considering to make this analysis available to the
LLVM
>>>> community, and contribute resources for future maintenance
needs,
>>>> provided that there is enough interest. We think that a new
precise
>>>> pointer analysis in LLVM can enable more new optimization and
>>>> analysis
>>>> techniques to be developed in LLVM.
>>>>
>>>> Any people interested in seeing a new precise pointer analysis
in
>>>> LLVM?
>>>>
>>>> Regards,
>>>> Lian Li
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>
>>> --
>>> Hal Finkel
>>> Assistant Computational Scientist
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>
>>
>>
>> --
>> // Copyright @ 2011 Authorized by ** LIAN LI **
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


-- 
// Copyright @ 2011 Authorized by ** LIAN LI **

Daniel Berlin

2013-Oct-18 15:41 UTC

head link

[LLVMdev] Contribute a new precise pointer analysis to LLVM

On Fri, Oct 18, 2013 at 7:27 AM, lian li <lianli at gmail.com> wrote:
> Hi Daniel,
>
> I want to clarify that our analysis is not based on CFL-reachability.
> We apply CFL-reachability to matching context information where the
> exist from a function to a call-site must match
> the entry from the corresponding call-site.

Yes, sorry, I pulled the wrong quote, it was late.

> The problem is a simple
> balanced parentheses problem in CFL-reachability, and it can be
> computed
> efficiently.
>
Yes, it can, however, it was not clear from the paper that you were :)
You cite a fairly old paper next to that that essentially gives a >O(1)
query time bound for this, but your
problem exactly fits into the bidirectional graph problem, unless i'm
missing something.

>
> The paper you mentioned is a very nice paper that proposed an
> efficient algorithm to solve Dyck-CFL-reachability problem in
> bidirectional graphs.

Right, and your  matching call sites can be formulated as one.

However, precise pointer analysis cannot be> formulated as a Dyck-CFL-reachability problem.

FWIW: This remains to be seen, and certainly also depends on your
definitions of "precise".
 I agree you cannot currently formulate precise andersen style field
sensitive pointer analysis as this.
As for whether it is impossible to get within a very very small percentage,
...

The alias analysis> presented in that paper is an over approximation. Hence the time bound
> from our paper still applies.
>
> Regards,
> Lian
>
>
>
>
>
> On Fri, Oct 18, 2013 at 4:39 PM, Daniel Berlin <dberlin at
dberlin.org>
> wrote:
> > I notice you guys formulate your CFL reachability problem as a
> > balanced parentheses problem.
> > What algorithm do you use to solve it?
> >
> > Are you aware of recent work that comes up with linear time and n log
> > n time algorithms to solve this class of problems:
> > http://www.cse.cuhk.edu.hk/lyu/_media/paper/pldi2013.pdf
> >
> > In particular, the time bound from the paper:
> >
> > "However, if we need the precise pointer information for all
> > variables, CFL-based
> > pointer analysis is generally more expensive as it has O(L^3 N^3)
> > complexity [29], where L is the size of the grammar and N is the
> > size of the graph."
> >
> > should now been reduced to  O(n + m log m) time, assuming you are
> > using a standard balanced parentheses formulation.
> >
> >
> >
> >
> > On Thu, Oct 17, 2013 at 9:51 PM, lian li <lianli at gmail.com>
wrote:
> >> Hi Hal,
> >>
> >> Thanks for your interest.
> >>
> >> We tested with the following existing compiler optimizations in
LLVM
> >> with SPECINT2006 benchmarks:
> >> -dse (dead store elimination),
> >> -gvn (global value numbering),
> >> -licm (loop invariant code motion),
> >> -bb-vectorize (basic block vectorization),
> >> -memcpyopt (memcpy optimization),
> >> -sink (code sinking),
> >> -loop-idom (recognize loop idioms),
> >> -argpromotion (argument promotion).
> >>
> >> As far as I know, this list includes all optmizations in LLVM that
> >> requires alias information. With our pointer analysis, for the
three
> >> optimizations "-dse, -gvn, -licm", we have observed
significant
> >> improvements in terms of the number of optimizations being
conducted.
> >> Take licm as an example, with our analysis, the number of
instructions
> >> being hoisted out of loops more than doubled for some benchmarks.
We
> >> have not seen big changes in terms of runtime performance though.
> >>
> >> We measure compile time by first linking all modules of the same
> >> benchmark together, then running the analysis against the linked
> >> module. It finishes in less than 1 minute over applications such
as
> >> httpd and sendmail, both with more than 100,000 LOC. For some
> >> benchmarks, it is time-consuming: the analysis time for GCC is
close
> >> to 45 minutes.  But if you run the analysis over individual files
(as
> >> in your normal compilation process), there is no observable
overhead
> >> in terms of compile time.
> >>
> >> Regards,
> >> Lian
> >>
> >>
> >>
> >> On Fri, Oct 18, 2013 at 12:15 PM, Hal Finkel <hfinkel at
anl.gov> wrote:
> >>> Hi Lian,
> >>>
> >>> I am certainly interested in seeing this; do you have
performance
> numbers (compile time)? Also, can you share more information about the
> promising optimization results you mentioned?
> >>>
> >>> Thanks,
> >>> Hal
> >>>
> >>> ----- Original Message -----
> >>>> Hi All,
> >>>>
> >>>> This is Lian Li  from Oracle Labs in Brisbane Australia.
> >>>>
> >>>> We have developed a precise and highly efficient pointer
analysis
> >>>> framework on top of LLVM,  The approach is flow, context,
and field
> >>>> sensitive, details are described in the two papers below:
> >>>>
> >>>> "Boosting the performance of flow-sensitive points-to
analysis using
> >>>> value flow" (in ESEC-FSE 2011), and
> >>>> "Precise and scalable context-sensitive pointer
analysis via value
> >>>> flow graph" (in ISMM 2013).
> >>>>
> >>>> The analysis was initially developed for the purpose of
bug checking,
> >>>> and is now extended to support compiler optimizations. We
have tested
> >>>> it with existing compiler optimizations in LLVM, and have
seen
> >>>> promising results.
> >>>>
> >>>> We are now considering to make this analysis available to
the LLVM
> >>>> community, and contribute resources for future maintenance
needs,
> >>>> provided that there is enough interest. We think that a
new precise
> >>>> pointer analysis in LLVM can enable more new optimization
and
> >>>> analysis
> >>>> techniques to be developed in LLVM.
> >>>>
> >>>> Any people interested in seeing a new precise pointer
analysis in
> >>>> LLVM?
> >>>>
> >>>> Regards,
> >>>> Lian Li
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> _______________________________________________
> >>>> LLVM Developers mailing list
> >>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>>>
> >>>
> >>> --
> >>> Hal Finkel
> >>> Assistant Computational Scientist
> >>> Leadership Computing Facility
> >>> Argonne National Laboratory
> >>
> >>
> >>
> >> --
> >> // Copyright @ 2011 Authorized by ** LIAN LI **
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> --
> // Copyright @ 2011 Authorized by ** LIAN LI **
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131018/bbafd2cd/attachment.html>

lian li

2013-Oct-19 11:13 UTC

head link

[LLVMdev] Contribute a new precise pointer analysis to LLVM

On Sat, Oct 19, 2013 at 1:41 AM, Daniel Berlin <dberlin at dberlin.org>
wrote:>
>
>
> On Fri, Oct 18, 2013 at 7:27 AM, lian li <lianli at gmail.com> wrote:
>>
>> Hi Daniel,
>>
>> I want to clarify that our analysis is not based on CFL-reachability.
>> We apply CFL-reachability to matching context information where the
>> exist from a function to a call-site must match
>> the entry from the corresponding call-site.
>
>
> Yes, sorry, I pulled the wrong quote, it was late.
>
>>
>> The problem is a simple
>> balanced parentheses problem in CFL-reachability, and it can be
>> computed
>> efficiently.
>
>
> Yes, it can, however, it was not clear from the paper that you were :)
> You cite a fairly old paper next to that that essentially gives a >O(1)
> query time bound for this, but your
> problem exactly fits into the bidirectional graph problem, unless i'm
> missing something.
>We use the traditional dynamic programming approach to computing value
flows with matching call sites, similar to the implementation of an
IFDS framwork.
The worst case complexity sounds high, but it is not expensive in
practice with effective memoisation techniques.
Actually in our approach, context matching is not the most expensive part.
The new CFL-reachability algorithm may be able to further improve
performance, but we haven't tried that yet.
>>
>>
>> The paper you mentioned is a very nice paper that proposed an
>> efficient algorithm to solve Dyck-CFL-reachability problem in
>> bidirectional graphs.
>
>
> Right, and your  matching call sites can be formulated as one.
>
>> However, precise pointer analysis cannot be
>> formulated as a Dyck-CFL-reachability problem.
>
>
> FWIW: This remains to be seen, and certainly also depends on your
> definitions of "precise".
>  I agree you cannot currently formulate precise andersen style field
> sensitive pointer analysis as this.
> As for whether it is impossible to get within a very very small percentage,
> ...
The analysis presented in the paper approximates on pointer
dereference, which is the tricky and complex part in pointer analysis.
It reminds me of the one-level approach proposed by Manuvir Das in
2000 ("Unification-based Pointer Analysis with Directional
Assignments"), which applies similar approximation to pointer
dereferences.
How to formulate Andersen Style analysis as Dyck-CFL-reachability
remains open, although you can argue that it is not important for
pointer analysis to solve pointer dereferences precisely.

Regards,
Lian
>
>> The alias analysis
>> presented in that paper is an over approximation. Hence the time bound
>> from our paper still applies.
>>
>> Regards,
>> Lian
>>
>>
>>
>>
>>
>> On Fri, Oct 18, 2013 at 4:39 PM, Daniel Berlin <dberlin at
dberlin.org>
>> wrote:
>> > I notice you guys formulate your CFL reachability problem as a
>> > balanced parentheses problem.
>> > What algorithm do you use to solve it?
>> >
>> > Are you aware of recent work that comes up with linear time and n
log
>> > n time algorithms to solve this class of problems:
>> > http://www.cse.cuhk.edu.hk/lyu/_media/paper/pldi2013.pdf
>> >
>> > In particular, the time bound from the paper:
>> >
>> > "However, if we need the precise pointer information for all
>> > variables, CFL-based
>> > pointer analysis is generally more expensive as it has O(L^3 N^3)
>> > complexity [29], where L is the size of the grammar and N is the
>> > size of the graph."
>> >
>> > should now been reduced to  O(n + m log m) time, assuming you are
>> > using a standard balanced parentheses formulation.
>> >
>> >
>> >
>> >
>> > On Thu, Oct 17, 2013 at 9:51 PM, lian li <lianli at
gmail.com> wrote:
>> >> Hi Hal,
>> >>
>> >> Thanks for your interest.
>> >>
>> >> We tested with the following existing compiler optimizations
in LLVM
>> >> with SPECINT2006 benchmarks:
>> >> -dse (dead store elimination),
>> >> -gvn (global value numbering),
>> >> -licm (loop invariant code motion),
>> >> -bb-vectorize (basic block vectorization),
>> >> -memcpyopt (memcpy optimization),
>> >> -sink (code sinking),
>> >> -loop-idom (recognize loop idioms),
>> >> -argpromotion (argument promotion).
>> >>
>> >> As far as I know, this list includes all optmizations in LLVM
that
>> >> requires alias information. With our pointer analysis, for the
three
>> >> optimizations "-dse, -gvn, -licm", we have observed
significant
>> >> improvements in terms of the number of optimizations being
conducted.
>> >> Take licm as an example, with our analysis, the number of
instructions
>> >> being hoisted out of loops more than doubled for some
benchmarks. We
>> >> have not seen big changes in terms of runtime performance
though.
>> >>
>> >> We measure compile time by first linking all modules of the
same
>> >> benchmark together, then running the analysis against the
linked
>> >> module. It finishes in less than 1 minute over applications
such as
>> >> httpd and sendmail, both with more than 100,000 LOC. For some
>> >> benchmarks, it is time-consuming: the analysis time for GCC is
close
>> >> to 45 minutes.  But if you run the analysis over individual
files (as
>> >> in your normal compilation process), there is no observable
overhead
>> >> in terms of compile time.
>> >>
>> >> Regards,
>> >> Lian
>> >>
>> >>
>> >>
>> >> On Fri, Oct 18, 2013 at 12:15 PM, Hal Finkel <hfinkel at
anl.gov> wrote:
>> >>> Hi Lian,
>> >>>
>> >>> I am certainly interested in seeing this; do you have
performance
>> >>> numbers (compile time)? Also, can you share more
information about the
>> >>> promising optimization results you mentioned?
>> >>>
>> >>> Thanks,
>> >>> Hal
>> >>>
>> >>> ----- Original Message -----
>> >>>> Hi All,
>> >>>>
>> >>>> This is Lian Li  from Oracle Labs in Brisbane
Australia.
>> >>>>
>> >>>> We have developed a precise and highly efficient
pointer analysis
>> >>>> framework on top of LLVM,  The approach is flow,
context, and field
>> >>>> sensitive, details are described in the two papers
below:
>> >>>>
>> >>>> "Boosting the performance of flow-sensitive
points-to analysis using
>> >>>> value flow" (in ESEC-FSE 2011), and
>> >>>> "Precise and scalable context-sensitive pointer
analysis via value
>> >>>> flow graph" (in ISMM 2013).
>> >>>>
>> >>>> The analysis was initially developed for the purpose
of bug checking,
>> >>>> and is now extended to support compiler optimizations.
We have tested
>> >>>> it with existing compiler optimizations in LLVM, and
have seen
>> >>>> promising results.
>> >>>>
>> >>>> We are now considering to make this analysis available
to the LLVM
>> >>>> community, and contribute resources for future
maintenance needs,
>> >>>> provided that there is enough interest. We think that
a new precise
>> >>>> pointer analysis in LLVM can enable more new
optimization and
>> >>>> analysis
>> >>>> techniques to be developed in LLVM.
>> >>>>
>> >>>> Any people interested in seeing a new precise pointer
analysis in
>> >>>> LLVM?
>> >>>>
>> >>>> Regards,
>> >>>> Lian Li
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> _______________________________________________
>> >>>> LLVM Developers mailing list
>> >>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>>
>> >>>
>> >>> --
>> >>> Hal Finkel
>> >>> Assistant Computational Scientist
>> >>> Leadership Computing Facility
>> >>> Argonne National Laboratory
>> >>
>> >>
>> >>
>> >> --
>> >> // Copyright @ 2011 Authorized by ** LIAN LI **
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>> --
>> // Copyright @ 2011 Authorized by ** LIAN LI **
>
>


-- 
// Copyright @ 2011 Authorized by ** LIAN LI **

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - Oct 2013 - [LLVMdev] Contribute a new precise pointer analysis to LLVM

[LLVMdev] Contribute a new precise pointer analysis to LLVM

[LLVMdev] Contribute a new precise pointer analysis to LLVM

[LLVMdev] Contribute a new precise pointer analysis to LLVM

Maybe Matching Threads