thr3ads.net - llvm dev - [LLVMdev] [GSoC] Applying for GSoC 2015 [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Mingxing Zhang

2015-Mar-14 12:55 UTC

[LLVMdev] [GSoC] Applying for GSoC 2015

Hello John,

I've finished the first version of my proposal on enhancing alias analysis.
The proposal can be downloaded at
http://james0zan.github.io/resource/GSoC15-Proposal-AA.pdf.
I hope I've successfully justified the necessity and benefits of this
project.
If possible, please find some time to review it and give me some more
feedbacks.

Thank you very much!

P.S. I'm working on the other proposal, a couple of days is needed.



On 8 March 2015 at 21:42, Mingxing Zhang <james0zan at gmail.com> wrote:
> Got it.
> I'll try to find the applications for field-sensitivity (and
> interprocedural, etc) AA analysis.
> And I'll do some preliminary evaluation on the tracing/slicing part for
> the bloat detection.
>
> Thanks!
>
> On 8 March 2015 at 21:34, John Criswell <jtcriswel at gmail.com>
wrote:
>
>>  On 3/8/15 8:56 AM, Mingxing Zhang wrote:
>>
>>  Hello John,
>>
>> According to the FAQ, I can submit two proposals although at most one
of
>> them can be accepted.
>> Thus I will prepare a proposal for each of the two projects.
>>
>>
>> Correct.  Only one proposal will be accepted.
>>
>>
>>  And, after reading the code of cfl-aa and several related papers,
I've
>> listed four milestones for the AA project:
>>
>> 1) In order to use the fast algorithm described in PLDI'13 [1],
cfl-aa
>> makes a simplification on the CFL defined in POPL'08 [2], which
will lead
>> to a reduction on precision (I've confirmed this observation with
the
>> author).
>> Thus a quantitative measurement on how much is the reduction is needed.
>>
>> 2) In cfl-aa, different fields of a same struct and the whole array are
>> represented by a single node.
>> This is the reason of the problem 2, 4 listed in
>> http://llvm.org/OpenProjects.html.
>> We should split these large nodes.
>>
>>
>> I think the real question is whether the loss of precision matters, and
>> if so, to which uses of alias analysis.  SAFECode, for example, wants
field
>> information to determine type safety (so that it can optimize away
>> type-safe loads and stores), so field sensitivity matters.  Perhaps
field
>> sensitivity doesn't matter for other applications (e.g.,
optimization).
>> There's no point in improving precision if it doesn't help the
analyses
>> that people care about most.
>>
>> As part of your project, I think you should state the uses of alias
>> analysis/points-to analysis that you're aiming to improve and
understand
>> whether your proposed improvements will help that use.  I would also
>> recommend picking a use that matters to a significant portion of the
LLVM
>> community.
>>
>>
>> 3) Handling special global variables, such as errno.
>>
>> 4) It seems that the current version of cfl-aa is an intraprocedural
>> analysis.
>> If the time is enough, I think we may extend it to an interprocedural
>> analysis.
>> The algorithm described in [3] can be applied to scaling it.
>>
>> As for the bloat-detection project, the final result should be a tool
>> that is verified by known bugs and a set of newly detected bugs.
>>
>>
>> For the bloat detection tool, I would like to be convinced that dynamic
>> tracing will be, or can be, sufficiently efficient to be practical.  I
hate
>> to ask, but I think you need to run an experiment with Giri to show
that
>> dynamic slicing is going to be practical for the executions that you
expect
>> to analyze.  Either that, or you need to explain how you can use
something
>> more efficient than dynamic slicing (note that dynamic slicing and
dynamic
>> tracing are not the same, so be sure you're correctly stating which
one you
>> need).
>>
>>
>> Do you have any suggestions on these objectives?
>>
>>
>> In your proposal, be sure to include a set of milestones and how long
you
>> think you will need to achieve those milestones.  I may have said that
>> before, but it's worth repeating.
>>
>> Regards,
>>
>> John Criswell
>>
>>
>>
>> Thanks!
>>
>>
>>  [1] Fast Algorithms for Dyck-CFL-Reachability with Applications to
Alias
>> Analysis. PLDI'13
>> [2] Demand-Driven Alias Analysis for C. POPL'08
>> [3] Demand-Driven Context-Sensitive Alias Analysis for Java.
ISSTA'11
>>
>>
>> On 5 March 2015 at 09:58, Mingxing Zhang <james0zan at gmail.com>
wrote:
>>
>>>  Wow, that is cool!
>>>  I'll check about it.
>>>
>>>  Thank you!
>>>
>>> On 4 March 2015 at 21:57, John Criswell <jtcriswel at
gmail.com> wrote:
>>>
>>>>  On 3/4/15 2:18 AM, Mingxing Zhang wrote:
>>>>
>>>> Hello John,
>>>>
>>>> Thank you for your advices and congratulations~
>>>>
>>>> I'll read the code of cfl-aa and Giri first and make the
decision of
>>>> which project to pursue.
>>>> The choice will be reported to this thread once I made the
>>>> determination (hopefully within this week).
>>>>
>>>>
>>>>  You should check for yourself, but I don't think anything
prevents you
>>>> from submitting two proposals.  If you have time to write two
strong
>>>> proposals, I see no problem with that.
>>>>
>>>> Just make sure that any proposal you write is strong: it
provides a
>>>> concrete explanation of what you want to do, some justification
for why it
>>>> would benefit the community (short or long term), and why
you're the person
>>>> qualified to do it.  Proposals should also include a set of
milestones and
>>>> expected dates for completing those milestones.
>>>>
>>>> Regards,
>>>>
>>>> John Criswell
>>>>
>>>>
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> On 3 March 2015 at 23:12, John Criswell <jtcriswel at
gmail.com> wrote:
>>>>
>>>>>  Dear Mingxing,
>>>>>
>>>>> I think both projects are interesting and useful.
>>>>>
>>>>> Points-to analysis is something that is needed by research
users of
>>>>> LLVM, but to the best of my knowledge, no solid
implementation currently
>>>>> exists (although the cfl-aa work being done at Google may
provide us with
>>>>> something; you should check into it before writing a
proposal).  My
>>>>> interest is in a points-to analysis that is robust and is
useful to both
>>>>> research and industry users of LLVM.  A points-to analysis
proposal must
>>>>> indicate how it will help both of these subsets of the LLVM
community, and
>>>>> it must argue why current efforts do not meet the
requirements of both
>>>>> subsets of the community.
>>>>>
>>>>> The runtime bloat tool also looks interesting, and your
approach (at
>>>>> least to me) is interesting.  One question in my mind,
though, is whether
>>>>> dynamic slicing is going to work well.  Swarup Sahoo and I
built a dynamic
>>>>> slicer for LLVM named Giri, and we found the tracing
required for dynamic
>>>>> slicing to be slow.  For our purposes, the overhead was
okay as we only
>>>>> needed to record execution until a crash (which happened
quickly).  In your
>>>>> bloat tool, the program will probably run for awhile,
creating a long trace
>>>>> record.  You should take a look at the Giri code, use it to
trace some
>>>>> programs, and see if the overheads are going to be
tolerable.  If they are
>>>>> not, then your first task would be to optimize Giri for
your bloat tool.
>>>>>
>>>>> You should also be more specific about which LLVM
instructions will be
>>>>> traced.  For example, I wouldn't record the outputs of
every LLVM
>>>>> instruction; I might only record the outputs of loads and
stores or the end
>>>>> of a def-use chain.
>>>>>
>>>>> I'd be interested in mentoring either project.
>>>>>
>>>>> BTW, it looks like your FSE paper won an award.  Congrats.
>>>>>
>>>>> Regards,
>>>>>
>>>>> John Criswell
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 3/3/15 2:30 AM, Mingxing Zhang wrote:
>>>>>
>>>>>  Hi all,
>>>>>
>>>>> As a Ph.D. student majored in Software Reliability, I have
used LLVM
>>>>> in many of my projects, such as the Anticipating Invariant
(
>>>>> http://james0zan.github.io/AI.html) and some other
undergoing ones.
>>>>> Thus, it would be a great pleasure for me if I could take
this
>>>>> opportunity to contribute to this awesome project.
>>>>>
>>>>> After reading the idea list
(http://llvm.org/OpenProjects.html), I
>>>>> was most interested in the idea of improving the
"Pointer and Alias
>>>>> Analysis" passes.
>>>>> Could you please give me some more tips or advices on how
to get
>>>>> started on working on the application?
>>>>>
>>>>> Simultaneously, I also have another idea about using LLVM
to detect
>>>>> runtime bloat, just like the ThreadSanitizer tool for data
races.
>>>>> If there is anyone here who would like to mentor this
project, could
>>>>> you please find some time to review the more detailed
proposal on gist
>>>>>
<https://gist.github.com/james0zan/d03737c60b10d0d11d34> and give me
>>>>> some feedbacks?
>>>>>
>>>>> P.S.
>>>>>   I do prefer the bloat detection tool, but I'm not
sure about whether
>>>>> it is suitable for GSoC.
>>>>>   Thus I will apply for the Alias Analysis one if it is not
suitable.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> --
>>>>>   Mingxing Zhang
>>>>>
>>>>>  Tel.: +86-10-62797143
>>>>>  Web: http://james0zan.github.io/
>>>>>  Addr: Room 3-122, FIT Building, Tsinghua University,
Beijing 100084,
>>>>> China
>>>>>
>>>>>
>>>>>  _______________________________________________
>>>>> LLVM Developers mailing listLLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>>>  /
>>>>>
>>>>> --
>>>>> John Criswell
>>>>> Assistant Professor
>>>>> Department of Computer Science, University of
Rochesterhttp://www.cs.rochester.edu/u/criswell
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>   Mingxing Zhang
>>>>
>>>>  Tel.: +86-10-62797143
>>>>  Web: http://james0zan.github.io/
>>>>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing
100084,
>>>> China
>>>>
>>>>
>>>>
>>>> --
>>>> John Criswell
>>>> Assistant Professor
>>>> Department of Computer Science, University of
Rochesterhttp://www.cs.rochester.edu/u/criswell
>>>>
>>>>
>>>
>>>
>>> --
>>>   Mingxing Zhang
>>>
>>>  Tel.: +86-10-62797143
>>>  Web: http://james0zan.github.io/
>>>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing
100084,
>>> China
>>>
>>
>>
>>
>> --
>>   Mingxing Zhang
>>
>>  Tel.: +86-10-62797143
>>  Web: http://james0zan.github.io/
>>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084,
>> China
>>
>>
>>
>> --
>> John Criswell
>> Assistant Professor
>> Department of Computer Science, University of
Rochesterhttp://www.cs.rochester.edu/u/criswell
>>
>>
>
>
> --
> Mingxing Zhang
>
> Tel.: +86-10-62797143
> Web: http://james0zan.github.io/
> Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084, China
>


-- 
Mingxing Zhang

Tel.: +86-10-62797143
Web: http://james0zan.github.io/
Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084, China
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150314/a281db15/attachment.html>

Daniel Berlin

2015-Mar-15 00:31 UTC

head link

[LLVMdev] [GSoC] Applying for GSoC 2015

A few notes:
1. "But these standard LLVM AA passes either take a large amount of time
(Anderson Analysis at cubic time and large memory requirements)"

Neither of these is correct. Andersen's is not cubic in practice, or large
memory in practice, when implemented properly.  GCC uses it by default as
the points-to implementation, and it's never even close to the top of the
profile.

It takes about 1 second to do a million lines of code.
And one can do better (gcc's impl is no longer state of the art).

2. The approach to field sensitivity you mention is not going to work very
well, given how much casting occurs (everything is casted). I would suggest
using the approach in http://dl.acm.org/citation.cfm?id=996835

3. George, cc'd, is planning on implementing both context sensitive and
context-insensitive interprocedural analysis in cfl-aa the next month or
two.

4. Using a BDD cloning approach for CFL-AA doesn't make much sense, the
whole point of CFL is not having to generate explicit points-to sets if you
don't want to. Plus, followup papers and researchers have *never* been able
to reproduce the scalability of Whaley's work.

Not to mention it's done on Java. Java is a language where doing things
like field-sensitivity always increase precision, which is not true for C.

If you really want to attempt this, I would suggest using one of the demand
driven context-sensitive approaches that will be easy to fit in CFL.

On Sat, Mar 14, 2015 at 5:57 AM Mingxing Zhang <james0zan at gmail.com>
wrote:
> Hello John,
>
> I've finished the first version of my proposal on enhancing alias
analysis.
> The proposal can be downloaded at
> http://james0zan.github.io/resource/GSoC15-Proposal-AA.pdf.
> I hope I've successfully justified the necessity and benefits of this
> project.
> If possible, please find some time to review it and give me some more
> feedbacks.
>
> Thank you very much!
>
> P.S. I'm working on the other proposal, a couple of days is needed.
>
>
>
> On 8 March 2015 at 21:42, Mingxing Zhang <james0zan at gmail.com>
wrote:
>
>> Got it.
>> I'll try to find the applications for field-sensitivity (and
>> interprocedural, etc) AA analysis.
>> And I'll do some preliminary evaluation on the tracing/slicing part
for
>> the bloat detection.
>>
>> Thanks!
>>
>> On 8 March 2015 at 21:34, John Criswell <jtcriswel at gmail.com>
wrote:
>>
>>>  On 3/8/15 8:56 AM, Mingxing Zhang wrote:
>>>
>>>  Hello John,
>>>
>>> According to the FAQ, I can submit two proposals although at most
one of
>>> them can be accepted.
>>> Thus I will prepare a proposal for each of the two projects.
>>>
>>>
>>> Correct.  Only one proposal will be accepted.
>>>
>>>
>>>  And, after reading the code of cfl-aa and several related papers,
I've
>>> listed four milestones for the AA project:
>>>
>>> 1) In order to use the fast algorithm described in PLDI'13 [1],
cfl-aa
>>> makes a simplification on the CFL defined in POPL'08 [2], which
will lead
>>> to a reduction on precision (I've confirmed this observation
with the
>>> author).
>>> Thus a quantitative measurement on how much is the reduction is
needed.
>>>
>>> 2) In cfl-aa, different fields of a same struct and the whole array
are
>>> represented by a single node.
>>> This is the reason of the problem 2, 4 listed in
>>> http://llvm.org/OpenProjects.html.
>>> We should split these large nodes.
>>>
>>>
>>> I think the real question is whether the loss of precision matters,
and
>>> if so, to which uses of alias analysis.  SAFECode, for example,
wants field
>>> information to determine type safety (so that it can optimize away
>>> type-safe loads and stores), so field sensitivity matters.  Perhaps
field
>>> sensitivity doesn't matter for other applications (e.g.,
optimization).
>>> There's no point in improving precision if it doesn't help
the analyses
>>> that people care about most.
>>>
>>> As part of your project, I think you should state the uses of alias
>>> analysis/points-to analysis that you're aiming to improve and
understand
>>> whether your proposed improvements will help that use.  I would
also
>>> recommend picking a use that matters to a significant portion of
the LLVM
>>> community.
>>>
>>>
>>> 3) Handling special global variables, such as errno.
>>>
>>> 4) It seems that the current version of cfl-aa is an
intraprocedural
>>> analysis.
>>> If the time is enough, I think we may extend it to an
interprocedural
>>> analysis.
>>> The algorithm described in [3] can be applied to scaling it.
>>>
>>> As for the bloat-detection project, the final result should be a
tool
>>> that is verified by known bugs and a set of newly detected bugs.
>>>
>>>
>>> For the bloat detection tool, I would like to be convinced that
dynamic
>>> tracing will be, or can be, sufficiently efficient to be practical.
I hate
>>> to ask, but I think you need to run an experiment with Giri to show
that
>>> dynamic slicing is going to be practical for the executions that
you expect
>>> to analyze.  Either that, or you need to explain how you can use
something
>>> more efficient than dynamic slicing (note that dynamic slicing and
dynamic
>>> tracing are not the same, so be sure you're correctly stating
which one you
>>> need).
>>>
>>>
>>> Do you have any suggestions on these objectives?
>>>
>>>
>>> In your proposal, be sure to include a set of milestones and how
long
>>> you think you will need to achieve those milestones.  I may have
said that
>>> before, but it's worth repeating.
>>>
>>> Regards,
>>>
>>> John Criswell
>>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>>  [1] Fast Algorithms for Dyck-CFL-Reachability with Applications to
>>> Alias Analysis. PLDI'13
>>> [2] Demand-Driven Alias Analysis for C. POPL'08
>>> [3] Demand-Driven Context-Sensitive Alias Analysis for Java.
ISSTA'11
>>>
>>>
>>> On 5 March 2015 at 09:58, Mingxing Zhang <james0zan at
gmail.com> wrote:
>>>
>>>>  Wow, that is cool!
>>>>  I'll check about it.
>>>>
>>>>  Thank you!
>>>>
>>>> On 4 March 2015 at 21:57, John Criswell <jtcriswel at
gmail.com> wrote:
>>>>
>>>>>  On 3/4/15 2:18 AM, Mingxing Zhang wrote:
>>>>>
>>>>> Hello John,
>>>>>
>>>>> Thank you for your advices and congratulations~
>>>>>
>>>>> I'll read the code of cfl-aa and Giri first and make
the decision of
>>>>> which project to pursue.
>>>>> The choice will be reported to this thread once I made the
>>>>> determination (hopefully within this week).
>>>>>
>>>>>
>>>>>  You should check for yourself, but I don't think
anything prevents
>>>>> you from submitting two proposals.  If you have time to
write two strong
>>>>> proposals, I see no problem with that.
>>>>>
>>>>> Just make sure that any proposal you write is strong: it
provides a
>>>>> concrete explanation of what you want to do, some
justification for why it
>>>>> would benefit the community (short or long term), and why
you're the person
>>>>> qualified to do it.  Proposals should also include a set of
milestones and
>>>>> expected dates for completing those milestones.
>>>>>
>>>>> Regards,
>>>>>
>>>>> John Criswell
>>>>>
>>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> On 3 March 2015 at 23:12, John Criswell <jtcriswel at
gmail.com> wrote:
>>>>>
>>>>>>  Dear Mingxing,
>>>>>>
>>>>>> I think both projects are interesting and useful.
>>>>>>
>>>>>> Points-to analysis is something that is needed by
research users of
>>>>>> LLVM, but to the best of my knowledge, no solid
implementation currently
>>>>>> exists (although the cfl-aa work being done at Google
may provide us with
>>>>>> something; you should check into it before writing a
proposal).  My
>>>>>> interest is in a points-to analysis that is robust and
is useful to both
>>>>>> research and industry users of LLVM.  A points-to
analysis proposal must
>>>>>> indicate how it will help both of these subsets of the
LLVM community, and
>>>>>> it must argue why current efforts do not meet the
requirements of both
>>>>>> subsets of the community.
>>>>>>
>>>>>> The runtime bloat tool also looks interesting, and your
approach (at
>>>>>> least to me) is interesting.  One question in my mind,
though, is whether
>>>>>> dynamic slicing is going to work well.  Swarup Sahoo
and I built a dynamic
>>>>>> slicer for LLVM named Giri, and we found the tracing
required for dynamic
>>>>>> slicing to be slow.  For our purposes, the overhead was
okay as we only
>>>>>> needed to record execution until a crash (which
happened quickly).  In your
>>>>>> bloat tool, the program will probably run for awhile,
creating a long trace
>>>>>> record.  You should take a look at the Giri code, use
it to trace some
>>>>>> programs, and see if the overheads are going to be
tolerable.  If they are
>>>>>> not, then your first task would be to optimize Giri for
your bloat tool.
>>>>>>
>>>>>> You should also be more specific about which LLVM
instructions will
>>>>>> be traced.  For example, I wouldn't record the
outputs of every LLVM
>>>>>> instruction; I might only record the outputs of loads
and stores or the end
>>>>>> of a def-use chain.
>>>>>>
>>>>>> I'd be interested in mentoring either project.
>>>>>>
>>>>>> BTW, it looks like your FSE paper won an award. 
Congrats.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> John Criswell
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 3/3/15 2:30 AM, Mingxing Zhang wrote:
>>>>>>
>>>>>>  Hi all,
>>>>>>
>>>>>> As a Ph.D. student majored in Software Reliability, I
have used LLVM
>>>>>> in many of my projects, such as the Anticipating
Invariant (
>>>>>> http://james0zan.github.io/AI.html) and some other
undergoing ones.
>>>>>> Thus, it would be a great pleasure for me if I could
take this
>>>>>> opportunity to contribute to this awesome project.
>>>>>>
>>>>>> After reading the idea list
(http://llvm.org/OpenProjects.html), I
>>>>>> was most interested in the idea of improving the
"Pointer and Alias
>>>>>> Analysis" passes.
>>>>>> Could you please give me some more tips or advices on
how to get
>>>>>> started on working on the application?
>>>>>>
>>>>>> Simultaneously, I also have another idea about using
LLVM to detect
>>>>>> runtime bloat, just like the ThreadSanitizer tool for
data races.
>>>>>> If there is anyone here who would like to mentor this
project, could
>>>>>> you please find some time to review the more detailed
proposal on
>>>>>> gist
<https://gist.github.com/james0zan/d03737c60b10d0d11d34> and
>>>>>> give me some feedbacks?
>>>>>>
>>>>>> P.S.
>>>>>>   I do prefer the bloat detection tool, but I'm not
sure about
>>>>>> whether it is suitable for GSoC.
>>>>>>   Thus I will apply for the Alias Analysis one if it is
not suitable.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> --
>>>>>>   Mingxing Zhang
>>>>>>
>>>>>>  Tel.: +86-10-62797143
>>>>>>  Web: http://james0zan.github.io/
>>>>>>  Addr: Room 3-122, FIT Building, Tsinghua University,
Beijing
>>>>>> 100084, China
>>>>>>
>>>>>>
>>>>>>  _______________________________________________
>>>>>> LLVM Developers mailing listLLVMdev at cs.uiuc.edu     
http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>
>>>>>>  /
>>>>>>
>>>>>> --
>>>>>> John Criswell
>>>>>> Assistant Professor
>>>>>> Department of Computer Science, University of
Rochesterhttp://www.cs.rochester.edu/u/criswell
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>   Mingxing Zhang
>>>>>
>>>>>  Tel.: +86-10-62797143
>>>>>  Web: http://james0zan.github.io/
>>>>>  Addr: Room 3-122, FIT Building, Tsinghua University,
Beijing 100084,
>>>>> China
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> John Criswell
>>>>> Assistant Professor
>>>>> Department of Computer Science, University of
Rochesterhttp://www.cs.rochester.edu/u/criswell
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>   Mingxing Zhang
>>>>
>>>>  Tel.: +86-10-62797143
>>>>  Web: http://james0zan.github.io/
>>>>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing
100084,
>>>> China
>>>>
>>>
>>>
>>>
>>> --
>>>   Mingxing Zhang
>>>
>>>  Tel.: +86-10-62797143
>>>  Web: http://james0zan.github.io/
>>>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing
100084,
>>> China
>>>
>>>
>>>
>>> --
>>> John Criswell
>>> Assistant Professor
>>> Department of Computer Science, University of
Rochesterhttp://www.cs.rochester.edu/u/criswell
>>>
>>>
>>
>>
>> --
>> Mingxing Zhang
>>
>> Tel.: +86-10-62797143
>> Web: http://james0zan.github.io/
>> Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084,
China
>>
>
>
>
> --
> Mingxing Zhang
>
> Tel.: +86-10-62797143
> Web: http://james0zan.github.io/
> Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084, China
>  _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150315/9b107d62/attachment.html>

Mingxing Zhang

2015-Mar-15 12:50 UTC

head link

[LLVMdev] [GSoC] Applying for GSoC 2015

Hello Daniel,

Thank you for your comments and sorry for my mistakes, I'll revise them.
And I'll for sure read the paper you mentioned and survey the recent
researches before deciding the implementation technique.

To George:
May I know the exact plan of your attempt for making cfl-aa interprocedural?
I do think that this is the most valuable part of my proposal, but that
makes no sense to do it twice.

Maybe I can work on the porting of the flow-sensitive method proposed by
Prof. Ben Hardekopf at CGO11
<http://www.cs.ucsb.edu/~benh/research/papers/hardekopf11flow.pdf>.
It is declared in his homepage that the published source code "is written
for a pre-release version of LLVM 2.5 and does not work in current versions
of LLVM"

Thanks!



On 15 March 2015 at 08:31, Daniel Berlin <dberlin at dberlin.org> wrote:
> A few notes:
> 1. "But these standard LLVM AA passes either take a large amount of
time
> (Anderson Analysis at cubic time and large memory requirements)"
>
> Neither of these is correct. Andersen's is not cubic in practice, or
large
> memory in practice, when implemented properly.  GCC uses it by default as
> the points-to implementation, and it's never even close to the top of
the
> profile.
>
> It takes about 1 second to do a million lines of code.
> And one can do better (gcc's impl is no longer state of the art).
>
> 2. The approach to field sensitivity you mention is not going to work very
> well, given how much casting occurs (everything is casted). I would suggest
> using the approach in http://dl.acm.org/citation.cfm?id=996835
>
> 3. George, cc'd, is planning on implementing both context sensitive and
> context-insensitive interprocedural analysis in cfl-aa the next month or
> two.
>
> 4. Using a BDD cloning approach for CFL-AA doesn't make much sense, the
> whole point of CFL is not having to generate explicit points-to sets if you
> don't want to. Plus, followup papers and researchers have *never* been
able
> to reproduce the scalability of Whaley's work.
>
> Not to mention it's done on Java. Java is a language where doing things
> like field-sensitivity always increase precision, which is not true for C.
>
> If you really want to attempt this, I would suggest using one of the
> demand driven context-sensitive approaches that will be easy to fit in CFL.
>
> On Sat, Mar 14, 2015 at 5:57 AM Mingxing Zhang <james0zan at
gmail.com>
> wrote:
>
>> Hello John,
>>
>> I've finished the first version of my proposal on enhancing alias
>> analysis.
>> The proposal can be downloaded at
>> http://james0zan.github.io/resource/GSoC15-Proposal-AA.pdf.
>> I hope I've successfully justified the necessity and benefits of
this
>> project.
>> If possible, please find some time to review it and give me some more
>> feedbacks.
>>
>> Thank you very much!
>>
>> P.S. I'm working on the other proposal, a couple of days is needed.
>>
>>
>>
>> On 8 March 2015 at 21:42, Mingxing Zhang <james0zan at gmail.com>
wrote:
>>
>>> Got it.
>>> I'll try to find the applications for field-sensitivity (and
>>> interprocedural, etc) AA analysis.
>>> And I'll do some preliminary evaluation on the tracing/slicing
part for
>>> the bloat detection.
>>>
>>> Thanks!
>>>
>>> On 8 March 2015 at 21:34, John Criswell <jtcriswel at
gmail.com> wrote:
>>>
>>>>  On 3/8/15 8:56 AM, Mingxing Zhang wrote:
>>>>
>>>>  Hello John,
>>>>
>>>> According to the FAQ, I can submit two proposals although at
most one
>>>> of them can be accepted.
>>>> Thus I will prepare a proposal for each of the two projects.
>>>>
>>>>
>>>> Correct.  Only one proposal will be accepted.
>>>>
>>>>
>>>>  And, after reading the code of cfl-aa and several related
papers,
>>>> I've listed four milestones for the AA project:
>>>>
>>>> 1) In order to use the fast algorithm described in PLDI'13
[1], cfl-aa
>>>> makes a simplification on the CFL defined in POPL'08 [2],
which will lead
>>>> to a reduction on precision (I've confirmed this
observation with the
>>>> author).
>>>> Thus a quantitative measurement on how much is the reduction is
needed.
>>>>
>>>> 2) In cfl-aa, different fields of a same struct and the whole
array are
>>>> represented by a single node.
>>>> This is the reason of the problem 2, 4 listed in
>>>> http://llvm.org/OpenProjects.html.
>>>> We should split these large nodes.
>>>>
>>>>
>>>> I think the real question is whether the loss of precision
matters, and
>>>> if so, to which uses of alias analysis.  SAFECode, for example,
wants field
>>>> information to determine type safety (so that it can optimize
away
>>>> type-safe loads and stores), so field sensitivity matters. 
Perhaps field
>>>> sensitivity doesn't matter for other applications (e.g.,
optimization).
>>>> There's no point in improving precision if it doesn't
help the analyses
>>>> that people care about most.
>>>>
>>>> As part of your project, I think you should state the uses of
alias
>>>> analysis/points-to analysis that you're aiming to improve
and understand
>>>> whether your proposed improvements will help that use.  I would
also
>>>> recommend picking a use that matters to a significant portion
of the LLVM
>>>> community.
>>>>
>>>>
>>>> 3) Handling special global variables, such as errno.
>>>>
>>>> 4) It seems that the current version of cfl-aa is an
intraprocedural
>>>> analysis.
>>>> If the time is enough, I think we may extend it to an
interprocedural
>>>> analysis.
>>>> The algorithm described in [3] can be applied to scaling it.
>>>>
>>>> As for the bloat-detection project, the final result should be
a tool
>>>> that is verified by known bugs and a set of newly detected
bugs.
>>>>
>>>>
>>>> For the bloat detection tool, I would like to be convinced that
dynamic
>>>> tracing will be, or can be, sufficiently efficient to be
practical.  I hate
>>>> to ask, but I think you need to run an experiment with Giri to
show that
>>>> dynamic slicing is going to be practical for the executions
that you expect
>>>> to analyze.  Either that, or you need to explain how you can
use something
>>>> more efficient than dynamic slicing (note that dynamic slicing
and dynamic
>>>> tracing are not the same, so be sure you're correctly
stating which one you
>>>> need).
>>>>
>>>>
>>>> Do you have any suggestions on these objectives?
>>>>
>>>>
>>>> In your proposal, be sure to include a set of milestones and
how long
>>>> you think you will need to achieve those milestones.  I may
have said that
>>>> before, but it's worth repeating.
>>>>
>>>> Regards,
>>>>
>>>> John Criswell
>>>>
>>>>
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>  [1] Fast Algorithms for Dyck-CFL-Reachability with
Applications to
>>>> Alias Analysis. PLDI'13
>>>> [2] Demand-Driven Alias Analysis for C. POPL'08
>>>> [3] Demand-Driven Context-Sensitive Alias Analysis for Java.
ISSTA'11
>>>>
>>>>
>>>> On 5 March 2015 at 09:58, Mingxing Zhang <james0zan at
gmail.com> wrote:
>>>>
>>>>>  Wow, that is cool!
>>>>>  I'll check about it.
>>>>>
>>>>>  Thank you!
>>>>>
>>>>> On 4 March 2015 at 21:57, John Criswell <jtcriswel at
gmail.com> wrote:
>>>>>
>>>>>>  On 3/4/15 2:18 AM, Mingxing Zhang wrote:
>>>>>>
>>>>>> Hello John,
>>>>>>
>>>>>> Thank you for your advices and congratulations~
>>>>>>
>>>>>> I'll read the code of cfl-aa and Giri first and
make the decision of
>>>>>> which project to pursue.
>>>>>> The choice will be reported to this thread once I made
the
>>>>>> determination (hopefully within this week).
>>>>>>
>>>>>>
>>>>>>  You should check for yourself, but I don't think
anything prevents
>>>>>> you from submitting two proposals.  If you have time to
write two strong
>>>>>> proposals, I see no problem with that.
>>>>>>
>>>>>> Just make sure that any proposal you write is strong:
it provides a
>>>>>> concrete explanation of what you want to do, some
justification for why it
>>>>>> would benefit the community (short or long term), and
why you're the person
>>>>>> qualified to do it.  Proposals should also include a
set of milestones and
>>>>>> expected dates for completing those milestones.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> John Criswell
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> On 3 March 2015 at 23:12, John Criswell <jtcriswel
at gmail.com> wrote:
>>>>>>
>>>>>>>  Dear Mingxing,
>>>>>>>
>>>>>>> I think both projects are interesting and useful.
>>>>>>>
>>>>>>> Points-to analysis is something that is needed by
research users of
>>>>>>> LLVM, but to the best of my knowledge, no solid
implementation currently
>>>>>>> exists (although the cfl-aa work being done at
Google may provide us with
>>>>>>> something; you should check into it before writing
a proposal).  My
>>>>>>> interest is in a points-to analysis that is robust
and is useful to both
>>>>>>> research and industry users of LLVM.  A points-to
analysis proposal must
>>>>>>> indicate how it will help both of these subsets of
the LLVM community, and
>>>>>>> it must argue why current efforts do not meet the
requirements of both
>>>>>>> subsets of the community.
>>>>>>>
>>>>>>> The runtime bloat tool also looks interesting, and
your approach (at
>>>>>>> least to me) is interesting.  One question in my
mind, though, is whether
>>>>>>> dynamic slicing is going to work well.  Swarup
Sahoo and I built a dynamic
>>>>>>> slicer for LLVM named Giri, and we found the
tracing required for dynamic
>>>>>>> slicing to be slow.  For our purposes, the overhead
was okay as we only
>>>>>>> needed to record execution until a crash (which
happened quickly).  In your
>>>>>>> bloat tool, the program will probably run for
awhile, creating a long trace
>>>>>>> record.  You should take a look at the Giri code,
use it to trace some
>>>>>>> programs, and see if the overheads are going to be
tolerable.  If they are
>>>>>>> not, then your first task would be to optimize Giri
for your bloat tool.
>>>>>>>
>>>>>>> You should also be more specific about which LLVM
instructions will
>>>>>>> be traced.  For example, I wouldn't record the
outputs of every LLVM
>>>>>>> instruction; I might only record the outputs of
loads and stores or the end
>>>>>>> of a def-use chain.
>>>>>>>
>>>>>>> I'd be interested in mentoring either project.
>>>>>>>
>>>>>>> BTW, it looks like your FSE paper won an award. 
Congrats.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> John Criswell
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 3/3/15 2:30 AM, Mingxing Zhang wrote:
>>>>>>>
>>>>>>>  Hi all,
>>>>>>>
>>>>>>> As a Ph.D. student majored in Software Reliability,
I have used LLVM
>>>>>>> in many of my projects, such as the Anticipating
Invariant (
>>>>>>> http://james0zan.github.io/AI.html) and some other
undergoing ones.
>>>>>>> Thus, it would be a great pleasure for me if I
could take this
>>>>>>> opportunity to contribute to this awesome project.
>>>>>>>
>>>>>>> After reading the idea list
(http://llvm.org/OpenProjects.html), I
>>>>>>> was most interested in the idea of improving the
"Pointer and Alias
>>>>>>> Analysis" passes.
>>>>>>> Could you please give me some more tips or advices
on how to get
>>>>>>> started on working on the application?
>>>>>>>
>>>>>>> Simultaneously, I also have another idea about
using LLVM to detect
>>>>>>> runtime bloat, just like the ThreadSanitizer tool
for data races.
>>>>>>> If there is anyone here who would like to mentor
this project, could
>>>>>>> you please find some time to review the more
detailed proposal on
>>>>>>> gist
<https://gist.github.com/james0zan/d03737c60b10d0d11d34> and
>>>>>>> give me some feedbacks?
>>>>>>>
>>>>>>> P.S.
>>>>>>>   I do prefer the bloat detection tool, but I'm
not sure about
>>>>>>> whether it is suitable for GSoC.
>>>>>>>   Thus I will apply for the Alias Analysis one if
it is not suitable.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> --
>>>>>>>   Mingxing Zhang
>>>>>>>
>>>>>>>  Tel.: +86-10-62797143
>>>>>>>  Web: http://james0zan.github.io/
>>>>>>>  Addr: Room 3-122, FIT Building, Tsinghua
University, Beijing
>>>>>>> 100084, China
>>>>>>>
>>>>>>>
>>>>>>>  _______________________________________________
>>>>>>> LLVM Developers mailing listLLVMdev at cs.uiuc.edu 
http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>
>>>>>>>  /
>>>>>>>
>>>>>>> --
>>>>>>> John Criswell
>>>>>>> Assistant Professor
>>>>>>> Department of Computer Science, University of
Rochesterhttp://www.cs.rochester.edu/u/criswell
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>   Mingxing Zhang
>>>>>>
>>>>>>  Tel.: +86-10-62797143
>>>>>>  Web: http://james0zan.github.io/
>>>>>>  Addr: Room 3-122, FIT Building, Tsinghua University,
Beijing
>>>>>> 100084, China
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> John Criswell
>>>>>> Assistant Professor
>>>>>> Department of Computer Science, University of
Rochesterhttp://www.cs.rochester.edu/u/criswell
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>   Mingxing Zhang
>>>>>
>>>>>  Tel.: +86-10-62797143
>>>>>  Web: http://james0zan.github.io/
>>>>>  Addr: Room 3-122, FIT Building, Tsinghua University,
Beijing 100084,
>>>>> China
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>   Mingxing Zhang
>>>>
>>>>  Tel.: +86-10-62797143
>>>>  Web: http://james0zan.github.io/
>>>>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing
100084,
>>>> China
>>>>
>>>>
>>>>
>>>> --
>>>> John Criswell
>>>> Assistant Professor
>>>> Department of Computer Science, University of
Rochesterhttp://www.cs.rochester.edu/u/criswell
>>>>
>>>>
>>>
>>>
>>> --
>>> Mingxing Zhang
>>>
>>> Tel.: +86-10-62797143
>>> Web: http://james0zan.github.io/
>>> Addr: Room 3-122, FIT Building, Tsinghua University, Beijing
100084,
>>> China
>>>
>>
>>
>>
>> --
>> Mingxing Zhang
>>
>> Tel.: +86-10-62797143
>> Web: http://james0zan.github.io/
>> Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084,
China
>>  _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>

-- 
Mingxing Zhang

Tel.: +86-10-62797143
Web: http://james0zan.github.io/
Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084, China
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150315/ae053b90/attachment.html>

John Criswell

2015-Mar-15 15:57 UTC

head link

[LLVMdev] [GSoC] Applying for GSoC 2015

Dear Mingxing,

I've attached comments to the PDF document and attached the document to 
this email.  You may need to use an option in your PDF reader to see the 
comments (I made the comment using Apple's Preview application).

The most significant thing you need to do is to provide better 
justification for the work.  What I (and I think others) want to see is 
an application of alias analysis that LLVM (or one of its sub-projects) 
is *already* using and how your project will improve that application of 
alias analysis.

You should also improve and fix the grammar and spelling (not so much 
for getting your proposal accepted but to practice doing well at that; 
it's something every Ph.D. student needs to practice).

Regards,

John Criswell

-- 
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
http://www.cs.rochester.edu/u/criswell

-------------- next part --------------
A non-text attachment was scrubbed...
Name: GSoC15-Proposal-AA.pdf
Type: application/pdf
Size: 192280 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150315/fe1d42bc/attachment.pdf>

llvm dev - Mar 2015 - [LLVMdev] [GSoC] Applying for GSoC 2015

[LLVMdev] [GSoC] Applying for GSoC 2015

[LLVMdev] [GSoC] Applying for GSoC 2015

[LLVMdev] [GSoC] Applying for GSoC 2015

[LLVMdev] [GSoC] Applying for GSoC 2015