thr3ads.net - llvm dev - [LLVMdev] GSoC Proposal: Inter-Procedure Program Slicing in LLVM [May 2013]

If this information is useful, please help other people find it:
Share via:

Mingliang LIU

2013-May-02 04:36 UTC

[LLVMdev] GSoC Proposal: Inter-Procedure Program Slicing in LLVM

Hi all,

I had a second thought of the dynamic slicing, as well as the source code
generating.

Firstly, the dynamic slicing is very useful to software community (I'll
illustrate more in the refined proposal later), but it's already
implemented by Swarup and John Criswell from UIUC. The static slicing code
has been released as Giri project in LLVM, and they would kindly release
the dynamic slicing too if this proposal is accepted.

I'd like to study and enhance the Giri code to make it better. However, I
don't know the exact part to which I can contribute. I'll ask Swarup (I
cc
this email to him) for help. I would be honored to be directed by him if
this proposal is selected.

Moreover, I don't know how the static and dynamic slicing fit together.
Does the static code of Giri need some improvements? For example, taking
into consideration the pointer aliases. Our group need the static slicing
to generate I/O benchmarks (see below please).

Secondly, to generate source code, which is human-readable and compilable
seems a bit ambitious. My previous scripts in python can generate the
source code of the original program by deleting useless line of code. It's
kind of tricky. We used dozens of regular expressions to match the boundary
of blocks, loops and functions to avoid deleting lines unexpectedly. I
thought employing clang was a better idea. I don't know whether you think
the source code genration is a good idea or not. I can release our simple
script and improve it later. So I have more time/energy to contribute to
the dynamic slicer.

I think the slicing pass can be loaded by the opt. There is no need to
change the front-end or other passes. I'm not sure whether the link
time optimization can be exploited. We borrowed the LLVMSlicer code
previously without LTO.

Lastly, I'd like to introduce myself briefly. I'm a three-years PhD
candidate student from Tsinghua University, China. My research area covers
performance analysis, compiler techniques for high performance
computing, and parallel computing (MPI/OpenMP).

One of my on-going work is to generate an I/O benchmark from the original
application. The base observation is that the computation and communication
statements can be deleted if they're irrelevant to the I/O pattern, e.g.
computing the buffer content to be written into a file. We take use of the
program slicing technique to find relevant/irrelevant statements. The
static slicer was borrowed from LLVMSlicer and I wrote code to make it work
for our application. We generated the line number of sliced code. There is
a very simple source code generation script using ugly and tricky regular
expressions to delete original source code according to the sliced line
number. We're going to submit the first version of the paper recently.

I also took part in one project in Open64 compiler several year ago, the
purpose of which is to fast collect the communication trace. We used
program slicing to delete the computation statements and kept the
communication related statements. We generated executable binaries instead
of source code from IR.

Now we plan to do a project that can automatically find manual
configuration errors in software deployment. The dynamic slicing can help a
lot since the input is the key factor to locate errors. We have not a
concrete plan for this project, but the dynamic slicing is heavily needed.

Regards.

On Thu, May 2, 2013 at 8:00 AM, Sahoo, Swarup Kumar <ssahoo2 at
illinois.edu>wrote:
>  Hi Mingliang,
>
>     We already implemented an usable version of Dynamic Backward Slicing
> as part of our ASPLOS paper. So, I think this will be a good idea to extend
> this project. I will also be interested in mentoring you for this project.
> Please let me know, if you need any help.
>
> Thanks,
> Swarup.
>
>  ------------------------------
> *From:* llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] on
> behalf of Mingliang LIU [liuml07 at mails.tsinghua.edu.cn]
> *Sent:* Saturday, April 27, 2013 2:53 PM
> *To:* LLVM
> *Subject:* [LLVMdev] GSoC Proposal: Inter-Procedure Program Slicing in
> LLVM
>
>  Hi all,
>
>  This is a GSoC 2013 proposal for LLVM project. Please see the formatted
> version at here:
>
http://pacman.cs.tsinghua.edu.cn/~liuml07/files/gsoc2013-proposal-program-slicing.pdf
>
>  Program slicing has been used in many applications, the criteria
> of which is a pair of statement and variables. I would like to write an
> inter-procedural program slicing pass in LLVM, which is able to calculate C
> program slices of source code effectively. There is no previous work
> implemented in LLVM, which considers both the dynamic program slicing and
> source code of the sliced program. Program slice contains all statements in
> a program that directly or indirectly act the value of a
> variable occurrence [5]. Program slicing has been used in many
> applications, e.g., program verification, testing, maintenance, automatic
> parallelization of program execution, automatic integration of program
> versions.
>
>  While it's straightforward to implement the slicer in the back-end of
> compiler using SSA form, the source code of the original program instead of
> intermediate
> representation is preferred in most cases. Moreover, we can further narrow
> the notion of slice, which contains statements that influence the value of
a
> variable occurrence for speci c program inputs. This is referred as
> dynamic program slicing [1]. However, there is no previous work implemented
> in LLVM
> which solved the two problems.
>
>  There are two public projects which implement the backwards static
> slicing in LLVM.
>
>    - Giri Written by John Criswell from UIUC, a subproject of LLVM. The
>    Giri code contains the static backwards intra-procedure slicing passes,
and
>    runs with an older version of LLVM. It also only backtracks until it
hits a
>    load. Additional code must be written to backtrack further to
>    find potentially reaching stores.
>    - LLVMSlicer This implementation is a complete static backwards slicer
>    from Masaryk University. It works on the well de ned data and control
flow
>    equations in a white paper by F. Tip [4]. However, this code was written
>    for special purpose, thus it's not general enough to be use by
others. They
>    implemented the Andersen's alias algorithm [2], callgraph, and modi
es
>    analysis to support the slicer, instead of using the LLVM APIs.
>
>
>  They eliminate the useless IR statements and keep the ones a ect the
> values of the criteria. However, neither of them generates the compilable
> source code
> slice, which is heavily needed in reality. There are several ways to do
> this. One is to generate the source code from sliced IR using llc tool. The
> issue is that
> the IR is not concerned with high-level semantic. The generated source
> code is different from the original program and not suitable for human
> reading. An-
> other approach is deleting the source code according to sliced IR, with
> line number information (in meta-data of each instruction). The naive
> script deleting
> sliced source code one by one fails to handle tricky cases. I think a
> better source code slicer is to take use of the AST info.
>
>  There are three main parts of this work.
>
>    1. First, borrow an implementation of static back-wards slicing from
>    Giri or LLVMSlicer, and use the LLVM callgraph, mod/ref and alias
>    interfaces as much as possible.
>    2. Second, implement the dynamic program slicing using the approach 3
>    in the paper [1].
>    3. Third, generate the source code of the sliced program. To make the
>    sliced source code compile directly, we need to employ clang front-end
>
>
>  The final result of this summer of code is to make this pass work e
> effectively and documented well. Further, I'll write test cases and
behave
> as the active
> maintainer for this project. My long-term plan is to add more features,
> e.g. Objective-C/C++ support, thing slicing [3], to this project.
>
>  Any comment is highly appreciated.
>
>
>  [1] H AGRAWAL. Dynamic Program Slicing. In ACM SIGPLAN Conference on
> Programming Language Design and Implementation (PLDI),1990.
>  2] L.O. Andersen. Program analysis and specialization for the C
> programming language. PhD thesis, University of Cophenhagen, Germany, 1994.
> [3] M. Sridharan, S.J. Fink, and R. Bodik. Thin slicing. In PLDI'07,
2007.
> [4] F. Tip. A survey of program slicing techniques.Journal of Programming
> Languages, 1995.
> [5] M. Weiser. Program slicing. In Proceedings of the 5th International
> Conference on Software Engineering, ICSE, pages 439{449. IEEE, 1981.
>
>  --
> Mingliang LIU (刘明亮 in Chinese)
>
> PACMAN Group,  Dept. of Computer Science & Technology
> Tsinghua University, Beijing 100084, China
> Email: liuml07 at mails.tsinghua.edu.cn
> Homepage: http://pacman.cs.tsinghua.edu.cn/~liuml07/
>
>

-- 
Mingliang Liu (刘明亮 in Chinese)

PACMAN Group,  Dept. of Computer Science & Technology
Tsinghua University, Beijing 100084, China
Email: liuml07 at mails.tsinghua.edu.cn
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130502/b78d927c/attachment.html>

John Criswell

2013-May-02 15:25 UTC

head link

[LLVMdev] GSoC Proposal: Inter-Procedure Program Slicing in LLVM

On 5/1/13 11:36 PM, Mingliang LIU wrote:> Hi all,
>
> I had a second thought of the dynamic slicing, as well as the source 
> code generating.
>
> Firstly, the dynamic slicing is very useful to software community 
> (I'll illustrate more in the refined proposal later), but it's
already
> implemented by Swarup and John Criswell from UIUC. The static slicing 
> code has been released as Giri project in LLVM, and they would kindly 
> release the dynamic slicing too if this proposal is accepted.
>
> I'd like to study and enhance the Giri code to make it better. 
> However, I don't know the exact part to which I can contribute.
I'll
> ask Swarup (I cc this email to him) for help. I would be honored to be 
> directed by him if this proposal is selected.
>
> Moreover, I don't know how the static and dynamic slicing fit 
> together. Does the static code of Giri need some improvements? For 
> example, taking into consideration the pointer aliases. Our group need 
> the static slicing to generate I/O benchmarks (see below please).
It is not immediately obvious to me that dynamic slicing can be improved 
by a static slicing analysis.

As far as improvements to the dynamic slicing Giri code, I can think of 
several:

1) Adding support to correctly handle asynchronous events (e.g., signal 
handlers).

2) Updating the code to LLVM mainline and putting it into the giri SVN 
repository

3) Reducing the trace size.  Giri currently records the execution of 
each basic block, load and store, call, and return instruction. There 
are things you can do to make the trace smaller (e.g., creating one 
trace record for control-equivalent basic blocks).

4) Making the giri run-time library thread safe and the slicing 
thread/process aware.  Right now, I think the events of all processes 
and threads get thrown together in one trace file.  Each should have a 
separate trace file, or trace records should indicate which thread is 
performing a particular operation.

5) Improving the giri run-time.  The current run-time library maps a 
portion of the trace file into memory, writes trace records into it, and 
then synchronously munmaps it and then maps in the next portion of the 
trace file.  This design ensures that we don't swamp memory (the 
application can produce trace records faster than the OS can write them 
to disk), but it doesn't overlap computation with I/O very well.  The 
run-time also has a static size for how many trace records to hold in 
memory before flushing to disk; this value should be computed dynamically.

In short, I think you would need to:

a) Make Giri up to date with LLVM and make it robust.
b) Profile it to see where to improve performance
c) Make improvements that improve performance or reduce trace size

>
> Secondly, to generate source code, which is human-readable and 
> compilable seems a bit ambitious. My previous scripts in python can 
> generate the source code of the original program by deleting useless 
> line of code. It's kind of tricky. We used dozens of regular 
> expressions to match the boundary of blocks, loops and functions to 
> avoid deleting lines unexpectedly. I thought employing clang was a 
> better idea. I don't know whether you think the source code genration 
> is a good idea or not. I can release our simple script and improve it 
> later. So I have more time/energy to contribute to the dynamic slicer.
>
> I think the slicing pass can be loaded by the opt. There is no need to 
> change the front-end or other passes. I'm not sure whether the link 
> time optimization can be exploited. We borrowed the LLVMSlicer code 
> previously without LTO.
The problem with loading passes into opt is that you need to have a 
whole-program bitcode file.  In practice, those can be difficult to 
build.  The LLVM instrumentation passes should either be compiled into 
Clang and run on each compilation unit or linked into libLTO and run on 
the whole program bitcode before libLTO generates native code.  Either 
of these approaches will make compiling real-world programs with Giri 
easier to do.

FWIW, we currently link the Giri passes into libLTO to ensure that each 
instrumented instruction gets its own unique ID.
>
> Lastly, I'd like to introduce myself briefly. I'm a three-years PhD
> candidate student from Tsinghua University, China. My research area 
> covers performance analysis, compiler techniques for high performance 
> computing, and parallel computing (MPI/OpenMP).
>
> One of my on-going work is to generate an I/O benchmark from the 
> original application. The base observation is that the computation and 
> communication statements can be deleted if they're irrelevant to the 
> I/O pattern, e.g. computing the buffer content to be written into a 
> file. We take use of the program slicing technique to find 
> relevant/irrelevant statements. The static slicer was borrowed from 
> LLVMSlicer and I wrote code to make it work for our application. We 
> generated the line number of sliced code. There is a very simple 
> source code generation script using ugly and tricky regular 
> expressions to delete original source code according to the sliced 
> line number. We're going to submit the first version of the paper 
> recently.
>
> I also took part in one project in Open64 compiler several year ago, 
> the purpose of which is to fast collect the communication trace. We 
> used program slicing to delete the computation statements and kept the 
> communication related statements. We generated executable 
> binaries instead of source code from IR.
>
> Now we plan to do a project that can automatically find manual 
> configuration errors in software deployment. The dynamic slicing can 
> help a lot since the input is the key factor to locate errors. We have 
> not a concrete plan for this project, but the dynamic slicing is 
> heavily needed.
Your final proposal should cite other applications that use (or could 
benefit from) dynamic slicing.  Our ASPLOS 2013 paper would be an 
example, but you should look for and cite several papers about several 
different applications from several different groups. Industry groups 
would be a plus.  Saying that one or two people use dynamic slicing 
isn't all that convincing; saying that x different groups use it, 
including y industry groups, would be far more compelling.

To get you started, you may want to look at this paper by Johnson et. 
al.: 
http://www.cs.berkeley.edu/~dawnsong/papers/2011%20diffslicing_oakland11.pdf.

-- John T.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130502/c229de22/attachment.html>

Sahoo, Swarup Kumar

2013-May-02 15:31 UTC

head link

[LLVMdev] GSoC Proposal: Inter-Procedure Program Slicing in LLVM

We already have a way to handle asynchronous events though it is not perfect.

One more improvement I can think of is handling external library calls which is
not complete for some calls.

Thanks,
Swarup.

________________________________
From: John Criswell [criswell at illinois.edu]
Sent: Thursday, May 02, 2013 10:25 AM
To: Mingliang LIU
Cc: LLVM; Sahoo, Swarup Kumar
Subject: Re: [LLVMdev] GSoC Proposal: Inter-Procedure Program Slicing in LLVM

On 5/1/13 11:36 PM, Mingliang LIU wrote:
Hi all,

I had a second thought of the dynamic slicing, as well as the source code
generating.

Firstly, the dynamic slicing is very useful to software community (I'll
illustrate more in the refined proposal later), but it's already implemented
by Swarup and John Criswell from UIUC. The static slicing code has been released
as Giri project in LLVM, and they would kindly release the dynamic slicing too
if this proposal is accepted.

I'd like to study and enhance the Giri code to make it better. However, I
don't know the exact part to which I can contribute. I'll ask Swarup (I
cc this email to him) for help. I would be honored to be directed by him if this
proposal is selected.

Moreover, I don't know how the static and dynamic slicing fit together. Does
the static code of Giri need some improvements? For example, taking into
consideration the pointer aliases. Our group need the static slicing to generate
I/O benchmarks (see below please).

It is not immediately obvious to me that dynamic slicing can be improved by a
static slicing analysis.

As far as improvements to the dynamic slicing Giri code, I can think of several:

1) Adding support to correctly handle asynchronous events (e.g., signal
handlers).

2) Updating the code to LLVM mainline and putting it into the giri SVN
repository

3) Reducing the trace size.  Giri currently records the execution of each basic
block, load and store, call, and return instruction.  There are things you can
do to make the trace smaller (e.g., creating one trace record for
control-equivalent basic blocks).

4) Making the giri run-time library thread safe and the slicing thread/process
aware.  Right now, I think the events of all processes and threads get thrown
together in one trace file.  Each should have a separate trace file, or trace
records should indicate which thread is performing a particular operation.

5) Improving the giri run-time.  The current run-time library maps a portion of
the trace file into memory, writes trace records into it, and then synchronously
munmaps it and then maps in the next portion of the trace file.  This design
ensures that we don't swamp memory (the application can produce trace
records faster than the OS can write them to disk), but it doesn't overlap
computation with I/O very well.  The run-time also has a static size for how
many trace records to hold in memory before flushing to disk; this value should
be computed dynamically.

In short, I think you would need to:

a) Make Giri up to date with LLVM and make it robust.
b) Profile it to see where to improve performance
c) Make improvements that improve performance or reduce trace size



Secondly, to generate source code, which is human-readable and compilable seems
a bit ambitious. My previous scripts in python can generate the source code of
the original program by deleting useless line of code. It's kind of tricky.
We used dozens of regular expressions to match the boundary of blocks, loops and
functions to avoid deleting lines unexpectedly. I thought employing clang was a
better idea. I don't know whether you think the source code genration is a
good idea or not. I can release our simple script and improve it later. So I
have more time/energy to contribute to the dynamic slicer.

I think the slicing pass can be loaded by the opt. There is no need to change
the front-end or other passes. I'm not sure whether the link time
optimization can be exploited. We borrowed the LLVMSlicer code previously
without LTO.

The problem with loading passes into opt is that you need to have a
whole-program bitcode file.  In practice, those can be difficult to build.  The
LLVM instrumentation passes should either be compiled into Clang and run on each
compilation unit or linked into libLTO and run on the whole program bitcode
before libLTO generates native code.  Either of these approaches will make
compiling real-world programs with Giri easier to do.

FWIW, we currently link the Giri passes into libLTO to ensure that each
instrumented instruction gets its own unique ID.


Lastly, I'd like to introduce myself briefly. I'm a three-years PhD
candidate student from Tsinghua University, China. My research area covers
performance analysis, compiler techniques for high performance computing, and
parallel computing (MPI/OpenMP).

One of my on-going work is to generate an I/O benchmark from the original
application. The base observation is that the computation and communication
statements can be deleted if they're irrelevant to the I/O pattern, e.g.
computing the buffer content to be written into a file. We take use of the
program slicing technique to find relevant/irrelevant statements. The static
slicer was borrowed from LLVMSlicer and I wrote code to make it work for our
application. We generated the line number of sliced code. There is a very simple
source code generation script using ugly and tricky regular expressions to
delete original source code according to the sliced line number. We're going
to submit the first version of the paper recently.

I also took part in one project in Open64 compiler several year ago, the purpose
of which is to fast collect the communication trace. We used program slicing to
delete the computation statements and kept the communication related statements.
We generated executable binaries instead of source code from IR.

Now we plan to do a project that can automatically find manual configuration
errors in software deployment. The dynamic slicing can help a lot since the
input is the key factor to locate errors. We have not a concrete plan for this
project, but the dynamic slicing is heavily needed.

Your final proposal should cite other applications that use (or could benefit
from) dynamic slicing.  Our ASPLOS 2013 paper would be an example, but you
should look for and cite several papers about several different applications
from several different groups.  Industry groups would be a plus.  Saying that
one or two people use dynamic slicing isn't all that convincing; saying that
x different groups use it, including y industry groups, would be far more
compelling.

To get you started, you may want to look at this paper by Johnson et. al.:
http://www.cs.berkeley.edu/~dawnsong/papers/2011%20diffslicing_oakland11.pdf.

-- John T.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130502/88732d5c/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - May 2013 - [LLVMdev] GSoC Proposal: Inter-Procedure Program Slicing in LLVM

[LLVMdev] GSoC Proposal: Inter-Procedure Program Slicing in LLVM

[LLVMdev] GSoC Proposal: Inter-Procedure Program Slicing in LLVM

[LLVMdev] GSoC Proposal: Inter-Procedure Program Slicing in LLVM

Possibly Parallel Threads