thr3ads.net - llvm dev - [LLVMdev] Efficient instrumentation of loads and stores [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Minjang Kim

2011-Mar-21 18:31 UTC

[LLVMdev] Efficient instrumentation of loads and stores

Hello, I'd like to listen your opinions regarding my research with LLVM.

My work is a dynamic analysis of data dependences [1]. Briefly speaking, I'm
instrumenting memory loads/stores and loop entries/exits/back edges, and
then calculating data dependences in runtime, especially focusing on
loop-carried dependences.

So far, I have been working with a binary-level instrumentation tool (Pin).
However, doing sophisticated static analysis with binaries is really
daunting, so I'm porting my code to LLVM.

The implementation for LLVM is actually simple because my core analysis
module is already orthogonal to instrumentation frameworks. So, all I need
to do is inserting calls of stub functions that send events to the analysis
module (e.g., loop A has been started).

First, instrumenting loop events such as entry/exit/back edges was very
straightforward and simple, comparing to my previous binary-level approach,
though there was quite a learning curve to use LLVM methods correctly and
efficiently. FYI, in a binary-level approach, even extracting loops is
challenging. No single pre-header and hard to capture loop exits as well.


However, instrumenting loads and stores efficiently is somewhat tricky. An
obvious way is instrumenting every loads and stores, which is what I did
with binaries. But, I'd like to filter loads and stores whose dependences
can be decided in static time to minimize runtime overhead. Notable examples
would be (1) induction variables and (2) local temporary variables:

for (int i = 0; i < N; ++i) {
  int temp = 10;
  ...
}

In the above example, it is obvious 'i' and 'temp' do not need
to be
analyzed in runtime. So, I'm writing the code that skip instrumentations for
such local scalar variables.

I do believe this is doable. But, as a non-expert in static and
compiler-level analysis, implementing such instrumentation code
isn't straightforward so far.

One reason that makes this problem tricky would be I can't do any
significant optimizations before instrumentation, because optimizations
heavily change code such as loop interchanges, instruction reordering and
elimination of variables. I'm currently doing instrumentation before any
optimizations.

Such restriction makes some problems: before any optimizations, the IR code
isn't a SSA-form, so sophisticated loop analysis is impossible such as
identifying induction variable easily. Without SSA form, there are bunch of
loads and stores, so extracting such variables require dirty hack.

I think I can implement this loads/stores instrumentation code anyway.
However, I'd like to hear opinions from experts on this domain. What would
be the most elegant and best approach for this problem? I will appreciate
any comments for my work.


Thank you for reading such a long article,
Minjang

[1] http://www.cc.gatech.edu/~minjang/micro10-mjkim.pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110321/c95aed00/attachment.html>

Reid Kleckner

2011-Mar-21 19:20 UTC

head link

[LLVMdev] Efficient instrumentation of loads and stores

Hello Minjang,

Exactly what information do you need to get in your instrumentation?  Is it
essentially all loop entry-continue-exit events, interspersed with the
memory access trace (address, read/write, pc)?

It seems the Thread Sanitizer people have already done something similar:
http://code.google.com/p/data-race-test/source/browse/trunk/llvm/opt/ThreadSanitizer/ThreadSanitizer.cpp

Not sure what state it's in, I was just browsing their repo.

I would say it's a good idea to do your instrumentation after the
optimization of the program.  The optimized version is probably closer to
what would have been produced in the final binary than the unoptimized IR.
 If you don't run the optimizations, you'll have to do a lot of work to
sort
out what memory accesses are obviously not dependent on the previous
iteration.  Optimizations will clean this up a lot.

The only issue then is that there are some dependencies that stay only in
registers and not in memory, like the temporary in a summation loop.  It
will be a phi node between the base case and the last iteration.  There
should be some machinery in LLVM for finding these loop carried
dependencies, and you should be able to record them in your instrumentation
another way.

---

Finally, if you're strongly concerned with the performance of your tool, I
think doing something similar to PiPA with DynamoRIO would give you the best
performance:
http://dynamorio.org/pubs/PiPA-pipelined-profiling-cgo08.pdf

Disclaimer: I work on DynamoRIO.  :)

It would be a very large amount of work, but I think the approach is very
clever.  The gist of it is that you try to precompute a table of all of the
memory access offsets, read-write info, and application pc for every BB, and
you record only the base registers and the id of the bb's table every time
the basic block is executed.  Beyond that, you just pipeline the analysis,
which is straightforward.

Good luck,
Reid

On Mon, Mar 21, 2011 at 11:31 AM, Minjang Kim <minjang at gatech.edu>
wrote:
> Hello, I'd like to listen your opinions regarding my research with
LLVM.
>
> My work is a dynamic analysis of data dependences [1]. Briefly speaking,
> I'm instrumenting memory loads/stores and loop entries/exits/back
edges, and
> then calculating data dependences in runtime, especially focusing on
> loop-carried dependences.
>
> So far, I have been working with a binary-level instrumentation tool (Pin).
> However, doing sophisticated static analysis with binaries is really
> daunting, so I'm porting my code to LLVM.
>
> The implementation for LLVM is actually simple because my core analysis
> module is already orthogonal to instrumentation frameworks. So, all I need
> to do is inserting calls of stub functions that send events to the analysis
> module (e.g., loop A has been started).
>
> First, instrumenting loop events such as entry/exit/back edges was very
> straightforward and simple, comparing to my previous binary-level approach,
> though there was quite a learning curve to use LLVM methods correctly and
> efficiently. FYI, in a binary-level approach, even extracting loops is
> challenging. No single pre-header and hard to capture loop exits as well.
>
>
> However, instrumenting loads and stores efficiently is somewhat tricky. An
> obvious way is instrumenting every loads and stores, which is what I did
> with binaries. But, I'd like to filter loads and stores whose
dependences
> can be decided in static time to minimize runtime overhead. Notable
examples
> would be (1) induction variables and (2) local temporary variables:
>
> for (int i = 0; i < N; ++i) {
>   int temp = 10;
>   ...
> }
>
> In the above example, it is obvious 'i' and 'temp' do not
need to be
> analyzed in runtime. So, I'm writing the code that skip
instrumentations for
> such local scalar variables.
>
> I do believe this is doable. But, as a non-expert in static and
> compiler-level analysis, implementing such instrumentation code
> isn't straightforward so far.
>
> One reason that makes this problem tricky would be I can't do any
> significant optimizations before instrumentation, because optimizations
> heavily change code such as loop interchanges, instruction reordering and
> elimination of variables. I'm currently doing instrumentation before
any
> optimizations.
>
> Such restriction makes some problems: before any optimizations, the IR code
> isn't a SSA-form, so sophisticated loop analysis is impossible such as
> identifying induction variable easily. Without SSA form, there are bunch of
> loads and stores, so extracting such variables require dirty hack.
>
> I think I can implement this loads/stores instrumentation code anyway.
> However, I'd like to hear opinions from experts on this domain. What
would
> be the most elegant and best approach for this problem? I will appreciate
> any comments for my work.
>
>
> Thank you for reading such a long article,
> Minjang
>
> [1] http://www.cc.gatech.edu/~minjang/micro10-mjkim.pdf
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110321/9648c9dd/attachment.html>

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Mar 2011 - [LLVMdev] Efficient instrumentation of loads and stores

[LLVMdev] Efficient instrumentation of loads and stores

[LLVMdev] Efficient instrumentation of loads and stores

Maybe Matching Threads