thr3ads.net - llvm dev - [LLVMdev] GSoC proposal: Common memory safety instrumentation and optimization passes for LLVM [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Ott Tinn

2012-Apr-06 01:49 UTC

[LLVMdev] GSoC proposal: Common memory safety instrumentation and optimization passes for LLVM

This is a proposal to create memory safety instrumentation and
optimization passes for LLVM.

Abstract:
The goal of this project is to modify SAFECode and AddressSanitizer
(ASAN) to use a common set of memory safety instrumentation and
optimization passes to increase code reuse. These tools and other
similar ones use varying methods to detect whether memory accesses are
safe, but are fundamentally trying to do the same thing: check whether
each memory access is safe. It is desirable to optimize away redundant
runtime checks to improve such tools' runtime performance. This means
that there is a need for shared memory safety instrumentation and
optimization passes.


Proposal:
The general idea is to make SAFECode and ASAN use the following design:
1. Add checks to memory accesses (loads, stores, and some intrinsics).
2. Run the memory safety check optimization passes.
3. Transform the remaining checks to tool-specific runtime calls.
4. Do whatever the specific tool did before.

This design would make it possible for SAFECode, ASAN, and other
similar tools to share the memory safety instrumentation and
optimization passes. The main benefit of the code reuse is that the
memory-safety-specific optimizations could be used by all such tools.

The project proposes to modify SAFECode and ASAN as a proof of
concept. It might also be useful to modify SoftBound, ThreadSanitizer,
or some other tool but I have not analysed how difficult/useful that
would be. That is why they are excluded from the current proposal.

Implementation plan:
1. Create the common instrumentation pass.
2. Add a pass to convert the common checks to ASAN-specific ones.
3. Add a pass to convert the common checks to SAFECode-specific ones.
4. Convert some of the simpler optimizations from SAFECode to run on
the common checks.
5. Add more optimizations (from SAFECode or otherwise).

The plan is to make sure that it is possible to commit early and often
without breaking anything (unless absolutely needed). The conversion
passes are needed to make the tool work but a side-effect is that the
existing tool-specific optimizations should continue working without
changes.

The "simpler" optimizations are defined to be the ones that are easy
for humans to verify and do not have large extra dependencies like
Poolalloc or SMT solvers.

Optimizations that will definitely be implemented such that they work
on the common memory safety checks (milestone 3 or 4):
* Remove obviously redundant checks in the same basic block.
* Remove unnecessary constant checks to global variables / allocas.
* Combine struct member checks in the same basic block.
* Hoisting constant checks from loops.
* Something more.

An additional plan that outlines the optimizations to be added in the
later part of the program will be produced and agreed upon before the
mid-term evaluations. The general idea is to add slightly more
complicated optimizations that are useful in practice rather than
large and complicated optimizations that are difficult to verify by
humans.

Timeline:
Milestone 1 (June 1): The common instrumentation pass works and there
are tests to verify it.
Milestone 2 (June 22): The tool-specific conversion passes work and
there are tests to verify it.
Milestone 3 (July 6): Some simple optimization passes from SAFECode
work on the common checks; there are unit tests to verify that.
Finished (and agreed upon) a specific plan that outlines which
optimizations will be converted / created for milestone 4.
Mid-term evaluations deadline (July 13)
Milestone 4 (August 13): Added more optimizations (and relevant unit
tests) as specified in the additional plan.
Firm 'pencils down' date (August 20): More testing and documentation.

Basically the idea is to produce something practically useful and
thoroughly tested that will definitely be done in time.


Contact information:
Included in the official submission.


Interesting to me:
I am generally interested in developing bug finding/detecting systems
but this project would also have been useful to me for a project I
completed previously (see the experience section). I have previously
used SAFECode for automatically checking whether a program has a
buffer overflow on a specific run.  I was interested in reusing the
static memory safety optimization parts of SAFECode but it seemed to
be too tightly integrated to be easily reused for my purposes.


Useful for LLVM:
This project would be useful for LLVM in general because it would make
it easier to develop memory safety tools based on LLVM because of the
available relevant transforms. Reducing the amount of code each
subproject has to add should make it more likely that the subprojects
stay compatible with the latest LLVM changes.

It would be useful for ASAN mostly because the optimizations should
reduce the runtime overhead.

It would be useful for SAFECode because the code should become a bit
more modular and there should be more code reuse. The extra testing
and shared code should make it easier to keep up with the changes in
LLVM because there would be more people who are interested in that
being the case.

It would be useful for both ASAN and SAFECode because optimizations
based on the common instrumentation would be useful for both of them.


Relevant experience:
I created a tool based on LLVM and KLEE that aimed to optimize a
specific type of C++ programs such that they would crash on exactly
the same inputs as before the optimizations. This made the system find
inputs on which the programs crashed faster than before. Most of the
project was about creating LLVM passes that might make the bug finding
process faster while retaining that property.

One part of the system was adding and later removing memory safety
checks. That was necessary because a significant part of the code
became otherwise unused after aggressively transforming / essentially
removing output calls (printf, the cout stream, etc.) and the aim was
to still detect invalid but unused memory accesses.

I successfully participated in GSoC 2011 by creating an AI player for
an open source RTS game, Unknown Horizons (written in Python). I have
continued to contribute to that project so far.

Kostya Serebryany

2012-Apr-06 05:50 UTC

head link

[LLVMdev] GSoC proposal: Common memory safety instrumentation and optimization passes for LLVM

I'd like some similar work to be done, although I view it a bit
differently.
This might be a separate analysis pass that knows nothing about ASAN or
SAFECode
and appends metadata nodes to memory access instructions saying things like
   - this access can not go out of buffer bounds
   - this access can not touch free-ed memory
   - this access can not participate in a race
   - this read always reads initialized memory
Then the actual instrumentation passes will simply consult the metadata.

Equally important would be an exhaustive test suite.
Not sure if it should be in LLVM IR or in C (if in C, other compilers will
benefit too).

On Thu, Apr 5, 2012 at 6:49 PM, Ott Tinn <llvm at otinn.com> wrote:
> This is a proposal to create memory safety instrumentation and
> optimization passes for LLVM.
>
> Abstract:
> The goal of this project is to modify SAFECode and AddressSanitizer
> (ASAN) to use a common set of memory safety instrumentation and
> optimization passes to increase code reuse. These tools and other
> similar ones use varying methods to detect whether memory accesses are
> safe, but are fundamentally trying to do the same thing: check whether
> each memory access is safe. It is desirable to optimize away redundant
> runtime checks to improve such tools' runtime performance. This means
> that there is a need for shared memory safety instrumentation and
> optimization passes.
>
>
> Proposal:
> The general idea is to make SAFECode and ASAN use the following design:
> 1. Add checks to memory accesses (loads, stores, and some intrinsics).
> 2. Run the memory safety check optimization passes.
> 3. Transform the remaining checks to tool-specific runtime calls.
> 4. Do whatever the specific tool did before.
>
> This design would make it possible for SAFECode, ASAN, and other
> similar tools to share the memory safety instrumentation and
> optimization passes. The main benefit of the code reuse is that the
> memory-safety-specific optimizations could be used by all such tools.
>
> The project proposes to modify SAFECode and ASAN as a proof of
> concept. It might also be useful to modify SoftBound, ThreadSanitizer,
> or some other tool but I have not analysed how difficult/useful that
> would be. That is why they are excluded from the current proposal.
>
> Implementation plan:
> 1. Create the common instrumentation pass.
> 2. Add a pass to convert the common checks to ASAN-specific ones.
> 3. Add a pass to convert the common checks to SAFECode-specific ones.
> 4. Convert some of the simpler optimizations from SAFECode to run on
> the common checks.
> 5. Add more optimizations (from SAFECode or otherwise).
>
> The plan is to make sure that it is possible to commit early and often
> without breaking anything (unless absolutely needed). The conversion
> passes are needed to make the tool work but a side-effect is that the
> existing tool-specific optimizations should continue working without
> changes.
>
> The "simpler" optimizations are defined to be the ones that are
easy
> for humans to verify and do not have large extra dependencies like
> Poolalloc or SMT solvers.
>
> Optimizations that will definitely be implemented such that they work
> on the common memory safety checks (milestone 3 or 4):
> * Remove obviously redundant checks in the same basic block.
> * Remove unnecessary constant checks to global variables / allocas.
> * Combine struct member checks in the same basic block.
>
Beware, that some of such cases will be covered by GVN (load widening,
etc). Although some will not.
E.g.
struct S {
  int alignment;
  short a, b;
};

S *x;
...
x->a = ...
... = x->b

These two accesses can be combined for ASAN, but not for TSAN.


> * Hoisting constant checks from loops.
>
In most cases, this should be handled by general LLVM loop invariant code
motion.

> * Something more.
>
> An additional plan that outlines the optimizations to be added in the
> later part of the program will be produced and agreed upon before the
> mid-term evaluations. The general idea is to add slightly more
> complicated optimizations that are useful in practice rather than
> large and complicated optimizations that are difficult to verify by
> humans.
>
> Timeline:
> Milestone 1 (June 1): The common instrumentation pass works and there
> are tests to verify it.
> Milestone 2 (June 22): The tool-specific conversion passes work and
> there are tests to verify it.
> Milestone 3 (July 6): Some simple optimization passes from SAFECode
> work on the common checks; there are unit tests to verify that.
> Finished (and agreed upon) a specific plan that outlines which
> optimizations will be converted / created for milestone 4.
> Mid-term evaluations deadline (July 13)
> Milestone 4 (August 13): Added more optimizations (and relevant unit
> tests) as specified in the additional plan.
> Firm 'pencils down' date (August 20): More testing and
documentation.
>
> Basically the idea is to produce something practically useful and
> thoroughly tested that will definitely be done in time.
>
>
> Contact information:
> Included in the official submission.
>
>
> Interesting to me:
> I am generally interested in developing bug finding/detecting systems
> but this project would also have been useful to me for a project I
> completed previously (see the experience section). I have previously
> used SAFECode for automatically checking whether a program has a
> buffer overflow on a specific run.  I was interested in reusing the
> static memory safety optimization parts of SAFECode but it seemed to
> be too tightly integrated to be easily reused for my purposes.
>
>
> Useful for LLVM:
> This project would be useful for LLVM in general because it would make
> it easier to develop memory safety tools based on LLVM because of the
> available relevant transforms. Reducing the amount of code each
> subproject has to add should make it more likely that the subprojects
> stay compatible with the latest LLVM changes.
>
> It would be useful for ASAN mostly because the optimizations should
> reduce the runtime overhead.
>
> It would be useful for SAFECode because the code should become a bit
> more modular and there should be more code reuse. The extra testing
> and shared code should make it easier to keep up with the changes in
> LLVM because there would be more people who are interested in that
> being the case.
>
> It would be useful for both ASAN and SAFECode because optimizations
> based on the common instrumentation would be useful for both of them.
>
>
> Relevant experience:
> I created a tool based on LLVM and KLEE that aimed to optimize a
> specific type of C++ programs such that they would crash on exactly
> the same inputs as before the optimizations. This made the system find
> inputs on which the programs crashed faster than before. Most of the
> project was about creating LLVM passes that might make the bug finding
> process faster while retaining that property.
>
> One part of the system was adding and later removing memory safety
> checks. That was necessary because a significant part of the code
> became otherwise unused after aggressively transforming / essentially
> removing output calls (printf, the cout stream, etc.) and the aim was
> to still detect invalid but unused memory accesses.
>
> I successfully participated in GSoC 2011 by creating an AI player for
> an open source RTS game, Unknown Horizons (written in Python). I have
> continued to contribute to that project so far.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120405/028e78c1/attachment.html>

John Criswell

2012-Apr-06 14:50 UTC

head link

[LLVMdev] GSoC proposal: Common memory safety instrumentation and optimization passes for LLVM

On 4/6/12 12:50 AM, Kostya Serebryany wrote:> I'd like some similar work to be done, although I view it a bit 
> differently.
> This might be a separate analysis pass that knows nothing about ASAN 
> or SAFECode
> and appends metadata nodes to memory access instructions saying things 
> like
This is a good idea but is the wrong way to implement the idea.  LLVM 
passes are not required to preserve metadata, and even if they were 
required to do so, there would always be a pass with a bug that would 
fail to preserve the metadata properly.  It's an approach that can lead 
to undesired headaches.  Furthermore, you're not guaranteed that an 
instruction that was deemed safe earlier is safe after transformation; 
there are optimizations that LLVM can do on C code exhibiting undefined 
behavior that can change it from memory safe to memory-unsafe code.

The correct way to do this is by writing generic LLVM analysis passes 
that compute this information and can be queried by 
SAFECode/ASAN-specific instrumentation and optimization passes.  In this 
fashion, the LLVM pass manager will re-run the analyses if an earlier 
transform comes along and modifies the IR.  It is also far more robust 
since analysis information cannot be discarded by LLVM passes written by 
other people.
>    - this access can not go out of buffer bounds
>    - this access can not touch free-ed memory
>    - this access can not participate in a race
>    - this read always reads initialized memory
> Then the actual instrumentation passes will simply consult the metadata.
One of the design principles I've been trying to follow in refactoring 
SAFECode is that we have dumb instrumentation passes that just 
instrument everything followed by optimization passes that remove 
run-time checks that are unnecessary.  This follows the 
compiler-building principle called Separation of Concerns, and it's 
useful in tools like SAFECode because it allows us to easily turn 
optimizations on/off by running/not running individual passes.  This 
makes performance analysis easier (we can see the effect of an 
optimization by not running a pass), and it makes it possible for 
bugpoint to figure out which optimization is causing a program to break.

SAFECode used to have a single instrumentation pass that inserted both 
load/store and GEP checks with various options to enable/disable 
optimizations.  It made the code complex and difficult to read.  The new 
passes are reusable and so blindingly simple that a child can understand 
what they do.  I highly recommend that ASAN not make the mistake that 
SAFECode originally made.

Finally, the common infrastructure idea I was talking about on the 
SAFECode open projects page is to have a common set of run-time check 
function names and set of instrumentation passes to add them and 
optimize them.  In this way, SAFECode/SoftBound/ASAN can share not only 
the same analysis passes (e.g., an always-safe load/store analysis) but 
the actual optimization and instrumentation passes, too.  SAFECode/ASAN 
specific transforms can be run after the generic instrumentation passes 
to specialize the checks for the specific tool (e.g., SAFECode would 
have a pass that adds pool handles to the run-time checks).
>
> Equally important would be an exhaustive test suite.
> Not sure if it should be in LLVM IR or in C (if in C, other compilers 
> will benefit too).
Wilander has a new suite of tests out that might be useful.

-- John T.
>
> On Thu, Apr 5, 2012 at 6:49 PM, Ott Tinn <llvm at otinn.com 
> <mailto:llvm at otinn.com>> wrote:
>
>     This is a proposal to create memory safety instrumentation and
>     optimization passes for LLVM.
>
>     Abstract:
>     The goal of this project is to modify SAFECode and AddressSanitizer
>     (ASAN) to use a common set of memory safety instrumentation and
>     optimization passes to increase code reuse. These tools and other
>     similar ones use varying methods to detect whether memory accesses are
>     safe, but are fundamentally trying to do the same thing: check whether
>     each memory access is safe. It is desirable to optimize away redundant
>     runtime checks to improve such tools' runtime performance. This
means
>     that there is a need for shared memory safety instrumentation and
>     optimization passes.
>
>
>     Proposal:
>     The general idea is to make SAFECode and ASAN use the following
>     design:
>     1. Add checks to memory accesses (loads, stores, and some intrinsics).
>     2. Run the memory safety check optimization passes.
>     3. Transform the remaining checks to tool-specific runtime calls.
>     4. Do whatever the specific tool did before.
>
>     This design would make it possible for SAFECode, ASAN, and other
>     similar tools to share the memory safety instrumentation and
>     optimization passes. The main benefit of the code reuse is that the
>     memory-safety-specific optimizations could be used by all such tools.
>
>     The project proposes to modify SAFECode and ASAN as a proof of
>     concept. It might also be useful to modify SoftBound, ThreadSanitizer,
>     or some other tool but I have not analysed how difficult/useful that
>     would be. That is why they are excluded from the current proposal.
>
>     Implementation plan:
>     1. Create the common instrumentation pass.
>     2. Add a pass to convert the common checks to ASAN-specific ones.
>     3. Add a pass to convert the common checks to SAFECode-specific ones.
>     4. Convert some of the simpler optimizations from SAFECode to run on
>     the common checks.
>     5. Add more optimizations (from SAFECode or otherwise).
>
>     The plan is to make sure that it is possible to commit early and often
>     without breaking anything (unless absolutely needed). The conversion
>     passes are needed to make the tool work but a side-effect is that the
>     existing tool-specific optimizations should continue working without
>     changes.
>
>     The "simpler" optimizations are defined to be the ones that
are easy
>     for humans to verify and do not have large extra dependencies like
>     Poolalloc or SMT solvers.
>
>     Optimizations that will definitely be implemented such that they work
>     on the common memory safety checks (milestone 3 or 4):
>     * Remove obviously redundant checks in the same basic block.
>     * Remove unnecessary constant checks to global variables / allocas.
>     * Combine struct member checks in the same basic block.
>
>
> Beware, that some of such cases will be covered by GVN (load widening, 
> etc). Although some will not.
> E.g.
> struct S {
>   int alignment;
>   short a, b;
> };
> S *x;
> ...
> x->a = ...
> ... = x->b
>
> These two accesses can be combined for ASAN, but not for TSAN.
>
>     * Hoisting constant checks from loops.
>
>
> In most cases, this should be handled by general LLVM loop invariant 
> code motion.
>
>     * Something more.
>
>     An additional plan that outlines the optimizations to be added in the
>     later part of the program will be produced and agreed upon before the
>     mid-term evaluations. The general idea is to add slightly more
>     complicated optimizations that are useful in practice rather than
>     large and complicated optimizations that are difficult to verify by
>     humans.
>
>     Timeline:
>     Milestone 1 (June 1): The common instrumentation pass works and there
>     are tests to verify it.
>     Milestone 2 (June 22): The tool-specific conversion passes work and
>     there are tests to verify it.
>     Milestone 3 (July 6): Some simple optimization passes from SAFECode
>     work on the common checks; there are unit tests to verify that.
>     Finished (and agreed upon) a specific plan that outlines which
>     optimizations will be converted / created for milestone 4.
>     Mid-term evaluations deadline (July 13)
>     Milestone 4 (August 13): Added more optimizations (and relevant unit
>     tests) as specified in the additional plan.
>     Firm 'pencils down' date (August 20): More testing and
documentation.
>
>     Basically the idea is to produce something practically useful and
>     thoroughly tested that will definitely be done in time.
>
>
>     Contact information:
>     Included in the official submission.
>
>
>     Interesting to me:
>     I am generally interested in developing bug finding/detecting systems
>     but this project would also have been useful to me for a project I
>     completed previously (see the experience section). I have previously
>     used SAFECode for automatically checking whether a program has a
>     buffer overflow on a specific run.  I was interested in reusing the
>     static memory safety optimization parts of SAFECode but it seemed to
>     be too tightly integrated to be easily reused for my purposes.
>
>
>     Useful for LLVM:
>     This project would be useful for LLVM in general because it would make
>     it easier to develop memory safety tools based on LLVM because of the
>     available relevant transforms. Reducing the amount of code each
>     subproject has to add should make it more likely that the subprojects
>     stay compatible with the latest LLVM changes.
>
>     It would be useful for ASAN mostly because the optimizations should
>     reduce the runtime overhead.
>
>     It would be useful for SAFECode because the code should become a bit
>     more modular and there should be more code reuse. The extra testing
>     and shared code should make it easier to keep up with the changes in
>     LLVM because there would be more people who are interested in that
>     being the case.
>
>     It would be useful for both ASAN and SAFECode because optimizations
>     based on the common instrumentation would be useful for both of them.
>
>
>     Relevant experience:
>     I created a tool based on LLVM and KLEE that aimed to optimize a
>     specific type of C++ programs such that they would crash on exactly
>     the same inputs as before the optimizations. This made the system find
>     inputs on which the programs crashed faster than before. Most of the
>     project was about creating LLVM passes that might make the bug finding
>     process faster while retaining that property.
>
>     One part of the system was adding and later removing memory safety
>     checks. That was necessary because a significant part of the code
>     became otherwise unused after aggressively transforming / essentially
>     removing output calls (printf, the cout stream, etc.) and the aim was
>     to still detect invalid but unused memory accesses.
>
>     I successfully participated in GSoC 2011 by creating an AI player for
>     an open source RTS game, Unknown Horizons (written in Python). I have
>     continued to contribute to that project so far.
>     _______________________________________________
>     LLVM Developers mailing list
>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>     http://llvm.cs.uiuc.edu
>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120406/89fb7213/attachment.html>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Apr 2012 - [LLVMdev] GSoC proposal: Common memory safety instrumentation and optimization passes for LLVM

[LLVMdev] GSoC proposal: Common memory safety instrumentation and optimization passes for LLVM

[LLVMdev] GSoC proposal: Common memory safety instrumentation and optimization passes for LLVM

[LLVMdev] GSoC proposal: Common memory safety instrumentation and optimization passes for LLVM

Reasonably Related Threads