Kostya Serebryany
2012-Jun-18 10:39 UTC
[LLVMdev] MemorySanitizer, a tool that finds uninitialized reads and more
Hello llvmdev, I would like to propose and discuss yet another dynamic tool, which we call MemorySanitizer (msan). The main goal of the tool is to detect uses of uninitialized memory (the major feature of Valgrind/Memcheck not covered by AddressSanitizer). It will also find use-after-destruction-but-before-free in C++. The algorithm of the tool is similar to that of Memcheck ( http://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward.pdf ). We associate a few shadow bits with every byte of the application memory, poison the shadow of the malloc-ed or alloca-ed memory, load the shadow bits on every memory read, propagate the shadow bits through some of the arithmetic instruction (including MOV), store the shadow bits on every memory write, report a bug on some other instructions (e.g. JMP) if the associated shadow is poisoned. But there are differences too. The first and the major one: compiler instrumentation instead of binary instrumentation. This gives us much better register allocation (function-wide instead of local), possible compiler optimizations (static analysis can prove that some accesses always read initialized memory), and a fast start-up. Our preliminary measurements show 3x-4x slowdown; compare it to Memchecks's 20x and DrMemory's 10x. (See http://groups.csail.mit.edu/commit/papers/2011/bruening-cgo11-drmemory.pdf for those numbers). But this brings the major issue as well: msan needs to see all program events, including system calls and reads/writes in system libraries, so we either need to compile *everything* with msan or use a binary translation component to instrument pre-built libraries (with DynamoRIO? PIN?). Question: is there any usable project in LLVM land which performs binary instrumentation (x86->LLVM->x86), either statically or dynamically? Another difference from Memcheck is that we propose to use 8 shadow bits per byte of application memory and use a direct shadow mapping (for 64-bit linux that is just clearing 46-th bit of the application memory address). This greatly simplifies the instrumentation code and avoids races on shadow updates (Memcheck is single-threaded so races are not a concern there. Memcheck uses 2 shadow bits per byte with a slow path storage that uses 8 bits per byte). Suggestions? Objections? Unless there is a general resentment against msan, we will soon start sending the code for review. (we already have a bit messy implementation, which at the top level looks very much like asan and tsan, and even shares some code with them. The major difference here is that the compiler part is relatively more complicated than asan/tsan and run-time part is very simple). FAQ: Q. Why can't we combine msan and asan? A: Valgrind/Memcheck and DrMemory do exactly that -- and pay large performance and memory costs. Addressability checker (like asan) requires little shadow memory, but needs large redzone around allocated objects. Tools that track uninitialized/tainted data need bit-per-bit shadow in worst case, but don't need redzones. So, if we merge the tools together we multiply the memory overheads. The instrumentation costs in a combined tool are mostly added to each other (e.g. asan needs to poison redzones and msan needs to propagate shadow through arithmetic insns). Thanks, --kcc -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120618/ded615c4/attachment.html>
Joerg Sonnenberger
2012-Jun-18 13:07 UTC
[LLVMdev] MemorySanitizer, a tool that finds uninitialized reads and more
On Mon, Jun 18, 2012 at 02:39:34PM +0400, Kostya Serebryany wrote:> Another difference from Memcheck is that we propose to use 8 shadow bits > per byte of application memory and use a > direct shadow mapping (for 64-bit linux that is just clearing 46-th bit of > the application memory address). > This greatly simplifies the instrumentation code and avoids races on shadow > updates > (Memcheck is single-threaded so races are not a concern there. > Memcheck uses 2 shadow bits per byte with a slow path storage that uses 8 > bits per byte).Can you make it possible for ASAN to share the same layout? I expect that both will often be used together... Joerg
Kostya Serebryany
2012-Jun-18 13:19 UTC
[LLVMdev] MemorySanitizer, a tool that finds uninitialized reads and more
On Mon, Jun 18, 2012 at 5:07 PM, Joerg Sonnenberger <joerg at britannica.bec.de> wrote:> On Mon, Jun 18, 2012 at 02:39:34PM +0400, Kostya Serebryany wrote: > > Another difference from Memcheck is that we propose to use 8 shadow bits > > per byte of application memory and use a > > direct shadow mapping (for 64-bit linux that is just clearing 46-th bit > of > > the application memory address). > > This greatly simplifies the instrumentation code and avoids races on > shadow > > updates > > (Memcheck is single-threaded so races are not a concern there. > > Memcheck uses 2 shadow bits per byte with a slow path storage that uses 8 > > bits per byte). > > Can you make it possible for ASAN to share the same layout?Not sure I understand you. What layout?> I expect > that both will often be used together... >> > Joerg > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120618/56ee5ee2/attachment.html>
Kostya Serebryany
2012-Jun-19 07:34 UTC
[LLVMdev] MemorySanitizer, a tool that finds uninitialized reads and more
I've just sent a code review request to llvm-commits. --kcc On Mon, Jun 18, 2012 at 2:39 PM, Kostya Serebryany <kcc at google.com> wrote:> Hello llvmdev, > > I would like to propose and discuss yet another dynamic tool, which we > call MemorySanitizer (msan). > The main goal of the tool is to detect uses of uninitialized memory (the > major feature of Valgrind/Memcheck not covered by AddressSanitizer). > It will also find use-after-destruction-but-before-free in C++. > > The algorithm of the tool is similar to that of Memcheck ( > http://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward.pdf > ). > We associate a few shadow bits with every byte of the application memory, > poison the shadow of the malloc-ed or alloca-ed memory, > load the shadow bits on every memory read, > propagate the shadow bits through some of the arithmetic instruction > (including MOV), > store the shadow bits on every memory write, > report a bug on some other instructions (e.g. JMP) if the associated > shadow is poisoned. > > But there are differences too. > > The first and the major one: compiler instrumentation instead of binary > instrumentation. > This gives us much better register allocation (function-wide instead of > local), > possible compiler optimizations (static analysis can prove that some > accesses always read initialized memory), > and a fast start-up. > Our preliminary measurements show 3x-4x slowdown; compare it to > Memchecks's 20x and DrMemory's 10x. > (See > http://groups.csail.mit.edu/commit/papers/2011/bruening-cgo11-drmemory.pdf for > those numbers). > But this brings the major issue as well: msan needs to see all program > events, including system calls and reads/writes in system libraries, > so we either need to compile *everything* with msan or use a binary > translation component to instrument pre-built libraries (with DynamoRIO? > PIN?). > > Question: is there any usable project in LLVM land which performs binary > instrumentation (x86->LLVM->x86), either statically or dynamically? > > Another difference from Memcheck is that we propose to use 8 shadow bits > per byte of application memory and use a > direct shadow mapping (for 64-bit linux that is just clearing 46-th bit of > the application memory address). > This greatly simplifies the instrumentation code and avoids races on > shadow updates > (Memcheck is single-threaded so races are not a concern there. > Memcheck uses 2 shadow bits per byte with a slow path storage that uses 8 > bits per byte). > > Suggestions? Objections? > Unless there is a general resentment against msan, we will soon start > sending the code for review. > (we already have a bit messy implementation, which at the top level looks > very much like asan and tsan, and even shares some code with them. > The major difference here is that the compiler part is relatively more > complicated than asan/tsan and run-time part is very simple). > > > FAQ: > Q. Why can't we combine msan and asan? > A: Valgrind/Memcheck and DrMemory do exactly that -- and pay large > performance and memory costs. > Addressability checker (like asan) requires little shadow memory, > but needs large redzone around allocated objects. > Tools that track uninitialized/tainted data need bit-per-bit shadow > in worst case, but don't need redzones. > So, if we merge the tools together we multiply the memory overheads. > The instrumentation costs in a combined tool are mostly added to > each other (e.g. asan needs to poison redzones and msan needs to propagate > shadow through arithmetic insns). > > Thanks, > > --kcc >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120619/842a4ff0/attachment.html>
Kostya Serebryany
2012-Oct-16 06:48 UTC
[LLVMdev] MemorySanitizer, a tool that finds uninitialized reads and more
Hi again, MemorySanitizer (msan) is now mature enough to bootstrap LLVM and run it w/o any additional tools. Msan has already found one bug in LLVM itself: http://llvm.org/bugs/show_bug.cgi?id=13929 Would anyone be willing to do a codereview (it was sent to llvm-commits: http://permalink.gmane.org/gmane.comp.compilers.llvm.cvs/123253) Thanks, --kcc On Tue, Jun 19, 2012 at 11:34 AM, Kostya Serebryany <kcc at google.com> wrote:> I've just sent a code review request to llvm-commits. > > --kcc > > > On Mon, Jun 18, 2012 at 2:39 PM, Kostya Serebryany <kcc at google.com> wrote: > >> Hello llvmdev, >> >> I would like to propose and discuss yet another dynamic tool, which we >> call MemorySanitizer (msan). >> The main goal of the tool is to detect uses of uninitialized memory (the >> major feature of Valgrind/Memcheck not covered by AddressSanitizer). >> It will also find use-after-destruction-but-before-free in C++. >> >> The algorithm of the tool is similar to that of Memcheck ( >> http://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward.pdf >> ). >> We associate a few shadow bits with every byte of the application memory, >> poison the shadow of the malloc-ed or alloca-ed memory, >> load the shadow bits on every memory read, >> propagate the shadow bits through some of the arithmetic instruction >> (including MOV), >> store the shadow bits on every memory write, >> report a bug on some other instructions (e.g. JMP) if the associated >> shadow is poisoned. >> >> But there are differences too. >> >> The first and the major one: compiler instrumentation instead of binary >> instrumentation. >> This gives us much better register allocation (function-wide instead of >> local), >> possible compiler optimizations (static analysis can prove that some >> accesses always read initialized memory), >> and a fast start-up. >> Our preliminary measurements show 3x-4x slowdown; compare it to >> Memchecks's 20x and DrMemory's 10x. >> (See >> http://groups.csail.mit.edu/commit/papers/2011/bruening-cgo11-drmemory.pdf for >> those numbers). >> But this brings the major issue as well: msan needs to see all program >> events, including system calls and reads/writes in system libraries, >> so we either need to compile *everything* with msan or use a binary >> translation component to instrument pre-built libraries (with DynamoRIO? >> PIN?). >> >> Question: is there any usable project in LLVM land which performs binary >> instrumentation (x86->LLVM->x86), either statically or dynamically? >> >> Another difference from Memcheck is that we propose to use 8 shadow bits >> per byte of application memory and use a >> direct shadow mapping (for 64-bit linux that is just clearing 46-th bit >> of the application memory address). >> This greatly simplifies the instrumentation code and avoids races on >> shadow updates >> (Memcheck is single-threaded so races are not a concern there. >> Memcheck uses 2 shadow bits per byte with a slow path storage that uses 8 >> bits per byte). >> >> Suggestions? Objections? >> Unless there is a general resentment against msan, we will soon start >> sending the code for review. >> (we already have a bit messy implementation, which at the top level looks >> very much like asan and tsan, and even shares some code with them. >> The major difference here is that the compiler part is relatively more >> complicated than asan/tsan and run-time part is very simple). >> >> >> FAQ: >> Q. Why can't we combine msan and asan? >> A: Valgrind/Memcheck and DrMemory do exactly that -- and pay large >> performance and memory costs. >> Addressability checker (like asan) requires little shadow memory, >> but needs large redzone around allocated objects. >> Tools that track uninitialized/tainted data need bit-per-bit shadow >> in worst case, but don't need redzones. >> So, if we merge the tools together we multiply the memory overheads. >> The instrumentation costs in a combined tool are mostly added to >> each other (e.g. asan needs to poison redzones and msan needs to propagate >> shadow through arithmetic insns). >> >> Thanks, >> >> --kcc >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121016/352b0490/attachment.html>
Apparently Analagous Threads
- [LLVMdev] MemorySanitizer, a tool that finds uninitialized reads and more
- [LLVMdev] MemorySanitizer, a tool that finds uninitialized reads and more
- [LLVMdev] MemorySanitizer, a tool that finds uninitialized reads and more
- [LLVMdev] MemorySanitizer, a tool that finds uninitialized reads and more
- [LLVMdev] MemorySanitizer, a tool that finds uninitialized reads and more