On Tue, Oct 7, 2014 at 12:18 PM, David Blaikie <dblaikie at gmail.com> wrote:> > > On Tue, Oct 7, 2014 at 12:10 PM, David Blaikie <dblaikie at gmail.com> wrote: > >> >> >> On Tue, Oct 7, 2014 at 11:48 AM, Peter Collingbourne <peter at pcc.me.uk> >> wrote: >> >>> On Tue, Oct 07, 2014 at 10:04:30AM -0700, David Blaikie wrote: >>> > Hi Peter, >>> > >>> > After discovering several bugs in ArgumentPromotion and >>> > DeadArgumentElimination where llvm::Functions were replaced with >>> similar >>> > functions (with the same name) to transform their type in some way, I >>> > started looking at all calls to llvm::Function::takeName to see if >>> there >>> > were any other debug info quality bugs in similar callers. >>> > >>> > One such caller is the DataFlowSanitizer, and I don't see any debug >>> info >>> > tests for this so I'm wondering what /should/ happen here. >>> > >>> > Is DFSan+DebugInfo something that matters? I assume so. >>> >>> It may be important in the future, but at the moment the dfsan runtime >>> library >>> does not make use of debug info. The debug info could still be useful for >>> regular debugging tasks though. >>> >> >> Yeah - that'd be the baseline. I assume some people probably care about >> being able to debug a dfsan binary. Not sure if your users have >> specifically highlighted this situation or come across the bugs I'm >> speculating about. >> >> >>> > It looks like DFSan is creating wrappers (in/around >>> > DataFlowSanitizer.cpp:680-700) - when it does this, should it update >>> the >>> > debug info for these functions? Or are these internal instrumentation >>> > functions & nothing to do with the code the user wrote? I can't quite >>> tell >>> > from the code. >>> >>> The functions created by that part of the code replace the original >>> functions, >>> so they should inherit the debug info for those functions. >>> >> >> Yep, that won't happen for free - we have to stitch it back into the >> debug info. >> >> >>> But the code below that can also create wrapper functions which do not >>> need >>> debug info (lines 712-746). Wrappers normally show up for uninstrumented >>> functions (e.g. main and many libc functions). >>> >>> > Could you provide any C/C++ source examples whis part of DFSan fires >>> > reliably, so I could experiment with some examples and see how the >>> debug >>> > info looks? >>> >>> This is an example of a program that exercises the replacement function >>> and >>> wrapper features. >>> >>> >>> -------------------------------------------------------------------------------- >>> #include <stddef.h> >>> #include <string.h> >>> >>> size_t len(size_t (*strlen_ptr)(const char *), const char *str) { >>> return strlen_ptr(str); >>> } >>> >>> int main(void) { >>> return len(strlen, "foo"); >>> } >>> >>> -------------------------------------------------------------------------------- >>> >>> In this example, 'len' is rewritten to 'dfs$len', 'main' keeps its >>> original >>> name (the pass treats it as an uninstrumented function), and wrappers are >>> created for 'main' and 'strlen' (the wrapper for 'main' is unused as the >>> C runtime calls the regular 'main' function directly). >>> >> >> OK, so when you say wrappers are created - where does the name go? "main" >> keeps the name? (it's important which way this transformation is done - if >> the guts of main are moved to a new function and the name moved as well, >> that's not the same as keeping the same function as far as debug info >> metadata is concerned) >> >> >>> I compile this with '-O0 -g'. A 'break main'/'run'/'break strlen'/'cont' >>> gives a relevant stack trace: >>> >>> #0 __strlen_sse2_pminub () at >>> ../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:33 >>> #1 0x00005555555587ff in __dfsw_strlen (s=0x55555556fe17 "foo", >>> s_label=<optimized out>, ret_label=0x7fffffffddee) >>> at llvm/projects/compiler-rt/lib/dfsan/dfsan_custom.cc:203 >>> #2 0x000055555556bbdc in dfsw$strlen () >>> #3 0x000055555556bb51 in len (strlen_ptr=0x55555556bbc0 <dfsw$strlen>, >>> str=0x55555556fe17 "foo") at strlen.c:5 >>> #4 0x000055555556bb96 in main () >>> >>> In this stack trace, #2 is the compiler-generated wrapper function for >>> strlen. >>> >>> It looks like the debug info for 'len' is preserved correctly, but I >>> don't >>> know why the debug info for 'main' is missing. >>> >> >> Yeah - not quite sure either. I'll poke around with it for a little bit. >> > > DataFlowSanitizer.cpp:719: F.replaceAllUsesWith(WrappedFnCst); > > replaces the debug info metadata's pointer to main with a pointer to > dfsw$main (this issue doesn't come up with DAE or ArgPromo because they > both don >... they both don't use RAUW because they have to visit each call site to do special stuff anyway. (they're replacing one function with another of a different type, RAUW isn't suitable) I /guess/ we never end up codegen'ing dfsw$main? (I haven't looked) and thus the debug info doesn't describe any function at all. It's possible that there's some other reason we don't end up describing any function at all... In any case if we did have "main" in the debug info describe a function, at this point, it would be describing the dfsw$main, not main main. So we'd need to handle remapping the debug info back to the original uninstrumented function anyway, I assume? (that's the expected behavior? That the uninstrumented function is the one described by debug info, not the instrumented wrapper?)> > >> >> >>> >>> Thanks, >>> -- >>> Peter >>> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141007/44a7a791/attachment.html>
On Tue, Oct 07, 2014 at 12:20:55PM -0700, David Blaikie wrote:> On Tue, Oct 7, 2014 at 12:18 PM, David Blaikie <dblaikie at gmail.com> wrote: > > > > > > > On Tue, Oct 7, 2014 at 12:10 PM, David Blaikie <dblaikie at gmail.com> wrote: > > > >> > >> > >> On Tue, Oct 7, 2014 at 11:48 AM, Peter Collingbourne <peter at pcc.me.uk> > >> wrote: > >> > >>> On Tue, Oct 07, 2014 at 10:04:30AM -0700, David Blaikie wrote: > >>> > Hi Peter, > >>> > > >>> > After discovering several bugs in ArgumentPromotion and > >>> > DeadArgumentElimination where llvm::Functions were replaced with > >>> similar > >>> > functions (with the same name) to transform their type in some way, I > >>> > started looking at all calls to llvm::Function::takeName to see if > >>> there > >>> > were any other debug info quality bugs in similar callers. > >>> > > >>> > One such caller is the DataFlowSanitizer, and I don't see any debug > >>> info > >>> > tests for this so I'm wondering what /should/ happen here. > >>> > > >>> > Is DFSan+DebugInfo something that matters? I assume so. > >>> > >>> It may be important in the future, but at the moment the dfsan runtime > >>> library > >>> does not make use of debug info. The debug info could still be useful for > >>> regular debugging tasks though. > >>> > >> > >> Yeah - that'd be the baseline. I assume some people probably care about > >> being able to debug a dfsan binary. Not sure if your users have > >> specifically highlighted this situation or come across the bugs I'm > >> speculating about. > >> > >> > >>> > It looks like DFSan is creating wrappers (in/around > >>> > DataFlowSanitizer.cpp:680-700) - when it does this, should it update > >>> the > >>> > debug info for these functions? Or are these internal instrumentation > >>> > functions & nothing to do with the code the user wrote? I can't quite > >>> tell > >>> > from the code. > >>> > >>> The functions created by that part of the code replace the original > >>> functions, > >>> so they should inherit the debug info for those functions. > >>> > >> > >> Yep, that won't happen for free - we have to stitch it back into the > >> debug info. > >> > >> > >>> But the code below that can also create wrapper functions which do not > >>> need > >>> debug info (lines 712-746). Wrappers normally show up for uninstrumented > >>> functions (e.g. main and many libc functions). > >>> > >>> > Could you provide any C/C++ source examples whis part of DFSan fires > >>> > reliably, so I could experiment with some examples and see how the > >>> debug > >>> > info looks? > >>> > >>> This is an example of a program that exercises the replacement function > >>> and > >>> wrapper features. > >>> > >>> > >>> -------------------------------------------------------------------------------- > >>> #include <stddef.h> > >>> #include <string.h> > >>> > >>> size_t len(size_t (*strlen_ptr)(const char *), const char *str) { > >>> return strlen_ptr(str); > >>> } > >>> > >>> int main(void) { > >>> return len(strlen, "foo"); > >>> } > >>> > >>> -------------------------------------------------------------------------------- > >>> > >>> In this example, 'len' is rewritten to 'dfs$len', 'main' keeps its > >>> original > >>> name (the pass treats it as an uninstrumented function), and wrappers are > >>> created for 'main' and 'strlen' (the wrapper for 'main' is unused as the > >>> C runtime calls the regular 'main' function directly). > >>> > >> > >> OK, so when you say wrappers are created - where does the name go? "main" > >> keeps the name? (it's important which way this transformation is done - if > >> the guts of main are moved to a new function and the name moved as well, > >> that's not the same as keeping the same function as far as debug info > >> metadata is concerned)In this case the function keeps its original name and the wrapper is created separately.> >>> I compile this with '-O0 -g'. A 'break main'/'run'/'break strlen'/'cont' > >>> gives a relevant stack trace: > >>> > >>> #0 __strlen_sse2_pminub () at > >>> ../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:33 > >>> #1 0x00005555555587ff in __dfsw_strlen (s=0x55555556fe17 "foo", > >>> s_label=<optimized out>, ret_label=0x7fffffffddee) > >>> at llvm/projects/compiler-rt/lib/dfsan/dfsan_custom.cc:203 > >>> #2 0x000055555556bbdc in dfsw$strlen () > >>> #3 0x000055555556bb51 in len (strlen_ptr=0x55555556bbc0 <dfsw$strlen>, > >>> str=0x55555556fe17 "foo") at strlen.c:5 > >>> #4 0x000055555556bb96 in main () > >>> > >>> In this stack trace, #2 is the compiler-generated wrapper function for > >>> strlen. > >>> > >>> It looks like the debug info for 'len' is preserved correctly, but I > >>> don't > >>> know why the debug info for 'main' is missing. > >>> > >> > >> Yeah - not quite sure either. I'll poke around with it for a little bit. > >> > > > > DataFlowSanitizer.cpp:719: F.replaceAllUsesWith(WrappedFnCst); > > > > replaces the debug info metadata's pointer to main with a pointer to > > dfsw$main (this issue doesn't come up with DAE or ArgPromo because they > > both don > > > > ... they both don't use RAUW because they have to visit each call site to > do special stuff anyway. (they're replacing one function with another of a > different type, RAUW isn't suitable) > > I /guess/ we never end up codegen'ing dfsw$main? (I haven't looked) and > thus the debug info doesn't describe any function at all. It's possible > that there's some other reason we don't end up describing any function at > all... > > In any case if we did have "main" in the debug info describe a function, at > this point, it would be describing the dfsw$main, not main main. So we'd > need to handle remapping the debug info back to the original uninstrumented > function anyway, I assume? (that's the expected behavior? That the > uninstrumented function is the one described by debug info, not the > instrumented wrapper?)Yes, the debug info describes 'main', not 'dfsw$main'. So I agree that the references in the debug info would need to refer to 'main' instead of 'dfsw$main'. Not sure if there is a good way to do that. Perhaps we could manually RAUW and skip metadata, but I'm not sure if that would work if the debug metadata is not a direct user. Thanks, -- Peter
Here's a basic patch which would solve it in sort of the same way as the other optimizations I was fixing (just special case the debug info & fix it up). I can work up a test case for this as well, or you can, if you like/this seems reasonable. On Tue, Oct 7, 2014 at 2:30 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:> On Tue, Oct 07, 2014 at 12:20:55PM -0700, David Blaikie wrote: > > On Tue, Oct 7, 2014 at 12:18 PM, David Blaikie <dblaikie at gmail.com> > wrote: > > > > > > > > > > > On Tue, Oct 7, 2014 at 12:10 PM, David Blaikie <dblaikie at gmail.com> > wrote: > > > > > >> > > >> > > >> On Tue, Oct 7, 2014 at 11:48 AM, Peter Collingbourne <peter at pcc.me.uk > > > > >> wrote: > > >> > > >>> On Tue, Oct 07, 2014 at 10:04:30AM -0700, David Blaikie wrote: > > >>> > Hi Peter, > > >>> > > > >>> > After discovering several bugs in ArgumentPromotion and > > >>> > DeadArgumentElimination where llvm::Functions were replaced with > > >>> similar > > >>> > functions (with the same name) to transform their type in some > way, I > > >>> > started looking at all calls to llvm::Function::takeName to see if > > >>> there > > >>> > were any other debug info quality bugs in similar callers. > > >>> > > > >>> > One such caller is the DataFlowSanitizer, and I don't see any debug > > >>> info > > >>> > tests for this so I'm wondering what /should/ happen here. > > >>> > > > >>> > Is DFSan+DebugInfo something that matters? I assume so. > > >>> > > >>> It may be important in the future, but at the moment the dfsan > runtime > > >>> library > > >>> does not make use of debug info. The debug info could still be > useful for > > >>> regular debugging tasks though. > > >>> > > >> > > >> Yeah - that'd be the baseline. I assume some people probably care > about > > >> being able to debug a dfsan binary. Not sure if your users have > > >> specifically highlighted this situation or come across the bugs I'm > > >> speculating about. > > >> > > >> > > >>> > It looks like DFSan is creating wrappers (in/around > > >>> > DataFlowSanitizer.cpp:680-700) - when it does this, should it > update > > >>> the > > >>> > debug info for these functions? Or are these internal > instrumentation > > >>> > functions & nothing to do with the code the user wrote? I can't > quite > > >>> tell > > >>> > from the code. > > >>> > > >>> The functions created by that part of the code replace the original > > >>> functions, > > >>> so they should inherit the debug info for those functions. > > >>> > > >> > > >> Yep, that won't happen for free - we have to stitch it back into the > > >> debug info. > > >> > > >> > > >>> But the code below that can also create wrapper functions which do > not > > >>> need > > >>> debug info (lines 712-746). Wrappers normally show up for > uninstrumented > > >>> functions (e.g. main and many libc functions). > > >>> > > >>> > Could you provide any C/C++ source examples whis part of DFSan > fires > > >>> > reliably, so I could experiment with some examples and see how the > > >>> debug > > >>> > info looks? > > >>> > > >>> This is an example of a program that exercises the replacement > function > > >>> and > > >>> wrapper features. > > >>> > > >>> > > >>> > -------------------------------------------------------------------------------- > > >>> #include <stddef.h> > > >>> #include <string.h> > > >>> > > >>> size_t len(size_t (*strlen_ptr)(const char *), const char *str) { > > >>> return strlen_ptr(str); > > >>> } > > >>> > > >>> int main(void) { > > >>> return len(strlen, "foo"); > > >>> } > > >>> > > >>> > -------------------------------------------------------------------------------- > > >>> > > >>> In this example, 'len' is rewritten to 'dfs$len', 'main' keeps its > > >>> original > > >>> name (the pass treats it as an uninstrumented function), and > wrappers are > > >>> created for 'main' and 'strlen' (the wrapper for 'main' is unused as > the > > >>> C runtime calls the regular 'main' function directly). > > >>> > > >> > > >> OK, so when you say wrappers are created - where does the name go? > "main" > > >> keeps the name? (it's important which way this transformation is done > - if > > >> the guts of main are moved to a new function and the name moved as > well, > > >> that's not the same as keeping the same function as far as debug info > > >> metadata is concerned) > > In this case the function keeps its original name and the wrapper is > created > separately. > > > >>> I compile this with '-O0 -g'. A 'break main'/'run'/'break > strlen'/'cont' > > >>> gives a relevant stack trace: > > >>> > > >>> #0 __strlen_sse2_pminub () at > > >>> ../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:33 > > >>> #1 0x00005555555587ff in __dfsw_strlen (s=0x55555556fe17 "foo", > > >>> s_label=<optimized out>, ret_label=0x7fffffffddee) > > >>> at llvm/projects/compiler-rt/lib/dfsan/dfsan_custom.cc:203 > > >>> #2 0x000055555556bbdc in dfsw$strlen () > > >>> #3 0x000055555556bb51 in len (strlen_ptr=0x55555556bbc0 > <dfsw$strlen>, > > >>> str=0x55555556fe17 "foo") at strlen.c:5 > > >>> #4 0x000055555556bb96 in main () > > >>> > > >>> In this stack trace, #2 is the compiler-generated wrapper function > for > > >>> strlen. > > >>> > > >>> It looks like the debug info for 'len' is preserved correctly, but I > > >>> don't > > >>> know why the debug info for 'main' is missing. > > >>> > > >> > > >> Yeah - not quite sure either. I'll poke around with it for a little > bit. > > >> > > > > > > DataFlowSanitizer.cpp:719: F.replaceAllUsesWith(WrappedFnCst); > > > > > > replaces the debug info metadata's pointer to main with a pointer to > > > dfsw$main (this issue doesn't come up with DAE or ArgPromo because they > > > both don > > > > > > > ... they both don't use RAUW because they have to visit each call site to > > do special stuff anyway. (they're replacing one function with another of > a > > different type, RAUW isn't suitable) > > > > I /guess/ we never end up codegen'ing dfsw$main? (I haven't looked) and > > thus the debug info doesn't describe any function at all. It's possible > > that there's some other reason we don't end up describing any function at > > all... > > > > In any case if we did have "main" in the debug info describe a function, > at > > this point, it would be describing the dfsw$main, not main main. So we'd > > need to handle remapping the debug info back to the original > uninstrumented > > function anyway, I assume? (that's the expected behavior? That the > > uninstrumented function is the one described by debug info, not the > > instrumented wrapper?) > > Yes, the debug info describes 'main', not 'dfsw$main'. So I agree that > the references in the debug info would need to refer to 'main' instead of > 'dfsw$main'. Not sure if there is a good way to do that. Perhaps we could > manually RAUW and skip metadata, but I'm not sure if that would work if the > debug metadata is not a direct user. > > Thanks, > -- > Peter >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141007/0673c94d/attachment.html> -------------- next part -------------- diff --git lib/Transforms/Instrumentation/DataFlowSanitizer.cpp lib/Transforms/Instrumentation/DataFlowSanitizer.cpp index 446bcf7..cb84fd6 100644 --- lib/Transforms/Instrumentation/DataFlowSanitizer.cpp +++ lib/Transforms/Instrumentation/DataFlowSanitizer.cpp @@ -51,6 +51,7 @@ #include "llvm/ADT/StringExtras.h" #include "llvm/Analysis/ValueTracking.h" #include "llvm/IR/Dominators.h" +#include "llvm/IR/DebugInfo.h" #include "llvm/IR/IRBuilder.h" #include "llvm/IR/InlineAsm.h" #include "llvm/IR/InstVisitor.h" @@ -243,6 +244,7 @@ class DataFlowSanitizer : public ModulePass { DFSanABIList ABIList; DenseMap<Value *, Function *> UnwrappedFnMap; AttributeSet ReadOnlyNoneAttrs; + DenseMap<const Function *, DISubprogram> FunctionDIs; Value *getShadowAddress(Value *Addr, Instruction *Pos); bool isInstrumented(const Function *F); @@ -573,6 +575,8 @@ bool DataFlowSanitizer::runOnModule(Module &M) { if (ABIList.isIn(M, "skip")) return false; + FunctionDIs = makeSubprogramMap(M); + if (!GetArgTLSPtr) { Type *ArgTLSTy = ArrayType::get(ShadowTy, 64); ArgTLS = Mod->getOrInsertGlobal("__dfsan_arg_tls", ArgTLSTy); @@ -725,6 +729,12 @@ bool DataFlowSanitizer::runOnModule(Module &M) { Value *WrappedFnCst ConstantExpr::getBitCast(NewF, PointerType::getUnqual(FT)); F.replaceAllUsesWith(WrappedFnCst); + + // Patch the pointer to LLVM function in debug info descriptor. + auto DI = FunctionDIs.find(&F); + if (DI != FunctionDIs.end()) + DI->second.replaceFunction(&F); + UnwrappedFnMap[WrappedFnCst] = &F; *i = NewF;