Robinson, Paul via llvm-dev
2016-Dec-01 19:08 UTC
[llvm-dev] Libfuzzer depending on uninitialized debug info
TL;DR: LibFuzzer appears to depend on debug-info source locations for whatever IR instrumentation it uses; however, that instrumentation does not have proper source locations attached to it, leading to potentially incorrect reporting. The short-term fix is to make sure the debug info it needs is actually set up; the long-term fix is not to rely on debug info, because some optimizations will (correctly) erase it. The long version: When Clang generates IR with debug info, one thing it does is attach a source location to most IR instructions. This source location (at least in principle) is carried through optimizations, SelectionDAG, MachineIR, assembler source, and ultimately ends up in the "line table" in the object file. The line table describes a mapping from the virtual addresses of instructions to source locations, which is very useful to debuggers and other tools. Not all IR instructions have a source location attached to them. When that happens, no specific line-table record is emitted for any machine instruction produced from that IR instruction. In DWARF, that means you assume the instruction belongs to the same source location as the instruction that precedes it in memory. This is a problem when the first instruction in a machine-basic-block has no explicit source location, because it implicitly inherits the source location of the last instruction of the basic block that precedes it in memory. That means, the source location is entirely at the mercy of block layout and other optimizations. In effect, the source location for that instruction is UNINITIALIZED. In r288283, I committed a patch that explicitly initialized the line number for some instructions to line 0. The DWARF spec says that line 0 means there is no specific source location for the instruction. Debuggers and other tools generally respond to this looking *forward* in the instruction stream to find the *next* instruction with an explicit non-0 location, rather than backward to the *previous* instruction with an explicit location. This caused a libFuzzer test to fail, because it depended on seeing a real source location for something, and got line 0 instead. This tells me libFuzzer is depending on an uninitialized source location. Kostya backed out that patch for me, but we really want to have it for improved debugger single-stepping behavior. I am unclear on what instrumentation the fuzzer is using, although the instructions for building it suggest it's ASAN instrumentation. Whatever it is, either the instrumentation should use its own source-location information scheme, or it should initialize the debug info that it is depending on. Note that debug info is not necessarily reliable in the face of optimization. If two blocks with different source locations get merged, most likely the source location will be zeroed (and that's not my patch, that's optimization-specific behavior). Therefore, I would recommend that fuzzer/asan/whoever stop relying on debug info for source locations, if we want all that to work on optimized code. In the short term it's probably easier to find places where the instrumentation is missing debug info, and add it. But that's not going to be reliable for optimized code. --paulr
Robinson, Paul via llvm-dev
2016-Dec-01 19:58 UTC
[llvm-dev] Libfuzzer depending on uninitialized debug info
> -----Original Message----- > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of > Robinson, Paul via llvm-dev > Sent: Thursday, December 01, 2016 11:08 AM > To: llvm-dev at lists.llvm.org > Subject: [llvm-dev] Libfuzzer depending on uninitialized debug info > > TL;DR: LibFuzzer appears to depend on debug-info source locations for > whatever IR instrumentation it uses; however, that instrumentation does > not have proper source locations attached to it, leading to potentially > incorrect reporting. The short-term fix is to make sure the debug info > it needs is actually set up; the long-term fix is not to rely on debug > info, because some optimizations will (correctly) erase it.Another way of looking at the problem is: The fuzzer (or the sanitizer instrumentation it is using) is depending on metadata for correctness, which is a no-no. --paulr> > The long version: > > When Clang generates IR with debug info, one thing it does is attach a > source location to most IR instructions. This source location (at least > in principle) is carried through optimizations, SelectionDAG, MachineIR, > assembler source, and ultimately ends up in the "line table" in the > object file. The line table describes a mapping from the virtual > addresses of instructions to source locations, which is very useful to > debuggers and other tools. > > Not all IR instructions have a source location attached to them. When > that happens, no specific line-table record is emitted for any machine > instruction produced from that IR instruction. In DWARF, that means you > assume the instruction belongs to the same source location as the > instruction that precedes it in memory. > > This is a problem when the first instruction in a machine-basic-block has > no explicit source location, because it implicitly inherits the source > location of the last instruction of the basic block that precedes it in > memory. That means, the source location is entirely at the mercy of > block layout and other optimizations. > > In effect, the source location for that instruction is UNINITIALIZED. > > In r288283, I committed a patch that explicitly initialized the line > number for some instructions to line 0. The DWARF spec says that line 0 > means there is no specific source location for the instruction. Debuggers > and other tools generally respond to this looking *forward* in the > instruction stream to find the *next* instruction with an explicit non-0 > location, rather than backward to the *previous* instruction with an > explicit location. > > This caused a libFuzzer test to fail, because it depended on seeing a > real source location for something, and got line 0 instead. This tells > me libFuzzer is depending on an uninitialized source location. Kostya > backed out that patch for me, but we really want to have it for improved > debugger single-stepping behavior. > > I am unclear on what instrumentation the fuzzer is using, although the > instructions for building it suggest it's ASAN instrumentation. Whatever > it is, either the instrumentation should use its own source-location > information scheme, or it should initialize the debug info that it is > depending on. > > Note that debug info is not necessarily reliable in the face of > optimization. If two blocks with different source locations get merged, > most likely the source location will be zeroed (and that's not my patch, > that's optimization-specific behavior). Therefore, I would recommend > that fuzzer/asan/whoever stop relying on debug info for source locations, > if we want all that to work on optimized code. > > In the short term it's probably easier to find places where the > instrumentation is missing debug info, and add it. But that's not going > to be reliable for optimized code. > --paulr > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Kostya Serebryany via llvm-dev
2016-Dec-01 22:53 UTC
[llvm-dev] Libfuzzer depending on uninitialized debug info
On Thu, Dec 1, 2016 at 11:08 AM, Robinson, Paul via llvm-dev < llvm-dev at lists.llvm.org> wrote:> TL;DR: LibFuzzer appears to depend on debug-info source locations for > whatever IR instrumentation it uses; however, that instrumentation does > not have proper source locations attached to it, leading to potentially > incorrect reporting. The short-term fix is to make sure the debug info > it needs is actually set up; the long-term fix is not to rely on debug > info, because some optimizations will (correctly) erase it. >Why is this libFuzzer-specific? We were just [un]lucky to detect the problem early with one of the libFuzzer tests that required debug info. Any tool that needs debug info will suffer from the same problem. No?> > The long version: > > When Clang generates IR with debug info, one thing it does is attach a > source location to most IR instructions. This source location (at least > in principle) is carried through optimizations, SelectionDAG, MachineIR, > assembler source, and ultimately ends up in the "line table" in the > object file. The line table describes a mapping from the virtual > addresses of instructions to source locations, which is very useful to > debuggers and other tools. > > Not all IR instructions have a source location attached to them. When > that happens, no specific line-table record is emitted for any machine > instruction produced from that IR instruction. In DWARF, that means you > assume the instruction belongs to the same source location as the > instruction that precedes it in memory. > > This is a problem when the first instruction in a machine-basic-block has > no explicit source location, because it implicitly inherits the source > location of the last instruction of the basic block that precedes it in > memory. That means, the source location is entirely at the mercy of > block layout and other optimizations. > > In effect, the source location for that instruction is UNINITIALIZED. > > In r288283, I committed a patch that explicitly initialized the line > number for some instructions to line 0. The DWARF spec says that line 0 > means there is no specific source location for the instruction. Debuggers > and other tools generally respond to this looking *forward* in the > instruction stream to find the *next* instruction with an explicit non-0 > location, rather than backward to the *previous* instruction with an > explicit location. > > This caused a libFuzzer test to fail, because it depended on seeing a > real source location for something, and got line 0 instead. This tells > me libFuzzer is depending on an uninitialized source location. Kostya > backed out that patch for me, but we really want to have it for improved > debugger single-stepping behavior. > > I am unclear on what instrumentation the fuzzer is using, although the > instructions for building it suggest it's ASAN instrumentation. Whatever > it is, either the instrumentation should use its own source-location > information scheme, or it should initialize the debug info that it is > depending on. > > Note that debug info is not necessarily reliable in the face of > optimization. If two blocks with different source locations get merged, > most likely the source location will be zeroed (and that's not my patch, > that's optimization-specific behavior). Therefore, I would recommend > that fuzzer/asan/whoever stop relying on debug info for source locations, > if we want all that to work on optimized code. > > In the short term it's probably easier to find places where the > instrumentation is missing debug info, and add it. But that's not going > to be reliable for optimized code. > --paulr > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161201/31e46b77/attachment.html>
Robinson, Paul via llvm-dev
2016-Dec-01 23:37 UTC
[llvm-dev] Libfuzzer depending on uninitialized debug info
It might be a wider problem than libfuzzer. I did want to raise the problem asap and libfuzzer is something we know has the problem. If it came across as "libfuzzer is evil" that was not my intent, sorry! --paulr From: Kostya Serebryany [mailto:kcc at google.com] Sent: Thursday, December 01, 2016 2:53 PM To: Robinson, Paul Cc: llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] Libfuzzer depending on uninitialized debug info On Thu, Dec 1, 2016 at 11:08 AM, Robinson, Paul via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: TL;DR: LibFuzzer appears to depend on debug-info source locations for whatever IR instrumentation it uses; however, that instrumentation does not have proper source locations attached to it, leading to potentially incorrect reporting. The short-term fix is to make sure the debug info it needs is actually set up; the long-term fix is not to rely on debug info, because some optimizations will (correctly) erase it. Why is this libFuzzer-specific? We were just [un]lucky to detect the problem early with one of the libFuzzer tests that required debug info. Any tool that needs debug info will suffer from the same problem. No? The long version: When Clang generates IR with debug info, one thing it does is attach a source location to most IR instructions. This source location (at least in principle) is carried through optimizations, SelectionDAG, MachineIR, assembler source, and ultimately ends up in the "line table" in the object file. The line table describes a mapping from the virtual addresses of instructions to source locations, which is very useful to debuggers and other tools. Not all IR instructions have a source location attached to them. When that happens, no specific line-table record is emitted for any machine instruction produced from that IR instruction. In DWARF, that means you assume the instruction belongs to the same source location as the instruction that precedes it in memory. This is a problem when the first instruction in a machine-basic-block has no explicit source location, because it implicitly inherits the source location of the last instruction of the basic block that precedes it in memory. That means, the source location is entirely at the mercy of block layout and other optimizations. In effect, the source location for that instruction is UNINITIALIZED. In r288283, I committed a patch that explicitly initialized the line number for some instructions to line 0. The DWARF spec says that line 0 means there is no specific source location for the instruction. Debuggers and other tools generally respond to this looking *forward* in the instruction stream to find the *next* instruction with an explicit non-0 location, rather than backward to the *previous* instruction with an explicit location. This caused a libFuzzer test to fail, because it depended on seeing a real source location for something, and got line 0 instead. This tells me libFuzzer is depending on an uninitialized source location. Kostya backed out that patch for me, but we really want to have it for improved debugger single-stepping behavior. I am unclear on what instrumentation the fuzzer is using, although the instructions for building it suggest it's ASAN instrumentation. Whatever it is, either the instrumentation should use its own source-location information scheme, or it should initialize the debug info that it is depending on. Note that debug info is not necessarily reliable in the face of optimization. If two blocks with different source locations get merged, most likely the source location will be zeroed (and that's not my patch, that's optimization-specific behavior). Therefore, I would recommend that fuzzer/asan/whoever stop relying on debug info for source locations, if we want all that to work on optimized code. In the short term it's probably easier to find places where the instrumentation is missing debug info, and add it. But that's not going to be reliable for optimized code. --paulr _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161201/a320200a/attachment.html>