thr3ads.net - llvm dev - [llvm-dev] Libfuzzer depending on uninitialized debug info [Dec 2016]

If this information is useful, please help other people find it:
Share via:

Robinson, Paul via llvm-dev

2016-Dec-02 02:28 UTC

[llvm-dev] Libfuzzer depending on uninitialized debug info

Hmmm that is a funny sequence.  I know the .cfi directives are represented as
pseudo-instructions, but they should not be causing us to emit .loc directives. 
They have no effect on the .text section so probably they should just be
excluded from emitting a location, same as DBG_VALUE is excluded.  Also I
believe the label there is unnecessary, but that's a separate issue.

Regarding "how do we find those problems" this is like "how do we
find all the bugs" and what we can do is come up with intelligent
approaches to finding where they are likely to hide.  For example, one
possibility is to audit all the places that call SetCurrentDebugLocation; my
grep through llvm/lib found 43 instances, which is not horrible.  We can make
sure that the SetInsertPoint/SetCurrentDebugLocation sequence is correct in all
those places.  If we can identify components that do depend on the debug line
table (like fuzzer and sanitizers) then running a bunch of their tests with
–use-unknown-locations turned on by default might also help, after we address
the .cfi thing.

I can look into better handling of .cfi instructions and also do the
SetCurrentDebugLocation audit tomorrow.
--paulr

From: Kostya Serebryany [mailto:kcc at google.com]
Sent: Thursday, December 01, 2016 5:01 PM
To: Robinson, Paul
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Libfuzzer depending on uninitialized debug info

Ok...

The particular instance of the problem can be solved with this patch in my code:

+      IRB.SetInsertPoint(Ins);
       IRB.SetCurrentDebugLocation(EntryLoc);
-      IRB.SetInsertPoint(Ins);

(apparently, SetInsertPoint invalidates the previous call to
SetCurrentDebugLocation)

But then there is another problem....

% cat dummy.c
void foo() {}

% clang -O -c -gmlt   -fsanitize-coverage=func,trace-pc-guard  -S dummy.c -o -
.LBB0_1:
        .loc    1 1 0                   # dummy.c:1:0
        pushq   %rax
.Lcfi0:
        .cfi_def_cfa_offset 16
        movl    $.L__sancov_gen_, %edi
        callq   __sanitizer_cov_trace_pc_guard

% clang -O -c -gmlt   -fsanitize-coverage=func,trace-pc-guard  -S dummy.c -mllvm
-use-unknown-locations -o -

.LBB0_1:
        .loc    1 1 0 is_stmt 0         # dummy.c:1:0
        pushq   %rax
        .loc    1 0 0                   # :0:0
.Lcfi0:
        .cfi_def_cfa_offset 16
        .loc    1 1 0 is_stmt 1         # dummy.c:1:0
        movl    $.L__sancov_gen_, %edi
        callq   __sanitizer_cov_trace_pc_guard


Then, when I addr2line the resulting binary some of the instructions get this
pesky ".loc    1 0 0" for some reason (did not investigate yet)

I am pretty sure that every particular problem like this can be solved with a
simple patch,
but how do we find those problems before the users get upset enough to file a
good bug report?


--kcc




On Thu, Dec 1, 2016 at 4:16 PM, Robinson, Paul <paul.robinson at
sony.com<mailto:paul.robinson at sony.com>> wrote:
There is already –mllvm –use-unknown-locations which ought to trigger this. 
Don't need my patch.
--paulr

From: Kostya Serebryany [mailto:kcc at google.com<mailto:kcc at
google.com>]
Sent: Thursday, December 01, 2016 4:08 PM

To: Robinson, Paul
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Libfuzzer depending on uninitialized debug info



On Thu, Dec 1, 2016 at 3:37 PM, Robinson, Paul <paul.robinson at
sony.com<mailto:paul.robinson at sony.com>> wrote:
It might be a wider problem than libfuzzer.  I did want to raise the problem
asap and libfuzzer is something we know has the problem.
If it came across as "libfuzzer is evil" that was not my intent,
sorry!
No, no, I did not mean you implied that :)
Just wanted to make sure everyone understand that this is not
libFuzzer-specific.

Looking at lib/Transforms/Instrumentation/SanitizerCoverage.cpp:
  DebugLoc EntryLoc;
  if (IsEntryBB) {
    if (auto SP = F.getSubprogram())
      EntryLoc = DebugLoc::get(SP->getScopeLine(), 0, SP);
...
  } else {
    EntryLoc = IP->getDebugLoc();
  }
  IRBuilder<> IRB(&*IP);
  IRB.SetCurrentDebugLocation(EntryLoc);

So, using this I assumed that the newly generated instructions have proper debug
info,
and so far it worked.

I wonder if you can re-commit your changes under a flag, off-by default, so that
everyone interested can play with it?


--paulr

From: Kostya Serebryany [mailto:kcc at google.com<mailto:kcc at
google.com>]
Sent: Thursday, December 01, 2016 2:53 PM
To: Robinson, Paul
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Libfuzzer depending on uninitialized debug info



On Thu, Dec 1, 2016 at 11:08 AM, Robinson, Paul via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
TL;DR:  LibFuzzer appears to depend on debug-info source locations for
whatever IR instrumentation it uses; however, that instrumentation does
not have proper source locations attached to it, leading to potentially
incorrect reporting.  The short-term fix is to make sure the debug info
it needs is actually set up; the long-term fix is not to rely on debug
info, because some optimizations will (correctly) erase it.


Why is this libFuzzer-specific?
We were just [un]lucky to detect the problem early with one of the libFuzzer
tests that required debug info.

Any tool that needs debug info will suffer from the same problem. No?



The long version:

When Clang generates IR with debug info, one thing it does is attach a
source location to most IR instructions.  This source location (at least
in principle) is carried through optimizations, SelectionDAG, MachineIR,
assembler source, and ultimately ends up in the "line table" in the
object file.  The line table describes a mapping from the virtual
addresses of instructions to source locations, which is very useful to
debuggers and other tools.

Not all IR instructions have a source location attached to them.  When
that happens, no specific line-table record is emitted for any machine
instruction produced from that IR instruction.  In DWARF, that means you
assume the instruction belongs to the same source location as the
instruction that precedes it in memory.

This is a problem when the first instruction in a machine-basic-block has
no explicit source location, because it implicitly inherits the source
location of the last instruction of the basic block that precedes it in
memory.  That means, the source location is entirely at the mercy of
block layout and other optimizations.

In effect, the source location for that instruction is UNINITIALIZED.

In r288283, I committed a patch that explicitly initialized the line
number for some instructions to line 0.  The DWARF spec says that line 0
means there is no specific source location for the instruction. Debuggers
and other tools generally respond to this looking *forward* in the
instruction stream to find the *next* instruction with an explicit non-0
location, rather than backward to the *previous* instruction with an
explicit location.

This caused a libFuzzer test to fail, because it depended on seeing a
real source location for something, and got line 0 instead.  This tells
me libFuzzer is depending on an uninitialized source location.  Kostya
backed out that patch for me, but we really want to have it for improved
debugger single-stepping behavior.

I am unclear on what instrumentation the fuzzer is using, although the
instructions for building it suggest it's ASAN instrumentation. Whatever
it is, either the instrumentation should use its own source-location
information scheme, or it should initialize the debug info that it is
depending on.

Note that debug info is not necessarily reliable in the face of
optimization.  If two blocks with different source locations get merged,
most likely the source location will be zeroed (and that's not my patch,
that's optimization-specific behavior).  Therefore, I would recommend
that fuzzer/asan/whoever stop relying on debug info for source locations,
if we want all that to work on optimized code.

In the short term it's probably easier to find places where the
instrumentation is missing debug info, and add it.  But that's not going
to be reliable for optimized code.
--paulr

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161202/cff5a3fd/attachment-0001.html>

Robinson, Paul via llvm-dev

2016-Dec-02 17:39 UTC

head link

[llvm-dev] Libfuzzer depending on uninitialized debug info

I looked through all the places that call SetCurrentDebugLocation().  Aside from
the one place you already found, there are some suspicious-looking sequences in
LoopVectorizer.cpp.  Other than that they look okay to me.

It turns out that `SetInsertPoint(Instruction *I)` automatically does
`SetCurrentDebugLocation(I->getDebugLoc())` so the problem arises when you
don't want the same debug location as the insertion point.  And the
IRBuilder ctor that takes an Instruction* does SetInsertPoint(I) so some places
are calling SetCurrentDebugLocation redundantly, but that's not harmful
functionally.

I'll play with the CFI stuff later today.
--paulr

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
Robinson, Paul via llvm-dev
Sent: Thursday, December 01, 2016 6:29 PM
To: Kostya Serebryany
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Libfuzzer depending on uninitialized debug info

Hmmm that is a funny sequence.  I know the .cfi directives are represented as
pseudo-instructions, but they should not be causing us to emit .loc directives. 
They have no effect on the .text section so probably they should just be
excluded from emitting a location, same as DBG_VALUE is excluded.  Also I
believe the label there is unnecessary, but that's a separate issue.

Regarding "how do we find those problems" this is like "how do we
find all the bugs" and what we can do is come up with intelligent
approaches to finding where they are likely to hide.  For example, one
possibility is to audit all the places that call SetCurrentDebugLocation; my
grep through llvm/lib found 43 instances, which is not horrible.  We can make
sure that the SetInsertPoint/SetCurrentDebugLocation sequence is correct in all
those places.  If we can identify components that do depend on the debug line
table (like fuzzer and sanitizers) then running a bunch of their tests with
–use-unknown-locations turned on by default might also help, after we address
the .cfi thing.

I can look into better handling of .cfi instructions and also do the
SetCurrentDebugLocation audit tomorrow.
--paulr

From: Kostya Serebryany [mailto:kcc at google.com]
Sent: Thursday, December 01, 2016 5:01 PM
To: Robinson, Paul
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Libfuzzer depending on uninitialized debug info

Ok...

The particular instance of the problem can be solved with this patch in my code:

+      IRB.SetInsertPoint(Ins);
       IRB.SetCurrentDebugLocation(EntryLoc);
-      IRB.SetInsertPoint(Ins);

(apparently, SetInsertPoint invalidates the previous call to
SetCurrentDebugLocation)

But then there is another problem....

% cat dummy.c
void foo() {}

% clang -O -c -gmlt   -fsanitize-coverage=func,trace-pc-guard  -S dummy.c -o -
.LBB0_1:
        .loc    1 1 0                   # dummy.c:1:0
        pushq   %rax
.Lcfi0:
        .cfi_def_cfa_offset 16
        movl    $.L__sancov_gen_, %edi
        callq   __sanitizer_cov_trace_pc_guard

% clang -O -c -gmlt   -fsanitize-coverage=func,trace-pc-guard  -S dummy.c -mllvm
-use-unknown-locations -o -

.LBB0_1:
        .loc    1 1 0 is_stmt 0         # dummy.c:1:0
        pushq   %rax
        .loc    1 0 0                   # :0:0
.Lcfi0:
        .cfi_def_cfa_offset 16
        .loc    1 1 0 is_stmt 1         # dummy.c:1:0
        movl    $.L__sancov_gen_, %edi
        callq   __sanitizer_cov_trace_pc_guard


Then, when I addr2line the resulting binary some of the instructions get this
pesky ".loc    1 0 0" for some reason (did not investigate yet)

I am pretty sure that every particular problem like this can be solved with a
simple patch,
but how do we find those problems before the users get upset enough to file a
good bug report?


--kcc




On Thu, Dec 1, 2016 at 4:16 PM, Robinson, Paul <paul.robinson at
sony.com<mailto:paul.robinson at sony.com>> wrote:
There is already –mllvm –use-unknown-locations which ought to trigger this. 
Don't need my patch.
--paulr

From: Kostya Serebryany [mailto:kcc at google.com<mailto:kcc at
google.com>]
Sent: Thursday, December 01, 2016 4:08 PM

To: Robinson, Paul
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Libfuzzer depending on uninitialized debug info



On Thu, Dec 1, 2016 at 3:37 PM, Robinson, Paul <paul.robinson at
sony.com<mailto:paul.robinson at sony.com>> wrote:
It might be a wider problem than libfuzzer.  I did want to raise the problem
asap and libfuzzer is something we know has the problem.
If it came across as "libfuzzer is evil" that was not my intent,
sorry!
No, no, I did not mean you implied that :)
Just wanted to make sure everyone understand that this is not
libFuzzer-specific.

Looking at lib/Transforms/Instrumentation/SanitizerCoverage.cpp:
  DebugLoc EntryLoc;
  if (IsEntryBB) {
    if (auto SP = F.getSubprogram())
      EntryLoc = DebugLoc::get(SP->getScopeLine(), 0, SP);
...
  } else {
    EntryLoc = IP->getDebugLoc();
  }
  IRBuilder<> IRB(&*IP);
  IRB.SetCurrentDebugLocation(EntryLoc);

So, using this I assumed that the newly generated instructions have proper debug
info,
and so far it worked.

I wonder if you can re-commit your changes under a flag, off-by default, so that
everyone interested can play with it?


--paulr

From: Kostya Serebryany [mailto:kcc at google.com<mailto:kcc at
google.com>]
Sent: Thursday, December 01, 2016 2:53 PM
To: Robinson, Paul
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Libfuzzer depending on uninitialized debug info



On Thu, Dec 1, 2016 at 11:08 AM, Robinson, Paul via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
TL;DR:  LibFuzzer appears to depend on debug-info source locations for
whatever IR instrumentation it uses; however, that instrumentation does
not have proper source locations attached to it, leading to potentially
incorrect reporting.  The short-term fix is to make sure the debug info
it needs is actually set up; the long-term fix is not to rely on debug
info, because some optimizations will (correctly) erase it.


Why is this libFuzzer-specific?
We were just [un]lucky to detect the problem early with one of the libFuzzer
tests that required debug info.

Any tool that needs debug info will suffer from the same problem. No?



The long version:

When Clang generates IR with debug info, one thing it does is attach a
source location to most IR instructions.  This source location (at least
in principle) is carried through optimizations, SelectionDAG, MachineIR,
assembler source, and ultimately ends up in the "line table" in the
object file.  The line table describes a mapping from the virtual
addresses of instructions to source locations, which is very useful to
debuggers and other tools.

Not all IR instructions have a source location attached to them.  When
that happens, no specific line-table record is emitted for any machine
instruction produced from that IR instruction.  In DWARF, that means you
assume the instruction belongs to the same source location as the
instruction that precedes it in memory.

This is a problem when the first instruction in a machine-basic-block has
no explicit source location, because it implicitly inherits the source
location of the last instruction of the basic block that precedes it in
memory.  That means, the source location is entirely at the mercy of
block layout and other optimizations.

In effect, the source location for that instruction is UNINITIALIZED.

In r288283, I committed a patch that explicitly initialized the line
number for some instructions to line 0.  The DWARF spec says that line 0
means there is no specific source location for the instruction. Debuggers
and other tools generally respond to this looking *forward* in the
instruction stream to find the *next* instruction with an explicit non-0
location, rather than backward to the *previous* instruction with an
explicit location.

This caused a libFuzzer test to fail, because it depended on seeing a
real source location for something, and got line 0 instead.  This tells
me libFuzzer is depending on an uninitialized source location.  Kostya
backed out that patch for me, but we really want to have it for improved
debugger single-stepping behavior.

I am unclear on what instrumentation the fuzzer is using, although the
instructions for building it suggest it's ASAN instrumentation. Whatever
it is, either the instrumentation should use its own source-location
information scheme, or it should initialize the debug info that it is
depending on.

Note that debug info is not necessarily reliable in the face of
optimization.  If two blocks with different source locations get merged,
most likely the source location will be zeroed (and that's not my patch,
that's optimization-specific behavior).  Therefore, I would recommend
that fuzzer/asan/whoever stop relying on debug info for source locations,
if we want all that to work on optimized code.

In the short term it's probably easier to find places where the
instrumentation is missing debug info, and add it.  But that's not going
to be reliable for optimized code.
--paulr

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161202/5e7c414a/attachment-0001.html>

Robinson, Paul via llvm-dev

2016-Dec-03 01:42 UTC

head link

[llvm-dev] Libfuzzer depending on uninitialized debug info

I've determined that the "pesky" .loc is indeed because of the
.cfi directive that comes immediately after it.  Some of the CFI instructions
have source locations, some don't.  But, emitting a source location for a
CFI instruction is inappropriate.  It's easy enough to ignore them.

I propose we do 4 things: (1) commit the patch in SanitizerCoverage.cpp that you
found; (2) cause CFI instructions not to emit any .loc directives; (3) file a
bug to have someone audit LoopVectorizer.cpp to see whether it is using
SetCurrentDebugLocation in the right places; (4) reapply my "line 0"
patch, which will be the 3rd attempt.

I can do all of these if you like, or you can do the first one and I'll do
the others.  I will continue with this on Monday.
Thanks,
--paulr

From: Robinson, Paul
Sent: Friday, December 02, 2016 9:39 AM
To: Robinson, Paul; Kostya Serebryany
Cc: llvm-dev at lists.llvm.org
Subject: RE: [llvm-dev] Libfuzzer depending on uninitialized debug info

I looked through all the places that call SetCurrentDebugLocation().  Aside from
the one place you already found, there are some suspicious-looking sequences in
LoopVectorizer.cpp.  Other than that they look okay to me.

It turns out that `SetInsertPoint(Instruction *I)` automatically does
`SetCurrentDebugLocation(I->getDebugLoc())` so the problem arises when you
don't want the same debug location as the insertion point.  And the
IRBuilder ctor that takes an Instruction* does SetInsertPoint(I) so some places
are calling SetCurrentDebugLocation redundantly, but that's not harmful
functionally.

I'll play with the CFI stuff later today.
--paulr

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
Robinson, Paul via llvm-dev
Sent: Thursday, December 01, 2016 6:29 PM
To: Kostya Serebryany
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Libfuzzer depending on uninitialized debug info

Hmmm that is a funny sequence.  I know the .cfi directives are represented as
pseudo-instructions, but they should not be causing us to emit .loc directives. 
They have no effect on the .text section so probably they should just be
excluded from emitting a location, same as DBG_VALUE is excluded.  Also I
believe the label there is unnecessary, but that's a separate issue.

Regarding "how do we find those problems" this is like "how do we
find all the bugs" and what we can do is come up with intelligent
approaches to finding where they are likely to hide.  For example, one
possibility is to audit all the places that call SetCurrentDebugLocation; my
grep through llvm/lib found 43 instances, which is not horrible.  We can make
sure that the SetInsertPoint/SetCurrentDebugLocation sequence is correct in all
those places.  If we can identify components that do depend on the debug line
table (like fuzzer and sanitizers) then running a bunch of their tests with
–use-unknown-locations turned on by default might also help, after we address
the .cfi thing.

I can look into better handling of .cfi instructions and also do the
SetCurrentDebugLocation audit tomorrow.
--paulr

From: Kostya Serebryany [mailto:kcc at google.com]
Sent: Thursday, December 01, 2016 5:01 PM
To: Robinson, Paul
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Libfuzzer depending on uninitialized debug info

Ok...

The particular instance of the problem can be solved with this patch in my code:

+      IRB.SetInsertPoint(Ins);
       IRB.SetCurrentDebugLocation(EntryLoc);
-      IRB.SetInsertPoint(Ins);

(apparently, SetInsertPoint invalidates the previous call to
SetCurrentDebugLocation)

But then there is another problem....

% cat dummy.c
void foo() {}

% clang -O -c -gmlt   -fsanitize-coverage=func,trace-pc-guard  -S dummy.c -o -
.LBB0_1:
        .loc    1 1 0                   # dummy.c:1:0
        pushq   %rax
.Lcfi0:
        .cfi_def_cfa_offset 16
        movl    $.L__sancov_gen_, %edi
        callq   __sanitizer_cov_trace_pc_guard

% clang -O -c -gmlt   -fsanitize-coverage=func,trace-pc-guard  -S dummy.c -mllvm
-use-unknown-locations -o -

.LBB0_1:
        .loc    1 1 0 is_stmt 0         # dummy.c:1:0
        pushq   %rax
        .loc    1 0 0                   # :0:0
.Lcfi0:
        .cfi_def_cfa_offset 16
        .loc    1 1 0 is_stmt 1         # dummy.c:1:0
        movl    $.L__sancov_gen_, %edi
        callq   __sanitizer_cov_trace_pc_guard


Then, when I addr2line the resulting binary some of the instructions get this
pesky ".loc    1 0 0" for some reason (did not investigate yet)

I am pretty sure that every particular problem like this can be solved with a
simple patch,
but how do we find those problems before the users get upset enough to file a
good bug report?


--kcc




On Thu, Dec 1, 2016 at 4:16 PM, Robinson, Paul <paul.robinson at
sony.com<mailto:paul.robinson at sony.com>> wrote:
There is already –mllvm –use-unknown-locations which ought to trigger this. 
Don't need my patch.
--paulr

From: Kostya Serebryany [mailto:kcc at google.com<mailto:kcc at
google.com>]
Sent: Thursday, December 01, 2016 4:08 PM

To: Robinson, Paul
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Libfuzzer depending on uninitialized debug info



On Thu, Dec 1, 2016 at 3:37 PM, Robinson, Paul <paul.robinson at
sony.com<mailto:paul.robinson at sony.com>> wrote:
It might be a wider problem than libfuzzer.  I did want to raise the problem
asap and libfuzzer is something we know has the problem.
If it came across as "libfuzzer is evil" that was not my intent,
sorry!
No, no, I did not mean you implied that :)
Just wanted to make sure everyone understand that this is not
libFuzzer-specific.

Looking at lib/Transforms/Instrumentation/SanitizerCoverage.cpp:
  DebugLoc EntryLoc;
  if (IsEntryBB) {
    if (auto SP = F.getSubprogram())
      EntryLoc = DebugLoc::get(SP->getScopeLine(), 0, SP);
...
  } else {
    EntryLoc = IP->getDebugLoc();
  }
  IRBuilder<> IRB(&*IP);
  IRB.SetCurrentDebugLocation(EntryLoc);

So, using this I assumed that the newly generated instructions have proper debug
info,
and so far it worked.

I wonder if you can re-commit your changes under a flag, off-by default, so that
everyone interested can play with it?


--paulr

From: Kostya Serebryany [mailto:kcc at google.com<mailto:kcc at
google.com>]
Sent: Thursday, December 01, 2016 2:53 PM
To: Robinson, Paul
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Libfuzzer depending on uninitialized debug info



On Thu, Dec 1, 2016 at 11:08 AM, Robinson, Paul via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
TL;DR:  LibFuzzer appears to depend on debug-info source locations for
whatever IR instrumentation it uses; however, that instrumentation does
not have proper source locations attached to it, leading to potentially
incorrect reporting.  The short-term fix is to make sure the debug info
it needs is actually set up; the long-term fix is not to rely on debug
info, because some optimizations will (correctly) erase it.


Why is this libFuzzer-specific?
We were just [un]lucky to detect the problem early with one of the libFuzzer
tests that required debug info.

Any tool that needs debug info will suffer from the same problem. No?



The long version:

When Clang generates IR with debug info, one thing it does is attach a
source location to most IR instructions.  This source location (at least
in principle) is carried through optimizations, SelectionDAG, MachineIR,
assembler source, and ultimately ends up in the "line table" in the
object file.  The line table describes a mapping from the virtual
addresses of instructions to source locations, which is very useful to
debuggers and other tools.

Not all IR instructions have a source location attached to them.  When
that happens, no specific line-table record is emitted for any machine
instruction produced from that IR instruction.  In DWARF, that means you
assume the instruction belongs to the same source location as the
instruction that precedes it in memory.

This is a problem when the first instruction in a machine-basic-block has
no explicit source location, because it implicitly inherits the source
location of the last instruction of the basic block that precedes it in
memory.  That means, the source location is entirely at the mercy of
block layout and other optimizations.

In effect, the source location for that instruction is UNINITIALIZED.

In r288283, I committed a patch that explicitly initialized the line
number for some instructions to line 0.  The DWARF spec says that line 0
means there is no specific source location for the instruction. Debuggers
and other tools generally respond to this looking *forward* in the
instruction stream to find the *next* instruction with an explicit non-0
location, rather than backward to the *previous* instruction with an
explicit location.

This caused a libFuzzer test to fail, because it depended on seeing a
real source location for something, and got line 0 instead.  This tells
me libFuzzer is depending on an uninitialized source location.  Kostya
backed out that patch for me, but we really want to have it for improved
debugger single-stepping behavior.

I am unclear on what instrumentation the fuzzer is using, although the
instructions for building it suggest it's ASAN instrumentation. Whatever
it is, either the instrumentation should use its own source-location
information scheme, or it should initialize the debug info that it is
depending on.

Note that debug info is not necessarily reliable in the face of
optimization.  If two blocks with different source locations get merged,
most likely the source location will be zeroed (and that's not my patch,
that's optimization-specific behavior).  Therefore, I would recommend
that fuzzer/asan/whoever stop relying on debug info for source locations,
if we want all that to work on optimized code.

In the short term it's probably easier to find places where the
instrumentation is missing debug info, and add it.  But that's not going
to be reliable for optimized code.
--paulr

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161203/686a2964/attachment-0001.html>

llvm dev - Dec 2016 - Libfuzzer depending on uninitialized debug info

[llvm-dev] Libfuzzer depending on uninitialized debug info

[llvm-dev] Libfuzzer depending on uninitialized debug info

[llvm-dev] Libfuzzer depending on uninitialized debug info