Kostya Serebryany via llvm-dev
2016-Dec-07 21:44 UTC
[llvm-dev] Debug Locations for Optimized Code
my 2c. the sanitizers rely on debug info to produce human-readable error messages, and I agree with Reid that it's unwise to have a parallel way of encoding the source locations. Well, we have something like this in the clang coverage already... Right? (I never particularly liked this design decision). But since the debug info is known to be unreliable it kind of made sense. Grrr. And since the coverage instrumentation is applied early (in clang) we can do it. asan/etc don't have this luxury. The sanitizers do not actually rely hard on the correctness of debug info, but lots of tests in compiler-rt expect the debug info to be sane. If we break debug info in a way that affects the sanitizers two things may happen: a) some of the existing *san tests in compiler-rt will start failing. That's usually easy to fix. b) all tests will continue working but users will be getting less readable reports -- and we will learn about it 6 months from the time of breakage. That's less welcome, but I am not sure if we can do something here. --kcc On Wed, Dec 7, 2016 at 1:11 PM, Robinson, Paul via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Is there a reason why we must only have one location for every > instruction? If not, why not merge them and keep them all? > > > Not a requirement - of course we could keep them all with some kind of > ordered list and even potentially include a "this is the one we would've > picked" info (eg: the first one's the one we would pick today, if we > would've picked one rather than none) so we could be backwards compatible > if desired. > > That would be a lot of engineering work to plumb through LLVM the notion > of multiple debug locations, I think. > > I'm not sure how DWARF (or CodeView) and its consumers currently copes > with multiple locations - it's probably technically possible to describe > using the line table format (not sure if it's intentional/documented for > that purpose), but existing consumers might have to be fixed not to trip > over it. > > Technically the DWARF encoding of the line table does allow it, I've seen > it happen, but not with the intent of describing two real source locations; > it was by accident. (And was one of the things that prompted me to submit > patch D27492.) I seriously doubt any DWARF consumer takes the trouble to > look for it. It's really not clear how a debugger *should* respond to > seeing two source locations for one instruction. > > --paulr > > > > *From:* David Blaikie [mailto:dblaikie at gmail.com] > *Sent:* Wednesday, December 07, 2016 10:27 AM > *To:* Hal Finkel; Robinson, Paul > *Cc:* llvm-dev at lists.llvm.org > *Subject:* Re: [llvm-dev] Debug Locations for Optimized Code > > > > > > On Wed, Dec 7, 2016 at 10:20 AM Hal Finkel <hfinkel at anl.gov> wrote: > > ----- Original Message ----- > > From: "Paul Robinson" <paul.robinson at sony.com> > > To: "Hal Finkel" <hfinkel at anl.gov>, "David Blaikie" <dblaikie at gmail.com> > > Cc: llvm-dev at lists.llvm.org > > Sent: Wednesday, December 7, 2016 9:39:16 AM > > Subject: RE: [llvm-dev] Debug Locations for Optimized Code > > > > >> I don't know what the right, if any, solution to this is - but I > > >> thought I should bring it up in case you or anyone else wanted to > > >> puzzle it over & see if the competing needs/desires might need to > > >> be > > >> considered. > > > One thing that I recall being discussed was changing the way that > > > we > > > set the is_stmt flag in the DWARF line-table information. As I > > > understand it, we currently set this flag for the first instruction > > > in > > > any sequence that is on the same line. This is, in part, why the > > > debugger appears to jump around when stepping through code with > > > speculated instructions, etc. If we did not do this for > > > out-of-place > > > instructions, then we might be able to keep for debugging > > > information > > > for tools while still providing a reasonable debugging experience. > > > > When we are looking at a situation where an instruction is merely > > *moved* > > from one place to another, retaining the source location and having a > > less naïve statement-marking tactic could help the debugging > > experience > > without perturbing other consumers (although one still wonders > > whether > > profiles will get messed up in cases where e.g. a loop invariant gets > > hoisted out of a cold loop into a hot predecessor). > > > > When we are looking at a situation where two instructions are > > *merged* or > > *combined* into one, and the original two instructions had different > > source locations, that's a separate problem. In that case there is > > no > > single correct source location for the new instruction, and typically > > erasing the source location will give a better debugging experience > > (also > > a less misleading profile). > > Is there a reason why we must only have one location for every > instruction? If not, why not merge them and keep them all? > > > Not a requirement - of course we could keep them all with some kind of > ordered list and even potentially include a "this is the one we would've > picked" info (eg: the first one's the one we would pick today, if we > would've picked one rather than none) so we could be backwards compatible > if desired. > > That would be a lot of engineering work to plumb through LLVM the notion > of multiple debug locations, I think. > > I'm not sure how DWARF (or CodeView) and its consumers currently copes > with multiple locations - it's probably technically possible to describe > using the line table format (not sure if it's intentional/documented for > that purpose), but existing consumers might have to be fixed not to trip > over it. > > It'd certainly be cute/fun/nice to have the extra fidelity (though all > extra fidelity also comes at a size cost to the IR and the resulting > object/executable files). > > Not sure anyone's in a position to sign up for that work right now - but > maybe someone is. (looks like Apple's making a bit of a push on optimized > debug info quality at the moment) > > - David > > > > > -Hal > > > > > My personal opinion is that having sanitizers *rely* on debug info > > for > > accurate source attribution is just asking for trouble. It happens > > to > > work at –O0 but cannot be considered reliable in the face of > > optimization. > > IMO this is a fundamental design flaw; debug info is best-effort and > > full > > of ambiguities, as shown above. Sanitizers need a more reliable > > source-of-truth, i.e. they should encode source info into their own > > instrumentation. > > > > --paulr > > > > > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161207/19a10fac/attachment-0001.html>
Evgenii Stepanov via llvm-dev
2016-Dec-08 00:50 UTC
[llvm-dev] Debug Locations for Optimized Code
In fact, we are already using "parallel" debug info. Frame layout descriptions encode line numbers for local variable declarations. They don't include file names to keep object size under control, and we can't really afford to add more duplicate debug info. On Wed, Dec 7, 2016 at 1:44 PM, Kostya Serebryany via llvm-dev <llvm-dev at lists.llvm.org> wrote:> my 2c. > > the sanitizers rely on debug info to produce human-readable error messages, > and I agree with Reid that it's unwise to have a parallel way of encoding > the source locations. > > Well, we have something like this in the clang coverage already... Right? > (I never particularly liked this design decision). > But since the debug info is known to be unreliable it kind of made sense. > Grrr. > And since the coverage instrumentation is applied early (in clang) we can do > it. > asan/etc don't have this luxury. > > The sanitizers do not actually rely hard on the correctness of debug info, > but lots of tests in compiler-rt expect the debug info to be sane. > > If we break debug info in a way that affects the sanitizers two things may > happen: > a) some of the existing *san tests in compiler-rt will start failing. That's > usually easy to fix. > b) all tests will continue working but users will be getting less readable > reports -- and we will learn about it 6 months from the time of breakage. > That's less welcome, but I am not sure if we can do something here. > > --kcc > > On Wed, Dec 7, 2016 at 1:11 PM, Robinson, Paul via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> >> Is there a reason why we must only have one location for every >> instruction? If not, why not merge them and keep them all? >> >> >> Not a requirement - of course we could keep them all with some kind of >> ordered list and even potentially include a "this is the one we would've >> picked" info (eg: the first one's the one we would pick today, if we >> would've picked one rather than none) so we could be backwards compatible if >> desired. >> >> That would be a lot of engineering work to plumb through LLVM the notion >> of multiple debug locations, I think. >> >> I'm not sure how DWARF (or CodeView) and its consumers currently copes >> with multiple locations - it's probably technically possible to describe >> using the line table format (not sure if it's intentional/documented for >> that purpose), but existing consumers might have to be fixed not to trip >> over it. >> >> Technically the DWARF encoding of the line table does allow it, I've seen >> it happen, but not with the intent of describing two real source locations; >> it was by accident. (And was one of the things that prompted me to submit >> patch D27492.) I seriously doubt any DWARF consumer takes the trouble to >> look for it. It's really not clear how a debugger *should* respond to >> seeing two source locations for one instruction. >> >> --paulr >> >> >> >> From: David Blaikie [mailto:dblaikie at gmail.com] >> Sent: Wednesday, December 07, 2016 10:27 AM >> To: Hal Finkel; Robinson, Paul >> Cc: llvm-dev at lists.llvm.org >> Subject: Re: [llvm-dev] Debug Locations for Optimized Code >> >> >> >> >> >> On Wed, Dec 7, 2016 at 10:20 AM Hal Finkel <hfinkel at anl.gov> wrote: >> >> ----- Original Message ----- >> > From: "Paul Robinson" <paul.robinson at sony.com> >> > To: "Hal Finkel" <hfinkel at anl.gov>, "David Blaikie" <dblaikie at gmail.com> >> > Cc: llvm-dev at lists.llvm.org >> > Sent: Wednesday, December 7, 2016 9:39:16 AM >> > Subject: RE: [llvm-dev] Debug Locations for Optimized Code >> > >> > >> I don't know what the right, if any, solution to this is - but I >> > >> thought I should bring it up in case you or anyone else wanted to >> > >> puzzle it over & see if the competing needs/desires might need to >> > >> be >> > >> considered. >> > > One thing that I recall being discussed was changing the way that >> > > we >> > > set the is_stmt flag in the DWARF line-table information. As I >> > > understand it, we currently set this flag for the first instruction >> > > in >> > > any sequence that is on the same line. This is, in part, why the >> > > debugger appears to jump around when stepping through code with >> > > speculated instructions, etc. If we did not do this for >> > > out-of-place >> > > instructions, then we might be able to keep for debugging >> > > information >> > > for tools while still providing a reasonable debugging experience. >> > >> > When we are looking at a situation where an instruction is merely >> > *moved* >> > from one place to another, retaining the source location and having a >> > less naïve statement-marking tactic could help the debugging >> > experience >> > without perturbing other consumers (although one still wonders >> > whether >> > profiles will get messed up in cases where e.g. a loop invariant gets >> > hoisted out of a cold loop into a hot predecessor). >> > >> > When we are looking at a situation where two instructions are >> > *merged* or >> > *combined* into one, and the original two instructions had different >> > source locations, that's a separate problem. In that case there is >> > no >> > single correct source location for the new instruction, and typically >> > erasing the source location will give a better debugging experience >> > (also >> > a less misleading profile). >> >> Is there a reason why we must only have one location for every >> instruction? If not, why not merge them and keep them all? >> >> >> Not a requirement - of course we could keep them all with some kind of >> ordered list and even potentially include a "this is the one we would've >> picked" info (eg: the first one's the one we would pick today, if we >> would've picked one rather than none) so we could be backwards compatible if >> desired. >> >> That would be a lot of engineering work to plumb through LLVM the notion >> of multiple debug locations, I think. >> >> I'm not sure how DWARF (or CodeView) and its consumers currently copes >> with multiple locations - it's probably technically possible to describe >> using the line table format (not sure if it's intentional/documented for >> that purpose), but existing consumers might have to be fixed not to trip >> over it. >> >> It'd certainly be cute/fun/nice to have the extra fidelity (though all >> extra fidelity also comes at a size cost to the IR and the resulting >> object/executable files). >> >> Not sure anyone's in a position to sign up for that work right now - but >> maybe someone is. (looks like Apple's making a bit of a push on optimized >> debug info quality at the moment) >> >> - David >> >> >> >> >> -Hal >> >> > >> > My personal opinion is that having sanitizers *rely* on debug info >> > for >> > accurate source attribution is just asking for trouble. It happens >> > to >> > work at –O0 but cannot be considered reliable in the face of >> > optimization. >> > IMO this is a fundamental design flaw; debug info is best-effort and >> > full >> > of ambiguities, as shown above. Sanitizers need a more reliable >> > source-of-truth, i.e. they should encode source info into their own >> > instrumentation. >> > >> > --paulr >> > >> > >> >> -- >> Hal Finkel >> Lead, Compiler Technology and Programming Languages >> Leadership Computing Facility >> Argonne National Laboratory >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Robinson, Paul via llvm-dev
2016-Dec-08 01:49 UTC
[llvm-dev] Debug Locations for Optimized Code
> -----Original Message----- > From: Evgenii Stepanov [mailto:eugeni.stepanov at gmail.com] > Sent: Wednesday, December 07, 2016 4:51 PM > To: Kostya Serebryany > Cc: Robinson, Paul; llvm-dev at lists.llvm.org > Subject: Re: [llvm-dev] Debug Locations for Optimized Code > > In fact, we are already using "parallel" debug info. Frame layout > descriptions encode line numbers for local variable declarations. They > don't include file names to keep object size under control, and we > can't really afford to add more duplicate debug info. > > On Wed, Dec 7, 2016 at 1:44 PM, Kostya Serebryany via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > my 2c. > > > > the sanitizers rely on debug info to produce human-readable error > messages,If the sanitizers already rely on the debug info line table, which already has a file table, why can't frame layout descriptions use that as well? --paulr