thr3ads.net - llvm dev - [llvm-dev] [RFC][DWARF] Handling errors in DWARF .debug

If this information is useful, please help other people find it:
Share via:

Pavel Labath via llvm-dev

2019-Dec-17 08:38 UTC

[llvm-dev] [RFC][DWARF] Handling errors in DWARF .debug_line parsing

On 16/12/2019 23:47, David Blaikie via llvm-dev wrote:> 
> 
> On Mon, Dec 16, 2019 at 2:40 PM Jonas Devlieghere <jonas at
devlieghere.com
> <mailto:jonas at devlieghere.com>> wrote:
> 
>     Hey James,
> 
>     This sounds really interesting. A few things that come to mind:
> 
>      - I'm curious what kind of errors you'd be okay with in dump
mode but
>     not in consumer mode. From LLDB's perspective, we probably want to
>     extract as much information as possible from a malformed line table,
>     as long as we don't end up lying of course.
>      - I need to take another look at the code, but don't we already
have
>     something similar where the consumer can decide how to deal with these
>     kinds of issues? I'm bringing this up because I don't think we
really
>     need different parsing modes. I think we need to let the consumer
>     decide what to do with the potential errors. The verifier in dwarfdump
>     would presumably stop after the first error (~strict mode) while the
>     dumper would just move on. Would the parsing modes then be a set of
>     "presets" with regards to how to handle these issues?
>      - I'm in favor of creating these error messages lazily, especially
>     for LLDB where we care about responsiveness. However, that does
>     conflict with the behavior you want for the DWARF verifier. We
>     probably want a way to control this?
> 
> 
> For my part - I'd imagine the verifier would be an aggressive reader of
> DWARF - it'd use the same lazy/high quality error API, but the verifier
> code itself would try to walk all of the parts of the DWARF to manifest
> any lazy error cases, rather than needing any other codepath in the parser.
That would be my ideal setup as well (disclaimer: I work on lldb) --
have the parser provide sufficient information so that the verification
can be done from the "outside".

That is, if the goal is to have stronger verification of generated line
tables -- it's not fully clear (to me) whether that's your main
motivation for adding these checks.

regards,
pavel

James Henderson via llvm-dev

2019-Dec-17 09:48 UTC

head link

[llvm-dev] [RFC][DWARF] Handling errors in DWARF .debug_line parsing

Thanks for the comments. I should have done a bit more experimenting before
writing up this RFC, it seems :) After sending it off, I played around with
the existing code, tor refresh my memory of what I did a while back in
adding the Recoverable/Unrecoverable Error callbacks. These are written
with the intent that the client could decide how to handle errors, as Jonas
points out, making the additional strictness level a little unnecessary.
Unrecoverable errors stop the parsing, whereas recoverable allow it to
continue as best it can. I think the current parser is stricter than it
needs to be to provide useful information, so I'm going to put a patch up
soon to loosen it up a bit before I start adding more checks.

The Errors reported by the parser only contain strings, which means that
it's difficult for a client to use them in some other way, for example to
identify issues that it might need to know/completely ignore etc. Short of
adding extra data to the Error, such as an enum, it's unclear to me how to
improve that situation. Of course, if we did add the extra data, it would
allow LLDB and other clients to decide what errors to ignore/pay attention
to on a finer grained basis, which partially achieves the lazy creation
goal (only report the problems that apply to the things the client cares
about), but doesn't improve responsiveness since the whole program will
still be parsed.

Lazy creation of errors is tricky, at least for some cases, given the
current code state. The problem is that the parser reads the name table,
calculates the rows etc as it parses a program, and that's the only way of
getting this data through the current interface. The only laziness
available as things stand is to skip line tables that we don't want (e.g.
to find the line table at a given offset). This ignores the body of the
skipped programs. The alternative is to refactor the parser completely into
much smaller pieces, which provide access to the desired bits individually,
and delays reading until that point. I can definitely see benefits in this
approach, and I think it would solve the needs of things at both ends of
the scale (verifiers and dumpers) but it's a lot more work, which I'm
not
convinced I personally can take on, although I'd be happy to review it if
others fancy attempting that option.

James

On Tue, 17 Dec 2019 at 08:36, Pavel Labath via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 16/12/2019 23:47, David Blaikie via llvm-dev wrote:
> >
> >
> > On Mon, Dec 16, 2019 at 2:40 PM Jonas Devlieghere <jonas at
devlieghere.com
> > <mailto:jonas at devlieghere.com>> wrote:
> >
> >     Hey James,
> >
> >     This sounds really interesting. A few things that come to mind:
> >
> >      - I'm curious what kind of errors you'd be okay with in
dump mode
> but
> >     not in consumer mode. From LLDB's perspective, we probably
want to
> >     extract as much information as possible from a malformed line
table,
> >     as long as we don't end up lying of course.
> >      - I need to take another look at the code, but don't we
already have
> >     something similar where the consumer can decide how to deal with
> these
> >     kinds of issues? I'm bringing this up because I don't
think we really
> >     need different parsing modes. I think we need to let the consumer
> >     decide what to do with the potential errors. The verifier in
> dwarfdump
> >     would presumably stop after the first error (~strict mode) while
the
> >     dumper would just move on. Would the parsing modes then be a set
of
> >     "presets" with regards to how to handle these issues?
> >      - I'm in favor of creating these error messages lazily,
especially
> >     for LLDB where we care about responsiveness. However, that does
> >     conflict with the behavior you want for the DWARF verifier. We
> >     probably want a way to control this?
> >
> >
> > For my part - I'd imagine the verifier would be an aggressive
reader of
> > DWARF - it'd use the same lazy/high quality error API, but the
verifier
> > code itself would try to walk all of the parts of the DWARF to
manifest
> > any lazy error cases, rather than needing any other codepath in the
> parser.
>
> That would be my ideal setup as well (disclaimer: I work on lldb) --
> have the parser provide sufficient information so that the verification
> can be done from the "outside".
>
> That is, if the goal is to have stronger verification of generated line
> tables -- it's not fully clear (to me) whether that's your main
> motivation for adding these checks.
>
> regards,
> pavel
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191217/e3f7b151/attachment.html>

Pavel Labath via llvm-dev

2019-Dec-17 10:07 UTC

head link

[llvm-dev] [RFC][DWARF] Handling errors in DWARF .debug_line parsing

Having recently done something similar for .debug_loc(lists) (a much
simpler data structure) I can certainly appreciate the complexity and
commitment necessary to pull something like that off.

Anyway, I just wanted to say that I think that adding new fields to the
returned Errors (or even creating completely new ErrorInfo types) seems
perfectly reasonable to me -- I think that's one of the main reasons for
introducing the Error class in the first place.

cheers,
pl

On 17/12/2019 10:48, James Henderson wrote:> Thanks for the comments. I should have done a bit more experimenting
> before writing up this RFC, it seems :) After sending it off, I played
> around with the existing code, tor refresh my memory of what I did a
> while back in adding the Recoverable/Unrecoverable Error callbacks.
> These are written with the intent that the client could decide how to
> handle errors, as Jonas points out, making the additional strictness
> level a little unnecessary. Unrecoverable errors stop the parsing,
> whereas recoverable allow it to continue as best it can. I think the
> current parser is stricter than it needs to be to provide useful
> information, so I'm going to put a patch up soon to loosen it up a bit
> before I start adding more checks.
> 
> The Errors reported by the parser only contain strings, which means that
> it's difficult for a client to use them in some other way, for example
> to identify issues that it might need to know/completely ignore etc.
> Short of adding extra data to the Error, such as an enum, it's unclear
> to me how to improve that situation. Of course, if we did add the extra
> data, it would allow LLDB and other clients to decide what errors to
> ignore/pay attention to on a finer grained basis, which partially
> achieves the lazy creation goal (only report the problems that apply to
> the things the client cares about), but doesn't improve responsiveness
> since the whole program will still be parsed.
> 
> Lazy creation of errors is tricky, at least for some cases, given the
> current code state. The problem is that the parser reads the name table,
> calculates the rows etc as it parses a program, and that's the only way
> of getting this data through the current interface. The only laziness
> available as things stand is to skip line tables that we don't want
> (e.g. to find the line table at a given offset). This ignores the body
> of the skipped programs. The alternative is to refactor the parser
> completely into much smaller pieces, which provide access to the desired
> bits individually, and delays reading until that point. I can definitely
> see benefits in this approach, and I think it would solve the needs of
> things at both ends of the scale (verifiers and dumpers) but it's a lot
> more work, which I'm not convinced I personally can take on, although
> I'd be happy to review it if others fancy attempting that option.
> 
> James
> 
> On Tue, 17 Dec 2019 at 08:36, Pavel Labath via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> 
>     On 16/12/2019 23:47, David Blaikie via llvm-dev wrote:
>     >
>     >
>     > On Mon, Dec 16, 2019 at 2:40 PM Jonas Devlieghere
>     <jonas at devlieghere.com <mailto:jonas at devlieghere.com>
>     > <mailto:jonas at devlieghere.com <mailto:jonas at
devlieghere.com>>> wrote:
>     >
>     >     Hey James,
>     >
>     >     This sounds really interesting. A few things that come to
mind:
>     >
>     >      - I'm curious what kind of errors you'd be okay with
in dump
>     mode but
>     >     not in consumer mode. From LLDB's perspective, we probably
want to
>     >     extract as much information as possible from a malformed line
>     table,
>     >     as long as we don't end up lying of course.
>     >      - I need to take another look at the code, but don't we
>     already have
>     >     something similar where the consumer can decide how to deal
>     with these
>     >     kinds of issues? I'm bringing this up because I don't
think we
>     really
>     >     need different parsing modes. I think we need to let the
consumer
>     >     decide what to do with the potential errors. The verifier in
>     dwarfdump
>     >     would presumably stop after the first error (~strict mode)
>     while the
>     >     dumper would just move on. Would the parsing modes then be a
>     set of
>     >     "presets" with regards to how to handle these
issues?
>     >      - I'm in favor of creating these error messages lazily,
>     especially
>     >     for LLDB where we care about responsiveness. However, that
does
>     >     conflict with the behavior you want for the DWARF verifier. We
>     >     probably want a way to control this?
>     >
>     >
>     > For my part - I'd imagine the verifier would be an aggressive
>     reader of
>     > DWARF - it'd use the same lazy/high quality error API, but the
>     verifier
>     > code itself would try to walk all of the parts of the DWARF to
>     manifest
>     > any lazy error cases, rather than needing any other codepath in
>     the parser.
> 
>     That would be my ideal setup as well (disclaimer: I work on lldb) --
>     have the parser provide sufficient information so that the verification
>     can be done from the "outside".
> 
>     That is, if the goal is to have stronger verification of generated line
>     tables -- it's not fully clear (to me) whether that's your main
>     motivation for adding these checks.
> 
>     regards,
>     pavel
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

llvm dev - Dec 2019 - [RFC][DWARF] Handling errors in DWARF .debug_line parsing

[llvm-dev] [RFC][DWARF] Handling errors in DWARF .debug_line parsing

[llvm-dev] [RFC][DWARF] Handling errors in DWARF .debug_line parsing

[llvm-dev] [RFC][DWARF] Handling errors in DWARF .debug_line parsing