Robinson, Paul via llvm-dev
2017-Apr-27 20:12 UTC
[llvm-dev] [DWARFv5] The new line-table section header
The next feature on my DWARF 5 list is the line-table header. While this is pretty easy generate, it is a real bear to parse, so I thought I should let y'all know what I'm up to and why as I head out to the yak farm. Any thoughts and suggestions would be very much appreciated. The v5 directory and file tables no longer have a fixed format; instead, we have a list of field descriptors followed by the fields for each entry in the directory or file table. Normally the directory table would have one descriptor: DW_LNCT_path, DW_FORM_string This tells us each entry contains a pathname encoded as an inline string. (Which is essentially how the v4 directory table is encoded.) However, because of the FORM code, we now have whole new worlds of complication regarding where the actual string might be. We might have DW_FORM_strp which puts the actual string in the .debug_string section; eventually we could have DW_FORM_line_str (pointing to .debug_line_str) or even DW_FORM_strx (indirecting through .debug_str_offsets). Conveniently, we have the DWARFFormValue class which knows how to decode data based on what the form code is. Inconveniently, DWARFFormValue assumes it is looking at a .debug_info section, and picks up its relocations from a DWARFUnit. But if we're using DWARFFormValue to decode data from .debug_line, then it needs a different relocation map. It's only the string data that causes a problem; all the other kinds of data in the file table are constants, and retrieving constants with DWARFFormValue is no problem. I think the right tactic is a "top-down" approach, starting by teaching DWARFDebugLine to parse a v5 line-table header but support only DW_FORM_string for the paths. This should let me use an unmodified DWARFFormValue to parse the directory and file tables.>From there, teaching DWARFFormValue to handle DW_FORM_strp from the.debug_line section should be pretty well motivated and it should be straightforward to see what's really needed in terms of the API. Once we get that far, I would hope that the line_str and strx<N> forms would not require much additional effort. Actually Wolfgang is separately working on the strx<N> forms so with any luck that would Just Work for the .debug_line section. Oh yeah, after all that I'd actually generate the v5 header from LLVM. The idea is that by then, I can use llvm-dwarfdump to validate it and be very confident that it would all work. Does all that sound like a plan? The alternative would be to try to teach DWARFFormValue to handle DW_FORM_strp from .debug_line up front, but I think we might rather go at this in smaller pieces. Thanks, --paulr
David Blaikie via llvm-dev
2017-Apr-28 19:10 UTC
[llvm-dev] [DWARFv5] The new line-table section header
On Thu, Apr 27, 2017 at 1:12 PM Robinson, Paul via llvm-dev < llvm-dev at lists.llvm.org> wrote:> The next feature on my DWARF 5 list is the line-table header. While this > is pretty easy generate, it is a real bear to parse, so I thought I should > let y'all know what I'm up to and why as I head out to the yak farm. Any > thoughts and suggestions would be very much appreciated. >Thanks a bunch for sending this email! - I'd love to see more like this when large pieces are undertaken in LLVM for just these reasons, so we can all get a sense of where things are aiming, the motivation, etc.> The v5 directory and file tables no longer have a fixed format; instead, > we have a list of field descriptors followed by the fields for each entry > in the directory or file table. Normally the directory table would have > one descriptor: > DW_LNCT_path, DW_FORM_string > This tells us each entry contains a pathname encoded as an inline string. > (Which is essentially how the v4 directory table is encoded.) However, > because of the FORM code, we now have whole new worlds of complication > regarding where the actual string might be. We might have DW_FORM_strp > which puts the actual string in the .debug_string section; eventually we > could have DW_FORM_line_str (pointing to .debug_line_str)What's DW_FORM_line_str/debug_line_str for? (so the line table can be kept while strippnig the rest of the debug info, including its strings?)> or even > DW_FORM_strx (indirecting through .debug_str_offsets). > > Conveniently, we have the DWARFFormValue class which knows how to decode > data based on what the form code is. > > Inconveniently, DWARFFormValue assumes it is looking at a .debug_info > section, and picks up its relocations from a DWARFUnit. But if we're > using DWARFFormValue to decode data from .debug_line, then it needs a > different relocation map. >I'm going to assume there's going to be similar inconvenience on the other side (the emission side).> It's only the string data that causes a problem; all the other kinds > of data in the file table are constants, and retrieving constants > with DWARFFormValue is no problem. > > > I think the right tactic is a "top-down" approach, starting by teaching > DWARFDebugLine to parse a v5 line-table header but support only > DW_FORM_string for the paths. This should let me use an unmodified > DWARFFormValue to parse the directory and file tables. >Any idea what form you'll be using for LLVM's emisison? LLVM currently only emits strp - figure the same for the line table? Or more likely to use _string unconditionally? In any case - if/when you have the right format support in llvm-dwarfdump, you could go ahead and implement the output code in LLVM's codegen, even before llvm-dwarfdump can handle every arcane format that any DWARF producer might decide to use. (& then you can continue implementing those - but it'd get you the LLVM functionality sooner, rather than gating it on having a fully general parser) This approach has certainly been taken in the past - implementing enough dumping support as needed for LLVM's generation functionality & expanding as-needed.> From there, teaching DWARFFormValue to handle DW_FORM_strp from the > .debug_line section should be pretty well motivated and it should be > straightforward to see what's really needed in terms of the API. > > Once we get that far, I would hope that the line_str and strx<N> forms > would not require much additional effort. Actually Wolfgang is > separately working on the strx<N> forms so with any luck that would > Just Work for the .debug_line section. > > Oh yeah, after all that I'd actually generate the v5 header from LLVM. > The idea is that by then, I can use llvm-dwarfdump to validate it and > be very confident that it would all work. > > Does all that sound like a plan? The alternative would be to try to > teach DWARFFormValue to handle DW_FORM_strp from .debug_line up front, > but I think we might rather go at this in smaller pieces. > > Thanks, > --paulr > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170428/17521a14/attachment.html>
Robinson, Paul via llvm-dev
2017-Apr-28 22:49 UTC
[llvm-dev] [DWARFv5] The new line-table section header
What's DW_FORM_line_str/debug_line_str for? (so the line table can be kept while strippnig the rest of the debug info, including its strings?) Exactly. In prior versions of DWARF the line-table strings were always embedded directly in .debug_line so it was possible to strip everything else. With v5 we wanted to make sure it was straightforward to keep doing that. Inconveniently, DWARFFormValue assumes it is looking at a .debug_info section, and picks up its relocations from a DWARFUnit. But if we're using DWARFFormValue to decode data from .debug_line, then it needs a different relocation map. I'm going to assume there's going to be similar inconvenience on the other side (the emission side). I hope not. Emission of the .debug_line section is already prepared to conjure up relocations (to various .text sections) as needed, and I would anticipate that once we can get the line-table strings to come out in another section, emitting the corresponding relocations would be quite natural. Any idea what form you'll be using for LLVM's emisison? LLVM currently only emits strp - figure the same for the line table? Or more likely to use _string unconditionally? I'd start out with _string. LLVM currently only emits strp for actual attributes in the .debug_info section; but these pathname strings are (currently for v4) dumped directly into the .debug_line section, and by specifying FORM_string I can keep doing that. Also this is the form that the first round of dumper changes will be able to handle. ☺ Later on we can change emission to use _line_strp; that entails emitting the actual strings into a different section. Once we do that, it becomes possible for the linker to do string pooling on the path names (the original motivation for this more complicated header format). I hope to be able to post the first patch next week. Thanks! --paulr From: David Blaikie [mailto:dblaikie at gmail.com] Sent: Friday, April 28, 2017 12:10 PM To: Robinson, Paul; llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] [DWARFv5] The new line-table section header On Thu, Apr 27, 2017 at 1:12 PM Robinson, Paul via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: The next feature on my DWARF 5 list is the line-table header. While this is pretty easy generate, it is a real bear to parse, so I thought I should let y'all know what I'm up to and why as I head out to the yak farm. Any thoughts and suggestions would be very much appreciated. Thanks a bunch for sending this email! - I'd love to see more like this when large pieces are undertaken in LLVM for just these reasons, so we can all get a sense of where things are aiming, the motivation, etc. The v5 directory and file tables no longer have a fixed format; instead, we have a list of field descriptors followed by the fields for each entry in the directory or file table. Normally the directory table would have one descriptor: DW_LNCT_path, DW_FORM_string This tells us each entry contains a pathname encoded as an inline string. (Which is essentially how the v4 directory table is encoded.) However, because of the FORM code, we now have whole new worlds of complication regarding where the actual string might be. We might have DW_FORM_strp which puts the actual string in the .debug_string section; eventually we could have DW_FORM_line_str (pointing to .debug_line_str) What's DW_FORM_line_str/debug_line_str for? (so the line table can be kept while strippnig the rest of the debug info, including its strings?) or even DW_FORM_strx (indirecting through .debug_str_offsets). Conveniently, we have the DWARFFormValue class which knows how to decode data based on what the form code is. Inconveniently, DWARFFormValue assumes it is looking at a .debug_info section, and picks up its relocations from a DWARFUnit. But if we're using DWARFFormValue to decode data from .debug_line, then it needs a different relocation map. I'm going to assume there's going to be similar inconvenience on the other side (the emission side). It's only the string data that causes a problem; all the other kinds of data in the file table are constants, and retrieving constants with DWARFFormValue is no problem. I think the right tactic is a "top-down" approach, starting by teaching DWARFDebugLine to parse a v5 line-table header but support only DW_FORM_string for the paths. This should let me use an unmodified DWARFFormValue to parse the directory and file tables. Any idea what form you'll be using for LLVM's emisison? LLVM currently only emits strp - figure the same for the line table? Or more likely to use _string unconditionally? In any case - if/when you have the right format support in llvm-dwarfdump, you could go ahead and implement the output code in LLVM's codegen, even before llvm-dwarfdump can handle every arcane format that any DWARF producer might decide to use. (& then you can continue implementing those - but it'd get you the LLVM functionality sooner, rather than gating it on having a fully general parser) This approach has certainly been taken in the past - implementing enough dumping support as needed for LLVM's generation functionality & expanding as-needed. From there, teaching DWARFFormValue to handle DW_FORM_strp from the .debug_line section should be pretty well motivated and it should be straightforward to see what's really needed in terms of the API. Once we get that far, I would hope that the line_str and strx<N> forms would not require much additional effort. Actually Wolfgang is separately working on the strx<N> forms so with any luck that would Just Work for the .debug_line section. Oh yeah, after all that I'd actually generate the v5 header from LLVM. The idea is that by then, I can use llvm-dwarfdump to validate it and be very confident that it would all work. Does all that sound like a plan? The alternative would be to try to teach DWARFFormValue to handle DW_FORM_strp from .debug_line up front, but I think we might rather go at this in smaller pieces. Thanks, --paulr _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170428/7c930e12/attachment.html>