Oliver Stannard via llvm-dev
2019-Nov-26 16:50 UTC
[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump
Hi llvm-dev, I've uploaded a prototype patch at https://reviews.llvm.org/D70720 which adds a new feature to llvm-objdump: displaying the location (in registers/memory/etc) of source-level variables alongside the disassembly display. I've put a demo of the output at https://reviews.llvm.org/M2. I have two use-cases in mind for this: * Users reading the disassembly of compiled code. It will be quicker/easier to do this if the disassembly shows which value is in each register and stack slot, rather than the user having to reverse-engineer this by hand. * Compiler developers, who can use it to understand the debug info emitted by the compiler, and spot missing or incorrect debug info. In fact, I've already spotted one LLVM bug while writing this patch: in the function `baz` in M2, the debug info claims that variable `a` is in `r0` between PC addresses 0x14 and 0x8, which isn't true. My questions for the LLVM community are: * Is this an acceptable change for llvm-objdump, or is this adding too much complexity to be worth it? * The patch currently uses unicode box-drawing characters, is this OK? If not, what would people rather see? A plain ASCII version of this, or some completely different format? * The patch displays DWARF expressions in an ad-hoc syntax, which is a mix of C and ARM assembly (square brackets for memory access). Is there an existing syntax which would be better for this? I think it's important that the common cases like "load 4 bytes from memory at SP+4" are displayed concisely. Oliver -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191126/4357da32/attachment.html>
Eric Christopher via llvm-dev
2019-Nov-26 17:12 UTC
[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump
Hi Oliver, This is really cool. I absolutely support this for llvm-objdump. As far as output I don't have any strong opinions other than it might be good to separate out the "drawing" code as much as possible from the variable collection and range code to make it a little easier, but that's about it from here. Thanks for the work, can't wait to use it. -eric On Tue, Nov 26, 2019 at 8:50 AM Oliver Stannard via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Hi llvm-dev, > > I've uploaded a prototype patch at https://reviews.llvm.org/D70720 which adds a new feature to llvm-objdump: displaying the location (in registers/memory/etc) of source-level variables alongside the disassembly display. I've put a demo of the output at https://reviews.llvm.org/M2. > > I have two use-cases in mind for this: > * Users reading the disassembly of compiled code. It will be quicker/easier to do this if the disassembly shows which value is in each register and stack slot, rather than the user having to reverse-engineer this by hand. > * Compiler developers, who can use it to understand the debug info emitted by the compiler, and spot missing or incorrect debug info. In fact, I've already spotted one LLVM bug while writing this patch: in the function `baz` in M2, the debug info claims that variable `a` is in `r0` between PC addresses 0x14 and 0x8, which isn't true. > > My questions for the LLVM community are: > * Is this an acceptable change for llvm-objdump, or is this adding too much complexity to be worth it? > * The patch currently uses unicode box-drawing characters, is this OK? If not, what would people rather see? A plain ASCII version of this, or some completely different format? > * The patch displays DWARF expressions in an ad-hoc syntax, which is a mix of C and ARM assembly (square brackets for memory access). Is there an existing syntax which would be better for this? I think it's important that the common cases like "load 4 bytes from memory at SP+4" are displayed concisely. > > Oliver > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Evgenii Stepanov via llvm-dev
2019-Nov-26 18:38 UTC
[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump
Hi, I like this a lot! I think the dwarf expression syntax is spot-on. On Tue, Nov 26, 2019 at 9:12 AM Eric Christopher <echristo at gmail.com> wrote:> Hi Oliver, > > This is really cool. I absolutely support this for llvm-objdump. As > far as output I don't have any strong opinions other than it might be > good to separate out the "drawing" code as much as possible from the > variable collection and range code to make it a little easier, but > that's about it from here. > > Thanks for the work, can't wait to use it. > > -eric > > On Tue, Nov 26, 2019 at 8:50 AM Oliver Stannard via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > > Hi llvm-dev, > > > > I've uploaded a prototype patch at https://reviews.llvm.org/D70720 > which adds a new feature to llvm-objdump: displaying the location (in > registers/memory/etc) of source-level variables alongside the disassembly > display. I've put a demo of the output at https://reviews.llvm.org/M2. > > > > I have two use-cases in mind for this: > > * Users reading the disassembly of compiled code. It will be > quicker/easier to do this if the disassembly shows which value is in each > register and stack slot, rather than the user having to reverse-engineer > this by hand. > > * Compiler developers, who can use it to understand the debug info > emitted by the compiler, and spot missing or incorrect debug info. In fact, > I've already spotted one LLVM bug while writing this patch: in the function > `baz` in M2, the debug info claims that variable `a` is in `r0` between PC > addresses 0x14 and 0x8, which isn't true. > > > > My questions for the LLVM community are: > > * Is this an acceptable change for llvm-objdump, or is this adding too > much complexity to be worth it? > > * The patch currently uses unicode box-drawing characters, is this OK? > If not, what would people rather see? A plain ASCII version of this, or > some completely different format? > > * The patch displays DWARF expressions in an ad-hoc syntax, which is a > mix of C and ARM assembly (square brackets for memory access). Is there an > existing syntax which would be better for this? I think it's important that > the common cases like "load 4 bytes from memory at SP+4" are displayed > concisely. > > > > Oliver > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191126/64ac2f18/attachment.html>
Jeremy Morse via llvm-dev
2019-Nov-27 17:26 UTC
[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump
Hi Oliver, On Tue, Nov 26, 2019 at 4:50 PM Oliver Stannard via llvm-dev <llvm-dev at lists.llvm.org> wrote:> I've uploaded a prototype patch at https://reviews.llvm.org/D70720 which adds a new feature to llvm-objdump: displaying the location (in registers/memory/etc) of source-level variables alongside the disassembly display. I've put a demo of the output at https://reviews.llvm.org/M2.I haven't read the code yet, but the demo looks incredibly good, and I'd certainly find this feature useful on a daily basis. Many thanks for writing it! Oliver wrote:> * The patch currently uses unicode box-drawing characters, is this OK? If not, what would people rather see? A plain ASCII version of this, or some completely different format?I enjoy a plain ASCII aesthetic myself, but I feel the extra detail is really contributing a lot, for example distinguishing the location range from the variable name connection (the former thick, the latter thin). IMHO, well worth keeping the unicode.> * The patch displays DWARF expressions in an ad-hoc syntax, which is a mix of C and ARM assembly (square brackets for memory access). Is there an existing syntax which would be better for this? I think it's important that the common cases like "load 4 bytes from memory at SP+4" are displayed concisely.I'm not aware of existing syntax, when printing assembly LLVM will add comments where variable ranges start such as [0]. AFAIUI that only ever prints the base register, an initial memory deref (like [SP+4] as your demo shows), and the rest of the expression is printed as text/opcodes as here [1]. I reckon that outside of the two common cases you describe, it would be enough to flag that there's extra unshown expression to consider, by appending a star for example. The rest of the expression is easily accessible to a developer, and displaying expressions isn't the primary aim of the patch. I'll get round to looking at the patch in a bit. [0] https://github.com/llvm/llvm-project/blob/1433b1b6ec7e1c2b2a91d2070dcd88adf1aa9774/llvm/test/tools/llvm-symbolizer/frame-types.s#L99 [1] https://github.com/llvm/llvm-project/blob/abf25745b339700639a5d319551ed120a52fd753/llvm/test/tools/llvm-dwarfdump/X86/Inputs/statistics-fib.split-dwarf.s#L115 -- Thanks, Jeremy
Sean Silva via llvm-dev
2019-Nov-27 18:50 UTC
[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump
This looks fantastic. It will be a big time saver for folks staring at assembly. — Sean Silva On Tue, Nov 26, 2019 at 8:50 AM Oliver Stannard via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi llvm-dev, > > I've uploaded a prototype patch at https://reviews.llvm.org/D70720 which > adds a new feature to llvm-objdump: displaying the location (in > registers/memory/etc) of source-level variables alongside the disassembly > display. I've put a demo of the output at https://reviews.llvm.org/M2. > > I have two use-cases in mind for this: > * Users reading the disassembly of compiled code. It will be > quicker/easier to do this if the disassembly shows which value is in each > register and stack slot, rather than the user having to reverse-engineer > this by hand. > * Compiler developers, who can use it to understand the debug info emitted > by the compiler, and spot missing or incorrect debug info. In fact, I've > already spotted one LLVM bug while writing this patch: in the function > `baz` in M2, the debug info claims that variable `a` is in `r0` between PC > addresses 0x14 and 0x8, which isn't true. > > My questions for the LLVM community are: > * Is this an acceptable change for llvm-objdump, or is this adding too > much complexity to be worth it? > * The patch currently uses unicode box-drawing characters, is this OK? If > not, what would people rather see? A plain ASCII version of this, or some > completely different format? > * The patch displays DWARF expressions in an ad-hoc syntax, which is a mix > of C and ARM assembly (square brackets for memory access). Is there an > existing syntax which would be better for this? I think it's important that > the common cases like "load 4 bytes from memory at SP+4" are displayed > concisely. > > Oliver > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191127/295ff925/attachment.html>
Jameson Nash via llvm-dev
2019-Dec-10 00:58 UTC
[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump
I agree with the others that this seems great! I think this information can be super helpful for users both in learning and skimming assembly codes. As a maintainer of a LLVM frontend (JuliaLang), I'm additionally interested in whether some bits of this make sense to end up in libLLVM itself. Probably especially the collection code pieces. For context, I've previously written some code to pretty-print the line-table information as code comments (sample https://gist.github.com/vtjnash/2f2b642663655d5fc63ec7321c5bd0bd, implementation https://github.com/JuliaLang/julia/blob/master/src/disasm.cpp#L167), and it's been on my mind ever since to figure out if some portion of that made sense to upstream, if any. And also to figure out how to parse and show the variable info along it. So even if none of this PR ends up in the libllvm library, I'd still plan to someday figure out which bits of this PR to copy into our AssemblyAnnotationWriter to show the variable info in our front-end also. But if it does get put in libLLVM, this capability seems like it could be useful for the other instruction printers too (e.g. IR and MIR). So I'd be interested to hear if you have any thoughts on what might make sense in a library, and any other opportunities where I could help collaborate. This shouldn't need to delay review and merging of your current PR though. -jameson On Wed, Nov 27, 2019 at 1:51 PM Sean Silva via llvm-dev < llvm-dev at lists.llvm.org> wrote:> This looks fantastic. It will be a big time saver for folks staring at > assembly. > > — Sean Silva > > On Tue, Nov 26, 2019 at 8:50 AM Oliver Stannard via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi llvm-dev, >> >> I've uploaded a prototype patch at https://reviews.llvm.org/D70720 which >> adds a new feature to llvm-objdump: displaying the location (in >> registers/memory/etc) of source-level variables alongside the disassembly >> display. I've put a demo of the output at https://reviews.llvm.org/M2. >> >> I have two use-cases in mind for this: >> * Users reading the disassembly of compiled code. It will be >> quicker/easier to do this if the disassembly shows which value is in each >> register and stack slot, rather than the user having to reverse-engineer >> this by hand. >> * Compiler developers, who can use it to understand the debug info >> emitted by the compiler, and spot missing or incorrect debug info. In fact, >> I've already spotted one LLVM bug while writing this patch: in the function >> `baz` in M2, the debug info claims that variable `a` is in `r0` between PC >> addresses 0x14 and 0x8, which isn't true. >> >> My questions for the LLVM community are: >> * Is this an acceptable change for llvm-objdump, or is this adding too >> much complexity to be worth it? >> * The patch currently uses unicode box-drawing characters, is this OK? If >> not, what would people rather see? A plain ASCII version of this, or some >> completely different format? >> * The patch displays DWARF expressions in an ad-hoc syntax, which is a >> mix of C and ARM assembly (square brackets for memory access). Is there an >> existing syntax which would be better for this? I think it's important that >> the common cases like "load 4 bytes from memory at SP+4" are displayed >> concisely. >> >> Oliver >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191209/627debe5/attachment.html>
Michael Spencer via llvm-dev
2019-Dec-11 21:40 UTC
[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump
On Tue, Nov 26, 2019 at 8:50 AM Oliver Stannard via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi llvm-dev, > > I've uploaded a prototype patch at https://reviews.llvm.org/D70720 which > adds a new feature to llvm-objdump: displaying the location (in > registers/memory/etc) of source-level variables alongside the disassembly > display. I've put a demo of the output at https://reviews.llvm.org/M2. > > I have two use-cases in mind for this: > * Users reading the disassembly of compiled code. It will be > quicker/easier to do this if the disassembly shows which value is in each > register and stack slot, rather than the user having to reverse-engineer > this by hand. > * Compiler developers, who can use it to understand the debug info emitted > by the compiler, and spot missing or incorrect debug info. In fact, I've > already spotted one LLVM bug while writing this patch: in the function > `baz` in M2, the debug info claims that variable `a` is in `r0` between PC > addresses 0x14 and 0x8, which isn't true. > > My questions for the LLVM community are: > * Is this an acceptable change for llvm-objdump, or is this adding too > much complexity to be worth it? > * The patch currently uses unicode box-drawing characters, is this OK? If > not, what would people rather see? A plain ASCII version of this, or some > completely different format? > * The patch displays DWARF expressions in an ad-hoc syntax, which is a mix > of C and ARM assembly (square brackets for memory access). Is there an > existing syntax which would be better for this? I think it's important that > the common cases like "load 4 bytes from memory at SP+4" are displayed > concisely. > > Oliver >This is a great addition to llvm-objdump. My only concern is that llvm-objdump.cpp is already pretty complicated and in need of refactoring as it's had lots of small features added over the years. I'd really like to see the disassembly formatting stuff moved out to another file, but I'm not sure that should be a blocker. While I really like the unicode, it won't work on Windows by default. It would be nice if we could detect if the terminal supported unicode, but I'm not sure there's actually a good way to do that. - Michael Spencer -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191211/ce52a192/attachment.html>