thr3ads.net - llvm dev - [llvm-dev] [RFC] Displaying source variable locations in llvm-objdump [Nov 2019]

If this information is useful, please help other people find it:
Share via:

Oliver Stannard via llvm-dev

2019-Nov-26 16:50 UTC

[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump

Hi llvm-dev,

I've uploaded a prototype patch at https://reviews.llvm.org/D70720 which
adds a new feature to llvm-objdump: displaying the location (in
registers/memory/etc) of source-level variables alongside the disassembly
display. I've put a demo of the output at https://reviews.llvm.org/M2.

I have two use-cases in mind for this:
* Users reading the disassembly of compiled code. It will be quicker/easier
to do this if the disassembly shows which value is in each register and
stack slot, rather than the user having to reverse-engineer this by hand.
* Compiler developers, who can use it to understand the debug info emitted
by the compiler, and spot missing or incorrect debug info. In fact, I've
already spotted one LLVM bug while writing this patch: in the function
`baz` in M2, the debug info claims that variable `a` is in `r0` between PC
addresses 0x14 and 0x8, which isn't true.

My questions for the LLVM community are:
* Is this an acceptable change for llvm-objdump, or is this adding too much
complexity to be worth it?
* The patch currently uses unicode box-drawing characters, is this OK? If
not, what would people rather see? A plain ASCII version of this, or some
completely different format?
* The patch displays DWARF expressions in an ad-hoc syntax, which is a mix
of C and ARM assembly (square brackets for memory access). Is there an
existing syntax which would be better for this? I think it's important that
the common cases like "load 4 bytes from memory at SP+4" are displayed
concisely.

Oliver
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191126/4357da32/attachment.html>

Eric Christopher via llvm-dev

2019-Nov-26 17:12 UTC

head link

[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump

Hi Oliver,

This is really cool. I absolutely support this for llvm-objdump. As
far as output I don't have any strong opinions other than it might be
good to separate out the "drawing" code as much as possible from the
variable collection and range code to make it a little easier, but
that's about it from here.

Thanks for the work, can't wait to use it.

-eric

On Tue, Nov 26, 2019 at 8:50 AM Oliver Stannard via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> Hi llvm-dev,
>
> I've uploaded a prototype patch at https://reviews.llvm.org/D70720
which adds a new feature to llvm-objdump: displaying the location (in
registers/memory/etc) of source-level variables alongside the disassembly
display. I've put a demo of the output at https://reviews.llvm.org/M2.
>
> I have two use-cases in mind for this:
> * Users reading the disassembly of compiled code. It will be quicker/easier
to do this if the disassembly shows which value is in each register and stack
slot, rather than the user having to reverse-engineer this by hand.
> * Compiler developers, who can use it to understand the debug info emitted
by the compiler, and spot missing or incorrect debug info. In fact, I've
already spotted one LLVM bug while writing this patch: in the function `baz` in
M2, the debug info claims that variable `a` is in `r0` between PC addresses 0x14
and 0x8, which isn't true.
>
> My questions for the LLVM community are:
> * Is this an acceptable change for llvm-objdump, or is this adding too much
complexity to be worth it?
> * The patch currently uses unicode box-drawing characters, is this OK? If
not, what would people rather see? A plain ASCII version of this, or some
completely different format?
> * The patch displays DWARF expressions in an ad-hoc syntax, which is a mix
of C and ARM assembly (square brackets for memory access). Is there an existing
syntax which would be better for this? I think it's important that the
common cases like "load 4 bytes from memory at SP+4" are displayed
concisely.
>
> Oliver
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Evgenii Stepanov via llvm-dev

2019-Nov-26 18:38 UTC

head link

[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump

Hi,

I like this a lot! I think the dwarf expression syntax is spot-on.

On Tue, Nov 26, 2019 at 9:12 AM Eric Christopher <echristo at gmail.com>
wrote:
> Hi Oliver,
>
> This is really cool. I absolutely support this for llvm-objdump. As
> far as output I don't have any strong opinions other than it might be
> good to separate out the "drawing" code as much as possible from
the
> variable collection and range code to make it a little easier, but
> that's about it from here.
>
> Thanks for the work, can't wait to use it.
>
> -eric
>
> On Tue, Nov 26, 2019 at 8:50 AM Oliver Stannard via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > Hi llvm-dev,
> >
> > I've uploaded a prototype patch at https://reviews.llvm.org/D70720
> which adds a new feature to llvm-objdump: displaying the location (in
> registers/memory/etc) of source-level variables alongside the disassembly
> display. I've put a demo of the output at https://reviews.llvm.org/M2.
> >
> > I have two use-cases in mind for this:
> > * Users reading the disassembly of compiled code. It will be
> quicker/easier to do this if the disassembly shows which value is in each
> register and stack slot, rather than the user having to reverse-engineer
> this by hand.
> > * Compiler developers, who can use it to understand the debug info
> emitted by the compiler, and spot missing or incorrect debug info. In fact,
> I've already spotted one LLVM bug while writing this patch: in the
function
> `baz` in M2, the debug info claims that variable `a` is in `r0` between PC
> addresses 0x14 and 0x8, which isn't true.
> >
> > My questions for the LLVM community are:
> > * Is this an acceptable change for llvm-objdump, or is this adding too
> much complexity to be worth it?
> > * The patch currently uses unicode box-drawing characters, is this OK?
> If not, what would people rather see? A plain ASCII version of this, or
> some completely different format?
> > * The patch displays DWARF expressions in an ad-hoc syntax, which is a
> mix of C and ARM assembly (square brackets for memory access). Is there an
> existing syntax which would be better for this? I think it's important
that
> the common cases like "load 4 bytes from memory at SP+4" are
displayed
> concisely.
> >
> > Oliver
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191126/64ac2f18/attachment.html>

Jeremy Morse via llvm-dev

2019-Nov-27 17:26 UTC

head link

[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump

Hi Oliver,

On Tue, Nov 26, 2019 at 4:50 PM Oliver Stannard via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> I've uploaded a prototype patch at https://reviews.llvm.org/D70720
which adds a new feature to llvm-objdump: displaying the location (in
registers/memory/etc) of source-level variables alongside the disassembly
display. I've put a demo of the output at https://reviews.llvm.org/M2.
I haven't read the code yet, but the demo looks incredibly good, and
I'd certainly find this feature useful on a daily basis. Many thanks
for writing it!

Oliver wrote:> * The patch currently uses unicode box-drawing characters, is this OK? If
not, what would people rather see? A plain ASCII version of this, or some
completely different format?
I enjoy a plain ASCII aesthetic myself, but I feel the extra detail is
really contributing a lot, for example distinguishing the location
range from the variable name connection (the former thick, the latter
thin). IMHO, well worth keeping the unicode.
> * The patch displays DWARF expressions in an ad-hoc syntax, which is a mix
of C and ARM assembly (square brackets for memory access). Is there an existing
syntax which would be better for this? I think it's important that the
common cases like "load 4 bytes from memory at SP+4" are displayed
concisely.
I'm not aware of existing syntax, when printing assembly LLVM will add
comments where variable ranges start such as [0]. AFAIUI that only
ever prints the base register, an initial memory deref (like [SP+4] as
your demo shows), and the rest of the expression is printed as
text/opcodes as here [1].

I reckon that outside of the two common cases you describe, it would
be enough to flag that there's extra unshown expression to consider,
by appending a star for example. The rest of the expression is easily
accessible to a developer, and displaying expressions isn't the
primary aim of the patch.

I'll get round to looking at the patch in a bit.

[0]
https://github.com/llvm/llvm-project/blob/1433b1b6ec7e1c2b2a91d2070dcd88adf1aa9774/llvm/test/tools/llvm-symbolizer/frame-types.s#L99
[1]
https://github.com/llvm/llvm-project/blob/abf25745b339700639a5d319551ed120a52fd753/llvm/test/tools/llvm-dwarfdump/X86/Inputs/statistics-fib.split-dwarf.s#L115

--
Thanks,
Jeremy

Sean Silva via llvm-dev

2019-Nov-27 18:50 UTC

head link

[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump

This looks fantastic. It will be a big time saver for folks staring at
assembly.

— Sean Silva

On Tue, Nov 26, 2019 at 8:50 AM Oliver Stannard via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi llvm-dev,
>
> I've uploaded a prototype patch at https://reviews.llvm.org/D70720
which
> adds a new feature to llvm-objdump: displaying the location (in
> registers/memory/etc) of source-level variables alongside the disassembly
> display. I've put a demo of the output at https://reviews.llvm.org/M2.
>
> I have two use-cases in mind for this:
> * Users reading the disassembly of compiled code. It will be
> quicker/easier to do this if the disassembly shows which value is in each
> register and stack slot, rather than the user having to reverse-engineer
> this by hand.
> * Compiler developers, who can use it to understand the debug info emitted
> by the compiler, and spot missing or incorrect debug info. In fact,
I've
> already spotted one LLVM bug while writing this patch: in the function
> `baz` in M2, the debug info claims that variable `a` is in `r0` between PC
> addresses 0x14 and 0x8, which isn't true.
>
> My questions for the LLVM community are:
> * Is this an acceptable change for llvm-objdump, or is this adding too
> much complexity to be worth it?
> * The patch currently uses unicode box-drawing characters, is this OK? If
> not, what would people rather see? A plain ASCII version of this, or some
> completely different format?
> * The patch displays DWARF expressions in an ad-hoc syntax, which is a mix
> of C and ARM assembly (square brackets for memory access). Is there an
> existing syntax which would be better for this? I think it's important
that
> the common cases like "load 4 bytes from memory at SP+4" are
displayed
> concisely.
>
> Oliver
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191127/295ff925/attachment.html>

Jameson Nash via llvm-dev

2019-Dec-10 00:58 UTC

head link

[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump

I agree with the others that this seems great! I think this information can
be super helpful for users both in learning and skimming assembly codes.

As a maintainer of a LLVM frontend (JuliaLang), I'm additionally interested
in whether some bits of this make sense to end up in libLLVM itself.
Probably especially the collection code pieces. For context, I've
previously written some code to pretty-print the line-table information as
code comments (sample
https://gist.github.com/vtjnash/2f2b642663655d5fc63ec7321c5bd0bd,
implementation
https://github.com/JuliaLang/julia/blob/master/src/disasm.cpp#L167), and
it's been on my mind ever since to figure out if some portion of that made
sense to upstream, if any. And also to figure out how to parse and show the
variable info along it. So even if none of this PR ends up in the libllvm
library, I'd still plan to someday figure out which bits of this PR to copy
into our AssemblyAnnotationWriter to show the variable info in our
front-end also.

But if it does get put in libLLVM, this capability seems like it could be
useful for the other instruction printers too (e.g. IR and MIR). So I'd be
interested to hear if you have any thoughts on what might make sense in a
library, and any other opportunities where I could help collaborate. This
shouldn't need to delay review and merging of your current PR though.

-jameson

On Wed, Nov 27, 2019 at 1:51 PM Sean Silva via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> This looks fantastic. It will be a big time saver for folks staring at
> assembly.
>
> — Sean Silva
>
> On Tue, Nov 26, 2019 at 8:50 AM Oliver Stannard via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi llvm-dev,
>>
>> I've uploaded a prototype patch at https://reviews.llvm.org/D70720
which
>> adds a new feature to llvm-objdump: displaying the location (in
>> registers/memory/etc) of source-level variables alongside the
disassembly
>> display. I've put a demo of the output at
https://reviews.llvm.org/M2.
>>
>> I have two use-cases in mind for this:
>> * Users reading the disassembly of compiled code. It will be
>> quicker/easier to do this if the disassembly shows which value is in
each
>> register and stack slot, rather than the user having to
reverse-engineer
>> this by hand.
>> * Compiler developers, who can use it to understand the debug info
>> emitted by the compiler, and spot missing or incorrect debug info. In
fact,
>> I've already spotted one LLVM bug while writing this patch: in the
function
>> `baz` in M2, the debug info claims that variable `a` is in `r0` between
PC
>> addresses 0x14 and 0x8, which isn't true.
>>
>> My questions for the LLVM community are:
>> * Is this an acceptable change for llvm-objdump, or is this adding too
>> much complexity to be worth it?
>> * The patch currently uses unicode box-drawing characters, is this OK?
If
>> not, what would people rather see? A plain ASCII version of this, or
some
>> completely different format?
>> * The patch displays DWARF expressions in an ad-hoc syntax, which is a
>> mix of C and ARM assembly (square brackets for memory access). Is there
an
>> existing syntax which would be better for this? I think it's
important that
>> the common cases like "load 4 bytes from memory at SP+4" are
displayed
>> concisely.
>>
>> Oliver
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191209/627debe5/attachment.html>

Michael Spencer via llvm-dev

2019-Dec-11 21:40 UTC

head link

[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump

On Tue, Nov 26, 2019 at 8:50 AM Oliver Stannard via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi llvm-dev,
>
> I've uploaded a prototype patch at https://reviews.llvm.org/D70720
which
> adds a new feature to llvm-objdump: displaying the location (in
> registers/memory/etc) of source-level variables alongside the disassembly
> display. I've put a demo of the output at https://reviews.llvm.org/M2.
>
> I have two use-cases in mind for this:
> * Users reading the disassembly of compiled code. It will be
> quicker/easier to do this if the disassembly shows which value is in each
> register and stack slot, rather than the user having to reverse-engineer
> this by hand.
> * Compiler developers, who can use it to understand the debug info emitted
> by the compiler, and spot missing or incorrect debug info. In fact,
I've
> already spotted one LLVM bug while writing this patch: in the function
> `baz` in M2, the debug info claims that variable `a` is in `r0` between PC
> addresses 0x14 and 0x8, which isn't true.
>
> My questions for the LLVM community are:
> * Is this an acceptable change for llvm-objdump, or is this adding too
> much complexity to be worth it?
> * The patch currently uses unicode box-drawing characters, is this OK? If
> not, what would people rather see? A plain ASCII version of this, or some
> completely different format?
> * The patch displays DWARF expressions in an ad-hoc syntax, which is a mix
> of C and ARM assembly (square brackets for memory access). Is there an
> existing syntax which would be better for this? I think it's important
that
> the common cases like "load 4 bytes from memory at SP+4" are
displayed
> concisely.
>
> Oliver
>
This is a great addition to llvm-objdump. My only concern is that
llvm-objdump.cpp is already pretty complicated and in need of refactoring
as it's had lots of small features added over the years. I'd really like
to
see the disassembly formatting stuff moved out to another file, but I'm not
sure that should be a blocker.

While I really like the unicode, it won't work on Windows by default. It
would be nice if we could detect if the terminal supported unicode, but I'm
not sure there's actually a good way to do that.

- Michael Spencer
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191211/ce52a192/attachment.html>

llvm dev - Nov 2019 - [RFC] Displaying source variable locations in llvm-objdump

[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump

[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump

[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump

[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump

[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump

[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump

[llvm-dev] [RFC] Displaying source variable locations in llvm-objdump