thr3ads.net - llvm dev - [LLVMdev] First-class debug info IR: MDLocation [Oct 2014]

If this information is useful, please help other people find it:
Share via:

Duncan P. N. Exon Smith

2014-Oct-24 23:16 UTC

[LLVMdev] First-class debug info IR: MDLocation

I've attached a preliminary patch for `MDLocation` as a follow-up to the
RFC [1] last week.  It's not commit-ready -- in particular, it squashes
a bunch of commits together and doesn't pass `make check` -- but I think
it's close enough to indicate the direction and work toward consensus.

[1]: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-October/077715.html

IMO, the files to focus on are:

    include/llvm/IR/DebugInfo.h
    include/llvm/IR/DebugLoc.h
    include/llvm/IR/Metadata.h
    include/llvm/IR/Value.h
    lib/AsmParser/LLLexer.cpp
    lib/AsmParser/LLParser.cpp
    lib/AsmParser/LLParser.h
    lib/AsmParser/LLToken.h
    lib/Bitcode/Reader/BitcodeReader.cpp
    lib/Bitcode/Writer/BitcodeWriter.cpp
    lib/Bitcode/Writer/ValueEnumerator.cpp
    lib/Bitcode/Writer/ValueEnumerator.h
    lib/IR/AsmWriter.cpp
    lib/IR/AsmWriter.h
    lib/IR/DebugInfo.cpp
    lib/IR/DebugLoc.cpp
    lib/IR/LLVMContextImpl.cpp
    lib/IR/LLVMContextImpl.h
    lib/IR/Metadata.cpp

Using `Value` instead of `MDNode`
================================
A number of APIs expect `MDNode` -- previously, the only referenceable
type of metadata -- but this patch (and the ones that will follow) have
referenceable metadata that *do not* inherit from `MDNode`.  Metadata
APIs such as `Instruction::getMetadata()` and
`NamedMDNode::getOperand()` need to return non-`MDNode` metadata.

I plan to commit the API changes incrementally so we can fix any issues
there before pushing the functionality changes.  Unfortunately, this
currently adds a lot of noise to the (squashed) patch.

Introducing `MDLocation`
=======================
Of course, this adds `MDLocation`, the first subclass of `MDUser`.  This
is a first-class IR type that has two other representations:
`DILocation` (which now trivially wraps `MDLocation` instead of
`MDNode`) and `DebugLoc`.

I've genericised the code in `LLParser` (and elsewhere) to sketch out
how adding other `MDUser` subclasses will go.  Perhaps I used the wrong
axis, but we can adjust it as we go.

Usage examples:

    !6 = metadata MDLocation(line: 43, column: 7, scope: !4)
    !7 = metadata MDLocation(scope: !5, line: 67, inlinedAt: !6)

The fields can be listed in any order.  The `scope:` field is required,
but the others are optional (`line:` and `column:` default to `0`,
`inlinedAt:` defaults to `null`).

(Note that in the RFC I referred to this as an `MDLineTable`, but
`MDLocation` is a better name.  If/when this work supersedes the
`DIDescriptor` hierarchy, it'll likely get renamed to `DILocation`, but
for now there's a name clash.)

Where this is heading
====================
Let's look at a concrete example.  Here's some simple C++ code:

    $ cat t.h
    struct T { short a; short b; };
    $ cat foo.cpp
    #include "t.h"
    int foo(T t) { return t.a + t.b; }
    $ cat bar.cpp
    #include "t.h"
    int foo(T t);
    int bar(T t) { return foo(t) * 2; }

Looking forward, after refactoring ownership and uniquing and fixing up
a few schema issues, I'd expect the above to link into something like
the following:

    !0 = metadata DIFile(filename: "foo.cpp", directory:
"/path/to")
    !1 = metadata DIFile(filename: "./t.h", directory:
"/path/to")
    !2 = metadata DIFile(filename: "bar.cpp", directory:
"/path/to")
    !3 = metadata DIBaseType(name: "short", size: 16, align: 16)
    !5 = metadata DIBaseType(name: "int", size: 32, align: 32)
    !6 = metadata DICompositeType(tag: 0x13, name: "T", uniqued:
"_ZTS1T",
                                  file: !1, line: 1, size: 32, align: 16)
    !7 = metadata DIMember(line: 1, file: !1, type: !3,
                           name: "a", size: 16, align: 16, context:
!6)
    !8 = metadata DIMember(line: 1, file: !1, type: !3,
                           name: "b", size: 16, align: 16, context:
!6)
    !9 = metadata DISubroutineType(args: [ !5, !6 ])
    !10 = metadata DICompileUnit(file: !0, language: 4, kind: FullDebug,
                                 producer: "clang version 3.6.0 ",
                                 retainedUniqueTypes: [ !6 ])
    !11 = metadata DISubprogram(name: "foo", linkageName:
"_Z3foo1T",
                                handle: i32(i32)* @_Z3foo1T, file: !0,
                                type: !9, context: !10)
    !12 = metadata DIArgVariable(name: "t", arg: 1, line: 2, type: !6,
                                 context: !11)
    !13 = metadata DILocation(line: 2, column: 11, scope: !11)
    !14 = metadata DILocation(line: 2, column: 16, scope: !11)
    !15 = metadata DICompileUnit(file: !2, language: 4, kind: FullDebug,
                                 producer: "clang version 3.6.0 ",
                                 retainedUniqueTypes: [ !6 ])
    !16 = metadata DISubprogram(name: "bar", linkageName:
"_Z3bar1T",
                                handle: i32 (i32)* @_Z3bar1T, file: !2,
                                type: !9, context: !15)
    !17 = metadata DIArgVariable(name: "t", arg: 2, line: 3, type: !6,
                                 context: !16)
    !18 = metadata DILocation(line: 3, column: 11, scope: !16)
    !19 = metadata DILocation(line: 3, column: 23, scope: !16)

Notice that only the links to parents (i.e., `context:`) are explicit
here -- backlinks are implied.  For example, !7 and !8 point to !6, but
not the reverse.

This has the interesting property of removing all cycles from
serialization (assembly and bitcode).

Making debug info assembly readable and writable
===============================================
Moreover, we're now in a place where it's trivial to express the
"context" pointer structurally.  Here's the same debug info as
above,
using syntactic sugar to fill the "context" pointers:

    !0 = metadata DIFile(filename: "foo.cpp", directory:
"/path/to")
    !1 = metadata DIFile(filename: "./t.h", directory:
"/path/to")
    !2 = metadata DIFile(filename: "bar.cpp", directory:
"/path/to")
    !3 = metadata DIBaseType(name: "short", size: 16, align: 16)
    !5 = metadata DIBaseType(name: "int", size: 32, align: 32)
    !6 = metadata DICompositeType(tag: 0x13, name: "T", uniqued:
"_ZTS1T",
                                  file: !1, line: 1, size: 32, align: 16) {
      !7 = metadata DIMember(line: 1, file: !1, type: !3,
                             name: "a", size: 16, align: 16)
      !8 = metadata DIMember(line: 1, file: !1, type: !3,
                             name: "b", size: 16, align: 16)
    } ; !6
    !9 = metadata DISubroutineType(args: [ !5, !6 ])
    !10 = metadata DICompileUnit(file: !0, language: 4, kind: FullDebug,
                                 producer: "clang version 3.6.0 ",
                                 retainedUniqueTypes: [ !6 ]) {
      !11 = metadata DISubprogram(name: "foo", linkageName:
"_Z3foo1T",
                                  handle: i32(i32)* @_Z3foo1T, file: !0,
                                  type: !9) {
        !12 = metadata DIArgVariable(name: "t", arg: 1, line: 2, type:
!6)
        !13 = metadata DILocation(line: 2, column: 11)
        !14 = metadata DILocation(line: 2, column: 16)
      } ; !11
    } ; !10
    !15 = metadata DICompileUnit(file: !2, language: 4, kind: FullDebug,
                                 producer: "clang version 3.6.0 ",
                                 retainedUniqueTypes: [ !6 ]) {
      !16 = metadata DISubprogram(name: "bar", linkageName:
"_Z3bar1T",
                                  handle: i32 (i32)* @_Z3bar1T, file: !2,
                                  type: !9) {
        !17 = metadata DIArgVariable(name: "t", arg: 2, line: 3, type:
!6)
        !18 = metadata DILocation(line: 3, column: 11)
        !19 = metadata DILocation(line: 3, column: 23)
      } ; !16
    } ; !15

This assembly has the following advantages over the status quo:

  - Fields are named.  Aside from readability, this prevents
    adding/reordering fields in the schema from requiring testcase
    updates.

  - Serialization graph becomes a DAG.  Aside from readability, this
    removes most RAUW from assembly (and all RAUW from bitcode).

  - Structure is clear.

Bike sheds to paint
==================
 1. Should we trim some boilerplate?  E.g., it would be trivial to
    change:

        !6 = metadata MDLocation(line: 43, column: 7, scope: !4)

    to:

        !6 = MDLocation(line: 43, column: 7, scope: !4)

    This would not complicate `LLParser`.  Thoughts?

 2. Which of the two "end goal" syntaxes is better: flat, or
    hierarchical?  Better for what?  Why?
    
    The flat one might be better for FileCheck-ing (not sure), but IMO
    the hierarchical one is much saner for us humans, and that's the
    main point of assembly.  It wouldn't be hard to default to one and
    write the other based on a command-line flag -- is that a good idea?

 3. Assembly syntax is pretty easy to change, so this doesn't have to be
    perfect now.  Nevertheless, is there a magical syntax that would be
    easier to read/write/FileCheck?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: MDLocation-preview.patch
Type: application/octet-stream
Size: 608893 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141024/2296c85c/attachment.obj>
-------------- next part --------------

David Blaikie

2014-Oct-24 23:37 UTC

head link

[LLVMdev] First-class debug info IR: MDLocation

On Fri, Oct 24, 2014 at 4:16 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:
> I've attached a preliminary patch for `MDLocation` as a follow-up to
the
> RFC [1] last week.  It's not commit-ready -- in particular, it squashes
> a bunch of commits together and doesn't pass `make check` -- but I
think
> it's close enough to indicate the direction and work toward consensus.
>
> [1]: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-October/077715.html
>
> IMO, the files to focus on are:
>
>     include/llvm/IR/DebugInfo.h
>     include/llvm/IR/DebugLoc.h
>     include/llvm/IR/Metadata.h
>     include/llvm/IR/Value.h
>     lib/AsmParser/LLLexer.cpp
>     lib/AsmParser/LLParser.cpp
>     lib/AsmParser/LLParser.h
>     lib/AsmParser/LLToken.h
>     lib/Bitcode/Reader/BitcodeReader.cpp
>     lib/Bitcode/Writer/BitcodeWriter.cpp
>     lib/Bitcode/Writer/ValueEnumerator.cpp
>     lib/Bitcode/Writer/ValueEnumerator.h
>     lib/IR/AsmWriter.cpp
>     lib/IR/AsmWriter.h
>     lib/IR/DebugInfo.cpp
>     lib/IR/DebugLoc.cpp
>     lib/IR/LLVMContextImpl.cpp
>     lib/IR/LLVMContextImpl.h
>     lib/IR/Metadata.cpp
>
> Using `Value` instead of `MDNode`
> ================================>
> A number of APIs expect `MDNode` -- previously, the only referenceable
> type of metadata -- but this patch (and the ones that will follow) have
> referenceable metadata that *do not* inherit from `MDNode`.  Metadata
> APIs such as `Instruction::getMetadata()` and
> `NamedMDNode::getOperand()` need to return non-`MDNode` metadata.
>
> I plan to commit the API changes incrementally so we can fix any issues
> there before pushing the functionality changes.  Unfortunately, this
> currently adds a lot of noise to the (squashed) patch.
>
> Introducing `MDLocation`
> =======================>
> Of course, this adds `MDLocation`, the first subclass of `MDUser`.  This
> is a first-class IR type that has two other representations:
> `DILocation` (which now trivially wraps `MDLocation` instead of
> `MDNode`) and `DebugLoc`.
>
> I've genericised the code in `LLParser` (and elsewhere) to sketch out
> how adding other `MDUser` subclasses will go.  Perhaps I used the wrong
> axis, but we can adjust it as we go.
>
> Usage examples:
>
>     !6 = metadata MDLocation(line: 43, column: 7, scope: !4)
>     !7 = metadata MDLocation(scope: !5, line: 67, inlinedAt: !6)
>
> The fields can be listed in any order.  The `scope:` field is required,
> but the others are optional (`line:` and `column:` default to `0`,
> `inlinedAt:` defaults to `null`).
>
> (Note that in the RFC I referred to this as an `MDLineTable`, but
> `MDLocation` is a better name.  If/when this work supersedes the
> `DIDescriptor` hierarchy, it'll likely get renamed to `DILocation`, but
> for now there's a name clash.)
>
> Where this is heading
> ====================>
> Let's look at a concrete example.  Here's some simple C++ code:
>
>     $ cat t.h
>     struct T { short a; short b; };
>     $ cat foo.cpp
>     #include "t.h"
>     int foo(T t) { return t.a + t.b; }
>     $ cat bar.cpp
>     #include "t.h"
>     int foo(T t);
>     int bar(T t) { return foo(t) * 2; }
>
> Looking forward, after refactoring ownership and uniquing and fixing up
> a few schema issues, I'd expect the above to link into something like
> the following:
>
>     !0 = metadata DIFile(filename: "foo.cpp", directory:
"/path/to")
>     !1 = metadata DIFile(filename: "./t.h", directory:
"/path/to")
>     !2 = metadata DIFile(filename: "bar.cpp", directory:
"/path/to")
>     !3 = metadata DIBaseType(name: "short", size: 16, align: 16)
>     !5 = metadata DIBaseType(name: "int", size: 32, align: 32)
>     !6 = metadata DICompositeType(tag: 0x13, name: "T", uniqued:
"_ZTS1T",
>                                   file: !1, line: 1, size: 32, align: 16)
>     !7 = metadata DIMember(line: 1, file: !1, type: !3,
>                            name: "a", size: 16, align: 16,
context: !6)
>     !8 = metadata DIMember(line: 1, file: !1, type: !3,
>                            name: "b", size: 16, align: 16,
context: !6)
>     !9 = metadata DISubroutineType(args: [ !5, !6 ])
>     !10 = metadata DICompileUnit(file: !0, language: 4, kind: FullDebug,
>                                  producer: "clang version 3.6.0
",
>                                  retainedUniqueTypes: [ !6 ])
>     !11 = metadata DISubprogram(name: "foo", linkageName:
"_Z3foo1T",
>                                 handle: i32(i32)* @_Z3foo1T, file: !0,
>                                 type: !9, context: !10)
>     !12 = metadata DIArgVariable(name: "t", arg: 1, line: 2,
type: !6,
>                                  context: !11)
>     !13 = metadata DILocation(line: 2, column: 11, scope: !11)
>     !14 = metadata DILocation(line: 2, column: 16, scope: !11)
>     !15 = metadata DICompileUnit(file: !2, language: 4, kind: FullDebug,
>                                  producer: "clang version 3.6.0
",
>                                  retainedUniqueTypes: [ !6 ])
>     !16 = metadata DISubprogram(name: "bar", linkageName:
"_Z3bar1T",
>                                 handle: i32 (i32)* @_Z3bar1T, file: !2,
>                                 type: !9, context: !15)
>     !17 = metadata DIArgVariable(name: "t", arg: 2, line: 3,
type: !6,
>                                  context: !16)
>     !18 = metadata DILocation(line: 3, column: 11, scope: !16)
>     !19 = metadata DILocation(line: 3, column: 23, scope: !16)
>
> Notice that only the links to parents (i.e., `context:`) are explicit
> here -- backlinks are implied.  For example, !7 and !8 point to !6, but
> not the reverse.
>
This may be a problem - the difference between nodes in a
structure/class_type's member list, and those that are not in the member
list but refer to the class/structure as their parent is meaningful. Type
units use this distinction to avoid emitting instantiation-specific data
into the canonical type unit (nested classes, implicit special members,
member template instantiations, etc).

Not sure what the right answer will be there.

>
> This has the interesting property of removing all cycles from
> serialization (assembly and bitcode).
>
> Making debug info assembly readable and writable
> ===============================================>
> Moreover, we're now in a place where it's trivial to express the
> "context" pointer structurally.  Here's the same debug info
as above,
> using syntactic sugar to fill the "context" pointers:
>
>     !0 = metadata DIFile(filename: "foo.cpp", directory:
"/path/to")
>     !1 = metadata DIFile(filename: "./t.h", directory:
"/path/to")
>     !2 = metadata DIFile(filename: "bar.cpp", directory:
"/path/to")
>     !3 = metadata DIBaseType(name: "short", size: 16, align: 16)
>     !5 = metadata DIBaseType(name: "int", size: 32, align: 32)
>     !6 = metadata DICompositeType(tag: 0x13, name: "T", uniqued:
"_ZTS1T",
>                                   file: !1, line: 1, size: 32, align: 16) {
>       !7 = metadata DIMember(line: 1, file: !1, type: !3,
>                              name: "a", size: 16, align: 16)
>       !8 = metadata DIMember(line: 1, file: !1, type: !3,
>                              name: "b", size: 16, align: 16)
>     } ; !6
>     !9 = metadata DISubroutineType(args: [ !5, !6 ])
>     !10 = metadata DICompileUnit(file: !0, language: 4, kind: FullDebug,
>                                  producer: "clang version 3.6.0
",
>                                  retainedUniqueTypes: [ !6 ]) {
>       !11 = metadata DISubprogram(name: "foo", linkageName:
"_Z3foo1T",
>                                   handle: i32(i32)* @_Z3foo1T, file: !0,
>                                   type: !9) {
>         !12 = metadata DIArgVariable(name: "t", arg: 1, line: 2,
type: !6)
>         !13 = metadata DILocation(line: 2, column: 11)
>         !14 = metadata DILocation(line: 2, column: 16)
>       } ; !11
>     } ; !10
>     !15 = metadata DICompileUnit(file: !2, language: 4, kind: FullDebug,
>                                  producer: "clang version 3.6.0
",
>                                  retainedUniqueTypes: [ !6 ]) {
>       !16 = metadata DISubprogram(name: "bar", linkageName:
"_Z3bar1T",
>                                   handle: i32 (i32)* @_Z3bar1T, file: !2,
>                                   type: !9) {
>         !17 = metadata DIArgVariable(name: "t", arg: 2, line: 3,
type: !6)
>         !18 = metadata DILocation(line: 3, column: 11)
>         !19 = metadata DILocation(line: 3, column: 23)
>       } ; !16
>     } ; !15
>
> This assembly has the following advantages over the status quo:
>
>   - Fields are named.  Aside from readability, this prevents
>     adding/reordering fields in the schema from requiring testcase
>     updates.
>
>   - Serialization graph becomes a DAG.  Aside from readability, this
>     removes most RAUW from assembly (and all RAUW from bitcode).
>
>   - Structure is clear.
>
> Bike sheds to paint
> ==================>
>  1. Should we trim some boilerplate?  E.g., it would be trivial to
>     change:
>
>         !6 = metadata MDLocation(line: 43, column: 7, scope: !4)
>
>     to:
>
>         !6 = MDLocation(line: 43, column: 7, scope: !4)
>
>     This would not complicate `LLParser`.  Thoughts?
>
>  2. Which of the two "end goal" syntaxes is better: flat, or
>     hierarchical?  Better for what?  Why?
>
>     The flat one might be better for FileCheck-ing (not sure), but IMO
>     the hierarchical one is much saner for us humans, and that's the
>     main point of assembly.  It wouldn't be hard to default to one and
>     write the other based on a command-line flag -- is that a good idea?
>
I don't think the flat form will be particularly more compelling for
testing. We test the hierarchical DWARF dumping all the time - doing so
pedantically does require littering CHECK-NOT: DW_TAG|NULL in a fair few
places, but even without those the tests aren't /too/ bad.

If checking hierarchical data is a problem, chances are we just want to
improve FileCheck until it isn't, since we already have hierarchies we have
to check.

>
>  3. Assembly syntax is pretty easy to change, so this doesn't have to
be
>     perfect now.  Nevertheless, is there a magical syntax that would be
>     easier to read/write/FileCheck?
>
Presumably when dumping the fields will come in a specific, defined order?
(probably not preserving the original source order, etc) Variation there
would probably make checking harder than it needs to be.

Would it be possible to omit the names of unreferenced nested metadata? (if
you have a bunch of member functions in a class, but don't need to refer to
them elsewhere (eg: those member functions aren't defined in this
translation unit)) - that'd help readability/writeability, but probably
wouldn't impact FileCheck.

Also the whole "metadata " prefix on anything is a bit verbose, if we
can
omit that in some/many places, that'll help reduce the visual noise/improve
readability/writeability.

But most/all of that I imagine is fairly easily incremental
change/improvement, nothing fundamental that needs to be chosen up-front.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141024/358bbd69/attachment.html>

Rafael Espíndola

2014-Oct-27 02:30 UTC

head link

[LLVMdev] First-class debug info IR: MDLocation

> Bike sheds to paint
> ==================>
>  1. Should we trim some boilerplate?  E.g., it would be trivial to
>     change:
>
>         !6 = metadata MDLocation(line: 43, column: 7, scope: !4)
>
>     to:
>
>         !6 = MDLocation(line: 43, column: 7, scope: !4)
>
>     This would not complicate `LLParser`.  Thoughts?
I do prefer the simpler version. There is not a lot of value to saying
"metadata" all the time.
>  2. Which of the two "end goal" syntaxes is better: flat, or
>     hierarchical?  Better for what?  Why?
>
>     The flat one might be better for FileCheck-ing (not sure), but IMO
>     the hierarchical one is much saner for us humans, and that's the
>     main point of assembly.  It wouldn't be hard to default to one and
>     write the other based on a command-line flag -- is that a good idea?
We use an hierarchical structure in llvm-readobj, and it works really
well with FileCheck, so I would say hierarchical.
>  3. Assembly syntax is pretty easy to change, so this doesn't have to
be
>     perfect now.  Nevertheless, is there a magical syntax that would be
>     easier to read/write/FileCheck?
Cheers,
Rafael

Chandler Carruth

2014-Oct-27 07:58 UTC

head link

[LLVMdev] First-class debug info IR: MDLocation

I haven't been able to follow all of the thread that got us here but your
patch below has distilled the result enough for me to at least ask
questions.

I'm sorry of some of the justification is buried in the thread and I'm
just
making you repeat it, but I suspect I'm not the only one that would benefit
from the rationale being summarized here.

On Fri, Oct 24, 2014 at 4:16 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:
> Using `Value` instead of `MDNode`
> ================================>
> A number of APIs expect `MDNode` -- previously, the only referenceable
> type of metadata -- but this patch (and the ones that will follow) have
> referenceable metadata that *do not* inherit from `MDNode`.  Metadata
> APIs such as `Instruction::getMetadata()` and
> `NamedMDNode::getOperand()` need to return non-`MDNode` metadata.
>
To me, this change is a red flag and points out a bit of a lie in the
subject line: this is not actually first-class debug-info IR. This is just
making debug info become special metadata with special encoding properties.

Note, I'm actually ok with us having special metadata that has special
encoding properties. But if we're going that route, I don't think that
there is anything "debug info" centric about it, and it shouldn't
be
described as such. I also think the relationship of MDUser, MDNode, and
MDString need to be clarified a great deal. Why doesn't getMetadata return
an 'MDUser*' for example?

It feels as though you really want to sink the current functionality of
MDNode down to some subclass of a more generic metadata IR type? Maybe I'm
misunderstanding?

I also have to ask because I can't currently see it: what does debug info
being metadata buy us? How much code is simplified by that, and at what
cost?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141027/d1679aa7/attachment.html>

Chandler Carruth

2014-Oct-27 08:19 UTC

head link

[LLVMdev] First-class debug info IR: MDLocation

Separate reply as the topics seem to have very little in common...

On Fri, Oct 24, 2014 at 4:16 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:
> Making debug info assembly readable and writable
> ===============================================>
> Moreover, we're now in a place where it's trivial to express the
> "context" pointer structurally.  Here's the same debug info
as above,
> using syntactic sugar to fill the "context" pointers:
>
FWIW, this doesn't make a huge difference to me in terms of readability
other than avoiding ordering problems....

Bike sheds to paint> ==================>
>  1. Should we trim some boilerplate?  E.g., it would be trivial to
>     change:
>
>         !6 = metadata MDLocation(line: 43, column: 7, scope: !4)
>
>     to:
>
>         !6 = MDLocation(line: 43, column: 7, scope: !4)
>
>     This would not complicate `LLParser`.  Thoughts?
>
If it is metadata, it should use 'metadata'. The boiler plate isn't
the
problem here IMO.

>  2. Which of the two "end goal" syntaxes is better: flat, or
>     hierarchical?  Better for what?  Why?
>
Some points that might simplify things:

- The largest overhead left for humans (once the fields are named and
semantically de-obfuscated) are IMO: lack of symbolic constants from DWARF
and the lack of locality with the referencing instruction.

- I think there is likely a better inflection point in the tradeoff space
between normalization and duplication. For example, I would be happy to see
line and column repeated for every instruction *on* the instruction. Every
time you save the reader an indirection through some "!019243" (which
is
totally unremarkable and hard to track) you win unless the size of the
input is greatly changed.

- The conflict between humans and FileCheck is not as bad as I think you
imagine. We have well established techniques for handling cases where what
the IR contains isn't useful for FileCheck, and/or what would be useful for
FileCheck is terribly cumbersome to write. We have the printer inject
comments which can then be used in FileCheck. I think the same technique
would tremendously help both human *readers* (but not writers) and
FileCheck with location information. My suggestion: print out a comment
line of the form "# location: ..." for all the indirected information
every
time that indirected information changes, and printed *before* the first
instruction with the new indirected information. If both line or column are
attached directly to the instruction, this gives just a comment at the
start of each function and after each file change within the function body.
Enough to form reasonably bracketed FileCheck but not overly verbose I
suspect.

- Once you've had an indirection from the actual IR structure to the
debuginfo structure embedded down in the metadata, I agree that a
structural form looks better.

Hope these thoughts help formulate even better looking IR. Not specifically
trying to change any part of this patch or any other single patch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141027/bd2f8faa/attachment.html>

Duncan P. N. Exon Smith

2014-Oct-27 16:22 UTC

head link

[LLVMdev] First-class debug info IR: MDLocation

> On 2014-Oct-24, at 16:37, David Blaikie <dblaikie at gmail.com>
wrote:
> 
>> Notice that only the links to parents (i.e., `context:`) are explicit
>> here -- backlinks are implied.  For example, !7 and !8 point to !6, but
>> not the reverse.
> 
> This may be a problem - the difference between nodes in a
structure/class_type's member list, and those that are not in the member
list but refer to the class/structure as their parent is meaningful. Type units
use this distinction to avoid emitting instantiation-specific data into the
canonical type unit (nested classes, implicit special members, member template
instantiations, etc).
> 
> Not sure what the right answer will be there.
Interesting.  Could we model that with a Boolean flag on the child?

Also, would there be value in modelling type units more explicitly in
the IR?
> Presumably when dumping the fields will come in a specific, defined order?
(probably not preserving the original source order, etc) Variation there would
probably make checking harder than it needs to be.
Right.
> 
> Would it be possible to omit the names of unreferenced nested metadata? (if
you have a bunch of member functions in a class, but don't need to refer to
them elsewhere (eg: those member functions aren't defined in this
translation unit)) - that'd help readability/writeability, but probably
wouldn't impact FileCheck.
Certainly possible, but is it generally desirable?  I guess we'll
sort that out when I get there.
> 
> Also the whole "metadata " prefix on anything is a bit verbose,
if we can omit that in some/many places, that'll help reduce the visual
noise/improve readability/writeability.
> 
> But most/all of that I imagine is fairly easily incremental
change/improvement, nothing fundamental that needs to be chosen up-front.

Duncan P. N. Exon Smith

2014-Oct-27 17:49 UTC

head link

[LLVMdev] First-class debug info IR: MDLocation

> On 2014-Oct-27, at 00:58, Chandler Carruth <chandlerc at google.com>
wrote:
> 
> I haven't been able to follow all of the thread that got us here but
your patch below has distilled the result enough for me to at least ask
questions.
Always takes a patch to draw people in :).
> I'm sorry of some of the justification is buried in the thread and
I'm just making you repeat it, but I suspect I'm not the only one that
would benefit from the rationale being summarized here.
> 
> On Fri, Oct 24, 2014 at 4:16 PM, Duncan P. N. Exon Smith <dexonsmith at
apple.com> wrote:
> Using `Value` instead of `MDNode`
> ================================> 
> A number of APIs expect `MDNode` -- previously, the only referenceable
> type of metadata -- but this patch (and the ones that will follow) have
> referenceable metadata that *do not* inherit from `MDNode`.  Metadata
> APIs such as `Instruction::getMetadata()` and
> `NamedMDNode::getOperand()` need to return non-`MDNode` metadata.
> 
> To me, this change is a red flag
This bothers me too -- which is why I highlighted it -- but I don't
see any alternatives.  It seems like a natural fallout of the rest of
the proposal.
> and points out a bit of a lie in the subject line: this is not actually
first-class debug-info IR. This is just making debug info become special
metadata with special encoding properties.
How special does it have to be to be labeled "first-class"?

IMO, the label makes sense here:  custom C++ type, bitcode, assembly,
uniquing, and ownership.  Doesn't seem any less "special" than,
say,
`AddInst`, but I don't really care what we call it.

Have I missed your point?  (Are you suggesting metadata is inherently
second-class?  How so?)
> Note, I'm actually ok with us having special metadata that has special
encoding properties. But if we're going that route, I don't think that
there is anything "debug info" centric about it, and it shouldn't
be described as such.
The infrastructure won't be debug info centric, but the scope of the
project certainly is.

I'll be making "first-class" types for each type of debug info we
use,
and changing their schema, ownership, and uniquing along the way.

I'm not going to touch any other metadata, although certainly if we
find that (e.g.) profile metadata has become a source of pain, we
could customize it in the future.
> I also think the relationship of MDUser, MDNode, and MDString need to be
clarified a great deal. Why doesn't getMetadata return an 'MDUser*'
for example?
In the class hierarchy:

  - Value -> MDString
  - Value -> MDNode
  - Value -> NamedMDNode
  - Value -> User -> MDUser -> ...

I named `MDUser` by its relationship with `User`, but maybe a better
name is `CustomMD`?  (Any suggestions?)

Here's a breakdown:

  - `MDString` is an arbitrary string that can be used as an operand.
    It's owned by an `LLVMContext` and treated like a constant.

  - `MDNode` is a generic node with arbitrary operands.  It can
    itself be used as an operand, and it can be attached to
    `Instruction`s.  Its operands don't "know" that it's using
them.
    It's owned by an `LLVMContext` and treated like a constant.

  - `NamedMDNode` is a generic node with arbitrary operands.  It cannot
    be used as an operand, and its operands must all be `MDNode` (or
    `MDUser`/`CustomMD`).  It's owned by a `Module`.

  - `MDUser`/`CustomMD` is a parent class for specific types of
    metadata.  They can be used as operands and attached to
    `Instruction`s.  Their metadata operands "know" that they're
being
    used, but they may have handles to non-metadata (which don't know).
    Their ownership is customized per subclass.

Also relevant (and kind of implied by the above): `Instruction` can
have arbitrary metadata attached to it, but it must be an `MDNode` (or
an `MDUser`/`CustomMD`).
> It feels as though you really want to sink the current functionality of
MDNode down to some subclass of a more generic metadata IR type? Maybe I'm
misunderstanding?
IMO, the overlap in functionality between `User` and `MDNode` precludes
having a (sane) inheritance relationship between `MDNode` and `MDUser`/
`CustomMD`.

Both `User` and `MDNode` implement support for an arbitrary number of
operands, but using completely incompatible mechanisms.

Since `MDNode` can reference an arbitrary number of arbitrary `Value`s,
it uses a subclass of `CallbackVH` called `MDNodeOperand` that costs
32B (x86-64).  Moreover, RAUW is expensive.

`MDUser`/`CustomMD` inherits from `User` so that its subclasses can
leverage the use-list infrastructure (8B per operand, fast RAUW).
> I also have to ask because I can't currently see it: what does debug
info being metadata buy us?
I suppose it buys us:

  - the guarantee that debug info doesn't modify optimizations or code
    generation, and

  - the flexibility for optimizations to ignore/drop it when they're
    not smart enough to update it.
> How much code is simplified by that, and at what cost?
I think that's hard to quantify.

I suppose the obvious alternative is to rewrite debug info from the
ground up, without using metadata at all.  I considered this, but
didn't find a compelling argument for it.  Main arguments against it:
it would be harder to implement incrementally, and it would increase
the amount of non-code IR.

Moreover, once we have specific subclasses and bitcode support for
debug info types, moving away from metadata (or even the `Value`
hierarchy entirely) would be an incremental step.

Do you have any specific alternatives mind?

Duncan P. N. Exon Smith

2014-Oct-27 18:13 UTC

head link

[LLVMdev] First-class debug info IR: MDLocation

> On 2014-Oct-27, at 01:19, Chandler Carruth <chandlerc at google.com>
wrote:
> 
> Separate reply as the topics seem to have very little in common...
> 
> On Fri, Oct 24, 2014 at 4:16 PM, Duncan P. N. Exon Smith <dexonsmith at
apple.com> wrote:
> Making debug info assembly readable and writable
> ===============================================> 
> Moreover, we're now in a place where it's trivial to express the
> "context" pointer structurally.  Here's the same debug info
as above,
> using syntactic sugar to fill the "context" pointers:
> 
> FWIW, this doesn't make a huge difference to me in terms of readability
other than avoiding ordering problems....
> 
> Bike sheds to paint
> ==================> 
>  1. Should we trim some boilerplate?  E.g., it would be trivial to
>     change:
> 
>         !6 = metadata MDLocation(line: 43, column: 7, scope: !4)
> 
>     to:
> 
>         !6 = MDLocation(line: 43, column: 7, scope: !4)
> 
>     This would not complicate `LLParser`.  Thoughts?
> 
> If it is metadata, it should use 'metadata'.
FWIW, `metadata` is implied by the context of `!6 =`.  I don't
see this as different from dropping `metadata` from the reference,
`!dbg !6`.
> The boiler plate isn't the problem here IMO.
Certainly not the biggest problem :).
>  2. Which of the two "end goal" syntaxes is better: flat, or
>     hierarchical?  Better for what?  Why?
> 
> Some points that might simplify things:
> 
> - The largest overhead left for humans (once the fields are named and
semantically de-obfuscated) are IMO: lack of symbolic constants from DWARF
Good point.  It wouldn't be hard to support proper names here
(like `DW_TAG_structure_type` instead of the `0x13` in my
example).
> and the lack of locality with the referencing instruction.
> 
> - I think there is likely a better inflection point in the tradeoff space
between normalization and duplication. For example, I would be happy to see line
and column repeated for every instruction *on* the instruction. Every time you
save the reader an indirection through some "!019243" (which is
totally unremarkable and hard to track) you win unless the size of the input is
greatly changed.
> 
> - The conflict between humans and FileCheck is not as bad as I think you
imagine. We have well established techniques for handling cases where what the
IR contains isn't useful for FileCheck, and/or what would be useful for
FileCheck is terribly cumbersome to write. We have the printer inject comments
which can then be used in FileCheck. I think the same technique would
tremendously help both human *readers* (but not writers) and FileCheck with
location information. My suggestion: print out a comment line of the form
"# location: ..." for all the indirected information every time that
indirected information changes, and printed *before* the first instruction with
the new indirected information. If both line or column are attached directly to
the instruction, this gives just a comment at the start of each function and
after each file change within the function body. Enough to form reasonably
bracketed FileCheck but not overly verbose I suspect.
Interesting idea!  This would be easy to do.
> 
> - Once you've had an indirection from the actual IR structure to the
debuginfo structure embedded down in the metadata, I agree that a structural
form looks better.
> 
> Hope these thoughts help formulate even better looking IR. Not specifically
trying to change any part of this patch or any other single patch.

Eric Christopher

2014-Nov-05 23:49 UTC

head link

[LLVMdev] First-class debug info IR: MDLocation

>
>
> Making debug info assembly readable and writable
> ===============================================>
> Moreover, we're now in a place where it's trivial to express the
> "context" pointer structurally.  Here's the same debug info
as above,
> using syntactic sugar to fill the "context" pointers:
>
>     !0 = metadata DIFile(filename: "foo.cpp", directory:
"/path/to")
>     !1 = metadata DIFile(filename: "./t.h", directory:
"/path/to")
>     !2 = metadata DIFile(filename: "bar.cpp", directory:
"/path/to")
>     !3 = metadata DIBaseType(name: "short", size: 16, align: 16)
>     !5 = metadata DIBaseType(name: "int", size: 32, align: 32)
>     !6 = metadata DICompositeType(tag: 0x13, name: "T", uniqued:
"_ZTS1T",
>                                   file: !1, line: 1, size: 32, align: 16) {
>       !7 = metadata DIMember(line: 1, file: !1, type: !3,
>                              name: "a", size: 16, align: 16)
>       !8 = metadata DIMember(line: 1, file: !1, type: !3,
>                              name: "b", size: 16, align: 16)
>     } ; !6
>     !9 = metadata DISubroutineType(args: [ !5, !6 ])
>     !10 = metadata DICompileUnit(file: !0, language: 4, kind: FullDebug,
>                                  producer: "clang version 3.6.0
",
>                                  retainedUniqueTypes: [ !6 ]) {
>       !11 = metadata DISubprogram(name: "foo", linkageName:
"_Z3foo1T",
>                                   handle: i32(i32)* @_Z3foo1T, file: !0,
>                                   type: !9) {
>         !12 = metadata DIArgVariable(name: "t", arg: 1, line: 2,
type: !6)
>         !13 = metadata DILocation(line: 2, column: 11)
>         !14 = metadata DILocation(line: 2, column: 16)
>       } ; !11
>     } ; !10
>     !15 = metadata DICompileUnit(file: !2, language: 4, kind: FullDebug,
>                                  producer: "clang version 3.6.0
",
>                                  retainedUniqueTypes: [ !6 ]) {
>       !16 = metadata DISubprogram(name: "bar", linkageName:
"_Z3bar1T",
>                                   handle: i32 (i32)* @_Z3bar1T, file: !2,
>                                   type: !9) {
>         !17 = metadata DIArgVariable(name: "t", arg: 2, line: 3,
type: !6)
>         !18 = metadata DILocation(line: 3, column: 11)
>         !19 = metadata DILocation(line: 3, column: 23)
>       } ; !16
>     } ; !15
>
> This assembly has the following advantages over the status quo:
>
>   - Fields are named.  Aside from readability, this prevents
>     adding/reordering fields in the schema from requiring testcase
>     updates.
>
>   - Serialization graph becomes a DAG.  Aside from readability, this
>     removes most RAUW from assembly (and all RAUW from bitcode).
>
>   - Structure is clear.
>
One more comment on the thread: I really like this syntax :)

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141105/3952c32f/attachment.html>

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Oct 2014 - [LLVMdev] First-class debug info IR: MDLocation

[LLVMdev] First-class debug info IR: MDLocation

[LLVMdev] First-class debug info IR: MDLocation

[LLVMdev] First-class debug info IR: MDLocation

[LLVMdev] First-class debug info IR: MDLocation

[LLVMdev] First-class debug info IR: MDLocation

[LLVMdev] First-class debug info IR: MDLocation

[LLVMdev] First-class debug info IR: MDLocation

[LLVMdev] First-class debug info IR: MDLocation

[LLVMdev] First-class debug info IR: MDLocation

Seemingly Similar Threads