thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] RFC: CodeView debug info emission in Clang/LLVM [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Zachary Turner via llvm-dev

2015-Oct-31 22:07 UTC

[llvm-dev] [cfe-dev] RFC: CodeView debug info emission in Clang/LLVM

Definitely having someone who knows both formats well would be an
advantage.  Dave B might be in the best position to do this, so hopefully
he can provide a couple more examples of areas where he has trouble
expressing CV information entirely in the backend.

Regardless of what everyone ends up deciding on with regards to the
front-end / back-discussion, I want to suggest separating the work into
separate pieces that can go in independently of each other.

For example, the proposed LLVMCodeView library, which simply reads and
writes raw CV records, seems to be orthogonal to this discussion and could
be submitted independently.

On Sat, Oct 31, 2015 at 12:04 PM Robinson, Paul via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> The details of the mangling would be ABI dependent not debug-info-format
> dependent.  Metadata already allows conveying a mangled name into LLVM, as
> David Blaikie mentioned, so that's not really an issue. The frontend
knows
> how to construct the mangled name, the backend knows where the mangled name
> goes in the final debug info.  It's a pretty reasonable separation of
> concerns.
>
>
>
> I didn't see anything in this quickie overview of CodeView that
wouldn't
> be expressible in DWARF, so there's nothing (yet) persuasive to suggest
> metadata should be format-aware.  It would be worthwhile for somebody
> knowledgeable in one format to take a good detailed look at the other, just
> to make sure; please provide a link to the detailed CodeView description
> when it becomes available.
>
>
>
> Regarding source-language awareness of the debug-info generator, that's
> really not a concern (and I say this as someone who once helped add DWARF
> emission of COBOL-specific entries to a compiler backend that was not
> entirely clear how to spell COBOL).  You need an API that is able to
> specify the constructs used by the language, and the rest of it is just
> processing those record types the way they're supposed to be.  The
backend
> is not doing any language-semantic analysis of the info, it's just
doing
> what it's told.
>
>
>
> Abstractly, the exercise of generalizing LLVM metadata to be able to
> support more than one debug-info format feels like a good thing. Metadata
> used to be more closely tied to DWARF (e.g., used DWARF tag codes directly
> in the metadata nodes to identify things) but it has been evolving away
> from that to a class hierarchy that is not so explicitly DWARF-ish.
> Handling CodeView would encourage that direction, rather than being a more
> fundamental shift.
>
> --paulr
>
>
>
> *From:* cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] *On Behalf Of
*David
> Blaikie via cfe-dev
> *Sent:* Friday, October 30, 2015 8:07 PM
> *To:* Dave Bartolomeo
> *Cc:* llvm-dev; Clang Dev
> *Subject:* Re: [cfe-dev] [llvm-dev] RFC: CodeView debug info emission in
> Clang/LLVM
>
>
>
> Brief answer, but can go into detail later:
>
> If this is the right idea, lets do it for dwarf too & generalize the
> support to work for both. It's certainly something we've
considered, to
> save all the complexity of representing essentially static data in an
> intermediate form.
>
> That said, given some of the stuff we have for lto, for example
> (deallocating/merging types etc) I'm not sure that's obviously the
right
> strategy.
>
> Mangled names for types don't seem like a hugely difficult feature. We
> already support mangled names for function debug info in dwarf. We already
> have the mangled name of a type in the metadata, it could be used for
> codeview emission.
>
> It might be worth talking more & considering what other language
features
> codeview uses that we haven't already plumbed through for dwarf (&
dwarf
> based debuggers use dwarf for expression evaluation too, fwiw)
>
> On Oct 30, 2015 5:12 PM, "Dave Bartolomeo via cfe-dev" <
> cfe-dev at lists.llvm.org> wrote:
>
>
>
>
>
> *From:* Saleem Abdulrasool [mailto:compnerd at compnerd.org]
> *Sent:* Thursday, October 29, 2015 10:02 PM
> *To:* Adrian Prantl <aprantl at apple.com>
> *Cc:* Dave Bartolomeo <Dave.Bartolomeo at microsoft.com>;
> llvm-dev at lists.llvm.org; cfe-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] RFC: CodeView debug info emission in Clang/LLVM
>
>
>
> On Thu, Oct 29, 2015 at 2:08 PM, Adrian Prantl via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
> > On Oct 29, 2015, at 10:11 AM, Dave Bartolomeo via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > Proposed Design
> > How Debug Info is Generated
> > The CodeView type records for a compilation unit will be generated by
> the front-end for the source language (Clang, in the case of C and C++).
> The front-end has access to the full type system and AST of the language,
> which is necessary to generate accurate debug type info. The type records
> will be represented as metadata in the LLVM IR, similar to how DWARF debug
> info is represented. I’ll cover the actual representation in a bit more
> detail below.
> > The LLVM back-end will be responsible for emitting the CodeView type
> records from the IR into the output .obj file. Since the type records will
> already be in the correct format, this is essentially just a copy. No
> inspection of the type records is necessary within LLVM. The back-end will
> also be responsible for generating CodeView symbol records, line numbers,
> and source file info for any functions and data defined in the compilation
> unit. The back-end is the logical place to do this because only the
> back-end knows the code addresses, data addresses, and stack frame layouts.
>
> Thanks for proposing this.
>
> How different are the type records from the type information we currently
> have in LLVM's DIType hierarchy? Would it be feasible to move the logic
for
> generating type records from LLVM metadata into the backend? This way a
> frontend could be agnostic about the debug information format.
>
>
>
> I think that this really is the path we want to follow.  If the current
> metadata we emit is insufficient, we should augment it with additional
> information sufficient to generate the necessary data in the backend.  The
> same annotations would then be able able to generate one OR both debug info
> formats.
>
>
>
> *[dB] I considered that approach, but I see a few reasons why I don’t
> think making the debug metadata format agnostic would work out very well.
> To ensure that the backed can generate both debug formats by itself, we
> need to make the metadata contain enough information from the original AST
> for the format-specific code in the backend to generate the debug info. I
> believe that in practice, we’d wind up having to encode a significant
> portion of the AST (for decls of types and members, at least) into
> metadata, because debug type info, at least in CodeView, strives for pretty
> close fidelity with the declarations and types in the original source
> language. The CodeView debug type info is used by the VS debugger to parse
> and evaluate C++ expressions while debugging. We currently have a bunch of
> limitations in our debugger’s expression evaluation due to information
> missing from the debug type info, and we’ll probably attempt to preserve
> even more of that information going forward. There’s not much information
> from the AST that we can ignore if we want to reach that goal. Of course,
> we could just accept that we need the majority of the AST for type and
> function declarations in the debug metadata, and do that work in order to
> avoid having the frontend know about debug info formats, but that just
> means that now the backend code that generates the debug info has to know
> about all of the source language-specific constructs that it’s reading when
> creating the debug info. I think I’d rather have Clang have to understand
> the language-specific parts of multiple debug info formats than have LLVM
> understand language-specific metadata.*
>
>
>
> *As an example, the CodeView definition of a user-defined type requires
> both the mangled name of the type and the non-mangled “display name” of the
> type. Both of these require a fair bit of information from the AST to
> generate. For the mangled name in particular, there’s already code in Clang
> that generates this. If we want the backend to do this instead, we have to
> stuff a bunch of AST info into metadata, and then figure out how to share
> the name mangling code between Clang (where it operates on actual ASTs) and
> LLVM (where it would operate on metadata). If, instead, we have Clang
> compute the mangled name and display name and pass those names in the
> metadata, we’re not being particularly format-agnostic in Clang, and if the
> current compilation is only generating DWARF, we didn’t really need to
> compute or store those potentially large strings for every type anyway.*
>
>
>
> *Whether Clang is format-agnostic or not, there will have to be some
> component that converts from something format-agnostic (either ASTs or
> metadata) to DWARF, and some component that converts from ASTs or metadata
> to CodeView. You can put those two components in Clang and accept that
> Clang won’t be format-agnostic. Or, you can put those two components in
> LLVM, which leaves Clang as format-agnostic but requires that LLVM be more
> source language-aware. It also requires a third component to translate ASTs
> into metadata to pass to LLVM. Letting Clang worry about two different
> debug type info formats seems preferable to writing additional code and
> making an LLVM component understand more about the source language.*
>
>
>
> *Is there another approach I haven’t thought of that would let us wind up
> with a cleaner solution? I’ve only been working with the Clang and LLVM
> debug info code for a few months, so my knowledge of the existing design is
> far from complete.*
>
>
>
> *Note that for the rest of debug info (line numbers, source files, stack
> layouts, etc.), I don’t think the frontend should have to worry about the
> debug info format, and the current design for those pieces is just fine.
> It’s only the type info that I think is source language-specific enough to
> justify computing it in the frontend.*
>
>
>
>
>
>
>
> -- adrian
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev&data=01%7c01%7cDave.Bartolomeo%40microsoft.com%7cd240c2d59d2a4a17baf508d2e0e74c79%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Fz6kMmt9LpwG7SvGYKLA4g3%2fYaBWp0AAhFsJKkZQARE%3d>
>
>
>
>
>
> --
>
> Saleem Abdulrasool
> compnerd (at) compnerd (dot) org
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151031/bc4a1132/attachment.html>

David Blaikie via llvm-dev

2015-Oct-31 22:19 UTC

head link

[llvm-dev] [cfe-dev] RFC: CodeView debug info emission in Clang/LLVM

On Sat, Oct 31, 2015 at 3:07 PM, Zachary Turner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Definitely having someone who knows both formats well would be an
> advantage.  Dave B might be in the best position to do this, so hopefully
> he can provide a couple more examples of areas where he has trouble
> expressing CV information entirely in the backend.
>
> Regardless of what everyone ends up deciding on with regards to the
> front-end / back-discussion, I want to suggest separating the work into
> separate pieces that can go in independently of each other.
>
> For example, the proposed LLVMCodeView library, which simply reads and
> writes raw CV records, seems to be orthogonal to this discussion and could
> be submitted independently.
>
I haven't looked at the patch in general, but that sounds quite plausible -
unit tests or what-have-you that demonstrate the expected behavior
regardless of wehre it ultimately ends up being used from (LLVM, Clang, or
both)

>
> On Sat, Oct 31, 2015 at 12:04 PM Robinson, Paul via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> The details of the mangling would be ABI dependent not
debug-info-format
>> dependent.  Metadata already allows conveying a mangled name into LLVM,
as
>> David Blaikie mentioned, so that's not really an issue. The
frontend knows
>> how to construct the mangled name, the backend knows where the mangled
name
>> goes in the final debug info.  It's a pretty reasonable separation
of
>> concerns.
>>
>>
>>
>> I didn't see anything in this quickie overview of CodeView that
wouldn't
>> be expressible in DWARF, so there's nothing (yet) persuasive to
suggest
>> metadata should be format-aware.  It would be worthwhile for somebody
>> knowledgeable in one format to take a good detailed look at the other,
just
>> to make sure; please provide a link to the detailed CodeView
description
>> when it becomes available.
>>
>>
>>
>> Regarding source-language awareness of the debug-info generator,
that's
>> really not a concern (and I say this as someone who once helped add
DWARF
>> emission of COBOL-specific entries to a compiler backend that was not
>> entirely clear how to spell COBOL).  You need an API that is able to
>> specify the constructs used by the language, and the rest of it is just
>> processing those record types the way they're supposed to be.  The
backend
>> is not doing any language-semantic analysis of the info, it's just
doing
>> what it's told.
>>
>>
>>
>> Abstractly, the exercise of generalizing LLVM metadata to be able to
>> support more than one debug-info format feels like a good thing.
Metadata
>> used to be more closely tied to DWARF (e.g., used DWARF tag codes
directly
>> in the metadata nodes to identify things) but it has been evolving away
>> from that to a class hierarchy that is not so explicitly DWARF-ish.
>> Handling CodeView would encourage that direction, rather than being a
more
>> fundamental shift.
>>
>> --paulr
>>
>>
>>
>> *From:* cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] *On Behalf
Of *David
>> Blaikie via cfe-dev
>> *Sent:* Friday, October 30, 2015 8:07 PM
>> *To:* Dave Bartolomeo
>> *Cc:* llvm-dev; Clang Dev
>> *Subject:* Re: [cfe-dev] [llvm-dev] RFC: CodeView debug info emission
in
>> Clang/LLVM
>>
>>
>>
>> Brief answer, but can go into detail later:
>>
>> If this is the right idea, lets do it for dwarf too & generalize
the
>> support to work for both. It's certainly something we've
considered, to
>> save all the complexity of representing essentially static data in an
>> intermediate form.
>>
>> That said, given some of the stuff we have for lto, for example
>> (deallocating/merging types etc) I'm not sure that's obviously
the right
>> strategy.
>>
>> Mangled names for types don't seem like a hugely difficult feature.
We
>> already support mangled names for function debug info in dwarf. We
already
>> have the mangled name of a type in the metadata, it could be used for
>> codeview emission.
>>
>> It might be worth talking more & considering what other language
features
>> codeview uses that we haven't already plumbed through for dwarf
(& dwarf
>> based debuggers use dwarf for expression evaluation too, fwiw)
>>
>> On Oct 30, 2015 5:12 PM, "Dave Bartolomeo via cfe-dev" <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>
>>
>>
>>
>> *From:* Saleem Abdulrasool [mailto:compnerd at compnerd.org]
>> *Sent:* Thursday, October 29, 2015 10:02 PM
>> *To:* Adrian Prantl <aprantl at apple.com>
>> *Cc:* Dave Bartolomeo <Dave.Bartolomeo at microsoft.com>;
>> llvm-dev at lists.llvm.org; cfe-dev at lists.llvm.org
>> *Subject:* Re: [llvm-dev] RFC: CodeView debug info emission in
Clang/LLVM
>>
>>
>>
>> On Thu, Oct 29, 2015 at 2:08 PM, Adrian Prantl via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>
>> > On Oct 29, 2015, at 10:11 AM, Dave Bartolomeo via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> >
>> > Proposed Design
>> > How Debug Info is Generated
>> > The CodeView type records for a compilation unit will be generated
by
>> the front-end for the source language (Clang, in the case of C and
C++).
>> The front-end has access to the full type system and AST of the
language,
>> which is necessary to generate accurate debug type info. The type
records
>> will be represented as metadata in the LLVM IR, similar to how DWARF
debug
>> info is represented. I’ll cover the actual representation in a bit more
>> detail below.
>> > The LLVM back-end will be responsible for emitting the CodeView
type
>> records from the IR into the output .obj file. Since the type records
will
>> already be in the correct format, this is essentially just a copy. No
>> inspection of the type records is necessary within LLVM. The back-end
will
>> also be responsible for generating CodeView symbol records, line
numbers,
>> and source file info for any functions and data defined in the
compilation
>> unit. The back-end is the logical place to do this because only the
>> back-end knows the code addresses, data addresses, and stack frame
layouts.
>>
>> Thanks for proposing this.
>>
>> How different are the type records from the type information we
currently
>> have in LLVM's DIType hierarchy? Would it be feasible to move the
logic for
>> generating type records from LLVM metadata into the backend? This way a
>> frontend could be agnostic about the debug information format.
>>
>>
>>
>> I think that this really is the path we want to follow.  If the current
>> metadata we emit is insufficient, we should augment it with additional
>> information sufficient to generate the necessary data in the backend. 
The
>> same annotations would then be able able to generate one OR both debug
info
>> formats.
>>
>>
>>
>> *[dB] I considered that approach, but I see a few reasons why I don’t
>> think making the debug metadata format agnostic would work out very
well.
>> To ensure that the backed can generate both debug formats by itself, we
>> need to make the metadata contain enough information from the original
AST
>> for the format-specific code in the backend to generate the debug info.
I
>> believe that in practice, we’d wind up having to encode a significant
>> portion of the AST (for decls of types and members, at least) into
>> metadata, because debug type info, at least in CodeView, strives for
pretty
>> close fidelity with the declarations and types in the original source
>> language. The CodeView debug type info is used by the VS debugger to
parse
>> and evaluate C++ expressions while debugging. We currently have a bunch
of
>> limitations in our debugger’s expression evaluation due to information
>> missing from the debug type info, and we’ll probably attempt to
preserve
>> even more of that information going forward. There’s not much
information
>> from the AST that we can ignore if we want to reach that goal. Of
course,
>> we could just accept that we need the majority of the AST for type and
>> function declarations in the debug metadata, and do that work in order
to
>> avoid having the frontend know about debug info formats, but that just
>> means that now the backend code that generates the debug info has to
know
>> about all of the source language-specific constructs that it’s reading
when
>> creating the debug info. I think I’d rather have Clang have to
understand
>> the language-specific parts of multiple debug info formats than have
LLVM
>> understand language-specific metadata.*
>>
>>
>>
>> *As an example, the CodeView definition of a user-defined type requires
>> both the mangled name of the type and the non-mangled “display name” of
the
>> type. Both of these require a fair bit of information from the AST to
>> generate. For the mangled name in particular, there’s already code in
Clang
>> that generates this. If we want the backend to do this instead, we have
to
>> stuff a bunch of AST info into metadata, and then figure out how to
share
>> the name mangling code between Clang (where it operates on actual ASTs)
and
>> LLVM (where it would operate on metadata). If, instead, we have Clang
>> compute the mangled name and display name and pass those names in the
>> metadata, we’re not being particularly format-agnostic in Clang, and if
the
>> current compilation is only generating DWARF, we didn’t really need to
>> compute or store those potentially large strings for every type
anyway.*
>>
>>
>>
>> *Whether Clang is format-agnostic or not, there will have to be some
>> component that converts from something format-agnostic (either ASTs or
>> metadata) to DWARF, and some component that converts from ASTs or
metadata
>> to CodeView. You can put those two components in Clang and accept that
>> Clang won’t be format-agnostic. Or, you can put those two components in
>> LLVM, which leaves Clang as format-agnostic but requires that LLVM be
more
>> source language-aware. It also requires a third component to translate
ASTs
>> into metadata to pass to LLVM. Letting Clang worry about two different
>> debug type info formats seems preferable to writing additional code and
>> making an LLVM component understand more about the source language.*
>>
>>
>>
>> *Is there another approach I haven’t thought of that would let us wind
up
>> with a cleaner solution? I’ve only been working with the Clang and LLVM
>> debug info code for a few months, so my knowledge of the existing
design is
>> far from complete.*
>>
>>
>>
>> *Note that for the rest of debug info (line numbers, source files,
stack
>> layouts, etc.), I don’t think the frontend should have to worry about
the
>> debug info format, and the current design for those pieces is just
fine.
>> It’s only the type info that I think is source language-specific enough
to
>> justify computing it in the frontend.*
>>
>>
>>
>>
>>
>>
>>
>> -- adrian
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev&data=01%7c01%7cDave.Bartolomeo%40microsoft.com%7cd240c2d59d2a4a17baf508d2e0e74c79%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Fz6kMmt9LpwG7SvGYKLA4g3%2fYaBWp0AAhFsJKkZQARE%3d>
>>
>>
>>
>>
>>
>> --
>>
>> Saleem Abdulrasool
>> compnerd (at) compnerd (dot) org
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151031/69355bcb/attachment.html>

Aboud, Amjad via llvm-dev

2015-Nov-01 12:10 UTC

head link

[llvm-dev] [cfe-dev] RFC: CodeView debug info emission in Clang/LLVM

I also think that we should keep one representation of debug info in the LLVM
IR.
There would be a need to extend some of the debug info entries to support
CodeView, but I think that most of the information generated today by Clang for
Dwarf can be used for generating CodeView.

I can think about two missing extensions that are needed to CodeView:

1.      In Frontend: File Checksum, it is probably a calculation that Clang
should do and send to the backend through DIFile.

2.      In Backend: Extend “X86 Dwarf<->LLVM register mappings” to support
“X86 CodeView<->LLVM register mappings”

I can think about more differences (gaps) between Dwarf and CodeView that need
to be closed, however, it is doable with one uniform (generic) debug info
metadata in the LLVM IR.

Regards,
Amjad

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of David
Blaikie via llvm-dev
Sent: Sunday, November 01, 2015 00:20
To: Zachary Turner
Cc: llvm-dev; cfe-dev at lists.llvm.org
Subject: Re: [llvm-dev] [cfe-dev] RFC: CodeView debug info emission in
Clang/LLVM

On Sat, Oct 31, 2015 at 3:07 PM, Zachary Turner via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Definitely having someone who knows both formats well would be an advantage. 
Dave B might be in the best position to do this, so hopefully he can provide a
couple more examples of areas where he has trouble expressing CV information
entirely in the backend.

Regardless of what everyone ends up deciding on with regards to the front-end /
back-discussion, I want to suggest separating the work into separate pieces that
can go in independently of each other.

For example, the proposed LLVMCodeView library, which simply reads and writes
raw CV records, seems to be orthogonal to this discussion and could be submitted
independently.

I haven't looked at the patch in general, but that sounds quite plausible -
unit tests or what-have-you that demonstrate the expected behavior regardless of
wehre it ultimately ends up being used from (LLVM, Clang, or both)

On Sat, Oct 31, 2015 at 12:04 PM Robinson, Paul via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
The details of the mangling would be ABI dependent not debug-info-format
dependent.  Metadata already allows conveying a mangled name into LLVM, as David
Blaikie mentioned, so that's not really an issue. The frontend knows how to
construct the mangled name, the backend knows where the mangled name goes in the
final debug info.  It's a pretty reasonable separation of concerns.

I didn't see anything in this quickie overview of CodeView that wouldn't
be expressible in DWARF, so there's nothing (yet) persuasive to suggest
metadata should be format-aware.  It would be worthwhile for somebody
knowledgeable in one format to take a good detailed look at the other, just to
make sure; please provide a link to the detailed CodeView description when it
becomes available.

Regarding source-language awareness of the debug-info generator, that's
really not a concern (and I say this as someone who once helped add DWARF
emission of COBOL-specific entries to a compiler backend that was not entirely
clear how to spell COBOL).  You need an API that is able to specify the
constructs used by the language, and the rest of it is just processing those
record types the way they're supposed to be.  The backend is not doing any
language-semantic analysis of the info, it's just doing what it's told.

Abstractly, the exercise of generalizing LLVM metadata to be able to support
more than one debug-info format feels like a good thing. Metadata used to be
more closely tied to DWARF (e.g., used DWARF tag codes directly in the metadata
nodes to identify things) but it has been evolving away from that to a class
hierarchy that is not so explicitly DWARF-ish.  Handling CodeView would
encourage that direction, rather than being a more fundamental shift.
--paulr

From: cfe-dev [mailto:cfe-dev-bounces at
lists.llvm.org<mailto:cfe-dev-bounces at lists.llvm.org>] On Behalf Of
David Blaikie via cfe-dev
Sent: Friday, October 30, 2015 8:07 PM
To: Dave Bartolomeo
Cc: llvm-dev; Clang Dev
Subject: Re: [cfe-dev] [llvm-dev] RFC: CodeView debug info emission in
Clang/LLVM

Brief answer, but can go into detail later:

If this is the right idea, lets do it for dwarf too & generalize the support
to work for both. It's certainly something we've considered, to save all
the complexity of representing essentially static data in an intermediate form.

That said, given some of the stuff we have for lto, for example
(deallocating/merging types etc) I'm not sure that's obviously the right
strategy.

Mangled names for types don't seem like a hugely difficult feature. We
already support mangled names for function debug info in dwarf. We already have
the mangled name of a type in the metadata, it could be used for codeview
emission.

It might be worth talking more & considering what other language features
codeview uses that we haven't already plumbed through for dwarf (& dwarf
based debuggers use dwarf for expression evaluation too, fwiw)
On Oct 30, 2015 5:12 PM, "Dave Bartolomeo via cfe-dev" <cfe-dev at
lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:

From: Saleem Abdulrasool [mailto:compnerd at compnerd.org<mailto:compnerd at
compnerd.org>]
Sent: Thursday, October 29, 2015 10:02 PM
To: Adrian Prantl <aprantl at apple.com<mailto:aprantl at
apple.com>>
Cc: Dave Bartolomeo <Dave.Bartolomeo at
microsoft.com<mailto:Dave.Bartolomeo at microsoft.com>>; llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; cfe-dev at
lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
Subject: Re: [llvm-dev] RFC: CodeView debug info emission in Clang/LLVM

On Thu, Oct 29, 2015 at 2:08 PM, Adrian Prantl via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
> On Oct 29, 2015, at 10:11 AM, Dave Bartolomeo via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
>
> Proposed Design
> How Debug Info is Generated
> The CodeView type records for a compilation unit will be generated by the
front-end for the source language (Clang, in the case of C and C++). The
front-end has access to the full type system and AST of the language, which is
necessary to generate accurate debug type info. The type records will be
represented as metadata in the LLVM IR, similar to how DWARF debug info is
represented. I’ll cover the actual representation in a bit more detail below.
> The LLVM back-end will be responsible for emitting the CodeView type
records from the IR into the output .obj file. Since the type records will
already be in the correct format, this is essentially just a copy. No inspection
of the type records is necessary within LLVM. The back-end will also be
responsible for generating CodeView symbol records, line numbers, and source
file info for any functions and data defined in the compilation unit. The
back-end is the logical place to do this because only the back-end knows the
code addresses, data addresses, and stack frame layouts.
Thanks for proposing this.

How different are the type records from the type information we currently have
in LLVM's DIType hierarchy? Would it be feasible to move the logic for
generating type records from LLVM metadata into the backend? This way a frontend
could be agnostic about the debug information format.

I think that this really is the path we want to follow.  If the current metadata
we emit is insufficient, we should augment it with additional information
sufficient to generate the necessary data in the backend.  The same annotations
would then be able able to generate one OR both debug info formats.

[dB] I considered that approach, but I see a few reasons why I don’t think
making the debug metadata format agnostic would work out very well. To ensure
that the backed can generate both debug formats by itself, we need to make the
metadata contain enough information from the original AST for the
format-specific code in the backend to generate the debug info. I believe that
in practice, we’d wind up having to encode a significant portion of the AST (for
decls of types and members, at least) into metadata, because debug type info, at
least in CodeView, strives for pretty close fidelity with the declarations and
types in the original source language. The CodeView debug type info is used by
the VS debugger to parse and evaluate C++ expressions while debugging. We
currently have a bunch of limitations in our debugger’s expression evaluation
due to information missing from the debug type info, and we’ll probably attempt
to preserve even more of that information going forward. There’s not much
information from the AST that we can ignore if we want to reach that goal. Of
course, we could just accept that we need the majority of the AST for type and
function declarations in the debug metadata, and do that work in order to avoid
having the frontend know about debug info formats, but that just means that now
the backend code that generates the debug info has to know about all of the
source language-specific constructs that it’s reading when creating the debug
info. I think I’d rather have Clang have to understand the language-specific
parts of multiple debug info formats than have LLVM understand language-specific
metadata.

As an example, the CodeView definition of a user-defined type requires both the
mangled name of the type and the non-mangled “display name” of the type. Both of
these require a fair bit of information from the AST to generate. For the
mangled name in particular, there’s already code in Clang that generates this.
If we want the backend to do this instead, we have to stuff a bunch of AST info
into metadata, and then figure out how to share the name mangling code between
Clang (where it operates on actual ASTs) and LLVM (where it would operate on
metadata). If, instead, we have Clang compute the mangled name and display name
and pass those names in the metadata, we’re not being particularly
format-agnostic in Clang, and if the current compilation is only generating
DWARF, we didn’t really need to compute or store those potentially large strings
for every type anyway.

Whether Clang is format-agnostic or not, there will have to be some component
that converts from something format-agnostic (either ASTs or metadata) to DWARF,
and some component that converts from ASTs or metadata to CodeView. You can put
those two components in Clang and accept that Clang won’t be format-agnostic.
Or, you can put those two components in LLVM, which leaves Clang as
format-agnostic but requires that LLVM be more source language-aware. It also
requires a third component to translate ASTs into metadata to pass to LLVM.
Letting Clang worry about two different debug type info formats seems preferable
to writing additional code and making an LLVM component understand more about
the source language.

Is there another approach I haven’t thought of that would let us wind up with a
cleaner solution? I’ve only been working with the Clang and LLVM debug info code
for a few months, so my knowledge of the existing design is far from complete.

Note that for the rest of debug info (line numbers, source files, stack layouts,
etc.), I don’t think the frontend should have to worry about the debug info
format, and the current design for those pieces is just fine. It’s only the type
info that I think is source language-specific enough to justify computing it in
the frontend.

-- adrian
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev&data=01%7c01%7cDave.Bartolomeo%40microsoft.com%7cd240c2d59d2a4a17baf508d2e0e74c79%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Fz6kMmt9LpwG7SvGYKLA4g3%2fYaBWp0AAhFsJKkZQARE%3d>

--
Saleem Abdulrasool
compnerd (at) compnerd (dot) org

_______________________________________________
cfe-dev mailing list
cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151101/3c96e9a0/attachment-0001.html>

Dave Bartolomeo via llvm-dev

2015-Nov-04 00:25 UTC

head link

[llvm-dev] [cfe-dev] RFC: CodeView debug info emission in Clang/LLVM

The LLVMCodeView library is definitely independent of the rest of the design
questions.

As far as testing goes, what would be the conventional LLVM way of testing a
library for file format manipulation? A test tool that converts some simple text
form into a .obj containing CodeView sections, and comparing with a baseline
.obj? Or would the test convert back from the .obj to some kind of text as well,
and compare to a text baseline? Is there some other LLVM component that has
similar testing requirements that I can use as an example for how to test
LLVMCodeView? Note that I’d be adding a CodeView->text dump tool anyway,
since that will be pretty much essential for anyone working with CodeView.

-Dave

From: cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] On Behalf Of David
Blaikie via cfe-dev
Sent: Saturday, October 31, 2015 3:20 PM
To: Zachary Turner <zturner at google.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>; cfe-dev at lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] RFC: CodeView debug info emission in
Clang/LLVM

On Sat, Oct 31, 2015 at 3:07 PM, Zachary Turner via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Definitely having someone who knows both formats well would be an advantage. 
Dave B might be in the best position to do this, so hopefully he can provide a
couple more examples of areas where he has trouble expressing CV information
entirely in the backend.

Regardless of what everyone ends up deciding on with regards to the front-end /
back-discussion, I want to suggest separating the work into separate pieces that
can go in independently of each other.

For example, the proposed LLVMCodeView library, which simply reads and writes
raw CV records, seems to be orthogonal to this discussion and could be submitted
independently.

I haven't looked at the patch in general, but that sounds quite plausible -
unit tests or what-have-you that demonstrate the expected behavior regardless of
wehre it ultimately ends up being used from (LLVM, Clang, or both)

On Sat, Oct 31, 2015 at 12:04 PM Robinson, Paul via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
The details of the mangling would be ABI dependent not debug-info-format
dependent.  Metadata already allows conveying a mangled name into LLVM, as David
Blaikie mentioned, so that's not really an issue. The frontend knows how to
construct the mangled name, the backend knows where the mangled name goes in the
final debug info.  It's a pretty reasonable separation of concerns.

I didn't see anything in this quickie overview of CodeView that wouldn't
be expressible in DWARF, so there's nothing (yet) persuasive to suggest
metadata should be format-aware.  It would be worthwhile for somebody
knowledgeable in one format to take a good detailed look at the other, just to
make sure; please provide a link to the detailed CodeView description when it
becomes available.

Regarding source-language awareness of the debug-info generator, that's
really not a concern (and I say this as someone who once helped add DWARF
emission of COBOL-specific entries to a compiler backend that was not entirely
clear how to spell COBOL).  You need an API that is able to specify the
constructs used by the language, and the rest of it is just processing those
record types the way they're supposed to be.  The backend is not doing any
language-semantic analysis of the info, it's just doing what it's told.

Abstractly, the exercise of generalizing LLVM metadata to be able to support
more than one debug-info format feels like a good thing. Metadata used to be
more closely tied to DWARF (e.g., used DWARF tag codes directly in the metadata
nodes to identify things) but it has been evolving away from that to a class
hierarchy that is not so explicitly DWARF-ish.  Handling CodeView would
encourage that direction, rather than being a more fundamental shift.
--paulr

From: cfe-dev [mailto:cfe-dev-bounces at
lists.llvm.org<mailto:cfe-dev-bounces at lists.llvm.org>] On Behalf Of
David Blaikie via cfe-dev
Sent: Friday, October 30, 2015 8:07 PM
To: Dave Bartolomeo
Cc: llvm-dev; Clang Dev
Subject: Re: [cfe-dev] [llvm-dev] RFC: CodeView debug info emission in
Clang/LLVM

Brief answer, but can go into detail later:

If this is the right idea, lets do it for dwarf too & generalize the support
to work for both. It's certainly something we've considered, to save all
the complexity of representing essentially static data in an intermediate form.

That said, given some of the stuff we have for lto, for example
(deallocating/merging types etc) I'm not sure that's obviously the right
strategy.

Mangled names for types don't seem like a hugely difficult feature. We
already support mangled names for function debug info in dwarf. We already have
the mangled name of a type in the metadata, it could be used for codeview
emission.

It might be worth talking more & considering what other language features
codeview uses that we haven't already plumbed through for dwarf (& dwarf
based debuggers use dwarf for expression evaluation too, fwiw)
On Oct 30, 2015 5:12 PM, "Dave Bartolomeo via cfe-dev" <cfe-dev at
lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:

From: Saleem Abdulrasool [mailto:compnerd at compnerd.org<mailto:compnerd at
compnerd.org>]
Sent: Thursday, October 29, 2015 10:02 PM
To: Adrian Prantl <aprantl at apple.com<mailto:aprantl at
apple.com>>
Cc: Dave Bartolomeo <Dave.Bartolomeo at
microsoft.com<mailto:Dave.Bartolomeo at microsoft.com>>; llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; cfe-dev at
lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
Subject: Re: [llvm-dev] RFC: CodeView debug info emission in Clang/LLVM

On Thu, Oct 29, 2015 at 2:08 PM, Adrian Prantl via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
> On Oct 29, 2015, at 10:11 AM, Dave Bartolomeo via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
>
> Proposed Design
> How Debug Info is Generated
> The CodeView type records for a compilation unit will be generated by the
front-end for the source language (Clang, in the case of C and C++). The
front-end has access to the full type system and AST of the language, which is
necessary to generate accurate debug type info. The type records will be
represented as metadata in the LLVM IR, similar to how DWARF debug info is
represented. I’ll cover the actual representation in a bit more detail below.
> The LLVM back-end will be responsible for emitting the CodeView type
records from the IR into the output .obj file. Since the type records will
already be in the correct format, this is essentially just a copy. No inspection
of the type records is necessary within LLVM. The back-end will also be
responsible for generating CodeView symbol records, line numbers, and source
file info for any functions and data defined in the compilation unit. The
back-end is the logical place to do this because only the back-end knows the
code addresses, data addresses, and stack frame layouts.
Thanks for proposing this.

How different are the type records from the type information we currently have
in LLVM's DIType hierarchy? Would it be feasible to move the logic for
generating type records from LLVM metadata into the backend? This way a frontend
could be agnostic about the debug information format.

I think that this really is the path we want to follow.  If the current metadata
we emit is insufficient, we should augment it with additional information
sufficient to generate the necessary data in the backend.  The same annotations
would then be able able to generate one OR both debug info formats.

[dB] I considered that approach, but I see a few reasons why I don’t think
making the debug metadata format agnostic would work out very well. To ensure
that the backed can generate both debug formats by itself, we need to make the
metadata contain enough information from the original AST for the
format-specific code in the backend to generate the debug info. I believe that
in practice, we’d wind up having to encode a significant portion of the AST (for
decls of types and members, at least) into metadata, because debug type info, at
least in CodeView, strives for pretty close fidelity with the declarations and
types in the original source language. The CodeView debug type info is used by
the VS debugger to parse and evaluate C++ expressions while debugging. We
currently have a bunch of limitations in our debugger’s expression evaluation
due to information missing from the debug type info, and we’ll probably attempt
to preserve even more of that information going forward. There’s not much
information from the AST that we can ignore if we want to reach that goal. Of
course, we could just accept that we need the majority of the AST for type and
function declarations in the debug metadata, and do that work in order to avoid
having the frontend know about debug info formats, but that just means that now
the backend code that generates the debug info has to know about all of the
source language-specific constructs that it’s reading when creating the debug
info. I think I’d rather have Clang have to understand the language-specific
parts of multiple debug info formats than have LLVM understand language-specific
metadata.

As an example, the CodeView definition of a user-defined type requires both the
mangled name of the type and the non-mangled “display name” of the type. Both of
these require a fair bit of information from the AST to generate. For the
mangled name in particular, there’s already code in Clang that generates this.
If we want the backend to do this instead, we have to stuff a bunch of AST info
into metadata, and then figure out how to share the name mangling code between
Clang (where it operates on actual ASTs) and LLVM (where it would operate on
metadata). If, instead, we have Clang compute the mangled name and display name
and pass those names in the metadata, we’re not being particularly
format-agnostic in Clang, and if the current compilation is only generating
DWARF, we didn’t really need to compute or store those potentially large strings
for every type anyway.

Whether Clang is format-agnostic or not, there will have to be some component
that converts from something format-agnostic (either ASTs or metadata) to DWARF,
and some component that converts from ASTs or metadata to CodeView. You can put
those two components in Clang and accept that Clang won’t be format-agnostic.
Or, you can put those two components in LLVM, which leaves Clang as
format-agnostic but requires that LLVM be more source language-aware. It also
requires a third component to translate ASTs into metadata to pass to LLVM.
Letting Clang worry about two different debug type info formats seems preferable
to writing additional code and making an LLVM component understand more about
the source language.

Is there another approach I haven’t thought of that would let us wind up with a
cleaner solution? I’ve only been working with the Clang and LLVM debug info code
for a few months, so my knowledge of the existing design is far from complete.

Note that for the rest of debug info (line numbers, source files, stack layouts,
etc.), I don’t think the frontend should have to worry about the debug info
format, and the current design for those pieces is just fine. It’s only the type
info that I think is source language-specific enough to justify computing it in
the frontend.

-- adrian
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev&data=01%7c01%7cDave.Bartolomeo%40microsoft.com%7cd240c2d59d2a4a17baf508d2e0e74c79%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Fz6kMmt9LpwG7SvGYKLA4g3%2fYaBWp0AAhFsJKkZQARE%3d>

--
Saleem Abdulrasool
compnerd (at) compnerd (dot) org

_______________________________________________
cfe-dev mailing list
cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fcfe-dev&data=01%7c01%7cDave.Bartolomeo%40microsoft.com%7c75ae1ecd7082445faf8e08d2e2416634%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=4pXxP9lxo1XdnIaeGB555hQ1KALGHthcRm7an1oagXQ%3d>
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev&data=01%7c01%7cDave.Bartolomeo%40microsoft.com%7c75ae1ecd7082445faf8e08d2e2416634%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=o5h%2fNyqy9KRI6SRXAAd%2fgra%2fDqpCddTeDP66%2bEJLJd4%3d>

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev&data=01%7c01%7cDave.Bartolomeo%40microsoft.com%7c75ae1ecd7082445faf8e08d2e2416634%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=o5h%2fNyqy9KRI6SRXAAd%2fgra%2fDqpCddTeDP66%2bEJLJd4%3d>

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151104/f6b76499/attachment-0001.html>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Nov 2015 - [cfe-dev] RFC: CodeView debug info emission in Clang/LLVM

[llvm-dev] [cfe-dev] RFC: CodeView debug info emission in Clang/LLVM

[llvm-dev] [cfe-dev] RFC: CodeView debug info emission in Clang/LLVM

[llvm-dev] [cfe-dev] RFC: CodeView debug info emission in Clang/LLVM

[llvm-dev] [cfe-dev] RFC: CodeView debug info emission in Clang/LLVM

Apparently Analagous Threads