thr3ads.net - llvm dev - [llvm-dev] RFC: Debug info for Cuda [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Alexey Bataev via llvm-dev

2017-Nov-06 18:37 UTC

[llvm-dev] RFC: Debug info for Cuda

Hi everybody,

As you know, Cuda/NVPTX target has very limited support of the debug info in
Clang/LLVM. Currently, LLVM supports only emission of the line numbers debug
info.

This is caused by limitations of the Cuda/NVPTX codegen. Clang/LLVM translates
the source code to LLVM IR, which is then lowered to PTX (parallel thread
execution) intermediate file. This PTX file represents special kind of the
assembler code in text format, which contains the code itself + (possibly) debug
info. Then this PTX file is compiled by ptxas tool into the CUDA binary
representation.


Debug info representation in PTX file.
=======================
According to PTX Writer's Guide to Interoperability, Debug information
(http://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#debug-information)
, debug information must be encoded in DWARF (Debug With Arbitrary Record
Format). The responsibility for generating debug information is split between
the PTX producer and the PTX-to-SASS backend. The PTX producer is responsible
for emitting binary DWARF into the PTX file, using the .section and
.b8-.b16-.b32-and-.b64 directives in PTX. This should contain the .debug_info
and .debug_abbrev sections, and possibly optional sections .debug_pubnames and
.debug_aranges. These sections are standard DWARF2 sections that refer to labels
and registers in the PTX.

The PTX-to-SASS backend is responsible for generating the .debug_line section
from the .file and .loc directives in the PTX file. This section maps source
lines to SASS addresses. The PTX-to-SASS backend also generates the .debug_frame
section.

LLVM is able to emit debug info in DWARF. But ptxas compiler has some
limitations, that make it hard to adapt LLVM for correct emission of the debug
info in PTX files.


Limitations/features of the PTX format/ptxas compiler.
=================================
a) Supports DWARF-2 only.
b) Labels are allowed only in code section (only in functions).
c) Does not support label arithmetic in DWARF sections.
    “.b32 L1 – L2” as the size of the section is not allowed, so the sections
sizes should be calculated explicitly.
d) Debug info must point to the sections, not to labels inside these sections.
    “.b32 .debug_abbrevs”
e) Sections itself must be enclosed into braces
    “.section .debug_info {…}”
f) Frame info is non-register based
    Based on function local “__local_depot” array, that represents the stack
frame.
g) All variables must have non-standard DW_AT_address_class attribute so the
debuger had the info about address class of the variable - global or local.
DWARF standard does support this attribute, but it can be appiled to
pointer/reference types only, not variables.
h) The first label in the function must follow the debug location macro. In
LLVM, it is followed by the debug location macro.
i) .debug_frame section is emitted by txas compiler.
    DW_AT_frame_base must be set to dwarf::DW_FORM_data1
dwarf::DW_OP_call_frame_cfa value.
j) Strings cannot be referenced by the labels, instead they must be inlined in
the sections in form of array of chars.

Some changes in LLVM are required to support all these limitation/features in
the output PTX files.

Required changes in LLVM.
=================
•include/llvm/CodeGen/AsmPrinter.h.
    •Add “virtual MCSymbol *getFunctionFrameSymbol(const MachineFunction *MF)
const” for non-register-based frame info.
    •Override “NVPTXMCAsmPrinter.cpp” to return the name of the “__local_depot”
frame storage.
•Add ”cuda-gdb” specific tuning.
    •Inlined strings must be used in sections, not string references.
    •Label arithmetic is replaced by the absolute section size evaluation.
    •Use “AsmPrinter::doInitialization()” instead of NVPTX-custom manual
initialization.
    •Local variables address emitted as “__local_depot” + <var offset>.
•Add NVPTX specific “NVPTXMCAsmStreamer” class.
    •Requires moving to includes of “MCAsmStreamer” class declaration.
    •Overrides emission of the labels (names of the section are emitted
instead).
    •Overrides emission of the sections (emit braces)
    •Overrides string emission (as sequence of bytes, not as strings)
    •Overrides emission of files/locations debug info

Required changes in Clang.
================
•Add option “-gcuda-gdb” to driver.
    •Emit cuda-gdb compatible debug info (DWARF-2 by default + CudaGDB tuning).
•Add options “-g --dont-merge-basicblocks --return-at-end” to “ptxas” call.
    •ptxas is able to translate debug information only if -O0 optimization level
is used. It means, that we can use optimization level in LLVM > O0, but still
have to use O0 when calling ptxas compiler.


This approach was implemented in https://github.com/clang-ykt to support debug
info emission for NVPTX target when generating code for OpenMP offloading
constructs. You can try to use it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171106/851e24f8/attachment.html>

Robinson, Paul via llvm-dev

2017-Nov-06 19:56 UTC

head link

[llvm-dev] Debug info for Cuda

> Hi everybody,
> As you know, Cuda/NVPTX target has very limited support of the debug
> info in Clang/LLVM. Currently, LLVM supports only emission of the line
> numbers debug info.
> This is caused by limitations of the Cuda/NVPTX codegen. Clang/LLVM
> translates the source code to LLVM IR, which is then lowered to PTX
> (parallel thread execution) intermediate file. This PTX file represents
> special kind of the assembler code in text format, which contains the
> code itself + (possibly) debug info. Then this PTX file is compiled by
> ptxas tool into the CUDA binary representation.
>
> Debug info representation in PTX file.
> =======================> According to PTX Writer's Guide to
Interoperability, Debug information
>
(http://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#debug-information)
> , debug information must be encoded in DWARF (Debug With Arbitrary
> Record Format). The responsibility for generating debug information is
> split between the PTX producer and the PTX-to-SASS backend. The PTX
> producer is responsible for emitting binary DWARF into the PTX file,
> using the .section and .b8-.b16-.b32-and-.b64 directives in PTX. This
> should contain the .debug_info and .debug_abbrev sections, and possibly
> optional sections .debug_pubnames and .debug_aranges. These sections
> are standard DWARF2 sections that refer to labels and registers in the
> PTX.
>
> The PTX-to-SASS backend is responsible for generating the .debug_line
> section from the .file and .loc directives in the PTX file. This
> section maps source lines to SASS addresses. The PTX-to-SASS backend
> also generates the .debug_frame section.
All this sounds like the standard division of responsibilities between
an LLVM code generator and the assembler.
> LLVM is able to emit debug info in DWARF. But ptxas compiler has some
> limitations, that make it hard to adapt LLVM for correct emission of
> the debug info in PTX files.
>
> Limitations/features of the PTX format/ptxas compiler.
> =================================> a) Supports DWARF-2 only.
IIRC, Darwin had a similar restriction until recently.
> b) Labels are allowed only in code section (only in functions).
If you have static/global variables, I guess their locations would
have to be described using a section+offset expression?  Normally
we emit a location attribute that is just a reference to a label
for the variable.
> c) Does not support label arithmetic in DWARF sections.
>     “.b32 L1 – L2” as the size of the section is not allowed, so the
> sections sizes should be calculated explicitly.
MachO has a similar restriction, this should not be a problem if you
can do something like:
    L3 = L1 - L2
    .b32 L3
> d) Debug info must point to the sections, not to labels inside these
> sections.
>     “.b32 .debug_abbrevs”
Offhand for DWARF-2 I can't think of a reference that couldn't be done
this way.
> e) Sections itself must be enclosed into braces
>     “.section .debug_info {…}”
> f) Frame info is non-register based
>     Based on function local “__local_depot” array, that represents the
> stack frame.
> g) All variables must have non-standard DW_AT_address_class attribute
> so the debuger had the info about address class of the variable -
> global or local. DWARF standard does support this attribute, but it can
> be appiled to pointer/reference types only, not variables.
For variables it would be more usual to use DW_AT_segment for this.
But that's an agreement that the compiler and debugger need to reach.
> h) The first label in the function must follow the debug location macro.
> In LLVM, it is followed by the debug location macro.
I am not 100% sure what you mean by this, but I think it has to do with
the fact that LLVM attaches locations to instructions, not labels.  It
might or might not be easy to work around this; there might be an
unfortunate interaction with how emitting line-0 records works.
> i) .debug_frame section is emitted by txas compiler.
>     DW_AT_frame_base must be set to dwarf::DW_FORM_data1
> dwarf::DW_OP_call_frame_cfa value.
I doubt that's a problem.
> j) Strings cannot be referenced by the labels, instead they must be
> inlined in the sections in form of array of chars.
LLVM used to do inline strings, but switched to the .debug_str section
quite a while ago.  On the other hand, I spent a little time maybe a
year ago looking into whether we could emit short strings inline as a
space-saving measure, and decided it was feasible.  (I didn't do it 
because the space savings was really trivial.)  So I think doing this
would not be terribly hard.
> Some changes in LLVM are required to support all these
> limitation/features in the output PTX files.
> Required changes in LLVM.
> =================> •include/llvm/CodeGen/AsmPrinter.h.
>     •Add “virtual MCSymbol *getFunctionFrameSymbol(const
> MachineFunction *MF) const” for non-register-based frame info.
>     •Override “NVPTXMCAsmPrinter.cpp” to return the name of the
> “__local_depot” frame storage.
> •Add ”cuda-gdb” specific tuning.
Note that our design philosophy for "tuning" is that a tuning option
unpacks into other separate flags.  Not a problem, just an observation.
>     •Inlined strings must be used in sections, not string references.
>     •Label arithmetic is replaced by the absolute section size
> evaluation.
This one isn't a debug-info tuning decision, it's how your assembler
works
and so is a target decision.
>     •Use “AsmPrinter::doInitialization()” instead of NVPTX-custom manual
> initialization.
>    •Local variables address emitted as “__local_depot” + <var
offset>.
> •Add NVPTX specific “NVPTXMCAsmStreamer” class.
>     •Requires moving to includes of “MCAsmStreamer” class declaration.
>     •Overrides emission of the labels (names of the section are emitted
> instead).
>     •Overrides emission of the sections (emit braces)
>     •Overrides string emission (as sequence of bytes, not as strings)
>     •Overrides emission of files/locations debug info
> Required changes in Clang.
> ================> •Add option “-gcuda-gdb” to driver.
>     •Emit cuda-gdb compatible debug info (DWARF-2 by default + CudaGDB
> tuning).
> •Add options “-g --dont-merge-basicblocks --return-at-end” to “ptxas”
> call.
>     •ptxas is able to translate debug information only if -O0
> optimization level is used. It means, that we can use optimization
> level in LLVM > O0, but still have to use O0 when calling ptxas
> compiler.
>
> This approach was implemented in https://github.com/clang-ykt to support
> debug info emission for NVPTX target when generating code for OpenMP
> offloading constructs. You can try to use it.
I haven't looked at your code but all the things you describe seem
reasonably feasible.  Certainly the details of what you want to do
to the emitted DWARF are fine; I am less sure about the assembler
details, but if you have a worked example that makes it likely that
part is okay as well.
--paulr

Alexey Bataev via llvm-dev

2017-Nov-06 20:03 UTC

head link

[llvm-dev] Debug info for Cuda

06.11.2017 14:56, Robinson, Paul пишет:>> Hi everybody,
>> As you know, Cuda/NVPTX target has very limited support of the debug
>> info in Clang/LLVM. Currently, LLVM supports only emission of the line
>> numbers debug info.
>> This is caused by limitations of the Cuda/NVPTX codegen. Clang/LLVM
>> translates the source code to LLVM IR, which is then lowered to PTX
>> (parallel thread execution) intermediate file. This PTX file represents
>> special kind of the assembler code in text format, which contains the
>> code itself + (possibly) debug info. Then this PTX file is compiled by
>> ptxas tool into the CUDA binary representation.
>>
>> Debug info representation in PTX file.
>> =======================>> According to PTX Writer's Guide to
Interoperability, Debug information
>>
(http://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#debug-information)
>> , debug information must be encoded in DWARF (Debug With Arbitrary
>> Record Format). The responsibility for generating debug information is
>> split between the PTX producer and the PTX-to-SASS backend. The PTX
>> producer is responsible for emitting binary DWARF into the PTX file,
>> using the .section and .b8-.b16-.b32-and-.b64 directives in PTX. This
>> should contain the .debug_info and .debug_abbrev sections, and possibly
>> optional sections .debug_pubnames and .debug_aranges. These sections
>> are standard DWARF2 sections that refer to labels and registers in the
>> PTX.
>>
>> The PTX-to-SASS backend is responsible for generating the .debug_line
>> section from the .file and .loc directives in the PTX file. This
>> section maps source lines to SASS addresses. The PTX-to-SASS backend
>> also generates the .debug_frame section.
> All this sounds like the standard division of responsibilities between
> an LLVM code generator and the assembler.
>
>> LLVM is able to emit debug info in DWARF. But ptxas compiler has some
>> limitations, that make it hard to adapt LLVM for correct emission of
>> the debug info in PTX files.
>>
>> Limitations/features of the PTX format/ptxas compiler.
>> =================================>> a) Supports DWARF-2 only.
> IIRC, Darwin had a similar restriction until recently.
>
>> b) Labels are allowed only in code section (only in functions).
> If you have static/global variables, I guess their locations would
> have to be described using a section+offset expression?  Normally
> we emit a location attribute that is just a reference to a label
> for the variable.
>
>> c) Does not support label arithmetic in DWARF sections.
>>     “.b32 L1 – L2” as the size of the section is not allowed, so the
>> sections sizes should be calculated explicitly.
> MachO has a similar restriction, this should not be a problem if you
> can do something like:
>     L3 = L1 - L2
>     .b32 L3
Nope, it is not supported>> d) Debug info must point to the sections, not to labels inside these
>> sections.
>>     “.b32 .debug_abbrevs”
> Offhand for DWARF-2 I can't think of a reference that couldn't be
done
> this way.
>
>> e) Sections itself must be enclosed into braces
>>     “.section .debug_info {…}”
>> f) Frame info is non-register based
>>     Based on function local “__local_depot” array, that represents the
>> stack frame.
>> g) All variables must have non-standard DW_AT_address_class attribute
>> so the debuger had the info about address class of the variable -
>> global or local. DWARF standard does support this attribute, but it can
>> be appiled to pointer/reference types only, not variables.
> For variables it would be more usual to use DW_AT_segment for this.
> But that's an agreement that the compiler and debugger need to reach.
>
>> h) The first label in the function must follow the debug location
macro.
>> In LLVM, it is followed by the debug location macro.
> I am not 100% sure what you mean by this, but I think it has to do with
> the fact that LLVM attaches locations to instructions, not labels.  It
> might or might not be easy to work around this; there might be an
> unfortunate interaction with how emitting line-0 records works.
>
>> i) .debug_frame section is emitted by txas compiler.
>>     DW_AT_frame_base must be set to dwarf::DW_FORM_data1
>> dwarf::DW_OP_call_frame_cfa value.
> I doubt that's a problem.
>
>> j) Strings cannot be referenced by the labels, instead they must be
>> inlined in the sections in form of array of chars.
> LLVM used to do inline strings, but switched to the .debug_str section
> quite a while ago.  On the other hand, I spent a little time maybe a
> year ago looking into whether we could emit short strings inline as a
> space-saving measure, and decided it was feasible.  (I didn't do it 
> because the space savings was really trivial.)  So I think doing this
> would not be terribly hard.
>
>> Some changes in LLVM are required to support all these
>> limitation/features in the output PTX files.
>> Required changes in LLVM.
>> =================>> •include/llvm/CodeGen/AsmPrinter.h.
>>     •Add “virtual MCSymbol *getFunctionFrameSymbol(const
>> MachineFunction *MF) const” for non-register-based frame info.
>>     •Override “NVPTXMCAsmPrinter.cpp” to return the name of the
>> “__local_depot” frame storage.
>> •Add ”cuda-gdb” specific tuning.
> Note that our design philosophy for "tuning" is that a tuning
option
> unpacks into other separate flags.  Not a problem, just an observation.
>
>>     •Inlined strings must be used in sections, not string references.
>>     •Label arithmetic is replaced by the absolute section size
>> evaluation.
> This one isn't a debug-info tuning decision, it's how your
assembler works
> and so is a target decision.
>
>>     •Use “AsmPrinter::doInitialization()” instead of NVPTX-custom
manual
>> initialization.
>>     •Local variables address emitted as “__local_depot” + <var
offset>.
>> •Add NVPTX specific “NVPTXMCAsmStreamer” class.
>>     •Requires moving to includes of “MCAsmStreamer” class declaration.
>>     •Overrides emission of the labels (names of the section are emitted
>> instead).
>>     •Overrides emission of the sections (emit braces)
>>     •Overrides string emission (as sequence of bytes, not as strings)
>>     •Overrides emission of files/locations debug info
>> Required changes in Clang.
>> ================>> •Add option “-gcuda-gdb” to driver.
>>     •Emit cuda-gdb compatible debug info (DWARF-2 by default + CudaGDB
>> tuning).
>> •Add options “-g --dont-merge-basicblocks --return-at-end” to “ptxas”
>> call.
>>     •ptxas is able to translate debug information only if -O0
>> optimization level is used. It means, that we can use optimization
>> level in LLVM > O0, but still have to use O0 when calling ptxas
>> compiler.
>>
>> This approach was implemented in https://github.com/clang-ykt to
support
>> debug info emission for NVPTX target when generating code for OpenMP
>> offloading constructs. You can try to use it.
> I haven't looked at your code but all the things you describe seem
> reasonably feasible.  Certainly the details of what you want to do
> to the emitted DWARF are fine; I am less sure about the assembler
> details, but if you have a worked example that makes it likely that
> part is okay as well.
> --paulr
>

Justin Lebar via llvm-dev

2017-Nov-06 23:27 UTC

head link

[llvm-dev] RFC: Debug info for Cuda

(Reply-all'ing this time.)

This all seems pretty reasonable to me, although I guess as we say, the
proof is in the patches.  :)

On Mon, Nov 6, 2017 at 10:43 AM Alexey Bataev via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi everybody,
>
> As you know, Cuda/NVPTX target has very limited support of the debug info
> in Clang/LLVM. Currently, LLVM supports only emission of the line numbers
> debug info.
>
> This is caused by limitations of the Cuda/NVPTX codegen. Clang/LLVM
> translates the source code to LLVM IR, which is then lowered to PTX
> (parallel thread execution) intermediate file. This PTX file represents
> special kind of the assembler code in text format, which contains the code
> itself + (possibly) debug info. Then this PTX file is compiled by ptxas
> tool into the CUDA binary representation.
>
>
> Debug info representation in PTX file.
> =======================>
> According to PTX Writer's Guide to Interoperability, Debug information
(
>
http://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#debug-information)
> , debug information must be encoded in DWARF (Debug With Arbitrary Record
> Format). The responsibility for generating debug information is split
> between the PTX producer and the PTX-to-SASS backend. The PTX producer is
> responsible for emitting binary DWARF into the PTX file, using the .section
> and .b8-.b16-.b32-and-.b64 directives in PTX. This should contain the
> .debug_info and .debug_abbrev sections, and possibly optional sections
> .debug_pubnames and .debug_aranges. These sections are standard DWARF2
> sections that refer to labels and registers in the PTX.
>
> The PTX-to-SASS backend is responsible for generating the .debug_line
> section from the .file and .loc directives in the PTX file. This section
> maps source lines to SASS addresses. The PTX-to-SASS backend also generates
> the .debug_frame section.
>
> LLVM is able to emit debug info in DWARF. But ptxas compiler has some
> limitations, that make it hard to adapt LLVM for correct emission of the
> debug info in PTX files.
>
>
> Limitations/features of the PTX format/ptxas compiler.
> =================================>
> a) Supports DWARF-2 only.
> b) Labels are allowed only in code section (only in functions).
> c) Does not support label arithmetic in DWARF sections.
>     “.b32 L1 – L2” as the size of the section is not allowed, so the
> sections sizes should be calculated explicitly.
> d) Debug info must point to the sections, not to labels inside these
> sections.
>     “.b32 .debug_abbrevs”
> e) Sections itself must be enclosed into braces
>     “.section .debug_info {…}”
> f) Frame info is non-register based
>     Based on function local “__local_depot” array, that represents the
> stack frame.
> g) All variables must have non-standard DW_AT_address_class attribute so
> the debuger had the info about address class of the variable - global or
> local. DWARF standard does support this attribute, but it can be appiled to
> pointer/reference types only, not variables.
> h) The first label in the function must follow the debug location macro.
> In LLVM, it is followed by the debug location macro.
> i) .debug_frame section is emitted by txas compiler.
>     DW_AT_frame_base must be set to dwarf::DW_FORM_data1
> dwarf::DW_OP_call_frame_cfa value.
> j) Strings cannot be referenced by the labels, instead they must be
> inlined in the sections in form of array of chars.
>
> Some changes in LLVM are required to support all these limitation/features
> in the output PTX files.
>
> Required changes in LLVM.
> =================>
> •include/llvm/CodeGen/AsmPrinter.h.
>     •Add “virtual MCSymbol *getFunctionFrameSymbol(const MachineFunction
> *MF) const” for non-register-based frame info.
>     •Override “NVPTXMCAsmPrinter.cpp” to return the name of the
> “__local_depot” frame storage.
> •Add ”cuda-gdb” specific tuning.
>     •Inlined strings must be used in sections, not string references.
>     •Label arithmetic is replaced by the absolute section size evaluation.
>     •Use “AsmPrinter::doInitialization()” instead of NVPTX-custom manual
> initialization.
>     •Local variables address emitted as “__local_depot” + <var
offset>.
> •Add NVPTX specific “NVPTXMCAsmStreamer” class.
>     •Requires moving to includes of “MCAsmStreamer” class declaration.
>     •Overrides emission of the labels (names of the section are emitted
> instead).
>     •Overrides emission of the sections (emit braces)
>     •Overrides string emission (as sequence of bytes, not as strings)
>     •Overrides emission of files/locations debug info
>
> Required changes in Clang.
> ================>
> •Add option “-gcuda-gdb” to driver.
>     •Emit cuda-gdb compatible debug info (DWARF-2 by default + CudaGDB
> tuning).
> •Add options “-g --dont-merge-basicblocks --return-at-end” to “ptxas” call.
>     •ptxas is able to translate debug information only if -O0 optimization
> level is used. It means, that we can use optimization level in LLVM >
O0,
> but still have to use O0 when calling ptxas compiler.
>
>
> This approach was implemented in https://github.com/clang-ykt to support
> debug info emission for NVPTX target when generating code for OpenMP
> offloading constructs. You can try to use it.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171106/f6a38a94/attachment-0001.html>

Hal Finkel via llvm-dev

2017-Nov-07 16:04 UTC

head link

[llvm-dev] RFC: Debug info for Cuda

Hi, Alexey,

Thanks a bunch for working on this. A couple quick questions below...

On 11/06/2017 12:37 PM, Alexey Bataev wrote:>
> Hi everybody,
>
> As you know, Cuda/NVPTX target has very limited support of the debug 
> info in Clang/LLVM. Currently, LLVM supports only emission of the line 
> numbers debug info.
>
> This is caused by limitations of the Cuda/NVPTX codegen. Clang/LLVM 
> translates the source code to LLVM IR, which is then lowered to PTX 
> (parallel thread execution) intermediate file. This PTX file 
> represents special kind of the assembler code in text format, which 
> contains the code itself + (possibly) debug info. Then this PTX file 
> is compiled by ptxas tool into the CUDA binary representation.
>
>
> Debug info representation in PTX file.
> =======================>
> According to PTX Writer's Guide to Interoperability, Debug information 
>
(http://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#debug-information)
> , debug information must be encoded in DWARF (Debug With Arbitrary 
> Record Format). The responsibility for generating debug information is 
> split between the PTX producer and the PTX-to-SASS backend. The PTX 
> producer is responsible for emitting binary DWARF into the PTX file, 
> using the .section and .b8-.b16-.b32-and-.b64 directives in PTX. This 
> should contain the .debug_info and .debug_abbrev sections, and 
> possibly optional sections .debug_pubnames and .debug_aranges. These 
> sections are standard DWARF2 sections that refer to labels and 
> registers in the PTX.
>
> The PTX-to-SASS backend is responsible for generating the .debug_line 
> section from the .file and .loc directives in the PTX file. This 
> section maps source lines to SASS addresses. The PTX-to-SASS backend 
> also generates the .debug_frame section.
>
> LLVM is able to emit debug info in DWARF. But ptxas compiler has some 
> limitations, that make it hard to adapt LLVM for correct emission of 
> the debug info in PTX files.
>
>
> Limitations/features of the PTX format/ptxas compiler.
> =================================>
> a) Supports DWARF-2 only.
> b) Labels are allowed only in code section (only in functions).
> c) Does not support label arithmetic in DWARF sections.
>     “.b32 L1 – L2” as the size of the section is not allowed, so the 
> sections sizes should be calculated explicitly.
> d) Debug info must point to the sections, not to labels inside these 
> sections.
>     “.b32 .debug_abbrevs”
> e) Sections itself must be enclosed into braces
>     “.section .debug_info {…}”
> f) Frame info is non-register based
>     Based on function local “__local_depot” array, that represents the 
> stack frame.
> g) All variables must have non-standard DW_AT_address_class attribute 
> so the debuger had the info about address class of the variable - 
> global or local. DWARF standard does support this attribute, but it 
> can be appiled to pointer/reference types only, not variables.
> h) The first label in the function must follow the debug location 
> macro. In LLVM, it is followed by the debug location macro.
> i) .debug_frame section is emitted by txas compiler.
>     DW_AT_frame_base must be set to dwarf::DW_FORM_data1 
> dwarf::DW_OP_call_frame_cfa value.
> j) Strings cannot be referenced by the labels, instead they must be 
> inlined in the sections in form of array of chars.
>
> Some changes in LLVM are required to support all these 
> limitation/features in the output PTX files.
>
> Required changes in LLVM.
> =================>
> •include/llvm/CodeGen/AsmPrinter.h.
>     •Add “virtual MCSymbol *getFunctionFrameSymbol(const 
> MachineFunction *MF) const” for non-register-based frame info.
>     •Override “NVPTXMCAsmPrinter.cpp” to return the name of the 
> “__local_depot” frame storage.
> •Add ”cuda-gdb” specific tuning.
>     •Inlined strings must be used in sections, not string references.
>     •Label arithmetic is replaced by the absolute section size evaluation.
>     •Use “AsmPrinter::doInitialization()” instead of NVPTX-custom 
> manual initialization.
>     •Local variables address emitted as “__local_depot” + <var
offset>.
> •Add NVPTX specific “NVPTXMCAsmStreamer” class.
>     •Requires moving to includes of “MCAsmStreamer” class declaration.
>     •Overrides emission of the labels (names of the section are 
> emitted instead).
>     •Overrides emission of the sections (emit braces)
>     •Overrides string emission (as sequence of bytes, not as strings)
>     •Overrides emission of files/locations debug info
>
> Required changes in Clang.
> ================>
> •Add option “-gcuda-gdb” to driver.
>     •Emit cuda-gdb compatible debug info (DWARF-2 by default + CudaGDB 
> tuning).
> •Add options “-g --dont-merge-basicblocks --return-at-end” to “ptxas” 
> call.
>
Is this a change? It looks like this is already the behavior of the 
driver's CUDA target code:

   if (Args.hasFlag(options::OPT_cuda_noopt_device_debug,
                    options::OPT_no_cuda_noopt_device_debug, false)) {
     // ptxas does not accept -g option if optimization is enabled, so
     // we ignore the compiler's -O* options if we want debug info.
     CmdArgs.push_back("-g");
     CmdArgs.push_back("--dont-merge-basicblocks");
     CmdArgs.push_back("--return-at-end");
   } else if (Arg *A = Args.getLastArg(options::OPT_O_Group)) {
     // Map the -O we received to -O{0,1,2,3}.
>     •ptxas is able to translate debug information only if -O0 
> optimization level is used. It means, that we can use optimization 
> level in LLVM > O0, but still have to use O0 when calling ptxas
compiler.
>
Can you clarify what "unable to translate" mean? Does it refuse to 
compile the code, drop all debug info, drop all debug info except for 
line-table information, something else?

Thanks again,
Hal
>
> This approach was implemented in https://github.com/clang-ykt to 
> support debug info emission for NVPTX target when generating code for 
> OpenMP offloading constructs. You can try to use it.
>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171107/20bc37b2/attachment-0001.html>

Alexey Bataev via llvm-dev

2017-Nov-07 16:09 UTC

head link

[llvm-dev] RFC: Debug info for Cuda

Hi Hal, see my answers below.

07.11.2017 11:04, Hal Finkel пишет:

Hi, Alexey,

Thanks a bunch for working on this. A couple quick questions below...

On 11/06/2017 12:37 PM, Alexey Bataev wrote:

Hi everybody,

As you know, Cuda/NVPTX target has very limited support of the debug info in
Clang/LLVM. Currently, LLVM supports only emission of the line numbers debug
info.

This is caused by limitations of the Cuda/NVPTX codegen. Clang/LLVM translates
the source code to LLVM IR, which is then lowered to PTX (parallel thread
execution) intermediate file. This PTX file represents special kind of the
assembler code in text format, which contains the code itself + (possibly) debug
info. Then this PTX file is compiled by ptxas tool into the CUDA binary
representation.


Debug info representation in PTX file.
=======================
According to PTX Writer's Guide to Interoperability, Debug information
(http://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#debug-information)
, debug information must be encoded in DWARF (Debug With Arbitrary Record
Format). The responsibility for generating debug information is split between
the PTX producer and the PTX-to-SASS backend. The PTX producer is responsible
for emitting binary DWARF into the PTX file, using the .section and
.b8-.b16-.b32-and-.b64 directives in PTX. This should contain the .debug_info
and .debug_abbrev sections, and possibly optional sections .debug_pubnames and
.debug_aranges. These sections are standard DWARF2 sections that refer to labels
and registers in the PTX.

The PTX-to-SASS backend is responsible for generating the .debug_line section
from the .file and .loc directives in the PTX file. This section maps source
lines to SASS addresses. The PTX-to-SASS backend also generates the .debug_frame
section.

LLVM is able to emit debug info in DWARF. But ptxas compiler has some
limitations, that make it hard to adapt LLVM for correct emission of the debug
info in PTX files.


Limitations/features of the PTX format/ptxas compiler.
=================================
a) Supports DWARF-2 only.
b) Labels are allowed only in code section (only in functions).
c) Does not support label arithmetic in DWARF sections.
    “.b32 L1 – L2” as the size of the section is not allowed, so the sections
sizes should be calculated explicitly.
d) Debug info must point to the sections, not to labels inside these sections.
    “.b32 .debug_abbrevs”
e) Sections itself must be enclosed into braces
    “.section .debug_info {…}”
f) Frame info is non-register based
    Based on function local “__local_depot” array, that represents the stack
frame.
g) All variables must have non-standard DW_AT_address_class attribute so the
debuger had the info about address class of the variable - global or local.
DWARF standard does support this attribute, but it can be appiled to
pointer/reference types only, not variables.
h) The first label in the function must follow the debug location macro. In
LLVM, it is followed by the debug location macro.
i) .debug_frame section is emitted by txas compiler.
    DW_AT_frame_base must be set to dwarf::DW_FORM_data1
dwarf::DW_OP_call_frame_cfa value.
j) Strings cannot be referenced by the labels, instead they must be inlined in
the sections in form of array of chars.

Some changes in LLVM are required to support all these limitation/features in
the output PTX files.

Required changes in LLVM.
=================
•include/llvm/CodeGen/AsmPrinter.h.
    •Add “virtual MCSymbol *getFunctionFrameSymbol(const MachineFunction *MF)
const” for non-register-based frame info.
    •Override “NVPTXMCAsmPrinter.cpp” to return the name of the “__local_depot”
frame storage.
•Add ”cuda-gdb” specific tuning.
    •Inlined strings must be used in sections, not string references.
    •Label arithmetic is replaced by the absolute section size evaluation.
    •Use “AsmPrinter::doInitialization()” instead of NVPTX-custom manual
initialization.
    •Local variables address emitted as “__local_depot” + <var offset>.
•Add NVPTX specific “NVPTXMCAsmStreamer” class.
    •Requires moving to includes of “MCAsmStreamer” class declaration.
    •Overrides emission of the labels (names of the section are emitted
instead).
    •Overrides emission of the sections (emit braces)
    •Overrides string emission (as sequence of bytes, not as strings)
    •Overrides emission of files/locations debug info

Required changes in Clang.
================
•Add option “-gcuda-gdb” to driver.
    •Emit cuda-gdb compatible debug info (DWARF-2 by default + CudaGDB tuning).
•Add options “-g --dont-merge-basicblocks --return-at-end” to “ptxas” call.

Is this a change? It looks like this is already the behavior of the driver's
CUDA target code:

  if (Args.hasFlag(options::OPT_cuda_noopt_device_debug,
                   options::OPT_no_cuda_noopt_device_debug, false)) {
    // ptxas does not accept -g option if optimization is enabled, so
    // we ignore the compiler's -O* options if we want debug info.
    CmdArgs.push_back("-g");
    CmdArgs.push_back("--dont-merge-basicblocks");
    CmdArgs.push_back("--return-at-end");
  } else if (Arg *A = Args.getLastArg(options::OPT_O_Group)) {
    // Map the -O we received to -O{0,1,2,3}.

Yes, but we need add the same behavior for OpenMP.

    •ptxas is able to translate debug information only if -O0 optimization level
is used. It means, that we can use optimization level in LLVM > O0, but still
have to use O0 when calling ptxas compiler.

Can you clarify what "unable to translate" mean? Does it refuse to
compile the code, drop all debug info, drop all debug info except for line-table
information, something else?

It refuses to compile the code and emits error message.

Thanks again,
Hal



This approach was implemented in https://github.com/clang-ykt to support debug
info emission for NVPTX target when generating code for OpenMP offloading
constructs. You can try to use it.


--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171107/09021ba0/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Nov 2017 - RFC: Debug info for Cuda

[llvm-dev] RFC: Debug info for Cuda

[llvm-dev] Debug info for Cuda

[llvm-dev] Debug info for Cuda

[llvm-dev] RFC: Debug info for Cuda

[llvm-dev] RFC: Debug info for Cuda

[llvm-dev] RFC: Debug info for Cuda

Possibly Parallel Threads