Luke Drummond via llvm-dev
2020-Mar-10 15:17 UTC
[llvm-dev] DWARF .debug_aranges data objects and address spaces
Hello I've been looking at a debuginfo issue on an out-of-tree target which uses DWARF aranges. The problem is that aranges are generated for both data and code objects, and the debugger gets confused when program addresses overlap data addresses. The target is a Harvard Architecture CPU, so the appearance of overlapping address ranges is not in itself a bug as they reside in different address spaces. During my investigations, I found that: - gcc appears to never generate an entry in the `.debug_aranges` table for data objects. I did a cursory read over gcc's source and history and it is my understanding that aranges are deliberately only emitted for text and cold text sections[1]. - However, the DWARF v5 specification[2] for `.debug_aranges` does not suggest that aranges should only be for text address and the wording strongly suggests that their use is general: 6.1.2: > This header is followed by a variable number of address range descriptors. > Each descriptor is a triple consisting of a segment selector, the > beginning address within that segment of a range of text or data covered > by some entry owned by the corresponding compilation unit, followed by the > non-zero length of that range As such llvm is doing nothing generally wrong by emitting aranges for data objects. - llvm unconditionally sets the `.debug_aranges.segment_selector_size` to zero[3]. GCC does this too. I think this is a bug if the target can have overlapping ranges due to multiple code/data address spaces as in my case of a Harvard machine. As far as I can tell, the only upstream backend that is of a similar configuration is AVR. I can reproduce the same `.debug_aranges` table as my target with the following simple example: $ clang -target avr -mmcu=attiny104 -S -o - -g -gdwarf-aranges -xc - <<'EOF' char char_array[16383] = {0}; int main() { return char_array[0]; } EOF # ... .section .debug_aranges,"", at progbits .long 20 ; Length of ARange Set .short 2 ; DWARF Arange version number .long .Lcu_begin0 ; Offset Into Debug Info Section .byte 2 ; Address Size (in bytes) .byte 0 ; Segment Size (in bytes) .short my_array .short .Lsec_end0-my_array .short .Lfunc_begin0 .short .Lsec_end1-.Lfunc_begin0 .short 0 ; ARange terminator ...but I cannot see documentation anywhere on what a consumer is expected to do with such information, and how *in general* multiple address spaces are expected to work for llvm and gcc when generating DWARF aranges when there is no segment selector in the tuple. A cursory grep of lldb shows that the segment size is set from the `.debug_aranges` header, but never checked. If it *is* nonzero, lldb will silently read incorrect data and possibly crash. I have provided a patch on the lldb mailing list[5]. My patch brings lldb in-line with gdb which throws an error in case of a nonzero segment selector size[6]. My question is: Should LLVM have some logic to emit `segment_selector_size != 0` for targets without a flat address space? Alternative formation: do we need to limit the emission of arange info for only code objects 1) only in non-flat address-space case or 2) for all targets unconditionally? My intuition is that we should limit emission of aranges to objects in the main text section. Neither GDB nor LLDB handle aranges for targets without flat address spaces, and significant work might be needed in downstream DWARF consumers. The usefulness of address ranges for data objects is not something obvious to me as the uses of this section in DWARF consumers seeems to mostly be PC-lookup. Any insight would be appreciated. I can likely provide patches if we conclude that changes are needed in LLVM. All the Best Luke [1] GCC only emits aranges for text: https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=gcc/dwarf2out.c;h=bb45279ea56d36621f14b0a68f4f0f0be3bf4e97;hb=HEAD#l11637 [2] DWARF Debugging Information Format Version 5; 6.1. http://dwarfstd.org/Dwarf5Std.php [3] LLVM segment selector size is always zero: https://github.com/llvm/llvm-project/blob/e71fb46a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L2749 [4] GCC segment selector size is always zero: https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=gcc/dwarf2out.c;h=bb45279ea56d36621f14b0a68f4f0f0be3bf4e97;hb=HEAD#l11624 [5] lldb patch to gracefully error on nonzero segment selector size: https://reviews.llvm.org/D75925 [6] GDB implementation of [5]: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/dwarf2/read.c;h=1d4397dfabc72004eaa64013e47033e0ebdfe213;hb=HEAD#l2779 -- Codeplay Software Ltd. Company registered in England and Wales, number: 04567874 Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF
David Blaikie via llvm-dev
2020-Mar-10 19:45 UTC
[llvm-dev] DWARF .debug_aranges data objects and address spaces
If you only want code addresses, why not use the CU's low_pc/high_pc/ranges - those are guaranteed to be only code addresses, I think? On Tue, Mar 10, 2020 at 8:18 AM Luke Drummond via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hello > > I've been looking at a debuginfo issue on an out-of-tree target which uses > DWARF aranges. > > The problem is that aranges are generated for both data and code objects, > and > the debugger gets confused when program addresses overlap data addresses. > The > target is a Harvard Architecture CPU, so the appearance of overlapping > address > ranges is not in itself a bug as they reside in different address spaces. > > During my investigations, I found that: > > - gcc appears to never generate an entry in the `.debug_aranges` table > for > data objects. I did a cursory read over gcc's source and history and > it is > my understanding that aranges are deliberately only emitted for text > and > cold text sections[1]. > - However, the DWARF v5 specification[2] for `.debug_aranges` does not > suggest > that aranges should only be for text address and the wording > strongly suggests that their use is general: > > 6.1.2: > > This header is followed by a variable number of address range > descriptors. > > Each descriptor is a triple consisting of a segment selector, > the > > beginning address within that segment of a range of text or > data covered > > by some entry owned by the corresponding compilation unit, > followed by the > > non-zero length of that range > > As such llvm is doing nothing generally wrong by emitting aranges > for data > objects. > > - llvm unconditionally sets the `.debug_aranges.segment_selector_size` > to > zero[3]. GCC does this too. I think this is a bug if the target can > have > overlapping ranges due to multiple code/data address spaces as in my > case > of a Harvard machine. > > As far as I can tell, the only upstream backend that is of a similar > configuration is AVR. I can reproduce the same `.debug_aranges` table as my > target with the following simple example: > > $ clang -target avr -mmcu=attiny104 -S -o - -g -gdwarf-aranges -xc - > <<'EOF' > char char_array[16383] = {0}; > int main() { > return char_array[0]; > } > EOF > # ... > .section .debug_aranges,"", at progbits > .long 20 ; Length of ARange Set > .short 2 ; DWARF Arange version number > .long .Lcu_begin0 ; Offset Into Debug Info Section > .byte 2 ; Address Size (in bytes) > .byte 0 ; Segment Size (in bytes) > .short my_array > .short .Lsec_end0-my_array > .short .Lfunc_begin0 > .short .Lsec_end1-.Lfunc_begin0 > .short 0 ; ARange terminator > > ...but I cannot see documentation anywhere on what a consumer is expected > to do > with such information, and how *in general* multiple address spaces are > expected > to work for llvm and gcc when generating DWARF aranges when there is no > segment > selector in the tuple. > > A cursory grep of lldb shows that the segment size is set from the > `.debug_aranges` header, but never checked. If it *is* nonzero, lldb will > silently > read incorrect data and possibly crash. I have provided a patch on the lldb > mailing list[5]. My patch brings lldb in-line with gdb which throws an > error in > case of a nonzero segment selector size[6]. > > My question is: Should LLVM have some logic to emit `segment_selector_size > != 0` > for targets without a flat address space? Alternative formation: do we > need to > limit the emission of arange info for only code objects 1) only in non-flat > address-space case or 2) for all targets unconditionally? > > My intuition is that we should limit emission of aranges to objects in the > main > text section. Neither GDB nor LLDB handle aranges for targets without flat > address spaces, and significant work might be needed in downstream DWARF > consumers. The usefulness of address ranges for data objects is not > something obvious to me as the uses of this section in DWARF consumers > seeems to mostly be PC-lookup. > > Any insight would be appreciated. I can likely provide patches if we > conclude > that changes are needed in LLVM. > > All the Best > > Luke > > [1] GCC only emits aranges for text: > > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=gcc/dwarf2out.c;h=bb45279ea56d36621f14b0a68f4f0f0be3bf4e97;hb=HEAD#l11637 > [2] DWARF Debugging Information Format Version 5; 6.1. > http://dwarfstd.org/Dwarf5Std.php > [3] LLVM segment selector size is always zero: > https://github.com/llvm/llvm-project/blob/e71fb46a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L2749 > [4] GCC segment selector size is always zero: > > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=gcc/dwarf2out.c;h=bb45279ea56d36621f14b0a68f4f0f0be3bf4e97;hb=HEAD#l11624 > [5] lldb patch to gracefully error on nonzero segment selector size: > https://reviews.llvm.org/D75925 > [6] GDB implementation of [5]: > > https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/dwarf2/read.c;h=1d4397dfabc72004eaa64013e47033e0ebdfe213;hb=HEAD#l2779 > > -- > Codeplay Software Ltd. > Company registered in England and Wales, number: 04567874 > Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200310/4d2373f4/attachment.html>
Luke Drummond via llvm-dev
2020-Mar-11 15:09 UTC
[llvm-dev] DWARF .debug_aranges data objects and address spaces
On Tue Mar 10, 2020 at 7:45 PM, David Blaikie wrote:> If you only want code addresses, why not use the CU's > low_pc/high_pc/ranges > - those are guaranteed to be only code addresses, I think? >In the common case, for most targets LLVM supports I think you're right, but for my case, regrettably, not. Because my target is a Harvard Architecture, any code address can have the same ordinal value as any data address: the code and data reside on different buses so the whole 4GiB space is available to both code, and data. `DW_AT_low_pc` and `DW_AT_high_pc` can be used to find the range of the code segment, but given an arbitrary address, cannot be used to conclusively determine whether that address belongs to code or data when both segments contain addresses in that numeric range. All the Best Luke -- Codeplay Software Ltd. Company registered in England and Wales, number: 04567874 Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF