There should be a line-table entry for the end of the function, which appears to be missing from the dump you provided. llvm-dwarfdump should report this address with 'end_sequence' in the Flags. Are you using a different dumper? I am not sure but my guess would be that inline data is not represented in the line table. The line table's primary purpose is to inform the debugger about good breakpoint locations, and clearly you do not want to set breakpoints in data. Inline data is probably contained within the code ranges described in the DW_TAG_subprogram, however. --paulr From: Muhui Jiang [mailto:jiangmuhui at gmail.com] Sent: Tuesday, June 26, 2018 1:44 AM To: Robinson, Paul Cc: Yajin; llvm-dev Subject: Re: [llvm-dev] Instruction boundaries Hi Paulr According to my observation. Not all the instructions are listed in the line table. For example. We have address 0xa3a0 and 0xa3a4 as the instructions .text:0000A394 CMP R1, #0x42 .text:0000A398 BHI loc_AB70 .text:0000A39C ADR R1, off_A3A8 .text:0000A3A0 LDR R0, [R1,R0,LSL#2] .text:0000A3A4 MOV PC, R0 .text:0000A3A4 ; --------------------------------------------------------------------------- .text:0000A3A8 off_A3A8 DCD loc_AB3C ; DATA XREF: main+AC↑o .text:0000A3AC DCD loc_AB34 .text:0000A3B0 DCD loc_AB70 .text:0000A3B4 DCD loc_AB70 However, inside the line table. The description is end at 0xa39c. Do you have any ideas? 196 0x000000000000a38c 956 7 1 0 0 197 0x000000000000a39c 0 7 1 0 0 198 0x000000000000a7d8 959 27 1 0 0 is_stmt 199 0x000000000000a7f8 959 25 1 0 0 200 0x000000000000a7fc 961 11 1 0 0 is_stmt 201 0x000000000000a800 964 15 1 0 0 is_stmt 202 0x000000000000a808 964 15 1 0 0 Regards Muhui 2018-06-25 23:31 GMT-04:00 Muhui Jiang <jiangmuhui at gmail.com<mailto:jiangmuhui at gmail.com>>: Hi paulr Thanks for your reply. Though DWARF info give me the code address ranges, there might be inline data. If so, how to handle this case? As for the dwarf line table. Sometimes, the source line might be zero. Do you know why? If all instructions should be describe in the line table, I think analyzing Dwarf line table is enough to get all the instructions addresses. Do you agree? I would also cc my supervisor for the discussion. Regards Muhui <paul.robinson at sony.com<mailto:paul.robinson at sony.com>>于2018年6月26日 周二上午2:38写道: The main DWARF info should provide the code address ranges for each function, as well as the starting source location. You could then use the line table to map code ranges to individual source lines. That could give you a reasonable grasp of the source range for each function. All addresses in the DWARF line table will be instruction addresses. And in fact, all instructions should be described in the line table (assuming all source was compiled with debug info). --paulr From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of Muhui Jiang via llvm-dev Sent: Monday, June 25, 2018 11:36 AM To: llvm-dev Subject: [llvm-dev] Instruction boundaries Hi I was wondering whether there are any methods to know what part of the target binary is code. I have some ideas and hope to get your comments. I would like to use LLVM's source level debugging information to extract the source lines belonging to every functions. Then use the dwarf mapping table to transfer the source level information to binary address. Are there any better methods? Besides, is the address listed inside dwarf line mapping table must be code rather than data? Regards Muhui -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180626/4e4ba276/attachment.html>
Hi Paulr I am using the llvm-dwarfdump. This is not the end of the function, thus there is no 'end_sequence' reported. Here I just want to give you an exception because you said that all the instructions should be listed inside dwarf line table. However, I found that it will lost some instructions. And some instruction's source line is zero. Besides, I also noticed that inline data is listed inside the line table. Regards Muhui 2018-06-26 9:08 GMT-04:00 <paul.robinson at sony.com>:> There should be a line-table entry for the end of the function, which > appears to be missing from the dump you provided. llvm-dwarfdump should > report this address with 'end_sequence' in the Flags. Are you using a > different dumper? > > > > I am not sure but my guess would be that inline data is not represented in > the line table. The line table's primary purpose is to inform the debugger > about good breakpoint locations, and clearly you do not want to set > breakpoints in data. Inline data is probably contained within the code > ranges described in the DW_TAG_subprogram, however. > > --paulr > > > > *From:* Muhui Jiang [mailto:jiangmuhui at gmail.com] > *Sent:* Tuesday, June 26, 2018 1:44 AM > *To:* Robinson, Paul > *Cc:* Yajin; llvm-dev > *Subject:* Re: [llvm-dev] Instruction boundaries > > > > Hi Paulr > > > > According to my observation. Not all the instructions are listed in the > line table. > > > > For example. We have address 0xa3a0 and 0xa3a4 as the instructions > > > > .text:0000A394 CMP R1, #0x42 > > .text:0000A398 BHI loc_AB70 > > .text:0000A39C ADR R1, off_A3A8 > > .text:0000A3A0 LDR R0, [R1,R0,LSL#2] > > .text:0000A3A4 MOV PC, R0 > > .text:0000A3A4 ; ------------------------------ > --------------------------------------------- > > .text:0000A3A8 off_A3A8 DCD loc_AB3C ; DATA XREF: > main+AC↑o > > .text:0000A3AC DCD loc_AB34 > > .text:0000A3B0 DCD loc_AB70 > > .text:0000A3B4 DCD loc_AB70 > > > > However, inside the line table. The description is end at 0xa39c. Do you > have any ideas? > > > > 196 0x000000000000a38c 956 7 1 0 0 > > 197 0x000000000000a39c 0 7 1 0 0 > > > 198 0x000000000000a7d8 959 27 1 0 0 is_stmt > > 199 0x000000000000a7f8 959 25 1 0 0 > > 200 0x000000000000a7fc 961 11 1 0 0 is_stmt > > 201 0x000000000000a800 964 15 1 0 0 is_stmt > > 202 0x000000000000a808 964 15 1 0 0 > > > > Regards > > Muhui > > > > > > 2018-06-25 23:31 GMT-04:00 Muhui Jiang <jiangmuhui at gmail.com>: > > Hi paulr > > > > Thanks for your reply. Though DWARF info give me the code address ranges, > there might be inline data. If so, how to handle this case? > > > > As for the dwarf line table. Sometimes, the source line might be zero. Do > you know why? If all instructions should be describe in the line table, I > think analyzing Dwarf line table is enough to get all the instructions > addresses. Do you agree? > > > > I would also cc my supervisor for the discussion. > > > > Regards > > Muhui > > > > <paul.robinson at sony.com>于2018年6月26日 周二上午2:38写道: > > The main DWARF info should provide the code address ranges for each > function, as well as the starting source location. You could then use the > line table to map code ranges to individual source lines. That could give > you a reasonable grasp of the source range for each function. > > All addresses in the DWARF line table will be instruction addresses. And > in fact, all instructions should be described in the line table (assuming > all source was compiled with debug info). > > --paulr > > > > *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Muhui > Jiang via llvm-dev > *Sent:* Monday, June 25, 2018 11:36 AM > *To:* llvm-dev > *Subject:* [llvm-dev] Instruction boundaries > > > > Hi > > > > I was wondering whether there are any methods to know what part of the > target binary is code. > > > > I have some ideas and hope to get your comments. > > > > I would like to use LLVM's source level debugging information to extract > the source lines belonging to every functions. Then use the dwarf mapping > table to transfer the source level information to binary address. Are > there any better methods? > > > > Besides, is the address listed inside dwarf line mapping table must be > code rather than data? > > > > Regards > > Muhui > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180626/3bad739e/attachment.html>
I'm not familiar with the target instruction set, but if "MOV PC, R0" is not a return instruction, I'm guessing that the sequence starting at A39C is a dispatch through a jump table. The jump table would be considered part of the instruction stream and included in the scope of the line table. This is not a case where you would see end_sequence; my mistake. The line table does not list every individual instruction. For compactness, it lists instructions only when the associated source location changes. All instructions between successive entries of the line table would have the same source location. Line 0 indicates one of two things. First, some optimizations might combine instructions from different source locations into a single new instruction. Because the source location is now ambiguous, we report line 0 rather than arbitrarily choosing one of the original source locations. This is a known deficiency in how DWARF represents source locations for instructions. It will require a change to the DWARF specification to address this problem. Second, there might not be a specific source location for some instructions. This is most common for things like code to call global initializers prior to the start of the 'main()' function, but can occur in other situations. I hope this addresses at least some of your questions. I am happy to continue this discussion if some questions remain or if you have new questions. --paulr From: Muhui Jiang [mailto:jiangmuhui at gmail.com] Sent: Tuesday, June 26, 2018 9:38 AM To: Robinson, Paul Cc: Yajin; llvm-dev Subject: Re: [llvm-dev] Instruction boundaries Hi Paulr I am using the llvm-dwarfdump. This is not the end of the function, thus there is no 'end_sequence' reported. Here I just want to give you an exception because you said that all the instructions should be listed inside dwarf line table. However, I found that it will lost some instructions. And some instruction's source line is zero. Besides, I also noticed that inline data is listed inside the line table. Regards Muhui 2018-06-26 9:08 GMT-04:00 <paul.robinson at sony.com<mailto:paul.robinson at sony.com>>: There should be a line-table entry for the end of the function, which appears to be missing from the dump you provided. llvm-dwarfdump should report this address with 'end_sequence' in the Flags. Are you using a different dumper? I am not sure but my guess would be that inline data is not represented in the line table. The line table's primary purpose is to inform the debugger about good breakpoint locations, and clearly you do not want to set breakpoints in data. Inline data is probably contained within the code ranges described in the DW_TAG_subprogram, however. --paulr From: Muhui Jiang [mailto:jiangmuhui at gmail.com<mailto:jiangmuhui at gmail.com>] Sent: Tuesday, June 26, 2018 1:44 AM To: Robinson, Paul Cc: Yajin; llvm-dev Subject: Re: [llvm-dev] Instruction boundaries Hi Paulr According to my observation. Not all the instructions are listed in the line table. For example. We have address 0xa3a0 and 0xa3a4 as the instructions .text:0000A394 CMP R1, #0x42 .text:0000A398 BHI loc_AB70 .text:0000A39C ADR R1, off_A3A8 .text:0000A3A0 LDR R0, [R1,R0,LSL#2] .text:0000A3A4 MOV PC, R0 .text:0000A3A4 ; --------------------------------------------------------------------------- .text:0000A3A8 off_A3A8 DCD loc_AB3C ; DATA XREF: main+AC↑o .text:0000A3AC DCD loc_AB34 .text:0000A3B0 DCD loc_AB70 .text:0000A3B4 DCD loc_AB70 However, inside the line table. The description is end at 0xa39c. Do you have any ideas? 196 0x000000000000a38c 956 7 1 0 0 197 0x000000000000a39c 0 7 1 0 0 198 0x000000000000a7d8 959 27 1 0 0 is_stmt 199 0x000000000000a7f8 959 25 1 0 0 200 0x000000000000a7fc 961 11 1 0 0 is_stmt 201 0x000000000000a800 964 15 1 0 0 is_stmt 202 0x000000000000a808 964 15 1 0 0 Regards Muhui 2018-06-25 23:31 GMT-04:00 Muhui Jiang <jiangmuhui at gmail.com<mailto:jiangmuhui at gmail.com>>: Hi paulr Thanks for your reply. Though DWARF info give me the code address ranges, there might be inline data. If so, how to handle this case? As for the dwarf line table. Sometimes, the source line might be zero. Do you know why? If all instructions should be describe in the line table, I think analyzing Dwarf line table is enough to get all the instructions addresses. Do you agree? I would also cc my supervisor for the discussion. Regards Muhui <paul.robinson at sony.com<mailto:paul.robinson at sony.com>>于2018年6月26日 周二上午2:38写道: The main DWARF info should provide the code address ranges for each function, as well as the starting source location. You could then use the line table to map code ranges to individual source lines. That could give you a reasonable grasp of the source range for each function. All addresses in the DWARF line table will be instruction addresses. And in fact, all instructions should be described in the line table (assuming all source was compiled with debug info). --paulr From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of Muhui Jiang via llvm-dev Sent: Monday, June 25, 2018 11:36 AM To: llvm-dev Subject: [llvm-dev] Instruction boundaries Hi I was wondering whether there are any methods to know what part of the target binary is code. I have some ideas and hope to get your comments. I would like to use LLVM's source level debugging information to extract the source lines belonging to every functions. Then use the dwarf mapping table to transfer the source level information to binary address. Are there any better methods? Besides, is the address listed inside dwarf line mapping table must be code rather than data? Regards Muhui -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180626/f520269c/attachment-0001.html>