On Oct 7, 2012, at 3:14 AM, Renato Golin <rengolin at systemcall.org> wrote:> On 5 October 2012 17:48, Jim Grosbach <grosbach at apple.com> wrote: >> The recent MachO data-in-code support should have fixed a lot of the problems. There's probably still some quirks in the specifics ($a vs. $t and making sure the symbols get into the ELF properly), but the core functionality to know how to mark data regions is there and works very well. > > Hi Jim, > > I'm trying to help Greg crack it down. From your recent commits, I > take it you're re-using a data-in-code detection previously used only > for ASM output, to object output, via the > EmitDataRegion/EmitDataRegionEnd. >It's a bit more than that. Those Emit* methods are new for this support. There was spotty support for the raw $a/$t/$d stuff before, and this abstracted and extended it to support both asm and binary emission as well as added uses for the methods to the various bits in the ARM backend where data-in-code regions get created (jump tables, constant pools, et. al.).> I haven't looked too deep in the MC, but I'm supposing that will work > automatically when the output streamer is printing object code and > meets a non-code region, so in theory, changing MCELFStreamer > accordingly (overriding those functions in there) would take care of > data vs. code issue in ELF.Yep. They'll likely be implemented as, effectively, an EmitLabel().> Assuming LLVM doesn't generate ARM/Thumb veneers inside the same > function (ie. a Thumb function has only Thumb code), Greg could use > the EmitDataRegion and EmitDataRegionEnd, with the former saving the > state of the current code (Thumb/Arm) and the latter restoring it, by > emiting the $d and $a/t respectively. > > Does it seem like a good initial approach? > > Continuing... It seems MCELFStreamer already has a EmitThumbFunc, > which looks to me as the wrong place to be.That's just the handler for the .thumb_func directive. It has nothing to do with emitting the contents of the actual function.> I'd imagine MCELFStreamer > would have EmitFunc and MCARMELCStreamer (or whatever) would identify > its type and call the appropriate EmitThumbFunc/EmitARMFunc. Being > pedantic, even that is still too high level because of the ARM/Thumb > veneers, but we don't want to worry about that if LLVM doesn't even > try to mix ARM and Thumb (and assuming external libraries would have > the symbols, if they do).This is complicated a bit by needing to work for plain .s files, not just compiler generated files. Those can intermix arm and thumb code in crazy ways. The assembler already has a thumb vs. arm mode state (which gets adjusted via the .arm/.thumb directives and the .code synonyms). ELF will want to check that state and use it to determine whether a data-region-end directive should result in a $a or a $t in the output ELF.> Generating or not, LLVM's disassembler should know about those symbols > and should be able to mark them accordingly. Where would be the best > part to put those symbols (in an enum or table), so that the > MCStreamer and the disassembler could reference a single place?It's not the disassembler itself that should know about them, but the driver for the disassembler. In this case, llvm-objdump. The disassembler doesn't have that kind of gestalt knowledge. -Jim> > -- > cheers, > --renato > > http://systemcall.org/
Thanks Jim! I have updated the bug with your comments, I think it's a good start. Greg, let me know if that's not enough, I think I can help you from now on. cheers, --renato On 9 October 2012 23:58, Jim Grosbach <grosbach at apple.com> wrote:> > On Oct 7, 2012, at 3:14 AM, Renato Golin <rengolin at systemcall.org> wrote: > >> On 5 October 2012 17:48, Jim Grosbach <grosbach at apple.com> wrote: >>> The recent MachO data-in-code support should have fixed a lot of the problems. There's probably still some quirks in the specifics ($a vs. $t and making sure the symbols get into the ELF properly), but the core functionality to know how to mark data regions is there and works very well. >> >> Hi Jim, >> >> I'm trying to help Greg crack it down. From your recent commits, I >> take it you're re-using a data-in-code detection previously used only >> for ASM output, to object output, via the >> EmitDataRegion/EmitDataRegionEnd. >> > > It's a bit more than that. Those Emit* methods are new for this support. There was spotty support for the raw $a/$t/$d stuff before, and this abstracted and extended it to support both asm and binary emission as well as added uses for the methods to the various bits in the ARM backend where data-in-code regions get created (jump tables, constant pools, et. al.). > > >> I haven't looked too deep in the MC, but I'm supposing that will work >> automatically when the output streamer is printing object code and >> meets a non-code region, so in theory, changing MCELFStreamer >> accordingly (overriding those functions in there) would take care of >> data vs. code issue in ELF. > > Yep. They'll likely be implemented as, effectively, an EmitLabel(). > >> Assuming LLVM doesn't generate ARM/Thumb veneers inside the same >> function (ie. a Thumb function has only Thumb code), Greg could use >> the EmitDataRegion and EmitDataRegionEnd, with the former saving the >> state of the current code (Thumb/Arm) and the latter restoring it, by >> emiting the $d and $a/t respectively. >> >> Does it seem like a good initial approach? >> >> Continuing... It seems MCELFStreamer already has a EmitThumbFunc, >> which looks to me as the wrong place to be. > > That's just the handler for the .thumb_func directive. It has nothing to do with emitting the contents of the actual function. > >> I'd imagine MCELFStreamer >> would have EmitFunc and MCARMELCStreamer (or whatever) would identify >> its type and call the appropriate EmitThumbFunc/EmitARMFunc. Being >> pedantic, even that is still too high level because of the ARM/Thumb >> veneers, but we don't want to worry about that if LLVM doesn't even >> try to mix ARM and Thumb (and assuming external libraries would have >> the symbols, if they do). > > This is complicated a bit by needing to work for plain .s files, not just compiler generated files. Those can intermix arm and thumb code in crazy ways. > > The assembler already has a thumb vs. arm mode state (which gets adjusted via the .arm/.thumb directives and the .code synonyms). ELF will want to check that state and use it to determine whether a data-region-end directive should result in a $a or a $t in the output ELF. > >> Generating or not, LLVM's disassembler should know about those symbols >> and should be able to mark them accordingly. Where would be the best >> part to put those symbols (in an enum or table), so that the >> MCStreamer and the disassembler could reference a single place? > > It's not the disassembler itself that should know about them, but the driver for the disassembler. In this case, llvm-objdump. The disassembler doesn't have that kind of gestalt knowledge. > > -Jim > > >> >> -- >> cheers, >> --renato >> >> http://systemcall.org/ >-- cheers, --renato http://systemcall.org/
Cool; glad to help. When I added the data region bits, I tried to keep the ARM-style annotations in mind a bit, so hopefully things will fit together without too much trouble. -Jim On Oct 10, 2012, at 12:05 PM, Renato Golin <rengolin at systemcall.org> wrote:> Thanks Jim! > > I have updated the bug with your comments, I think it's a good start. > > Greg, let me know if that's not enough, I think I can help you from now on. > > cheers, > --renato > > On 9 October 2012 23:58, Jim Grosbach <grosbach at apple.com> wrote: >> >> On Oct 7, 2012, at 3:14 AM, Renato Golin <rengolin at systemcall.org> wrote: >> >>> On 5 October 2012 17:48, Jim Grosbach <grosbach at apple.com> wrote: >>>> The recent MachO data-in-code support should have fixed a lot of the problems. There's probably still some quirks in the specifics ($a vs. $t and making sure the symbols get into the ELF properly), but the core functionality to know how to mark data regions is there and works very well. >>> >>> Hi Jim, >>> >>> I'm trying to help Greg crack it down. From your recent commits, I >>> take it you're re-using a data-in-code detection previously used only >>> for ASM output, to object output, via the >>> EmitDataRegion/EmitDataRegionEnd. >>> >> >> It's a bit more than that. Those Emit* methods are new for this support. There was spotty support for the raw $a/$t/$d stuff before, and this abstracted and extended it to support both asm and binary emission as well as added uses for the methods to the various bits in the ARM backend where data-in-code regions get created (jump tables, constant pools, et. al.). >> >> >>> I haven't looked too deep in the MC, but I'm supposing that will work >>> automatically when the output streamer is printing object code and >>> meets a non-code region, so in theory, changing MCELFStreamer >>> accordingly (overriding those functions in there) would take care of >>> data vs. code issue in ELF. >> >> Yep. They'll likely be implemented as, effectively, an EmitLabel(). >> >>> Assuming LLVM doesn't generate ARM/Thumb veneers inside the same >>> function (ie. a Thumb function has only Thumb code), Greg could use >>> the EmitDataRegion and EmitDataRegionEnd, with the former saving the >>> state of the current code (Thumb/Arm) and the latter restoring it, by >>> emiting the $d and $a/t respectively. >>> >>> Does it seem like a good initial approach? >>> >>> Continuing... It seems MCELFStreamer already has a EmitThumbFunc, >>> which looks to me as the wrong place to be. >> >> That's just the handler for the .thumb_func directive. It has nothing to do with emitting the contents of the actual function. >> >>> I'd imagine MCELFStreamer >>> would have EmitFunc and MCARMELCStreamer (or whatever) would identify >>> its type and call the appropriate EmitThumbFunc/EmitARMFunc. Being >>> pedantic, even that is still too high level because of the ARM/Thumb >>> veneers, but we don't want to worry about that if LLVM doesn't even >>> try to mix ARM and Thumb (and assuming external libraries would have >>> the symbols, if they do). >> >> This is complicated a bit by needing to work for plain .s files, not just compiler generated files. Those can intermix arm and thumb code in crazy ways. >> >> The assembler already has a thumb vs. arm mode state (which gets adjusted via the .arm/.thumb directives and the .code synonyms). ELF will want to check that state and use it to determine whether a data-region-end directive should result in a $a or a $t in the output ELF. >> >>> Generating or not, LLVM's disassembler should know about those symbols >>> and should be able to mark them accordingly. Where would be the best >>> part to put those symbols (in an enum or table), so that the >>> MCStreamer and the disassembler could reference a single place? >> >> It's not the disassembler itself that should know about them, but the driver for the disassembler. In this case, llvm-objdump. The disassembler doesn't have that kind of gestalt knowledge. >> >> -Jim >> >> >>> >>> -- >>> cheers, >>> --renato >>> >>> http://systemcall.org/ >> > > > > -- > cheers, > --renato > > http://systemcall.org/
Maybe Matching Threads
- [LLVMdev] R_ARM_ABS32 disassembly with integrated-as
- [LLVMdev] R_ARM_ABS32 disassembly with integrated-as
- [LLVMdev] R_ARM_ABS32 disassembly with integrated-as
- [LLVMdev] R_ARM_ABS32 disassembly with integrated-as
- [LLVMdev] R_ARM_ABS32 disassembly with integrated-as