Thanks, Shankar. I needed to override all the places where st_value had been used, and it worked. But there another problem appeared: after correcting all atoms, I cannot distinguish between ARM and Thumb symbols in the further stages when fixing up relocations. I used to check targetVAddress (in terms of the relocation handler) since it contained 1 in the least bit when addressing Thumb symbols. Now targetVAddress always contains 0 in the least bit, because atoms are properly aligned and have proper contents. I tried applying a workaround and use dyn_cast to retrieve information from overridden ARMELFDefinedAtoms, but DefinedAtoms' children do not support dyn_casts. In general, I can describe the issue as inability to pass extra information between linking stages (passes). Is there a way to do that? The solution I see is to add a sort of custom context with abstract interface passed along different stages, and directly cast it to specific implementation where needed. That's a lot of changes though, so I'd like to hear more thoughts. Regards, Denis. On 01/02/2015 09:32 PM, Shankar Easwaran wrote: You could just override symbolContentSize in the ARM Reader (remove the last bit to indicate thumb). Shankar Easwaran On 12/24/2014 3:09 AM, Denis Protivensky wrote:> Hi guys, > > I'm working on ARM architecture support for lld. > I faced the problem with ARM/Thumb symbols described below. > > ARM ELF Reference specifies that symbols addressing Thumb instructions > have zero bit of st_value field set (see 4.5.3). > General ELF Reference says that st_value holds virtual address offset > from the beginning of the section > for executable files and shared objects (see Chapter 4 - Symbol Values). > > When atoms are created in ELFFile::createAtoms, their content size and > content data, and their addresses are formed using st_value. > Since st_value has zero bit set for symbols addressing Thumb > instructions, corresponding atoms' addresses are always > one byte ahead of real values. > Content size and, therefore, content data may also be wrong for both ARM > and Thumb symbols depending on their order (see ELFFile::symbolContentSize): > when content size is calculated, it takes the difference between offsets > of two adjacent symbols, and if one of them is Thumb, and the other is not, > the resulting value will be one byte smaller or one byte larger than > expected. > Therefore, atom's content data is also malformed since it uses given > miscalculated content size value. > > Such a wrong behavior results in: > - situations when the very first instruction of an atom has the first > byte set to zero > (if there's a gap between previous atom and the current, the initial > instruction's first byte is skipped) > - situations when the very first instruction is split between two atoms > (the right atom which should hold the instruction, and the > previous one, which "stole" the very first byte of the initial instruction) > > Is there a way to override this behavior so that both ARM and Thumb atoms > formed correctly, and that I can distinguish between them in the later > stages > for proper relocation calculations? > > Regards! > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/47406058/attachment.html>
You could use the codemodel to say that the code is Thumb for thumb code. On 1/12/2015 8:22 AM, Denis Protivensky wrote:> Thanks, Shankar. > > I needed to override all the places where st_value had been used, and it worked. > > But there another problem appeared: after correcting all atoms, I cannot distinguish between ARM and Thumb symbols in the further stages when fixing up relocations. > > I used to check targetVAddress (in terms of the relocation handler) since it contained 1 in the least bit when addressing Thumb symbols. Now targetVAddress always contains 0 in the least bit, because atoms are properly aligned and have proper contents. > > I tried applying a workaround and use dyn_cast to retrieve information from overridden ARMELFDefinedAtoms, but DefinedAtoms' children do not support dyn_casts. > > In general, I can describe the issue as inability to pass extra information between linking stages (passes). > Is there a way to do that? > > The solution I see is to add a sort of custom context with abstract interface passed along different stages, and directly cast it to specific implementation where needed. That's a lot of changes though, so I'd like to hear more thoughts. > > Regards, > Denis. > > On 01/02/2015 09:32 PM, Shankar Easwaran wrote: > > You could just override symbolContentSize in the ARM Reader (remove the > last bit to indicate thumb). > > Shankar Easwaran > > On 12/24/2014 3:09 AM, Denis Protivensky wrote: >> Hi guys, >> >> I'm working on ARM architecture support for lld. >> I faced the problem with ARM/Thumb symbols described below. >> >> ARM ELF Reference specifies that symbols addressing Thumb instructions >> have zero bit of st_value field set (see 4.5.3). >> General ELF Reference says that st_value holds virtual address offset >> from the beginning of the section >> for executable files and shared objects (see Chapter 4 - Symbol Values). >> >> When atoms are created in ELFFile::createAtoms, their content size and >> content data, and their addresses are formed using st_value. >> Since st_value has zero bit set for symbols addressing Thumb >> instructions, corresponding atoms' addresses are always >> one byte ahead of real values. >> Content size and, therefore, content data may also be wrong for both ARM >> and Thumb symbols depending on their order (see ELFFile::symbolContentSize): >> when content size is calculated, it takes the difference between offsets >> of two adjacent symbols, and if one of them is Thumb, and the other is not, >> the resulting value will be one byte smaller or one byte larger than >> expected. >> Therefore, atom's content data is also malformed since it uses given >> miscalculated content size value. >> >> Such a wrong behavior results in: >> - situations when the very first instruction of an atom has the first >> byte set to zero >> (if there's a gap between previous atom and the current, the initial >> instruction's first byte is skipped) >> - situations when the very first instruction is split between two atoms >> (the right atom which should hold the instruction, and the >> previous one, which "stole" the very first byte of the initial instruction) >> >> Is there a way to override this behavior so that both ARM and Thumb atoms >> formed correctly, and that I can distinguish between them in the later >> stages >> for proper relocation calculations? >> >> Regards! >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation > > >-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation
Shankar, thank you again. Now I see how this trick is done in MIPS, so I'll use it as a reference. - Denis. On 01/12/2015 07:08 PM, Shankar Easwaran wrote: You could use the codemodel to say that the code is Thumb for thumb code. On 1/12/2015 8:22 AM, Denis Protivensky wrote:> Thanks, Shankar. > > I needed to override all the places where st_value had been used, and it worked. > > But there another problem appeared: after correcting all atoms, I cannot distinguish between ARM and Thumb symbols in the further stages when fixing up relocations. > > I used to check targetVAddress (in terms of the relocation handler) since it contained 1 in the least bit when addressing Thumb symbols. Now targetVAddress always contains 0 in the least bit, because atoms are properly aligned and have proper contents. > > I tried applying a workaround and use dyn_cast to retrieve information from overridden ARMELFDefinedAtoms, but DefinedAtoms' children do not support dyn_casts. > > In general, I can describe the issue as inability to pass extra information between linking stages (passes). > Is there a way to do that? > > The solution I see is to add a sort of custom context with abstract interface passed along different stages, and directly cast it to specific implementation where needed. That's a lot of changes though, so I'd like to hear more thoughts. > > Regards, > Denis. > > On 01/02/2015 09:32 PM, Shankar Easwaran wrote: > > You could just override symbolContentSize in the ARM Reader (remove the > last bit to indicate thumb). > > Shankar Easwaran > > On 12/24/2014 3:09 AM, Denis Protivensky wrote: >> Hi guys, >> >> I'm working on ARM architecture support for lld. >> I faced the problem with ARM/Thumb symbols described below. >> >> ARM ELF Reference specifies that symbols addressing Thumb instructions >> have zero bit of st_value field set (see 4.5.3). >> General ELF Reference says that st_value holds virtual address offset >> from the beginning of the section >> for executable files and shared objects (see Chapter 4 - Symbol Values). >> >> When atoms are created in ELFFile::createAtoms, their content size and >> content data, and their addresses are formed using st_value. >> Since st_value has zero bit set for symbols addressing Thumb >> instructions, corresponding atoms' addresses are always >> one byte ahead of real values. >> Content size and, therefore, content data may also be wrong for both ARM >> and Thumb symbols depending on their order (see ELFFile::symbolContentSize): >> when content size is calculated, it takes the difference between offsets >> of two adjacent symbols, and if one of them is Thumb, and the other is not, >> the resulting value will be one byte smaller or one byte larger than >> expected. >> Therefore, atom's content data is also malformed since it uses given >> miscalculated content size value. >> >> Such a wrong behavior results in: >> - situations when the very first instruction of an atom has the first >> byte set to zero >> (if there's a gap between previous atom and the current, the initial >> instruction's first byte is skipped) >> - situations when the very first instruction is split between two atoms >> (the right atom which should hold the instruction, and the >> previous one, which "stole" the very first byte of the initial instruction) >> >> Is there a way to override this behavior so that both ARM and Thumb atoms >> formed correctly, and that I can distinguish between them in the later >> stages >> for proper relocation calculations? >> >> Regards! >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu><mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation > > >-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/26f29525/attachment.html>
Seemingly Similar Threads
- [LLVMdev] [lld] Relocation reading refactoring
- [LLVMdev] [lld] How do I prevent .note sections from being eliminated?
- [LLVMdev] [lld] Wrong references for C++ COMDAT groups
- [LLVMdev] [lld] Relocation reading refactoring
- [LLVMdev] [lld] Undefined symbols postprocessing