Hello everybody, I would like some insights on the semantics of the X86_64_RELOC_UNSIGNED relocation type. When r_extern=1, the semantics seem pretty clear: Let x be a pointer to r_offset of appropriate size given by r_size, then *x += addr_of_symbol(r_symbolnum) However, when r_extern=0 the correct behavior is not clear. By analogy with the above, I would have expected *x += addr_of_section(r_symbolnum) but what LLVM implements is different. In RTDyld it implements *x = (*x-addr_of_section(r_symbolnum)) + addr_of_section(r_symbolnum) or equivalently *x = *x i.e. a noop. This works because llvm codegen also emits the absolute value of the address. I am unsure what is intended and would appreciate some clarification. A couple of points to consider: 1. I checked ld64 and as far as I can tell it doesn't consider non-external X86_64_RELOC_UNSIGNED but does *x += addr_of_symbol(r_symbolnum) regardless. That seems like a bug in ld64 to me because other relocations in the same switch statement do check r_extern. 2. I implemented *x += addr_of_section(r_symbolnum) in LLVM and all tests pass just fine 3. If the current implementation is correct r_symbolnum (and potentially the entire relocation) basically meaningless, which could of course be correct, but which is what originally caused me to look at this. If so I'd appreciate an explanation as to why we need to have the relocation in the first place. That's all I could find on the subject. I hope somebody else knows more than I. Thanks, Keno -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140608/d966206f/attachment.html>
Let me correct that to noting that *x = (*x-addr_of_section(r_symbolnum)) + addr_of_section(r_symbolnum) actually makes sense to do when you're moving the section (so it's *x = *x, sorry for the confusion). It still strikes me as inconsistent with the first definition though, but maybe that's ok? On Sun, Jun 8, 2014 at 11:59 PM, Keno Fischer <kfischer at college.harvard.edu> wrote:> Hello everybody, > > I would like some insights on the semantics of the X86_64_RELOC_UNSIGNED > relocation type. When r_extern=1, the semantics seem pretty clear: > > Let x be a pointer to r_offset of appropriate size given by r_size, then > *x += addr_of_symbol(r_symbolnum) > > However, when r_extern=0 the correct behavior is not clear. By analogy > with the above, I would have expected > > *x += addr_of_section(r_symbolnum) > > but what LLVM implements is different. In RTDyld it implements > > *x = (*x-addr_of_section(r_symbolnum)) + addr_of_section(r_symbolnum) > > or equivalently > > *x = *x > > i.e. a noop. This works because llvm codegen also emits the absolute value > of the address. I am unsure what is intended and would appreciate some > clarification. A couple of points to consider: > > 1. I checked ld64 and as far as I can tell it doesn't consider > non-external X86_64_RELOC_UNSIGNED but does *x +> addr_of_symbol(r_symbolnum) regardless. That seems like a bug in ld64 to me > because other relocations in the same switch statement do check r_extern. > > 2. I implemented *x += addr_of_section(r_symbolnum) in LLVM and all tests > pass just fine > > 3. If the current implementation is correct r_symbolnum (and potentially > the entire relocation) basically meaningless, which could of course be > correct, but which is what originally caused me to look at this. If so I'd > appreciate an explanation as to why we need to have the relocation in the > first place. > > That's all I could find on the subject. I hope somebody else knows more > than I. > > Thanks, > Keno > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140609/d8ecb305/attachment.html>
*it's not *x = *x On Mon, Jun 9, 2014 at 6:32 PM, Keno Fischer <kfischer at college.harvard.edu> wrote:> Let me correct that to noting that > > *x = (*x-addr_of_section(r_symbolnum)) + addr_of_section(r_symbolnum) > > actually makes sense to do when you're moving the section (so it's *x > *x, sorry for the confusion). It still strikes me as inconsistent with the > first definition though, but maybe that's ok? > > > On Sun, Jun 8, 2014 at 11:59 PM, Keno Fischer < > kfischer at college.harvard.edu> wrote: > >> Hello everybody, >> >> I would like some insights on the semantics of the X86_64_RELOC_UNSIGNED >> relocation type. When r_extern=1, the semantics seem pretty clear: >> >> Let x be a pointer to r_offset of appropriate size given by r_size, then >> *x += addr_of_symbol(r_symbolnum) >> >> However, when r_extern=0 the correct behavior is not clear. By analogy >> with the above, I would have expected >> >> *x += addr_of_section(r_symbolnum) >> >> but what LLVM implements is different. In RTDyld it implements >> >> *x = (*x-addr_of_section(r_symbolnum)) + addr_of_section(r_symbolnum) >> >> or equivalently >> >> *x = *x >> >> i.e. a noop. This works because llvm codegen also emits the absolute >> value of the address. I am unsure what is intended and would appreciate >> some clarification. A couple of points to consider: >> >> 1. I checked ld64 and as far as I can tell it doesn't consider >> non-external X86_64_RELOC_UNSIGNED but does *x +>> addr_of_symbol(r_symbolnum) regardless. That seems like a bug in ld64 to me >> because other relocations in the same switch statement do check r_extern. >> >> 2. I implemented *x += addr_of_section(r_symbolnum) in LLVM and all tests >> pass just fine >> >> 3. If the current implementation is correct r_symbolnum (and potentially >> the entire relocation) basically meaningless, which could of course be >> correct, but which is what originally caused me to look at this. If so I'd >> appreciate an explanation as to why we need to have the relocation in the >> first place. >> >> That's all I could find on the subject. I hope somebody else knows more >> than I. >> >> Thanks, >> Keno >> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140609/4c07c608/attachment.html>
On Jun 8, 2014, at 8:59 PM, Keno Fischer <kfischer at college.harvard.edu> wrote:> Hello everybody, > > I would like some insights on the semantics of the X86_64_RELOC_UNSIGNED relocation type. When r_extern=1, the semantics seem pretty clear: > > Let x be a pointer to r_offset of appropriate size given by r_size, then > *x += addr_of_symbol(r_symbolnum) > > However, when r_extern=0 the correct behavior is not clear. By analogy with the above, I would have expected > > *x += addr_of_section(r_symbolnum) > > but what LLVM implements is different. In RTDyld it implements > > *x = (*x-addr_of_section(r_symbolnum)) + addr_of_section(r_symbolnum) > > or equivalently > > *x = *xIn ld64 relocations are parsed into “Fixups”. A Fixup is a location to fix up and a value/expression of what to set it to. All sections are parsed up into “atoms”. A location is an atom and an offset (within the atom). The expression for a fixup is a target atom and optional addend (e.g. &foo + 10). For X86_64_RELOC_UNSIGNED when r_extern=1, the location is the atom containing the r_address (offset in the section), and the expression is the atom corresponding to r_symbolnum plus the added that is the current content of the location. In the JIT case where you are trying to prepare a object file for execution, that boils down to adding the final address of the r_symbolnum atom to the current content (addend) in the fixup location. For X86_64_RELOC_UNSIGNED when r_extern=0, the fixup location is the atom containing the r_address (offset in the section), and the expression is whatever atom+offset the current contents of location points to in that object file. In the JIT case, the boils down to adjusting the location by the amount the target atom slid from its address in the object file to its final address for execution. For instance, if the location contains 0x00000218 which points into section __DATA,__data (0x200 thru 0x280) and the __data section winds up at address 0x100001000 at runtime, then the location needs to have 0x100000E00 added to it (0x100001000 - 0x200). -Nick> > i.e. a noop. This works because llvm codegen also emits the absolute value of the address. I am unsure what is intended and would appreciate some clarification. A couple of points to consider: > > 1. I checked ld64 and as far as I can tell it doesn't consider non-external X86_64_RELOC_UNSIGNED but does *x += addr_of_symbol(r_symbolnum) regardless. That seems like a bug in ld64 to me because other relocations in the same switch statement do check r_extern. > > 2. I implemented *x += addr_of_section(r_symbolnum) in LLVM and all tests pass just fine > > 3. If the current implementation is correct r_symbolnum (and potentially the entire relocation) basically meaningless, which could of course be correct, but which is what originally caused me to look at this. If so I'd appreciate an explanation as to why we need to have the relocation in the first place. > > That's all I could find on the subject. I hope somebody else knows more than I. > > Thanks, > Keno > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Thank you for the explanation. Does that mean r_symbolnum is basically redundant in that case? Also, let me ask you how to handle the following use case which is somewhat related. Currently in MCJIT for MachO we are relocating all the debug sections. Eventually (as ELF does), it would be good to avoid this. However, this means that the debugger would have to handle relocations (as lldb currently does for ELF). With this scheme it seems impossible to me to adjust the vaddr of one section without adjusting the relocations that point at it. Is my interpretation of that correct? I guess the best we can do then is to to the relocations inline in the original copy of the object file. Also, I'm not sure who at Apple does documentation, but would it be possible to include the gist of your response in the reference documentation? It's basically impossible to discern the semantics just from what's written there. On Mon, Jun 9, 2014 at 7:19 PM, Nick Kledzik <kledzik at apple.com> wrote:> > On Jun 8, 2014, at 8:59 PM, Keno Fischer <kfischer at college.harvard.edu> > wrote: > > > Hello everybody, > > > > I would like some insights on the semantics of the X86_64_RELOC_UNSIGNED > relocation type. When r_extern=1, the semantics seem pretty clear: > > > > Let x be a pointer to r_offset of appropriate size given by r_size, then > > *x += addr_of_symbol(r_symbolnum) > > > > However, when r_extern=0 the correct behavior is not clear. By analogy > with the above, I would have expected > > > > *x += addr_of_section(r_symbolnum) > > > > but what LLVM implements is different. In RTDyld it implements > > > > *x = (*x-addr_of_section(r_symbolnum)) + addr_of_section(r_symbolnum) > > > > or equivalently > > > > *x = *x > In ld64 relocations are parsed into “Fixups”. A Fixup is a location to > fix up and a value/expression of what to set it to. All sections are > parsed up into “atoms”. A location is an atom and an offset (within the > atom). The expression for a fixup is a target atom and optional addend > (e.g. &foo + 10). > > For X86_64_RELOC_UNSIGNED when r_extern=1, the location is the atom > containing the r_address (offset in the section), and the expression is the > atom corresponding to r_symbolnum plus the added that is the current > content of the location. In the JIT case where you are trying to prepare a > object file for execution, that boils down to adding the final address of > the r_symbolnum atom to the current content (addend) in the fixup location. > > For X86_64_RELOC_UNSIGNED when r_extern=0, the fixup location is the atom > containing the r_address (offset in the section), and the expression is > whatever atom+offset the current contents of location points to in that > object file. In the JIT case, the boils down to adjusting the location by > the amount the target atom slid from its address in the object file to its > final address for execution. For instance, if the location contains > 0x00000218 which points into section __DATA,__data (0x200 thru 0x280) and > the __data section winds up at address 0x100001000 at runtime, then the > location needs to have 0x100000E00 added to it (0x100001000 - 0x200). > > -Nick > > > > > > i.e. a noop. This works because llvm codegen also emits the absolute > value of the address. I am unsure what is intended and would appreciate > some clarification. A couple of points to consider: > > > > 1. I checked ld64 and as far as I can tell it doesn't consider > non-external X86_64_RELOC_UNSIGNED but does *x +> addr_of_symbol(r_symbolnum) regardless. That seems like a bug in ld64 to me > because other relocations in the same switch statement do check r_extern. > > > > 2. I implemented *x += addr_of_section(r_symbolnum) in LLVM and all > tests pass just fine > > > > 3. If the current implementation is correct r_symbolnum (and potentially > the entire relocation) basically meaningless, which could of course be > correct, but which is what originally caused me to look at this. If so I'd > appreciate an explanation as to why we need to have the relocation in the > first place. > > > > That's all I could find on the subject. I hope somebody else knows more > than I. > > > > Thanks, > > Keno > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140609/4d6292c0/attachment.html>