thr3ads.net - llvm dev - [LLVMdev] MachO non-external X86_64_RELOC

If this information is useful, please help other people find it:
Share via:

Keno Fischer

2014-Jun-10 00:50 UTC

[LLVMdev] MachO non-external X86_64_RELOC_UNSIGNED

Thank you for the explanation. Does that mean r_symbolnum is basically
redundant in that case? Also, let me ask you how to handle the following
use case which is somewhat related. Currently in MCJIT for MachO we are
relocating all the debug sections. Eventually (as ELF does), it would be
good to avoid this. However, this means that the debugger would have to
handle relocations (as lldb currently does for ELF). With this scheme it
seems impossible to me to adjust the vaddr of one section without adjusting
the relocations that point at it. Is my interpretation of that correct? I
guess the best we can do then is to to the relocations inline in the
original copy of the object file.

Also, I'm not sure who at Apple does documentation, but would it be
possible to include the gist of your response in the reference
documentation? It's basically impossible to discern the semantics just from
what's written there.


On Mon, Jun 9, 2014 at 7:19 PM, Nick Kledzik <kledzik at apple.com> wrote:
>
> On Jun 8, 2014, at 8:59 PM, Keno Fischer <kfischer at
college.harvard.edu>
> wrote:
>
> > Hello everybody,
> >
> > I would like some insights on the semantics of the
X86_64_RELOC_UNSIGNED
> relocation type. When r_extern=1, the semantics seem pretty clear:
> >
> > Let x be a pointer to r_offset of appropriate size given by r_size,
then
> > *x += addr_of_symbol(r_symbolnum)
> >
> > However, when r_extern=0 the correct behavior is not clear. By analogy
> with the above, I would have expected
> >
> > *x += addr_of_section(r_symbolnum)
> >
> > but what LLVM implements is different. In RTDyld it implements
> >
> > *x = (*x-addr_of_section(r_symbolnum)) + addr_of_section(r_symbolnum)
> >
> > or equivalently
> >
> > *x = *x
> In ld64 relocations are parsed into “Fixups”.  A Fixup is a location to
> fix up and a value/expression of what to set it to.  All sections are
> parsed up into “atoms”.  A location is an atom and an offset (within the
> atom).  The expression for a fixup is a target atom and optional addend
> (e.g. &foo + 10).
>
> For X86_64_RELOC_UNSIGNED when r_extern=1, the location is the atom
> containing the r_address (offset in the section), and the expression is the
> atom corresponding to r_symbolnum plus the added that is the current
> content of the location.  In the JIT case where you are trying to prepare a
> object file for execution, that boils down to adding the final address of
> the r_symbolnum atom to the current content (addend) in the fixup location.
>
> For X86_64_RELOC_UNSIGNED when r_extern=0, the fixup location is the atom
> containing the r_address (offset in the section), and the expression is
> whatever atom+offset the current contents of location points to in that
> object file.  In the JIT case, the boils down to adjusting the location by
> the amount the target atom slid from its address in the object file to its
> final address for execution.  For instance, if the location contains
> 0x00000218 which points into section __DATA,__data (0x200 thru 0x280) and
> the __data section winds up at address 0x100001000 at runtime, then the
> location needs to have 0x100000E00 added to it (0x100001000 - 0x200).
>
> -Nick
>
>
> >
> > i.e. a noop. This works because llvm codegen also emits the absolute
> value of the address. I am unsure what is intended and would appreciate
> some clarification. A couple of points to consider:
> >
> > 1. I checked ld64 and as far as I can tell it doesn't consider
> non-external X86_64_RELOC_UNSIGNED but does *x +>
addr_of_symbol(r_symbolnum) regardless. That seems like a bug in ld64 to me
> because other relocations in the same switch statement do check r_extern.
> >
> > 2. I implemented *x += addr_of_section(r_symbolnum) in LLVM and all
> tests pass just fine
> >
> > 3. If the current implementation is correct r_symbolnum (and
potentially
> the entire relocation) basically meaningless, which could of course be
> correct, but which is what originally caused me to look at this. If so
I'd
> appreciate an explanation as to why we need to have the relocation in the
> first place.
> >
> > That's all I could find on the subject. I hope somebody else knows
more
> than I.
> >
> > Thanks,
> > Keno
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140609/4d6292c0/attachment.html>

Keno Fischer

2014-Jun-10 01:01 UTC

head link

[LLVMdev] MachO non-external X86_64_RELOC_UNSIGNED

Also, may I ask what the semantics for X86_64_RELOC_SIGNED are with an
r_extern=0 relocation?


On Mon, Jun 9, 2014 at 8:50 PM, Keno Fischer <kfischer at
college.harvard.edu>
wrote:
> Thank you for the explanation. Does that mean r_symbolnum is basically
> redundant in that case? Also, let me ask you how to handle the following
> use case which is somewhat related. Currently in MCJIT for MachO we are
> relocating all the debug sections. Eventually (as ELF does), it would be
> good to avoid this. However, this means that the debugger would have to
> handle relocations (as lldb currently does for ELF). With this scheme it
> seems impossible to me to adjust the vaddr of one section without adjusting
> the relocations that point at it. Is my interpretation of that correct? I
> guess the best we can do then is to to the relocations inline in the
> original copy of the object file.
>
> Also, I'm not sure who at Apple does documentation, but would it be
> possible to include the gist of your response in the reference
> documentation? It's basically impossible to discern the semantics just
from
> what's written there.
>
>
> On Mon, Jun 9, 2014 at 7:19 PM, Nick Kledzik <kledzik at apple.com>
wrote:
>
>>
>> On Jun 8, 2014, at 8:59 PM, Keno Fischer <kfischer at
college.harvard.edu>
>> wrote:
>>
>> > Hello everybody,
>> >
>> > I would like some insights on the semantics of the
>> X86_64_RELOC_UNSIGNED relocation type. When r_extern=1, the semantics
seem
>> pretty clear:
>> >
>> > Let x be a pointer to r_offset of appropriate size given by
r_size, then
>> > *x += addr_of_symbol(r_symbolnum)
>> >
>> > However, when r_extern=0 the correct behavior is not clear. By
analogy
>> with the above, I would have expected
>> >
>> > *x += addr_of_section(r_symbolnum)
>> >
>> > but what LLVM implements is different. In RTDyld it implements
>> >
>> > *x = (*x-addr_of_section(r_symbolnum)) +
addr_of_section(r_symbolnum)
>> >
>> > or equivalently
>> >
>> > *x = *x
>> In ld64 relocations are parsed into “Fixups”.  A Fixup is a location to
>> fix up and a value/expression of what to set it to.  All sections are
>> parsed up into “atoms”.  A location is an atom and an offset (within
the
>> atom).  The expression for a fixup is a target atom and optional addend
>> (e.g. &foo + 10).
>>
>> For X86_64_RELOC_UNSIGNED when r_extern=1, the location is the atom
>> containing the r_address (offset in the section), and the expression is
the
>> atom corresponding to r_symbolnum plus the added that is the current
>> content of the location.  In the JIT case where you are trying to
prepare a
>> object file for execution, that boils down to adding the final address
of
>> the r_symbolnum atom to the current content (addend) in the fixup
location.
>>
>> For X86_64_RELOC_UNSIGNED when r_extern=0, the fixup location is the
atom
>> containing the r_address (offset in the section), and the expression is
>> whatever atom+offset the current contents of location points to in that
>> object file.  In the JIT case, the boils down to adjusting the location
by
>> the amount the target atom slid from its address in the object file to
its
>> final address for execution.  For instance, if the location contains
>> 0x00000218 which points into section __DATA,__data (0x200 thru 0x280)
and
>> the __data section winds up at address 0x100001000 at runtime, then the
>> location needs to have 0x100000E00 added to it (0x100001000 - 0x200).
>>
>> -Nick
>>
>>
>> >
>> > i.e. a noop. This works because llvm codegen also emits the
absolute
>> value of the address. I am unsure what is intended and would appreciate
>> some clarification. A couple of points to consider:
>> >
>> > 1. I checked ld64 and as far as I can tell it doesn't consider
>> non-external X86_64_RELOC_UNSIGNED but does *x +>>
addr_of_symbol(r_symbolnum) regardless. That seems like a bug in ld64 to me
>> because other relocations in the same switch statement do check
r_extern.
>> >
>> > 2. I implemented *x += addr_of_section(r_symbolnum) in LLVM and
all
>> tests pass just fine
>> >
>> > 3. If the current implementation is correct r_symbolnum (and
>> potentially the entire relocation) basically meaningless, which could
of
>> course be correct, but which is what originally caused me to look at
this.
>> If so I'd appreciate an explanation as to why we need to have the
>> relocation in the first place.
>> >
>> > That's all I could find on the subject. I hope somebody else
knows more
>> than I.
>> >
>> > Thanks,
>> > Keno
>> >
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140609/d4cf8d98/attachment.html>

Nick Kledzik

2014-Jun-10 01:30 UTC

head link

[LLVMdev] MachO non-external X86_64_RELOC_UNSIGNED

On Jun 9, 2014, at 6:01 PM, Keno Fischer <kfischer at college.harvard.edu>
wrote:
> Also, may I ask what the semantics for X86_64_RELOC_SIGNED are with an
r_extern=0 relocation?That is only used for 32-bit fixups such as in RIP-relative instructions.  The
r_extern=0 case might occur if the instruction references something in a section
that has no symbols.  The JIT would need to do an analogous update of adding to
the fixup location the (32-bit signed) difference between the final runtime
address minus the object file address of the start of the section containing the
thing being referenced by the RIP relative instruction.

> On Mon, Jun 9, 2014 at 8:50 PM, Keno Fischer <kfischer at
college.harvard.edu> wrote:
> Thank you for the explanation. Does that mean r_symbolnum is basically
redundant in that case?It usually is not needed.  The r_symbolnum (which is the section index when
r_extern=0) is needed when the target of the relocation is the start of end of a
section.  For instance if section __foo ends at address 0x300 and section __bar
starts at sections 0x300 and the fixup location content points to 0x300, you
don’t know which section it is pointing to without that r_symbolnum.  The
sections may be split apart in the final execution layout, so which section it
is referencing is important in that edge case.

> Also, let me ask you how to handle the following use case which is somewhat
related. Currently in MCJIT for MachO we are relocating all the debug sections.
Eventually (as ELF does), it would be good to avoid this. However, this means
that the debugger would have to handle relocations (as lldb currently does for
ELF). With this scheme it seems impossible to me to adjust the vaddr of one
section without adjusting the relocations that point at it. Is my interpretation
of that correct? I guess the best we can do then is to to the relocations inline
in the original copy of the object file.In darwin tools, we leave the debug info in the .o file.  lldb can find it there
if it needs it.  To aid that, the linker generates “debug notes” in the final
linked image which contain the paths of the original .o files.  These are STABS
N_OSO symbol table entries.   Can you just ignore (not copy to execution space)
the DWARF debug sections in MCJIT for darwin?

-Nick

> 
> Also, I'm not sure who at Apple does documentation, but would it be
possible to include the gist of your response in the reference documentation?
It's basically impossible to discern the semantics just from what's
written there.
> 
> 
> On Mon, Jun 9, 2014 at 7:19 PM, Nick Kledzik <kledzik at apple.com>
wrote:
> 
> On Jun 8, 2014, at 8:59 PM, Keno Fischer <kfischer at
college.harvard.edu> wrote:
> 
> > Hello everybody,
> >
> > I would like some insights on the semantics of the
X86_64_RELOC_UNSIGNED relocation type. When r_extern=1, the semantics seem
pretty clear:
> >
> > Let x be a pointer to r_offset of appropriate size given by r_size,
then
> > *x += addr_of_symbol(r_symbolnum)
> >
> > However, when r_extern=0 the correct behavior is not clear. By analogy
with the above, I would have expected
> >
> > *x += addr_of_section(r_symbolnum)
> >
> > but what LLVM implements is different. In RTDyld it implements
> >
> > *x = (*x-addr_of_section(r_symbolnum)) + addr_of_section(r_symbolnum)
> >
> > or equivalently
> >
> > *x = *x
> In ld64 relocations are parsed into “Fixups”.  A Fixup is a location to fix
up and a value/expression of what to set it to.  All sections are parsed up into
“atoms”.  A location is an atom and an offset (within the atom).  The expression
for a fixup is a target atom and optional addend (e.g. &foo + 10).
> 
> For X86_64_RELOC_UNSIGNED when r_extern=1, the location is the atom
containing the r_address (offset in the section), and the expression is the atom
corresponding to r_symbolnum plus the added that is the current content of the
location.  In the JIT case where you are trying to prepare a object file for
execution, that boils down to adding the final address of the r_symbolnum atom
to the current content (addend) in the fixup location.
> 
> For X86_64_RELOC_UNSIGNED when r_extern=0, the fixup location is the atom
containing the r_address (offset in the section), and the expression is whatever
atom+offset the current contents of location points to in that object file.  In
the JIT case, the boils down to adjusting the location by the amount the target
atom slid from its address in the object file to its final address for
execution.  For instance, if the location contains 0x00000218 which points into
section __DATA,__data (0x200 thru 0x280) and the __data section winds up at
address 0x100001000 at runtime, then the location needs to have 0x100000E00
added to it (0x100001000 - 0x200).
> 
> -Nick
> 
> 
> >
> > i.e. a noop. This works because llvm codegen also emits the absolute
value of the address. I am unsure what is intended and would appreciate some
clarification. A couple of points to consider:
> >
> > 1. I checked ld64 and as far as I can tell it doesn't consider
non-external X86_64_RELOC_UNSIGNED but does *x += addr_of_symbol(r_symbolnum)
regardless. That seems like a bug in ld64 to me because other relocations in the
same switch statement do check r_extern.
> >
> > 2. I implemented *x += addr_of_section(r_symbolnum) in LLVM and all
tests pass just fine
> >
> > 3. If the current implementation is correct r_symbolnum (and
potentially the entire relocation) basically meaningless, which could of course
be correct, but which is what originally caused me to look at this. If so
I'd appreciate an explanation as to why we need to have the relocation in
the first place.
> >
> > That's all I could find on the subject. I hope somebody else knows
more than I.
> >
> > Thanks,
> > Keno
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140609/27ddf223/attachment.html>

Keno Fischer

2014-Jun-12 22:10 UTC

head link

[LLVMdev] MachO non-external X86_64_RELOC_UNSIGNED

Realized my original reply accidentally didn't go to the mailing list (see
below). I'm also still pondering the question of
whether for jitting purposes setting ->addr or ->offset would be better.


On Mon, Jun 9, 2014 at 9:46 PM, Keno Fischer <kfischer at
college.harvard.edu>
wrote:
>
>
>
> On Mon, Jun 9, 2014 at 9:30 PM, Nick Kledzik <kledzik at apple.com>
wrote:
>
>>
>> On Jun 9, 2014, at 6:01 PM, Keno Fischer <kfischer at
college.harvard.edu>
>> wrote:
>>
>> Also, may I ask what the semantics for X86_64_RELOC_SIGNED are with an
>> r_extern=0 relocation?
>>
>> That is only used for 32-bit fixups such as in RIP-relative
instructions.
>>  The r_extern=0 case might occur if the instruction references
something in
>> a section that has no symbols.  The JIT would need to do an analogous
>> update of adding to the fixup location the (32-bit signed) difference
>> between the final runtime address minus the object file address of the
>> start of the section containing the thing being referenced by the RIP
>> relative instruction.
>>
>>
> Ok, a non-external X86_64_RELOC_SIGNED doesn't make sense then since
the
> address would always be positive so the unsigned variant could just be
> used.
>
>
>>
>> On Mon, Jun 9, 2014 at 8:50 PM, Keno Fischer <
>> kfischer at college.harvard.edu> wrote:
>>
>>> Thank you for the explanation. Does that mean r_symbolnum is
basically
>>> redundant in that case?
>>>
>> It usually is not needed.  The r_symbolnum (which is the section index
>> when r_extern=0) is needed when the target of the relocation is the
start
>> of end of a section.  For instance if section __foo ends at address
0x300
>> and section __bar starts at sections 0x300 and the fixup location
content
>> points to 0x300, you don’t know which section it is pointing to without
>> that r_symbolnum.  The sections may be split apart in the final
execution
>> layout, so which section it is referencing is important in that edge
case.
>>
>>
>>
> Ah, hadn't considered that edge case, thanks!
>
>>  Also, let me ask you how to handle the following use case which is
>>> somewhat related. Currently in MCJIT for MachO we are relocating
all the
>>> debug sections. Eventually (as ELF does), it would be good to avoid
this.
>>> However, this means that the debugger would have to handle
relocations (as
>>> lldb currently does for ELF). With this scheme it seems impossible
to me to
>>> adjust the vaddr of one section without adjusting the relocations
that
>>> point at it. Is my interpretation of that correct? I guess the best
we can
>>> do then is to to the relocations inline in the original copy of the
object
>>> file.
>>>
>> In darwin tools, we leave the debug info in the .o file.  lldb can find
>> it there if it needs it.  To aid that, the linker generates “debug
notes”
>> in the final linked image which contain the paths of the original .o
files.
>>  These are STABS N_OSO symbol table entries.   Can you just ignore (not
>> copy to execution space) the DWARF debug sections in MCJIT for darwin?
>>
>>
> The way this works in ELF is that the vaddr in the object header is
> adjusted to the vaddr of the relocated section. I mirrored this approach in
> my pending patch to add MachO support (i.e. adjusting
> (section_(64))->addr). This means that if we don't relocate the
debug
> section (i.e. don't copy it) then we'll have lost the information
where the
> section used to be. I am now wondering if there is a better approach. Maybe
> by modifying (section_(64))->offset instead?
>
>
>> -Nick
>>
>>
>>
>>> Also, I'm not sure who at Apple does documentation, but would
it be
>>> possible to include the gist of your response in the reference
>>> documentation? It's basically impossible to discern the
semantics just from
>>> what's written there.
>>>
>>>
>>> On Mon, Jun 9, 2014 at 7:19 PM, Nick Kledzik <kledzik at
apple.com> wrote:
>>>
>>>>
>>>> On Jun 8, 2014, at 8:59 PM, Keno Fischer <kfischer at
college.harvard.edu>
>>>> wrote:
>>>>
>>>> > Hello everybody,
>>>> >
>>>> > I would like some insights on the semantics of the
>>>> X86_64_RELOC_UNSIGNED relocation type. When r_extern=1, the
semantics seem
>>>> pretty clear:
>>>> >
>>>> > Let x be a pointer to r_offset of appropriate size given
by r_size,
>>>> then
>>>> > *x += addr_of_symbol(r_symbolnum)
>>>> >
>>>> > However, when r_extern=0 the correct behavior is not
clear. By
>>>> analogy with the above, I would have expected
>>>> >
>>>> > *x += addr_of_section(r_symbolnum)
>>>> >
>>>> > but what LLVM implements is different. In RTDyld it
implements
>>>> >
>>>> > *x = (*x-addr_of_section(r_symbolnum)) +
addr_of_section(r_symbolnum)
>>>> >
>>>> > or equivalently
>>>> >
>>>> > *x = *x
>>>> In ld64 relocations are parsed into “Fixups”.  A Fixup is a
location to
>>>> fix up and a value/expression of what to set it to.  All
sections are
>>>> parsed up into “atoms”.  A location is an atom and an offset
(within the
>>>> atom).  The expression for a fixup is a target atom and
optional addend
>>>> (e.g. &foo + 10).
>>>>
>>>> For X86_64_RELOC_UNSIGNED when r_extern=1, the location is the
atom
>>>> containing the r_address (offset in the section), and the
expression is the
>>>> atom corresponding to r_symbolnum plus the added that is the
current
>>>> content of the location.  In the JIT case where you are trying
to prepare a
>>>> object file for execution, that boils down to adding the final
address of
>>>> the r_symbolnum atom to the current content (addend) in the
fixup location.
>>>>
>>>> For X86_64_RELOC_UNSIGNED when r_extern=0, the fixup location
is the
>>>> atom containing the r_address (offset in the section), and the
expression
>>>> is whatever atom+offset the current contents of location points
to in that
>>>> object file.  In the JIT case, the boils down to adjusting the
location by
>>>> the amount the target atom slid from its address in the object
file to its
>>>> final address for execution.  For instance, if the location
contains
>>>> 0x00000218 which points into section __DATA,__data (0x200 thru
0x280) and
>>>> the __data section winds up at address 0x100001000 at runtime,
then the
>>>> location needs to have 0x100000E00 added to it (0x100001000 -
0x200).
>>>>
>>>> -Nick
>>>>
>>>>
>>>> >
>>>> > i.e. a noop. This works because llvm codegen also emits
the absolute
>>>> value of the address. I am unsure what is intended and would
appreciate
>>>> some clarification. A couple of points to consider:
>>>> >
>>>> > 1. I checked ld64 and as far as I can tell it doesn't
consider
>>>> non-external X86_64_RELOC_UNSIGNED but does *x
+>>>> addr_of_symbol(r_symbolnum) regardless. That seems like a bug
in ld64 to me
>>>> because other relocations in the same switch statement do check
r_extern.
>>>> >
>>>> > 2. I implemented *x += addr_of_section(r_symbolnum) in
LLVM and all
>>>> tests pass just fine
>>>> >
>>>> > 3. If the current implementation is correct r_symbolnum
(and
>>>> potentially the entire relocation) basically meaningless, which
could of
>>>> course be correct, but which is what originally caused me to
look at this.
>>>> If so I'd appreciate an explanation as to why we need to
have the
>>>> relocation in the first place.
>>>> >
>>>> > That's all I could find on the subject. I hope
somebody else knows
>>>> more than I.
>>>> >
>>>> > Thanks,
>>>> > Keno
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > LLVM Developers mailing list
>>>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>>
>>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140612/cab44b83/attachment.html>

Keno Fischer

2014-Jun-13 04:07 UTC

head link

[LLVMdev] MachO non-external X86_64_RELOC_UNSIGNED

Adjusting ->offset doesn't work because it's an unsigned 32bit
integer, but
the data may easily be moved to an address before the header of the object
file. I'm honestly out of ideas at this point (other than manually
iterating through every single relocation and adjusting it). I do have to
say, these semantics of X86_RELOC_UNSIGNED make it pretty JIT-unfriendly.
Unless somebody comes up with a clever idea (or we agree on a way to pass
the required info to the debugger out of band), I guess we'll have to live
with always relocating debug section :(.


On Thu, Jun 12, 2014 at 6:10 PM, Keno Fischer <kfischer at
college.harvard.edu>
wrote:
> Realized my original reply accidentally didn't go to the mailing list
(see
> below). I'm also still pondering the question of
> whether for jitting purposes setting ->addr or ->offset would be
better.
>
>
>
> On Mon, Jun 9, 2014 at 9:46 PM, Keno Fischer <kfischer at
college.harvard.edu
> > wrote:
>
>>
>>
>>
>> On Mon, Jun 9, 2014 at 9:30 PM, Nick Kledzik <kledzik at
apple.com> wrote:
>>
>>>
>>> On Jun 9, 2014, at 6:01 PM, Keno Fischer <kfischer at
college.harvard.edu>
>>> wrote:
>>>
>>> Also, may I ask what the semantics for X86_64_RELOC_SIGNED are with
an
>>> r_extern=0 relocation?
>>>
>>> That is only used for 32-bit fixups such as in RIP-relative
>>> instructions.  The r_extern=0 case might occur if the instruction
>>> references something in a section that has no symbols.  The JIT
would need
>>> to do an analogous update of adding to the fixup location the
(32-bit
>>> signed) difference between the final runtime address minus the
object file
>>> address of the start of the section containing the thing being
referenced
>>> by the RIP relative instruction.
>>>
>>>
>> Ok, a non-external X86_64_RELOC_SIGNED doesn't make sense then
since the
>> address would always be positive so the unsigned variant could just be
>> used.
>>
>>
>>>
>>> On Mon, Jun 9, 2014 at 8:50 PM, Keno Fischer <
>>> kfischer at college.harvard.edu> wrote:
>>>
>>>> Thank you for the explanation. Does that mean r_symbolnum is
basically
>>>> redundant in that case?
>>>>
>>> It usually is not needed.  The r_symbolnum (which is the section
index
>>> when r_extern=0) is needed when the target of the relocation is the
start
>>> of end of a section.  For instance if section __foo ends at address
0x300
>>> and section __bar starts at sections 0x300 and the fixup location
content
>>> points to 0x300, you don’t know which section it is pointing to
without
>>> that r_symbolnum.  The sections may be split apart in the final
execution
>>> layout, so which section it is referencing is important in that
edge case.
>>>
>>>
>>>
>> Ah, hadn't considered that edge case, thanks!
>>
>>>  Also, let me ask you how to handle the following use case which is
>>>> somewhat related. Currently in MCJIT for MachO we are
relocating all the
>>>> debug sections. Eventually (as ELF does), it would be good to
avoid this.
>>>> However, this means that the debugger would have to handle
relocations (as
>>>> lldb currently does for ELF). With this scheme it seems
impossible to me to
>>>> adjust the vaddr of one section without adjusting the
relocations that
>>>> point at it. Is my interpretation of that correct? I guess the
best we can
>>>> do then is to to the relocations inline in the original copy of
the object
>>>> file.
>>>>
>>> In darwin tools, we leave the debug info in the .o file.  lldb can
find
>>> it there if it needs it.  To aid that, the linker generates “debug
notes”
>>> in the final linked image which contain the paths of the original
.o files.
>>>  These are STABS N_OSO symbol table entries.   Can you just ignore
(not
>>> copy to execution space) the DWARF debug sections in MCJIT for
darwin?
>>>
>>>
>> The way this works in ELF is that the vaddr in the object header is
>> adjusted to the vaddr of the relocated section. I mirrored this
approach in
>> my pending patch to add MachO support (i.e. adjusting
>> (section_(64))->addr). This means that if we don't relocate the
debug
>> section (i.e. don't copy it) then we'll have lost the
information where the
>> section used to be. I am now wondering if there is a better approach.
Maybe
>> by modifying (section_(64))->offset instead?
>>
>>
>>> -Nick
>>>
>>>
>>>
>>>> Also, I'm not sure who at Apple does documentation, but
would it be
>>>> possible to include the gist of your response in the reference
>>>> documentation? It's basically impossible to discern the
semantics just from
>>>> what's written there.
>>>>
>>>>
>>>> On Mon, Jun 9, 2014 at 7:19 PM, Nick Kledzik <kledzik at
apple.com> wrote:
>>>>
>>>>>
>>>>> On Jun 8, 2014, at 8:59 PM, Keno Fischer <kfischer at
college.harvard.edu>
>>>>> wrote:
>>>>>
>>>>> > Hello everybody,
>>>>> >
>>>>> > I would like some insights on the semantics of the
>>>>> X86_64_RELOC_UNSIGNED relocation type. When r_extern=1, the
semantics seem
>>>>> pretty clear:
>>>>> >
>>>>> > Let x be a pointer to r_offset of appropriate size
given by r_size,
>>>>> then
>>>>> > *x += addr_of_symbol(r_symbolnum)
>>>>> >
>>>>> > However, when r_extern=0 the correct behavior is not
clear. By
>>>>> analogy with the above, I would have expected
>>>>> >
>>>>> > *x += addr_of_section(r_symbolnum)
>>>>> >
>>>>> > but what LLVM implements is different. In RTDyld it
implements
>>>>> >
>>>>> > *x = (*x-addr_of_section(r_symbolnum)) +
addr_of_section(r_symbolnum)
>>>>> >
>>>>> > or equivalently
>>>>> >
>>>>> > *x = *x
>>>>> In ld64 relocations are parsed into “Fixups”.  A Fixup is a
location
>>>>> to fix up and a value/expression of what to set it to.  All
sections are
>>>>> parsed up into “atoms”.  A location is an atom and an
offset (within the
>>>>> atom).  The expression for a fixup is a target atom and
optional addend
>>>>> (e.g. &foo + 10).
>>>>>
>>>>> For X86_64_RELOC_UNSIGNED when r_extern=1, the location is
the atom
>>>>> containing the r_address (offset in the section), and the
expression is the
>>>>> atom corresponding to r_symbolnum plus the added that is
the current
>>>>> content of the location.  In the JIT case where you are
trying to prepare a
>>>>> object file for execution, that boils down to adding the
final address of
>>>>> the r_symbolnum atom to the current content (addend) in the
fixup location.
>>>>>
>>>>> For X86_64_RELOC_UNSIGNED when r_extern=0, the fixup
location is the
>>>>> atom containing the r_address (offset in the section), and
the expression
>>>>> is whatever atom+offset the current contents of location
points to in that
>>>>> object file.  In the JIT case, the boils down to adjusting
the location by
>>>>> the amount the target atom slid from its address in the
object file to its
>>>>> final address for execution.  For instance, if the location
contains
>>>>> 0x00000218 which points into section __DATA,__data (0x200
thru 0x280) and
>>>>> the __data section winds up at address 0x100001000 at
runtime, then the
>>>>> location needs to have 0x100000E00 added to it (0x100001000
- 0x200).
>>>>>
>>>>> -Nick
>>>>>
>>>>>
>>>>> >
>>>>> > i.e. a noop. This works because llvm codegen also
emits the absolute
>>>>> value of the address. I am unsure what is intended and
would appreciate
>>>>> some clarification. A couple of points to consider:
>>>>> >
>>>>> > 1. I checked ld64 and as far as I can tell it
doesn't consider
>>>>> non-external X86_64_RELOC_UNSIGNED but does *x
+>>>>> addr_of_symbol(r_symbolnum) regardless. That seems like a
bug in ld64 to me
>>>>> because other relocations in the same switch statement do
check r_extern.
>>>>> >
>>>>> > 2. I implemented *x += addr_of_section(r_symbolnum) in
LLVM and all
>>>>> tests pass just fine
>>>>> >
>>>>> > 3. If the current implementation is correct
r_symbolnum (and
>>>>> potentially the entire relocation) basically meaningless,
which could of
>>>>> course be correct, but which is what originally caused me
to look at this.
>>>>> If so I'd appreciate an explanation as to why we need
to have the
>>>>> relocation in the first place.
>>>>> >
>>>>> > That's all I could find on the subject. I hope
somebody else knows
>>>>> more than I.
>>>>> >
>>>>> > Thanks,
>>>>> > Keno
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > LLVM Developers mailing list
>>>>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140613/b610ccd2/attachment.html>

llvm dev - Jun 2014 - [LLVMdev] MachO non-external X86_64_RELOC_UNSIGNED

[LLVMdev] MachO non-external X86_64_RELOC_UNSIGNED

[LLVMdev] MachO non-external X86_64_RELOC_UNSIGNED

[LLVMdev] MachO non-external X86_64_RELOC_UNSIGNED

[LLVMdev] MachO non-external X86_64_RELOC_UNSIGNED

[LLVMdev] MachO non-external X86_64_RELOC_UNSIGNED