thr3ads.net - llvm dev - [LLVMdev] [MCJIT] Multiple GOT handling in RuntimeDyldELF [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Keno Fischer

2015-Jan-18 13:38 UTC

[LLVMdev] [MCJIT] Multiple GOT handling in RuntimeDyldELF

Hello everyone,

As part of my quest to add TLS relocation support to MCJIT, I've been
taking a closer look at the GOT implementation in RuntimeDyldELF and I
believe that is not valid as currently implemented. In particular, I am
wondering about the multiple GOT handling support introduced in r192020. If
I understand correctly this can make code reuse the GOT table entry in a
different object file. This doesn't seem correct to me as there is no
guarantee that the loaded object files are allocated within 2GB of each
other in memory. What was the intended use case of this feature?
Additionally, it seems that currently every access through the GOT get it's
own entry, when identical relocations could be combined into one entry. The
GOTEntries array is also never cleared, causing memory and performance
problems when loading multiple object files (this is a bug and easily
fixed, but makes me think this feature isn't particularly well tested).
I'm
planning to redesign the GOT mechanism, but I would like to understand the
use case intended in r192020 first, to make sure I don't design myself into
a corner.

Thanks,
Keno
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150118/0a973cbb/attachment.html>

Keno Fischer

2015-Jan-18 17:25 UTC

head link

[LLVMdev] [MCJIT] Multiple GOT handling in RuntimeDyldELF

FWIW, I verified that we indeed crash with an assertion failure, if two
copies of

declare void @global_foo()

define internal void @foo() {
call void @global_foo()
ret void
}

are too far apart (with a definition of global_foo anywhere in the address
space).


On Sun, Jan 18, 2015 at 2:38 PM, Keno Fischer <kfischer at
college.harvard.edu>
wrote:
> Hello everyone,
>
> As part of my quest to add TLS relocation support to MCJIT, I've been
> taking a closer look at the GOT implementation in RuntimeDyldELF and I
> believe that is not valid as currently implemented. In particular, I am
> wondering about the multiple GOT handling support introduced in r192020. If
> I understand correctly this can make code reuse the GOT table entry in a
> different object file. This doesn't seem correct to me as there is no
> guarantee that the loaded object files are allocated within 2GB of each
> other in memory. What was the intended use case of this feature?
> Additionally, it seems that currently every access through the GOT get
it's
> own entry, when identical relocations could be combined into one entry. The
> GOTEntries array is also never cleared, causing memory and performance
> problems when loading multiple object files (this is a bug and easily
> fixed, but makes me think this feature isn't particularly well tested).
I'm
> planning to redesign the GOT mechanism, but I would like to understand the
> use case intended in r192020 first, to make sure I don't design myself
into
> a corner.
>
> Thanks,
> Keno
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150118/db54ce49/attachment.html>

Kaylor, Andrew

2015-Jan-19 19:51 UTC

head link

[LLVMdev] [MCJIT] Multiple GOT handling in RuntimeDyldELF

Hi Keno,

I _think_ that the GOT support we currently have can be made to work if the
memory manager provides the necessary help (more on that below), but I will
readily admit that it is implemented in a fairly non-standard way that is likely
to seem completely wrong on first inspection (and probably still seems at least
slightly wrong on second inspection).  It may also have inherent limitations
that can’t be overcome without a redesign, but if so I don’t know what those
limitations might be.

It may be helpful to refer to the comments in my original GOT implementation
patch
(http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130812/184265.html)
when trying to decipher the intent of the existing code as unfortunately I seem
to have said quite a bit more there than I did in the actual code comments.

I’m pretty sure that the “multiple GOT” patch was intended to support the case
where additional modules are loaded after finalizeLoad() has been called.  It
looks like we were at some point trying to use a single GOT for all modules, but
once it had been “finalized” another GOT had to be created for subsequent loads.
It’s been a while since I looked at this code, but I believe that we defer
calculating the offsets for the GOT until a “finalize” is performed.  This is
because the memory for loaded sections may be remapped before that time to
handle remote (or out-of-process) execution.  It appears that we are also
deferring allocation of the GOT section memory until this time.

With regard to the 2 GB+ offset problem, we’re dependent on the memory manager
in that regard.  Even with a single object being loaded there is no guarantee
that the memory allocated for the GOT section will be within 2 GB of the memory
allocated for other sections unless the memory manager does something to make it
so.  An interface was added sometime in the past year (I think) that optionally
pre-calculates the amount of memory that will be needed for an object load so
that the memory manager can allocate all of this memory as a single block.  I’m
not sure this interface properly accounts for the possibility of GOT sections
and I don’t know how it works with multiple modules.

The default memory manager attempts to use system address hints to allocate
sections in the same region of the address space, but not all OSs support the
flags we’d like to use and the address requests are never guaranteed to be
respected.  FWIW, Address Sanitizer is very good at exposing issues of this
sort.

I should also mention that there is some variation in how GOT-related issues are
handled from architecture to architecture within RuntimeDyldELF.  When I
implemented the GOT support, I intended for it to be capable of supporting any
architecture, but there was some support for GOT-related relocations for non-x86
platforms that pre-dated my GOT implementation and I suspect those will continue
to be used as long as they are working correctly.  For instance, several
architectures extended the allocated size of code sections and use the extra
space at the end of the section to create stubs for PC-relative function calls.

Let me know if there’s anything more I can do to help you get things working.

-Andy


From: Keno Fischer [mailto:kfischer at college.harvard.edu]
Sent: Sunday, January 18, 2015 5:38 AM
To: LLVM Developers Mailing List; Lang Hames; Kaylor, Andrew; Thirumurthi, Ashok
Subject: [MCJIT] Multiple GOT handling in RuntimeDyldELF

Hello everyone,

As part of my quest to add TLS relocation support to MCJIT, I've been taking
a closer look at the GOT implementation in RuntimeDyldELF and I believe that is
not valid as currently implemented. In particular, I am wondering about the
multiple GOT handling support introduced in r192020. If I understand correctly
this can make code reuse the GOT table entry in a different object file. This
doesn't seem correct to me as there is no guarantee that the loaded object
files are allocated within 2GB of each other in memory. What was the intended
use case of this feature? Additionally, it seems that currently every access
through the GOT get it's own entry, when identical relocations could be
combined into one entry. The GOTEntries array is also never cleared, causing
memory and performance problems when loading multiple object files (this is a
bug and easily fixed, but makes me think this feature isn't particularly
well tested). I'm planning to redesign the GOT mechanism, but I would like
to understand the use case intended in r192020 first, to make sure I don't
design myself into a corner.

Thanks,
Keno
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150119/4e1be077/attachment.html>

Keno Fischer

2015-Jan-19 23:31 UTC

head link

[LLVMdev] [MCJIT] Multiple GOT handling in RuntimeDyldELF

On Mon, Jan 19, 2015 at 8:51 PM, Kaylor, Andrew <andrew.kaylor at
intel.com>
wrote:
>  Hi Keno,
>
>
>
> I _think_ that the GOT support we currently have can be made to work if
> the memory manager provides the necessary help (more on that below), but I
> will readily admit that it is implemented in a fairly non-standard way that
> is likely to seem completely wrong on first inspection (and probably still
> seems at least slightly wrong on second inspection).  It may also have
> inherent limitations that can’t be overcome without a redesign, but if so I
> don’t know what those limitations might be.
>
>
>
> It may be helpful to refer to the comments in my original GOT
> implementation patch (
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130812/184265.html)
> when trying to decipher the intent of the existing code as unfortunately I
> seem to have said quite a bit more there than I did in the actual code
> comments.
>
>
>
> I’m pretty sure that the “multiple GOT” patch was intended to support the
> case where additional modules are loaded after finalizeLoad() has been
> called.  It looks like we were at some point trying to use a single GOT for
> all modules, but once it had been “finalized” another GOT had to be created
> for subsequent loads.  It’s been a while since I looked at this code, but I
> believe that we defer calculating the offsets for the GOT until a
> “finalize” is performed.  This is because the memory for loaded sections
> may be remapped before that time to handle remote (or out-of-process)
> execution.  It appears that we are also deferring allocation of the GOT
> section memory until this time.
>
We call finalizeLoad for every object I believe, so we essentially end up
with one GOT per object file anyway. We're deferring filling (though not
allocating) the GOT until we call resolveRelocations.

> With regard to the 2 GB+ offset problem, we’re dependent on the memory
> manager in that regard.  Even with a single object being loaded there is no
> guarantee that the memory allocated for the GOT section will be within 2 GB
> of the memory allocated for other sections unless the memory manager does
> something to make it so.  An interface was added sometime in the past year
> (I think) that optionally pre-calculates the amount of memory that will be
> needed for an object load so that the memory manager can allocate all of
> this memory as a single block.  I’m not sure this interface properly
> accounts for the possibility of GOT sections and I don’t know how it works
> with multiple modules.
>
While this is true, it's actually not the case I'm worried about. The
case
I'm worried about is where we load enough object files to exhaust 2GB worth
of objects (this doesn't even have to be 2GB worth of code, for example I
hit this with msan). The current interface basically forces all code to fit
within two GB, which is precisely what the GOT is supposed to avoid.

Just to be very explicit, the case I'm concerned about is

- Allocate Object file 1 with GOTPCREL to `foo`
- [ Allocate 2GB worth of other data ]
- Allocate Object file 2 with GOTPCREL to `foo`

Object file 2 will reuse Object file 1's GOT (though we'll still
allocate
space in object file 2's GOT, so it's not like we're doing this to
save
memory)


> The default memory manager attempts to use system address hints to
> allocate sections in the same region of the address space, but not all OSs
> support the flags we’d like to use and the address requests are never
> guaranteed to be respected.  FWIW, Address Sanitizer is very good at
> exposing issues of this sort.
>
Yes, I agree this is a concern, though it seems solvable to always allocate
one ObjectFile within 2GB, while it doesn't necessarily seem right to
impose this to impose the restriction that all code ever loaded has to fit
within 2GB.

>  I should also mention that there is some variation in how GOT-related
> issues are handled from architecture to architecture within
> RuntimeDyldELF.  When I implemented the GOT support, I intended for it to
> be capable of supporting any architecture, but there was some support for
> GOT-related relocations for non-x86 platforms that pre-dated my GOT
> implementation and I suspect those will continue to be used as long as they
> are working correctly.  For instance, several architectures extended the
> allocated size of code sections and use the extra space at the end of the
> section to create stubs for PC-relative function calls.
>
Yes, I've seen this code.

>  Let me know if there’s anything more I can do to help you get things
> working.
>
Thanks for replying. I have a half-way functioning prototype that makes
GOTs local to each object file again and also deduplicates GOTEntries where
possible. I'll finish it up and post it here as soon as I can.

>  -Andy
>
>
>
>
>
> *From:* Keno Fischer [mailto:kfischer at college.harvard.edu]
> *Sent:* Sunday, January 18, 2015 5:38 AM
> *To:* LLVM Developers Mailing List; Lang Hames; Kaylor, Andrew;
> Thirumurthi, Ashok
> *Subject:* [MCJIT] Multiple GOT handling in RuntimeDyldELF
>
>
>
> Hello everyone,
>
>
>
> As part of my quest to add TLS relocation support to MCJIT, I've been
> taking a closer look at the GOT implementation in RuntimeDyldELF and I
> believe that is not valid as currently implemented. In particular, I am
> wondering about the multiple GOT handling support introduced in r192020. If
> I understand correctly this can make code reuse the GOT table entry in a
> different object file. This doesn't seem correct to me as there is no
> guarantee that the loaded object files are allocated within 2GB of each
> other in memory. What was the intended use case of this feature?
> Additionally, it seems that currently every access through the GOT get
it's
> own entry, when identical relocations could be combined into one entry. The
> GOTEntries array is also never cleared, causing memory and performance
> problems when loading multiple object files (this is a bug and easily
> fixed, but makes me think this feature isn't particularly well tested).
I'm
> planning to redesign the GOT mechanism, but I would like to understand the
> use case intended in r192020 first, to make sure I don't design myself
into
> a corner.
>
>
>
> Thanks,
>
> Keno
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150120/9b463800/attachment.html>

llvm dev - Jan 2015 - [LLVMdev] [MCJIT] Multiple GOT handling in RuntimeDyldELF

[LLVMdev] [MCJIT] Multiple GOT handling in RuntimeDyldELF

[LLVMdev] [MCJIT] Multiple GOT handling in RuntimeDyldELF

[LLVMdev] [MCJIT] Multiple GOT handling in RuntimeDyldELF

[LLVMdev] [MCJIT] Multiple GOT handling in RuntimeDyldELF