thr3ads.net - llvm dev - [LLVMdev] JIT allocates global data in function body memory [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Jeffrey Yasskin

2009-Jun-30 18:18 UTC

[LLVMdev] JIT allocates global data in function body memory

On Mon, Jun 29, 2009 at 5:50 PM, Dale Johannesen<dalej at apple.com>
wrote:>
> On Jun 29, 2009, at 5:41 PMPDT, Reid Kleckner wrote:
>
>> So I (think I) found a bug in the JIT:
>> http://llvm.org/bugs/show_bug.cgi?id=4483
>>
>> Basically, globals used by a function are allocated in the same buffer
>> as the first code that uses it.  However, when you free the machine
>> code, you also free the memory holding the global's data.  The
address
>> is still in the GlobalValue map, so any other code using that global
>> will access freed memory, which will cause problems as soon as you
>> reallocate that memory for something else.
>>
>> I tracked down the commit that introduced the bug:
>> http://llvm.org/viewvc/llvm-project?view=rev&revision=54442
>>
>> It very nicely explains what it does, but not why it does it, which
>> I'd like to know before I change it.  I couldn't find the
author
>> (johannes) on IRC so ssen told me to ask LLVMdev about this behavior.
>
> That's me (and I'm not on IRC because I like messages to be
> archived).  The reason everything needs to go in the same buffer is
> that we're JITting code on one machine, then sending it to another to
> be executed, and references from one buffer to another won't work in
> that environment.  So that model needs to continue to work.  If you
> want to generalize it so other models work as well, go ahead.
So, you're moving code across machines without running any relocations
on it? How can that work? Are you just assuming that everything winds
up at the same addresses? Or is everything PC-relative on your
platform, so all that matters is that globals and the code are in the
same relative positions?

How are you getting the size of the code you need to copy?
MachineCodeInfo didn't exist when you wrote this patch, so I assume
you've written your own JITMemoryManager. Even then, if you JIT more
than one function, and they share any globals, you have to deal with
multiple calls into the MemoryManager and functions that use globals
allocated inside other buffers. You should be able to deal with having
separate calls to allocate global space and allocate code space. You'd
just remember the answers you gave and preserve them when copying to a
new system.

I'd like freeMachineCodeForFunction to avoid corrupting emitted
globals, and with the current arrangement of information within the
JIT, that means globals and code have to live in different
allocations. I think Reid's suggesting a flag of some sort, with one
setting for "freeMachineCodeForFunction works" and another for
"globals and code are allocated by a single call into the
MemoryManager." I'd like to avoid new knobs if it's possible, so do
you really need that second option? Or do you just need globals to be
allocated by some call into the MemoryManager?

Thanks!
Jeffrey

Dale Johannesen

2009-Jun-30 18:42 UTC

head link

[LLVMdev] JIT allocates global data in function body memory

On Jun 30, 2009, at 11:18 AMPDT, Jeffrey Yasskin wrote:
> On Mon, Jun 29, 2009 at 5:50 PM, Dale Johannesen<dalej at apple.com>
> wrote:
>>
>> On Jun 29, 2009, at 5:41 PMPDT, Reid Kleckner wrote:
>>
>>> So I (think I) found a bug in the JIT:
>>> http://llvm.org/bugs/show_bug.cgi?id=4483
>>>
>>> Basically, globals used by a function are allocated in the same  
>>> buffer
>>> as the first code that uses it.  However, when you free the machine
>>> code, you also free the memory holding the global's data.  The
>>> address
>>> is still in the GlobalValue map, so any other code using that
global
>>> will access freed memory, which will cause problems as soon as you
>>> reallocate that memory for something else.
>>>
>>> I tracked down the commit that introduced the bug:
>>> http://llvm.org/viewvc/llvm-project?view=rev&revision=54442
>>>
>>> It very nicely explains what it does, but not why it does it, which
>>> I'd like to know before I change it.  I couldn't find the
author
>>> (johannes) on IRC so ssen told me to ask LLVMdev about this  
>>> behavior.
>>
>> That's me (and I'm not on IRC because I like messages to be
>> archived).  The reason everything needs to go in the same buffer is
>> that we're JITting code on one machine, then sending it to another
to
>> be executed, and references from one buffer to another won't work
in
>> that environment.  So that model needs to continue to work.  If you
>> want to generalize it so other models work as well, go ahead.
>
> So, you're moving code across machines without running any relocations
> on it? How can that work? Are you just assuming that everything winds
> up at the same addresses? Or is everything PC-relative on your
> platform, so all that matters is that globals and the code are in the
> same relative positions?
I am not the people actually doing this, I am the guy who changed llvm  
JIT handling so that this model would work.  I believe everything is  
PC-relative, but I don't know details (and probably couldn't talk  
about them on a public list if I did).  I don't think those guys do  
any freeing, so they don't have your problem.

The current model where code and data share a buffer needs to continue  
to work, and I have a fairly strong preference (and so will our  
client) that whatever you do should not require any changes to the  
existing client code.  Beyond that, I am not the kind of person who  
thinks there's only one way to do things; I won't object to what you  
do as long as it doesn't break what we're using now.
> How are you getting the size of the code you need to copy?
> MachineCodeInfo didn't exist when you wrote this patch, so I assume
> you've written your own JITMemoryManager. Even then, if you JIT more
> than one function, and they share any globals, you have to deal with
> multiple calls into the MemoryManager and functions that use globals
> allocated inside other buffers. You should be able to deal with having
> separate calls to allocate global space and allocate code space. You'd
> just remember the answers you gave and preserve them when copying to a
> new system.
>
> I'd like freeMachineCodeForFunction to avoid corrupting emitted
> globals, and with the current arrangement of information within the
> JIT, that means globals and code have to live in different
> allocations. I think Reid's suggesting a flag of some sort, with one
> setting for "freeMachineCodeForFunction works" and another for
> "globals and code are allocated by a single call into the
> MemoryManager." I'd like to avoid new knobs if it's possible,
so do
> you really need that second option? Or do you just need globals to be
> allocated by some call into the MemoryManager?
>
> Thanks!
> Jeffrey
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Andrew Haley

2009-Jun-30 19:18 UTC

head link

[LLVMdev] JIT allocates global data in function body memory

Dale Johannesen wrote:> On Jun 30, 2009, at 11:18 AMPDT, Jeffrey Yasskin wrote:
> 
>> On Mon, Jun 29, 2009 at 5:50 PM, Dale Johannesen<dalej at
apple.com>
>> wrote:
>>> On Jun 29, 2009, at 5:41 PMPDT, Reid Kleckner wrote:
>>>
>>>> So I (think I) found a bug in the JIT:
>>>> http://llvm.org/bugs/show_bug.cgi?id=4483
>>>>
>>>> Basically, globals used by a function are allocated in the same
>>>> buffer
>>>> as the first code that uses it.  However, when you free the
machine
>>>> code, you also free the memory holding the global's data. 
The
>>>> address
>>>> is still in the GlobalValue map, so any other code using that
global
>>>> will access freed memory, which will cause problems as soon as
you
>>>> reallocate that memory for something else.
>>>>
>>>> I tracked down the commit that introduced the bug:
>>>> http://llvm.org/viewvc/llvm-project?view=rev&revision=54442
>>>>
>>>> It very nicely explains what it does, but not why it does it,
which
>>>> I'd like to know before I change it.  I couldn't find
the author
>>>> (johannes) on IRC so ssen told me to ask LLVMdev about this  
>>>> behavior.
>>> That's me (and I'm not on IRC because I like messages to be
>>> archived).  The reason everything needs to go in the same buffer is
>>> that we're JITting code on one machine, then sending it to
another to
>>> be executed, and references from one buffer to another won't
work in
>>> that environment.  So that model needs to continue to work.  If you
>>> want to generalize it so other models work as well, go ahead.
>> So, you're moving code across machines without running any
relocations
>> on it? How can that work? Are you just assuming that everything winds
>> up at the same addresses? Or is everything PC-relative on your
>> platform, so all that matters is that globals and the code are in the
>> same relative positions?
I presume (hope, really) that we don't end up with code and data in the
same page.  From
Intel® 64 and IA-32 Architectures Optimization Reference Manual:
Assembly/Compiler Coding Rule 57. (H impact, L generality) Always put
code and data on separate pages.

Sorry, I guess you know this already.

Andrew.

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Jun 2009 - [LLVMdev] JIT allocates global data in function body memory

[LLVMdev] JIT allocates global data in function body memory

[LLVMdev] JIT allocates global data in function body memory

[LLVMdev] JIT allocates global data in function body memory

Reasonably Related Threads