Jeffrey Yasskin
2009-Jun-30 18:18 UTC
[LLVMdev] JIT allocates global data in function body memory
On Mon, Jun 29, 2009 at 5:50 PM, Dale Johannesen<dalej at apple.com> wrote:> > On Jun 29, 2009, at 5:41 PMPDT, Reid Kleckner wrote: > >> So I (think I) found a bug in the JIT: >> http://llvm.org/bugs/show_bug.cgi?id=4483 >> >> Basically, globals used by a function are allocated in the same buffer >> as the first code that uses it. However, when you free the machine >> code, you also free the memory holding the global's data. The address >> is still in the GlobalValue map, so any other code using that global >> will access freed memory, which will cause problems as soon as you >> reallocate that memory for something else. >> >> I tracked down the commit that introduced the bug: >> http://llvm.org/viewvc/llvm-project?view=rev&revision=54442 >> >> It very nicely explains what it does, but not why it does it, which >> I'd like to know before I change it. I couldn't find the author >> (johannes) on IRC so ssen told me to ask LLVMdev about this behavior. > > That's me (and I'm not on IRC because I like messages to be > archived). The reason everything needs to go in the same buffer is > that we're JITting code on one machine, then sending it to another to > be executed, and references from one buffer to another won't work in > that environment. So that model needs to continue to work. If you > want to generalize it so other models work as well, go ahead.So, you're moving code across machines without running any relocations on it? How can that work? Are you just assuming that everything winds up at the same addresses? Or is everything PC-relative on your platform, so all that matters is that globals and the code are in the same relative positions? How are you getting the size of the code you need to copy? MachineCodeInfo didn't exist when you wrote this patch, so I assume you've written your own JITMemoryManager. Even then, if you JIT more than one function, and they share any globals, you have to deal with multiple calls into the MemoryManager and functions that use globals allocated inside other buffers. You should be able to deal with having separate calls to allocate global space and allocate code space. You'd just remember the answers you gave and preserve them when copying to a new system. I'd like freeMachineCodeForFunction to avoid corrupting emitted globals, and with the current arrangement of information within the JIT, that means globals and code have to live in different allocations. I think Reid's suggesting a flag of some sort, with one setting for "freeMachineCodeForFunction works" and another for "globals and code are allocated by a single call into the MemoryManager." I'd like to avoid new knobs if it's possible, so do you really need that second option? Or do you just need globals to be allocated by some call into the MemoryManager? Thanks! Jeffrey
Dale Johannesen
2009-Jun-30 18:42 UTC
[LLVMdev] JIT allocates global data in function body memory
On Jun 30, 2009, at 11:18 AMPDT, Jeffrey Yasskin wrote:> On Mon, Jun 29, 2009 at 5:50 PM, Dale Johannesen<dalej at apple.com> > wrote: >> >> On Jun 29, 2009, at 5:41 PMPDT, Reid Kleckner wrote: >> >>> So I (think I) found a bug in the JIT: >>> http://llvm.org/bugs/show_bug.cgi?id=4483 >>> >>> Basically, globals used by a function are allocated in the same >>> buffer >>> as the first code that uses it. However, when you free the machine >>> code, you also free the memory holding the global's data. The >>> address >>> is still in the GlobalValue map, so any other code using that global >>> will access freed memory, which will cause problems as soon as you >>> reallocate that memory for something else. >>> >>> I tracked down the commit that introduced the bug: >>> http://llvm.org/viewvc/llvm-project?view=rev&revision=54442 >>> >>> It very nicely explains what it does, but not why it does it, which >>> I'd like to know before I change it. I couldn't find the author >>> (johannes) on IRC so ssen told me to ask LLVMdev about this >>> behavior. >> >> That's me (and I'm not on IRC because I like messages to be >> archived). The reason everything needs to go in the same buffer is >> that we're JITting code on one machine, then sending it to another to >> be executed, and references from one buffer to another won't work in >> that environment. So that model needs to continue to work. If you >> want to generalize it so other models work as well, go ahead. > > So, you're moving code across machines without running any relocations > on it? How can that work? Are you just assuming that everything winds > up at the same addresses? Or is everything PC-relative on your > platform, so all that matters is that globals and the code are in the > same relative positions?I am not the people actually doing this, I am the guy who changed llvm JIT handling so that this model would work. I believe everything is PC-relative, but I don't know details (and probably couldn't talk about them on a public list if I did). I don't think those guys do any freeing, so they don't have your problem. The current model where code and data share a buffer needs to continue to work, and I have a fairly strong preference (and so will our client) that whatever you do should not require any changes to the existing client code. Beyond that, I am not the kind of person who thinks there's only one way to do things; I won't object to what you do as long as it doesn't break what we're using now.> How are you getting the size of the code you need to copy? > MachineCodeInfo didn't exist when you wrote this patch, so I assume > you've written your own JITMemoryManager. Even then, if you JIT more > than one function, and they share any globals, you have to deal with > multiple calls into the MemoryManager and functions that use globals > allocated inside other buffers. You should be able to deal with having > separate calls to allocate global space and allocate code space. You'd > just remember the answers you gave and preserve them when copying to a > new system. > > I'd like freeMachineCodeForFunction to avoid corrupting emitted > globals, and with the current arrangement of information within the > JIT, that means globals and code have to live in different > allocations. I think Reid's suggesting a flag of some sort, with one > setting for "freeMachineCodeForFunction works" and another for > "globals and code are allocated by a single call into the > MemoryManager." I'd like to avoid new knobs if it's possible, so do > you really need that second option? Or do you just need globals to be > allocated by some call into the MemoryManager? > > Thanks! > Jeffrey > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Andrew Haley
2009-Jun-30 19:18 UTC
[LLVMdev] JIT allocates global data in function body memory
Dale Johannesen wrote:> On Jun 30, 2009, at 11:18 AMPDT, Jeffrey Yasskin wrote: > >> On Mon, Jun 29, 2009 at 5:50 PM, Dale Johannesen<dalej at apple.com> >> wrote: >>> On Jun 29, 2009, at 5:41 PMPDT, Reid Kleckner wrote: >>> >>>> So I (think I) found a bug in the JIT: >>>> http://llvm.org/bugs/show_bug.cgi?id=4483 >>>> >>>> Basically, globals used by a function are allocated in the same >>>> buffer >>>> as the first code that uses it. However, when you free the machine >>>> code, you also free the memory holding the global's data. The >>>> address >>>> is still in the GlobalValue map, so any other code using that global >>>> will access freed memory, which will cause problems as soon as you >>>> reallocate that memory for something else. >>>> >>>> I tracked down the commit that introduced the bug: >>>> http://llvm.org/viewvc/llvm-project?view=rev&revision=54442 >>>> >>>> It very nicely explains what it does, but not why it does it, which >>>> I'd like to know before I change it. I couldn't find the author >>>> (johannes) on IRC so ssen told me to ask LLVMdev about this >>>> behavior. >>> That's me (and I'm not on IRC because I like messages to be >>> archived). The reason everything needs to go in the same buffer is >>> that we're JITting code on one machine, then sending it to another to >>> be executed, and references from one buffer to another won't work in >>> that environment. So that model needs to continue to work. If you >>> want to generalize it so other models work as well, go ahead. >> So, you're moving code across machines without running any relocations >> on it? How can that work? Are you just assuming that everything winds >> up at the same addresses? Or is everything PC-relative on your >> platform, so all that matters is that globals and the code are in the >> same relative positions?I presume (hope, really) that we don't end up with code and data in the same page. From Intel® 64 and IA-32 Architectures Optimization Reference Manual: Assembly/Compiler Coding Rule 57. (H impact, L generality) Always put code and data on separate pages. Sorry, I guess you know this already. Andrew.
Apparently Analagous Threads
- [LLVMdev] JIT allocates global data in function body memory
- [LLVMdev] JIT allocates global data in function body memory
- [LLVMdev] JIT allocates global data in function body memory
- [LLVMdev] JIT allocates global data in function body memory
- [LLVMdev] JIT allocates global data in function body memory