Hi David, I believe that assertion indicates that something didn't get loaded into the lower 2GB of address space. That is, the memory manager isn't allocating memory in that range. I'm sure there must be a way to allocate memory in that range on FreeBSD. The system loader has to do it, right? I just don't know what makes it happen. -Andy -----Original Message----- From: Dr D. Chisnall [mailto:dc552 at hermes.cam.ac.uk] On Behalf Of David Chisnall Sent: Friday, May 10, 2013 11:06 AM To: Kaylor, Andrew Cc: LLVM Developers Mailing List Subject: Re: TLS with MCJIT (an experimental patch) Without the MSP_32BIT part, I consistently hit this assertion: Assertion failed: ((Type == ELF::R_X86_64_32 && (Value <= UINT32_MAX)) || (Type == ELF::R_X86_64_32S && ((int64_t)Value <= INT32_MAX && (int64_t)Value >= INT32_MIN))), function resolveX86_64Relocation, file ../lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp, line 222. David On 9 May 2013, at 13:58, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote:> Can you try it without the MAP_32BIT part? It won't be as reliable, but if the memory addresses it is asking for are available it could work. > > I agree that there are good reasons not to lock in on a single memory address, but I'm curious as to what other obstacles might be lurking behind the ones we know about. If the patch works when memory is loaded below 2GB then it would be possible to right a sophisticated memory manager that surveys the available memory in that space and selects an appropriate block in some non-deterministic manner. > > -Andy > > -----Original Message----- > From: Dr D. Chisnall [mailto:dc552 at hermes.cam.ac.uk] On Behalf Of David Chisnall > Sent: Wednesday, May 08, 2013 6:53 PM > To: Kaylor, Andrew > Cc: LLVM Developers Mailing List > Subject: Re: TLS with MCJIT (an experimental patch) > > Hi, > > Unfortunately, I can't compile this patch. MAP_32BIT is a Linuxism that doesn't work on FreeBSD (or OS X, or, as far as I can tell, anywhere except Linux). We can consider adding something similar to FreeBSD (although I'm hesitant to encourage anything that increases the determinism of the memory layout of JITed code, for security reasons), but it doesn't seem ideal. > > David > > On 8 May 2013, at 16:54, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote: > >> Hi David, >> >> Following up on the problems we discussed yesterday on IRC regarding TLS with MCJIT, I've put together the attached experimental patch. >> >> This patch makes three changes: >> >> 1. SectionMemoryManager is changed to request memory below the 2GB boundary by default. >> 2. sys::Memory::allocateMappedMemory is changed to set the MAP_32BIT flag if the requested "near" block is below the 2GB boundary. >> 3. RuntimeDyldELF is changed to recognize the possibility of external data symbols. >> >> Of these changes, items 2 and 3 are probably reasonable things to commit into trunk, and depending on how this turns out I will do so. Item 1 is a bit heavy-handed as presented here, but it suggests the type of thing that subclasses of SectionMemoryManager could do to make this work. If we had a way to communicate the code model to the memory manager from RuntimeDyld/MCJIT (and we obviously should!) then SectionMemoryManager could do something like this when small or medium memory models are selected on applicable platforms. >> >> When I tried this patch with the test case you provided yesterday it got through the compilation phase with lli using the small code model and the static relocation model, but it ultimately failed (but failed gracefully) because it couldn't resolve the '_ThreadRuneLocale' symbol. Resolution of external symbols is meant to be handled by the memory manager, so I thought perhaps you could get something working with this patch. >> >> Please give this a try and let me know how it works. >> >> Thanks, >> Andy >> <tls-experimental.patch> >
Can you elaborate on why MCJIT TLS support needs code in the low 2 GB? What piece of data do you need to be reachable? It sounds like this was discussed on IRC, but I'm curious. Does the MCJIT even have the reachability problems of the old JIT? If you build an object file in memory, presumably you can measure it and then allocate +x memory for it all at once, instead of the old model of not knowing how big it was going to be. If we build a module at a time, presumably separate modules don't need to be reachable w.r.t. each other, since they can use PLT-style stubs. On Tue, May 14, 2013 at 8:17 PM, Kaylor, Andrew <andrew.kaylor at intel.com>wrote:> Hi David, > > I believe that assertion indicates that something didn't get loaded into > the lower 2GB of address space. That is, the memory manager isn't > allocating memory in that range. > > I'm sure there must be a way to allocate memory in that range on FreeBSD. > The system loader has to do it, right? I just don't know what makes it > happen. > > -Andy > > -----Original Message----- > From: Dr D. Chisnall [mailto:dc552 at hermes.cam.ac.uk] On Behalf Of David > Chisnall > Sent: Friday, May 10, 2013 11:06 AM > To: Kaylor, Andrew > Cc: LLVM Developers Mailing List > Subject: Re: TLS with MCJIT (an experimental patch) > > Without the MSP_32BIT part, I consistently hit this assertion: > > Assertion failed: ((Type == ELF::R_X86_64_32 && (Value <= UINT32_MAX)) || > (Type == ELF::R_X86_64_32S && ((int64_t)Value <= INT32_MAX && > (int64_t)Value >= INT32_MIN))), function resolveX86_64Relocation, file > ../lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp, line 222. > > David > > On 9 May 2013, at 13:58, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote: > > > Can you try it without the MAP_32BIT part? It won't be as reliable, but > if the memory addresses it is asking for are available it could work. > > > > I agree that there are good reasons not to lock in on a single memory > address, but I'm curious as to what other obstacles might be lurking behind > the ones we know about. If the patch works when memory is loaded below 2GB > then it would be possible to right a sophisticated memory manager that > surveys the available memory in that space and selects an appropriate block > in some non-deterministic manner. > > > > -Andy > > > > -----Original Message----- > > From: Dr D. Chisnall [mailto:dc552 at hermes.cam.ac.uk] On Behalf Of David > Chisnall > > Sent: Wednesday, May 08, 2013 6:53 PM > > To: Kaylor, Andrew > > Cc: LLVM Developers Mailing List > > Subject: Re: TLS with MCJIT (an experimental patch) > > > > Hi, > > > > Unfortunately, I can't compile this patch. MAP_32BIT is a Linuxism that > doesn't work on FreeBSD (or OS X, or, as far as I can tell, anywhere except > Linux). We can consider adding something similar to FreeBSD (although I'm > hesitant to encourage anything that increases the determinism of the memory > layout of JITed code, for security reasons), but it doesn't seem ideal. > > > > David > > > > On 8 May 2013, at 16:54, "Kaylor, Andrew" <andrew.kaylor at intel.com> > wrote: > > > >> Hi David, > >> > >> Following up on the problems we discussed yesterday on IRC regarding > TLS with MCJIT, I've put together the attached experimental patch. > >> > >> This patch makes three changes: > >> > >> 1. SectionMemoryManager is changed to request memory below the > 2GB boundary by default. > >> 2. sys::Memory::allocateMappedMemory is changed to set the > MAP_32BIT flag if the requested "near" block is below the 2GB boundary. > >> 3. RuntimeDyldELF is changed to recognize the possibility of > external data symbols. > >> > >> Of these changes, items 2 and 3 are probably reasonable things to > commit into trunk, and depending on how this turns out I will do so. Item > 1 is a bit heavy-handed as presented here, but it suggests the type of > thing that subclasses of SectionMemoryManager could do to make this work. > If we had a way to communicate the code model to the memory manager from > RuntimeDyld/MCJIT (and we obviously should!) then SectionMemoryManager > could do something like this when small or medium memory models are > selected on applicable platforms. > >> > >> When I tried this patch with the test case you provided yesterday it > got through the compilation phase with lli using the small code model and > the static relocation model, but it ultimately failed (but failed > gracefully) because it couldn't resolve the '_ThreadRuneLocale' symbol. > Resolution of external symbols is meant to be handled by the memory > manager, so I thought perhaps you could get something working with this > patch. > >> > >> Please give this a try and let me know how it works. > >> > >> Thanks, > >> Andy > >> <tls-experimental.patch> > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130515/0087fc38/attachment.html>
I don't think this is actually a TLS-specific problem. The TLS case just exposed a couple of other shortcomings in the current code base. The problem is two-fold. First, MCJIT doesn't support the PIC relocation model for most platforms. Second, the MC code generation doesn't work with large code model and the static relocation model. Because of these two issues, to try to get TLS working, we wanted to generate code with the static relocation model and the small code model. It's the small code model that requires code to be loaded in the lower 2GB. In particular, when you use small code model with static relocation model MC generates relocations that assume 32-bit addresses (R_X86_64_32). Once this relocation is generated, the RuntimeDyld doesn't have enough information to be able to fake it if the address it needs to write into the relocation is bigger than 32-bits. For PC-relative relocations, we can just rely on everything being loaded in proximity, and in fact that happens even with the large memory model. For "absolute" 32-bit relocations that doesn't work. -Andy From: Reid Kleckner [mailto:rnk at google.com] Sent: Wednesday, May 15, 2013 5:47 AM To: Kaylor, Andrew Cc: David Chisnall; LLVM Developers Mailing List Subject: Re: [LLVMdev] TLS with MCJIT (an experimental patch) Can you elaborate on why MCJIT TLS support needs code in the low 2 GB? What piece of data do you need to be reachable? It sounds like this was discussed on IRC, but I'm curious. Does the MCJIT even have the reachability problems of the old JIT? If you build an object file in memory, presumably you can measure it and then allocate +x memory for it all at once, instead of the old model of not knowing how big it was going to be. If we build a module at a time, presumably separate modules don't need to be reachable w.r.t. each other, since they can use PLT-style stubs. On Tue, May 14, 2013 at 8:17 PM, Kaylor, Andrew <andrew.kaylor at intel.com<mailto:andrew.kaylor at intel.com>> wrote: Hi David, I believe that assertion indicates that something didn't get loaded into the lower 2GB of address space. That is, the memory manager isn't allocating memory in that range. I'm sure there must be a way to allocate memory in that range on FreeBSD. The system loader has to do it, right? I just don't know what makes it happen. -Andy -----Original Message----- From: Dr D. Chisnall [mailto:dc552 at hermes.cam.ac.uk<mailto:dc552 at hermes.cam.ac.uk>] On Behalf Of David Chisnall Sent: Friday, May 10, 2013 11:06 AM To: Kaylor, Andrew Cc: LLVM Developers Mailing List Subject: Re: TLS with MCJIT (an experimental patch) Without the MSP_32BIT part, I consistently hit this assertion: Assertion failed: ((Type == ELF::R_X86_64_32 && (Value <= UINT32_MAX)) || (Type == ELF::R_X86_64_32S && ((int64_t)Value <= INT32_MAX && (int64_t)Value >= INT32_MIN))), function resolveX86_64Relocation, file ../lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp, line 222. David On 9 May 2013, at 13:58, "Kaylor, Andrew" <andrew.kaylor at intel.com<mailto:andrew.kaylor at intel.com>> wrote:> Can you try it without the MAP_32BIT part? It won't be as reliable, but if the memory addresses it is asking for are available it could work. > > I agree that there are good reasons not to lock in on a single memory address, but I'm curious as to what other obstacles might be lurking behind the ones we know about. If the patch works when memory is loaded below 2GB then it would be possible to right a sophisticated memory manager that surveys the available memory in that space and selects an appropriate block in some non-deterministic manner. > > -Andy > > -----Original Message----- > From: Dr D. Chisnall [mailto:dc552 at hermes.cam.ac.uk<mailto:dc552 at hermes.cam.ac.uk>] On Behalf Of David Chisnall > Sent: Wednesday, May 08, 2013 6:53 PM > To: Kaylor, Andrew > Cc: LLVM Developers Mailing List > Subject: Re: TLS with MCJIT (an experimental patch) > > Hi, > > Unfortunately, I can't compile this patch. MAP_32BIT is a Linuxism that doesn't work on FreeBSD (or OS X, or, as far as I can tell, anywhere except Linux). We can consider adding something similar to FreeBSD (although I'm hesitant to encourage anything that increases the determinism of the memory layout of JITed code, for security reasons), but it doesn't seem ideal. > > David > > On 8 May 2013, at 16:54, "Kaylor, Andrew" <andrew.kaylor at intel.com<mailto:andrew.kaylor at intel.com>> wrote: > >> Hi David, >> >> Following up on the problems we discussed yesterday on IRC regarding TLS with MCJIT, I've put together the attached experimental patch. >> >> This patch makes three changes: >> >> 1. SectionMemoryManager is changed to request memory below the 2GB boundary by default. >> 2. sys::Memory::allocateMappedMemory is changed to set the MAP_32BIT flag if the requested "near" block is below the 2GB boundary. >> 3. RuntimeDyldELF is changed to recognize the possibility of external data symbols. >> >> Of these changes, items 2 and 3 are probably reasonable things to commit into trunk, and depending on how this turns out I will do so. Item 1 is a bit heavy-handed as presented here, but it suggests the type of thing that subclasses of SectionMemoryManager could do to make this work. If we had a way to communicate the code model to the memory manager from RuntimeDyld/MCJIT (and we obviously should!) then SectionMemoryManager could do something like this when small or medium memory models are selected on applicable platforms. >> >> When I tried this patch with the test case you provided yesterday it got through the compilation phase with lli using the small code model and the static relocation model, but it ultimately failed (but failed gracefully) because it couldn't resolve the '_ThreadRuneLocale' symbol. Resolution of external symbols is meant to be handled by the memory manager, so I thought perhaps you could get something working with this patch. >> >> Please give this a try and let me know how it works. >> >> Thanks, >> Andy >> <tls-experimental.patch> >_______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130515/831bd6e3/attachment.html>
On 15 May 2013, at 01:17, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote:> I believe that assertion indicates that something didn't get loaded into the lower 2GB of address space. That is, the memory manager isn't allocating memory in that range. > > I'm sure there must be a way to allocate memory in that range on FreeBSD. The system loader has to do it, right? I just don't know what makes it happen.I've asked around, and we don't seem to have anything that can do it. Checking the code for rtld, it explicitly asks for memory at a specific address and keeps track of the regions it has used. David
On 22 May 2013 06:22, David Chisnall <David.Chisnall at cl.cam.ac.uk> wrote:> On 15 May 2013, at 01:17, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote: > >> I believe that assertion indicates that something didn't get loaded into the lower 2GB of address space. That is, the memory manager isn't allocating memory in that range. >> >> I'm sure there must be a way to allocate memory in that range on FreeBSD. The system loader has to do it, right? I just don't know what makes it happen. > > I've asked around, and we don't seem to have anything that can do it. Checking the code for rtld, it explicitly asks for memory at a specific address and keeps track of the regions it has used.I was under the impression that, in the small memory model, each .so had to be small, but because of the use of GOTs and PLTs they could be anywhere in memory. If we allocate the tls memory in the same allocator call that allocates space for the text section this would work, no?> David > >Cheers, Rafael
On 22 May 2013, at 13:23, Rafael Espíndola <rafael.espindola at gmail.com> wrote:> Why the private message? If unintentional, please forward this to the list.Ooops, forgot to hit reply-all. Didn't the LLVM lists used to default to reply-to-list behaviour?> So, the JIT is analogous to dlopen, so it should be using general > dynamic and local dynamic models. It is only the initial exec and > local exec that require the dynamic linker to allocate memory at > startup.The dynamic linker will have allocated the memory because the TLS variable in question is provided by libc. It is already allocated before the JIT'd code runs. The JIT'd code just needs to refer to it.> If MCJIT is producing TLS code using initial exec or local exec that is a bug. > > On 22 May 2013 08:14, David Chisnall <David.Chisnall at cl.cam.ac.uk> wrote: >> On 22 May 2013, at 13:01, Rafael Espíndola <rafael.espindola at gmail.com> wrote: >> >>> On 22 May 2013 06:22, David Chisnall <David.Chisnall at cl.cam.ac.uk> wrote: >>>> On 15 May 2013, at 01:17, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote: >>>> >>>>> I believe that assertion indicates that something didn't get loaded into the lower 2GB of address space. That is, the memory manager isn't allocating memory in that range. >>>>> >>>>> I'm sure there must be a way to allocate memory in that range on FreeBSD. The system loader has to do it, right? I just don't know what makes it happen. >>>> >>>> I've asked around, and we don't seem to have anything that can do it. Checking the code for rtld, it explicitly asks for memory at a specific address and keeps track of the regions it has used. >>> >>> >>> I was under the impression that, in the small memory model, each .so >>> had to be small, but because of the use of GOTs and PLTs they could be >>> anywhere in memory. If we allocate the tls memory in the same >>> allocator call that allocates space for the text section this would >>> work, no? >> >> I'm not sure what you mean by 'we' in this context. The memory for TLS (in this instance) is allocated when libc is loaded, as one of the first things after rtld starts. The memory will be accessed via a segment register, so the offset is actually the thing that must be resolved. Because this is static at run time (irrespective of the TLS model, the offset doesn't change during a single program run), the MC loader could hard-code the offset, irrespective of where it ends up. The memory for the text section in JIT'd code is allocated by the memory manager. We do not have a way of asking rtld to allocate memory for us. >> >> If we're going to go via GOTs and PLTs for calls out of JIT'd code then it seems that we're losing a lot of the benefit of the JIT, as it knows when doing code generation exactly where every function and every variable is. >> >> David >>
>> So, the JIT is analogous to dlopen, so it should be using general >> dynamic and local dynamic models. It is only the initial exec and >> local exec that require the dynamic linker to allocate memory at >> startup. > > The dynamic linker will have allocated the memory because the TLS variable in question is provided by libc. It is already allocated before the JIT'd code runs. The JIT'd code just needs to refer to it.OK. Are we generating generic dynamic code to do so? It will look like .byte 0x66 leaq x at tlsgd(%rip),%rdi ; R_X86_64_TLSGD to symbol x (MCJIT has to create a GOT entry) .word 0x6666 rex64 call __tls get_addr at plt ; R_X86_64_PLT32 to __tls_get_addr (MCJIT has to create a GOT and a PLT entry) This should work from any place in memory. I wouldn't be surprised if these relocations are not implemented yet, but that should be all that is needed to get tls working. Cheers, Rafael
22.05.2013, 17:55, "David Chisnall" <David.Chisnall at cl.cam.ac.uk>:> On 22 May 2013, at 13:23, Rafael Espíndola <rafael.espindola at gmail.com> wrote: > >> Why the private message? If unintentional, please forward this to the list. > > Ooops, forgot to hit reply-all. Didn't the LLVM lists used to default to reply-to-list behaviour?http://www.unicom.com/pw/reply-to-harmful.html -- Regards, Konstantin