Fellow developers, I'm parallelizing loops to be called by pthread. The thread body that I pass to pthread_create looks like define i8* @loop1({ i32*, i32* }* nest %parent_frame, i8* %arg) parent_frame is pointer to shared variables in original function 0x00007f0de11c41f0: mov (%r10),%rax 0x00007f0de11c41f3: cmpl $0x63,(%rax) 0x00007f0de11c41f6: jg 0x7f0de11c420c 0x00007f0de11c41fc: mov 0x8(%r10),%rax 0x00007f0de11c4200: incl (%rax) 0x00007f0de11c4202: mov (%r10),%rax 0x00007f0de11c4205: incl (%rax) 0x00007f0de11c4207: jmpq 0x7f0de11c41f0 0x00007f0de11c420c: xor %rax,%rax 0x00007f0de11c420f: retq I use init_trampoline to generate code that sets up the static link: 0x00007fffee982316: mov $0x7f48e1a08fb0,%r11 0x00007fffee982320: mov $0x7fffee982330,%r10 the static link 0x00007fffee98232a: rex.WB jmpq *%r11 The program crashes in loop1 on the 2nd instruction. r10, which contained the static link was different from the value set by the trampoline. Upon closer inspection, it looks like the trampoline first jumps to a stub that compiles loop1: 0x00007f48e1a08fb0: mov $0x5c61c0,%r10 0x00007f48e1a08fba: callq *%r10 0x00007f48e1a08fbd: int $0x0 But that clobbers r10 which loop1 needs. According to the x86-64 ABI, r10 isn't preserved across functions, but here it needs to be. Is there anyway to force LLVM to do that? I tried telling lli to compile the entire program (-no-lazy) so that the stub won't be generated, but gives the error: LLVM JIT requested to do lazy compilation of function '_Z41__static_initialization_and_destruction_0ii' when lazy compiles are disabled! Any ideas? Note, I had to compile lli with -z execstack in order for trampolines on the stack to work. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20081031/682d11d7/attachment.html>
On 2008-10-31 23:57, Yale Zhang wrote:> Fellow developers, > > I'm parallelizing loops to be called by pthread. The thread body that > I pass to pthread_create looks like > > define i8* @loop1({ i32*, i32* }* nest %parent_frame, i8* > %arg) parent_frame is pointer to shared variables in > original function > > 0x00007f0de11c41f0: mov (%r10),%rax > 0x00007f0de11c41f3: cmpl $0x63,(%rax) > 0x00007f0de11c41f6: jg 0x7f0de11c420c > 0x00007f0de11c41fc: mov 0x8(%r10),%rax > 0x00007f0de11c4200: incl (%rax) > 0x00007f0de11c4202: mov (%r10),%rax > 0x00007f0de11c4205: incl (%rax) > 0x00007f0de11c4207: jmpq 0x7f0de11c41f0 > 0x00007f0de11c420c: xor %rax,%rax > 0x00007f0de11c420f: retq > > I use init_trampoline to generate code that sets up the static link: > > 0x00007fffee982316: mov $0x7f48e1a08fb0,%r11 > 0x00007fffee982320: mov $0x7fffee982330,%r10 the > static link > 0x00007fffee98232a: rex.WB jmpq *%r11 > > The program crashes in loop1 on the 2nd instruction. r10, which > contained the static link was different from the value set by the > trampoline. > > Upon closer inspection, it looks like the trampoline first jumps to a > stub that compiles loop1: > > 0x00007f48e1a08fb0: mov $0x5c61c0,%r10 > 0x00007f48e1a08fba: callq *%r10 > 0x00007f48e1a08fbd: int $0x0 > > But that clobbers r10 which loop1 needs. According to the x86-64 ABI, > r10 isn't preserved across functions, but here it needs to be. Is > there anyway > to force LLVM to do that? I tried telling lli to compile the entire > program (-no-lazy) so that the stub won't be generated, but gives the > error: > > LLVM JIT requested to do lazy compilation of function > '_Z41__static_initialization_and_destruction_0ii' when lazy compiles > are disabled! > > Any ideas?Hmm, lli.cpp does this: if (NoLazyCompilation) EE->DisableLazyCompilation(); [....] // Run static constructors. EE->runStaticConstructorsDestructors(false); if (NoLazyCompilation) { for (Module::iterator I = Mod->begin(), E = Mod->end(); I != E; ++I) { Function *Fn = &*I; if (Fn != MainFn && !Fn->isDeclaration()) EE->getPointerToFunction(Fn); } } If you actually have static constructors and destructors then nolazy may not work. You could try moving the runStatic... below the NoLazy block, but it could be that compiling the functions themselves could need those constructors to be run already. The easiest way out seems to move the DisableLazyCompilation just after you've run the static constructors. Best regards, --Edwin
Hi,> I'm parallelizing loops to be called by pthread. The thread body that I pass > to pthread_create looks like > > define i8* @loop1({ i32*, i32* }* nest %parent_frame, i8* %arg) > parent_frame is pointer to shared variables in original function > > 0x00007f0de11c41f0: mov (%r10),%rax > 0x00007f0de11c41f3: cmpl $0x63,(%rax) > 0x00007f0de11c41f6: jg 0x7f0de11c420c > 0x00007f0de11c41fc: mov 0x8(%r10),%rax > 0x00007f0de11c4200: incl (%rax) > 0x00007f0de11c4202: mov (%r10),%rax > 0x00007f0de11c4205: incl (%rax) > 0x00007f0de11c4207: jmpq 0x7f0de11c41f0 > 0x00007f0de11c420c: xor %rax,%rax > 0x00007f0de11c420f: retq > > I use init_trampoline to generate code that sets up the static link: > > 0x00007fffee982316: mov $0x7f48e1a08fb0,%r11 > 0x00007fffee982320: mov $0x7fffee982330,%r10 the static > link > 0x00007fffee98232a: rex.WB jmpq *%r11 > > The program crashes in loop1 on the 2nd instruction. r10, which contained > the static link was different from the value set by the trampoline. > > Upon closer inspection, it looks like the trampoline first jumps to a stub > that compiles loop1: > > 0x00007f48e1a08fb0: mov $0x5c61c0,%r10 > 0x00007f48e1a08fba: callq *%r10 > 0x00007f48e1a08fbd: int $0x0 > > But that clobbers r10 which loop1 needs. According to the x86-64 ABI, r10 > isn't preserved across functions, but here it needs to be. Is there anyway > to force LLVM to do that?you must be the first person to try using nest functions with the JIT :) If you look in X86JITInfo.cpp, in the function X86JITInfo::emitFunctionStub, you will see the code generating the stub and using r10. I think the right solution is to change r10 to a different call clobbered register. It would also be possible to have the trampoline use a different register, but since the x86-64 ABI explicitly states that r10 should be used for the static chain, I'd rather not. I'm also wondering about the x86-32 case. There are no comments in the JIT stub code in this case, so I'm not sure which register it is using. The problem with x86-32 is that there are so few registers, and for some calling conventions there is only one spare call clobbered register available. This is used by trampolines, so if it's also used by JIT, which is almost surely the case, that will cause trouble. Even worse, it looks like the JIT is wrong even without trampolines, because for the C and X86_StdCall conventions it is ECX that is spare, while for X86_FastCall and Fast it is EAX. Yet the JIT always uses the same hardwired code, and does not adjust according to the calling convention. So presumably it is broken for one of these sets of calling conventions. Hopefully Anton can comment on this.> I tried telling lli to compile the entire program > (-no-lazy) so that the stub won't be generated, but gives the error: > > LLVM JIT requested to do lazy compilation of function > '_Z41__static_initialization_and_destruction_0ii' when lazy compiles are > disabled! > > Any ideas? > > Note, I had to compile lli with -z execstack in order for trampolines on the > stack to work.Maybe lli can be taught to mark itself as having an executable stack when it sees a trampoline. I'm not sure how this can best be done. On linux I guess it can be done using mmap. Ciao, Duncan.
I admit I got carried away with trying to use an extra static link when the arg parameter would've sufficed. Using a static link is probably still a better idea because if there are > 1 loop to parallelize in a function, they would share the same parent frame struct but might have a separate structs describing their parameters. "you must be the first person to try using nest functions with the JIT :) " Well, this is a project in a dynamic optimization course. The JIT lacks a lot of things for this purpose like recompiling, patching old callers to refer to the new code, and deleting old machine code - currently, it just overwrites the old code with a branch to the new code and makes no attempt to patch the callers. We'll probably come up with something more sophisticated and submit it. "If you look in X86JITInfo.cpp, in the function X86JITInfo::emitFunctionStub, you will see the code generating the stub and using r10" I didn't expect it to be that easy. I thought I needed to add special rules to the register allocator. I'll take a look at it. On Sat, Nov 1, 2008 at 3:54 AM, Duncan Sands <duncan.sands at math.u-psud.fr>wrote:> Hi, > > > I'm parallelizing loops to be called by pthread. The thread body that I > pass > > to pthread_create looks like > > > > define i8* @loop1({ i32*, i32* }* nest %parent_frame, i8* %arg) > > parent_frame is pointer to shared variables in original function > > > > 0x00007f0de11c41f0: mov (%r10),%rax > > 0x00007f0de11c41f3: cmpl $0x63,(%rax) > > 0x00007f0de11c41f6: jg 0x7f0de11c420c > > 0x00007f0de11c41fc: mov 0x8(%r10),%rax > > 0x00007f0de11c4200: incl (%rax) > > 0x00007f0de11c4202: mov (%r10),%rax > > 0x00007f0de11c4205: incl (%rax) > > 0x00007f0de11c4207: jmpq 0x7f0de11c41f0 > > 0x00007f0de11c420c: xor %rax,%rax > > 0x00007f0de11c420f: retq > > > > I use init_trampoline to generate code that sets up the static link: > > > > 0x00007fffee982316: mov $0x7f48e1a08fb0,%r11 > > 0x00007fffee982320: mov $0x7fffee982330,%r10 the > static > > link > > 0x00007fffee98232a: rex.WB jmpq *%r11 > > > > The program crashes in loop1 on the 2nd instruction. r10, which contained > > the static link was different from the value set by the trampoline. > > > > Upon closer inspection, it looks like the trampoline first jumps to a > stub > > that compiles loop1: > > > > 0x00007f48e1a08fb0: mov $0x5c61c0,%r10 > > 0x00007f48e1a08fba: callq *%r10 > > 0x00007f48e1a08fbd: int $0x0 > > > > But that clobbers r10 which loop1 needs. According to the x86-64 ABI, r10 > > isn't preserved across functions, but here it needs to be. Is there > anyway > > to force LLVM to do that? > > you must be the first person to try using nest functions with the JIT :) > If you look in X86JITInfo.cpp, in the function > X86JITInfo::emitFunctionStub, > you will see the code generating the stub and using r10. I think the right > solution is to change r10 to a different call clobbered register. It would > also be possible to have the trampoline use a different register, but since > the x86-64 ABI explicitly states that r10 should be used for the static > chain, > I'd rather not. > > I'm also wondering about the x86-32 case. There are no comments in the > JIT stub code in this case, so I'm not sure which register it is using. > The problem with x86-32 is that there are so few registers, and for some > calling conventions there is only one spare call clobbered register > available. This is used by trampolines, so if it's also used by JIT, > which is almost surely the case, that will cause trouble. Even worse, > it looks like the JIT is wrong even without trampolines, because for > the C and X86_StdCall conventions it is ECX that is spare, while for > X86_FastCall and Fast it is EAX. Yet the JIT always uses the same > hardwired code, and does not adjust according to the calling convention. > So presumably it is broken for one of these sets of calling conventions. > > Hopefully Anton can comment on this. > > > I tried telling lli to compile the entire program > > (-no-lazy) so that the stub won't be generated, but gives the error: > > > > LLVM JIT requested to do lazy compilation of function > > '_Z41__static_initialization_and_destruction_0ii' when lazy compiles are > > disabled! > > > > Any ideas? > > > > Note, I had to compile lli with -z execstack in order for trampolines on > the > > stack to work. > > Maybe lli can be taught to mark itself as having an executable stack when > it sees a trampoline. I'm not sure how this can best be done. On linux > I guess it can be done using mmap. > > Ciao, > > Duncan. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20081101/13b14150/attachment.html>