thr3ads.net - llvm dev - [LLVMdev] Why does the x86-64 JIT emit stubs for external calls? [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Jeffrey Yasskin

2009-Jun-10 19:17 UTC

[LLVMdev] Why does the x86-64 JIT emit stubs for external calls?

In X86CodeGen.cpp, the following code appears in the handler used for
CALL64pcrel32 instructions:

        // Assume undefined functions may be outside the Small codespace.
        bool NeedStub           (Is64BitMode &&
              (TM.getCodeModel() == CodeModel::Large ||
               TM.getSubtarget<X86Subtarget>().isTargetDarwin())) ||
          Opcode == X86::TAILJMPd;
        emitGlobalAddress(MO.getGlobal(), X86::reloc_pcrel_word,
                          MO.getOffset(), 0, NeedStub);

This causes every external call to be emitted as a call to a stub
which then jumps to the real function.
I understand, thanks to the helpful folks on #llvm, that calls across
more than 31 bits of address space need to be emitted as a "mov
$ADDRESS, r10; call *r10" pair instead of the simple "call
rip+ADDRESS" used for calls within 31 bits. But why isn't the mov+call
pair emitted inline? And why are Darwin and TAILJMPs special?

Having this out of line seems to lose up to 2% performance on the
Unladen Swallow benchmarks, so, while it's not urgent, it'd be nice to
figure out how to avoid the stubs.

What kind of patch would be welcome to fix this?

Thanks,
Jeffrey

Evan Cheng

2009-Jun-11 19:54 UTC

head link

[LLVMdev] Why does the x86-64 JIT emit stubs for external calls?

On Jun 10, 2009, at 12:17 PM, Jeffrey Yasskin wrote:
> In X86CodeGen.cpp, the following code appears in the handler used for
> CALL64pcrel32 instructions:
>
>        // Assume undefined functions may be outside the Small  
> codespace.
>        bool NeedStub >          (Is64BitMode &&
>              (TM.getCodeModel() == CodeModel::Large ||
>               TM.getSubtarget<X86Subtarget>().isTargetDarwin())) ||
>          Opcode == X86::TAILJMPd;
>        emitGlobalAddress(MO.getGlobal(), X86::reloc_pcrel_word,
>                          MO.getOffset(), 0, NeedStub);
>
> This causes every external call to be emitted as a call to a stub
> which then jumps to the real function.
> I understand, thanks to the helpful folks on #llvm, that calls across
> more than 31 bits of address space need to be emitted as a "mov
> $ADDRESS, r10; call *r10" pair instead of the simple "call
> rip+ADDRESS" used for calls within 31 bits. But why isn't the
mov+call
> pair emitted inline? And why are Darwin and TAILJMPs special?
This is needed because of lazy compilation, before the callee is  
resolved, it is just a JIT stub. It's heap allocated so it may not be  
in the lower 4G even if the code size model is small. I know this is  
the case on Darwin x86_64, I am not sure about other targets. I forgot  
why this is needed for tail calls, sorry.

In theory we can make the code generator inline mov+call, the reality  
is it doesn't know whether it's jitting or not. Also, we really want  
to keep the code generation the same (as much as possible) whether  
it's jitting or compiling. One possible solution for this is to add  
code size model specifically for JIT so code generator can generate  
more efficient code in that configuration.

Evan
>
>
> Having this out of line seems to lose up to 2% performance on the
> Unladen Swallow benchmarks, so, while it's not urgent, it'd be nice
to
> figure out how to avoid the stubs.
>
> What kind of patch would be welcome to fix this?
>
> Thanks,
> Jeffrey
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Jeffrey Yasskin

2009-Jun-11 23:24 UTC

head link

[LLVMdev] [unladen-swallow] Re: Why does the x86-64 JIT emit stubs for external calls?

On Thu, Jun 11, 2009 at 12:54 PM, Evan Cheng<evan.cheng at apple.com>
wrote:>
>
>
> On Jun 10, 2009, at 12:17 PM, Jeffrey Yasskin wrote:
>
>> In X86CodeGen.cpp, the following code appears in the handler used for
>> CALL64pcrel32 instructions:
>>
>>       // Assume undefined functions may be outside the Small codespace.
>>       bool NeedStub >>         (Is64BitMode &&
>>             (TM.getCodeModel() == CodeModel::Large ||
>>              TM.getSubtarget<X86Subtarget>().isTargetDarwin()))
||
>>         Opcode == X86::TAILJMPd;
>>       emitGlobalAddress(MO.getGlobal(), X86::reloc_pcrel_word,
>>                         MO.getOffset(), 0, NeedStub);
>>
>> This causes every external call to be emitted as a call to a stub
>> which then jumps to the real function.
>> I understand, thanks to the helpful folks on #llvm, that calls across
>> more than 31 bits of address space need to be emitted as a "mov
>> $ADDRESS, r10; call *r10" pair instead of the simple "call
>> rip+ADDRESS" used for calls within 31 bits. But why isn't the
mov+call
>> pair emitted inline? And why are Darwin and TAILJMPs special?
>
> This is needed because of lazy compilation, before the callee is resolved,
> it is just a JIT stub.
Even with lazy compilation, the contents of the stub get emitted (by
JITEmitter::getPointerToGlobal) as a direct call to the function, not
the compilation callback, because the function is an external
declaration. You can watch this happen with the following program:

declare i32 @rand()

define i32 @main() nounwind {
entry:
	%call = tail call i32 @rand()		; <i32> [#uses=1]
	%add = add i32 %call, 2		; <i32> [#uses=1]
	ret i32 %add
}

and the command line `lli -debug-only=jit -march=x86-64 test.bc`.

With lazy compilation and a call to an internal function, the
JITEmitter can emit a stub even if MachineRelocation::doesntNeedStub()
(the field NeedStub gets passed into) returns true. Only returning
false constrains the emitter.
> It's heap allocated so it may not be in the lower 4G
> even if the code size model is small. I know this is the case on Darwin
> x86_64, I am not sure about other targets.
Oh, other targets can certainly allocate code above 4G too.
sys::AllocateRWX just uses mmap with no constraints on the returned
address, and I've got a Linux desktop where that always produces an
address over 4G.
> I forgot why this is needed for
> tail calls, sorry.
>
> In theory we can make the code generator inline mov+call, the reality is it
> doesn't know whether it's jitting or not. Also, we really want to
keep the
> code generation the same (as much as possible) whether it's jitting or
> compiling. One possible solution for this is to add code size model
> specifically for JIT so code generator can generate more efficient code in
> that configuration.
For non-JIT, the code generator doesn't ever need a stub, right? The
linker does it using the relocation information? It must be ignoring
the NeedStub parameter. ... But wait, is this code generator used for
anything besides the JIT? Compiling uses the AsmPrinter until direct
object code generation lands, and presumably they're redesigning this
whole subsystem.


It sounds like I'd have to fully understand the whole structure of the
code generator to fix this, and for <=2% performance, that's not
really worth it. I'll probably wait for the direct object code people
to get around to it. Thanks though.
>>
>>
>> Having this out of line seems to lose up to 2% performance on the
>> Unladen Swallow benchmarks, so, while it's not urgent, it'd be
nice to
>> figure out how to avoid the stubs.
>>
>> What kind of patch would be welcome to fix this?
>>
>> Thanks,
>> Jeffrey
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

Aaron Gray

2009-Jun-12 00:56 UTC

head link

[LLVMdev] Why does the x86-64 JIT emit stubs for external calls?

> On Jun 10, 2009, at 12:17 PM, Jeffrey Yasskin wrote:
>
>> In X86CodeGen.cpp, the following code appears in the handler used for
>> CALL64pcrel32 instructions:
>>
>>        // Assume undefined functions may be outside the Small
>> codespace.
>>        bool NeedStub >>          (Is64BitMode &&
>>              (TM.getCodeModel() == CodeModel::Large ||
>>               TM.getSubtarget<X86Subtarget>().isTargetDarwin()))
||
>>          Opcode == X86::TAILJMPd;
>>        emitGlobalAddress(MO.getGlobal(), X86::reloc_pcrel_word,
>>                          MO.getOffset(), 0, NeedStub);
>>
>> This causes every external call to be emitted as a call to a stub
>> which then jumps to the real function.
>> I understand, thanks to the helpful folks on #llvm, that calls across
>> more than 31 bits of address space need to be emitted as a "mov
>> $ADDRESS, r10; call *r10" pair instead of the simple "call
>> rip+ADDRESS" used for calls within 31 bits. But why isn't the
mov+call
>> pair emitted inline? And why are Darwin and TAILJMPs special?
>
> This is needed because of lazy compilation, before the callee is
> resolved, it is just a JIT stub. It's heap allocated so it may not be
> in the lower 4G even if the code size model is small. I know this is
> the case on Darwin x86_64, I am not sure about other targets. I forgot
> why this is needed for tail calls, sorry.
>
> In theory we can make the code generator inline mov+call, the reality
> is it doesn't know whether it's jitting or not. Also, we really
want
> to keep the code generation the same (as much as possible) whether
> it's jitting or compiling. One possible solution for this is to add
> code size model specifically for JIT so code generator can generate
> more efficient code in that configuration.
Since the CodeEmitter's are now generically parameterized they can be 
specialized for JIT quite easily now.

Aaron
>> Having this out of line seems to lose up to 2% performance on the
>> Unladen Swallow benchmarks, so, while it's not urgent, it'd be
nice to
>> figure out how to avoid the stubs.
>>
>> What kind of patch would be welcome to fix this?
>>
>> Thanks,
>> Jeffrey
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Jun 2009 - [LLVMdev] Why does the x86-64 JIT emit stubs for external calls?

[LLVMdev] Why does the x86-64 JIT emit stubs for external calls?

[LLVMdev] Why does the x86-64 JIT emit stubs for external calls?

[LLVMdev] [unladen-swallow] Re: Why does the x86-64 JIT emit stubs for external calls?

[LLVMdev] Why does the x86-64 JIT emit stubs for external calls?

Possibly Parallel Threads