thr3ads.net - llvm dev - [LLVMdev] Should LLVM JIT default to lazy or non-lazy? [Nov 2009]

If this information is useful, please help other people find it:
Share via:

Török Edwin

2009-Oct-30 09:15 UTC

[LLVMdev] Should LLVM JIT default to lazy or non-lazy?

On 2009-10-29 23:55, Jeffrey Yasskin wrote:> On Thu, Oct 29, 2009 at 2:30 PM, Nicolas Geoffray
> <nicolas.geoffray at lip6.fr> wrote:
>   
>> Hi Jeffrey,
>>
>> Jeffrey Yasskin wrote:
>>     
>>> Cool, I'll start implementing it.
>>>
>>>       
>> Great! Thanks.
>>
>> Just to clarify things: on my end, it doesn't really matter what is
the
>> default behavior, as long as vmkit can continue to have the existing
>> behavior of lazy compilation. With Chris' solution, I was wondering
how you
>> would implement the getPointerToFunction{Eager, Lazy} functions when
the
>> getPointerToFunction is called by the JIT, not the user. For example,
when
>> Function F calls Function G and the JIT needs an address for G (either
a
>> callback or the function address), how will it know if it must call
>> getPointerToFunctionEager or getPointerToFunctionLazy? Do you plan on
>> continuing having a flag that enables/disables lazy compilation and
poll
>> this flag on each function call? How is that different than the
existing
>> system?
>>     
>
> Semantically, I'll thread the flag through all the calls that may
> eventually need to recursively call getPointerToFunction. To implement
> that without having to modify lots of calls, I'll probably replace the
> current public default eager/lazy setting with a private flag with
> values {Unknown, Lazy, Eager}, set it on entry and exit of
> getPointerToFunction, and check it on each internal recursive call.
> The difference with the current system is that the user is forced to
> set the flag to their desired value whenever they call into the JIT,
> rather than relying on a default. That choice then propagates through
> the whole recursive tree of codegens, without affecting the next tree.
>
> Note that I'm using getPointerToFunction as an abbreviation for the
> 3ish public functions that'll need to take this option.
The documentation should also be updated
(http://llvm.org/docs/ProgrammersManual.html#threading) to reflect what
one needs to do,
to ensure thread-safe JITing.

Also does every JIT target support non-lazy JITing now?  See PR4816,
last time I checked (r83242) it only worked on X86, and failed on PPC;
so I had to keep lazy JITing enabled even if its not what I want for
many reasons.

Also perhaps the lazy compilation stub should spin waiting on a lock
(implemented using atomics), and the compilation callback should
execute while holding the lock just before patching the callsite, so it
would look like this in pseudocode:

callsite_patch_state = 0;// for each callsite one byte of memory

callsite:
if  (atomic_load(&callsite_patch_state) == 2) {
      //fast-path: already compiled and patched
patchsite:
       jmp <nop nop nop nop nop nop nop nop> // will be patched
}
// not yet patched, it may already be compiling
if  (atomic_test_and_set(&callsite_patch_state, 0, 1) == 0) {
      // not yet compiling, set state to compiling, and start compiling
      call CompilationCallBack
      // set state to patched
      atomic_set(&callsite_patch_state, 2)
}
// wait for patched state
while (atomic_load(&callsite_patch_state) != 2) {
  waitJIT();
}
// serialize
CPUID
patchsite2:
// execute new code
jmp <nop nop nop nop nop nop nop nop> // will be patched

waitJIT:
    jitLock()
    jitUnlock()

^This should be consistent with the Intel Manual's requirements on XMC,
which has a similar algorithm, except for the fast-path.

CompilationCallBack:
   jitLock();
        if (isJITed(F)) {jitUnlock(); return;}
        JIT function

        patch_callsite(&patchsite, compiledFunctionAddress);
        patch_callsite(&patchsite2, compiledFunctionAddress);
        setJITed(F, true);
        
   jitUnlock();

This way once it is compiled the callsite will only execute:
    atomic_load(&callsite_patch_state)
    == 2
    jmp compiledFunctionAddress

Best regards,
--Edwin

Jeffrey Yasskin

2009-Nov-01 06:40 UTC

head link

[LLVMdev] Should LLVM JIT default to lazy or non-lazy?

2009/10/30 Török Edwin <edwintorok at gmail.com>:> On 2009-10-29 23:55, Jeffrey Yasskin wrote:
>> On Thu, Oct 29, 2009 at 2:30 PM, Nicolas Geoffray
>> <nicolas.geoffray at lip6.fr> wrote:
>>
>>> Hi Jeffrey,
>>>
>>> Jeffrey Yasskin wrote:
>>>
>>>> Cool, I'll start implementing it.
>>>>
>>>>
>>> Great! Thanks.
>>>
>>> Just to clarify things: on my end, it doesn't really matter
what is the
>>> default behavior, as long as vmkit can continue to have the
existing
>>> behavior of lazy compilation. With Chris' solution, I was
wondering how you
>>> would implement the getPointerToFunction{Eager, Lazy} functions
when the
>>> getPointerToFunction is called by the JIT, not the user. For
example, when
>>> Function F calls Function G and the JIT needs an address for G
(either a
>>> callback or the function address), how will it know if it must call
>>> getPointerToFunctionEager or getPointerToFunctionLazy? Do you plan
on
>>> continuing having a flag that enables/disables lazy compilation and
poll
>>> this flag on each function call? How is that different than the
existing
>>> system?
>>>
>>
>> Semantically, I'll thread the flag through all the calls that may
>> eventually need to recursively call getPointerToFunction. To implement
>> that without having to modify lots of calls, I'll probably replace
the
>> current public default eager/lazy setting with a private flag with
>> values {Unknown, Lazy, Eager}, set it on entry and exit of
>> getPointerToFunction, and check it on each internal recursive call.
>> The difference with the current system is that the user is forced to
>> set the flag to their desired value whenever they call into the JIT,
>> rather than relying on a default. That choice then propagates through
>> the whole recursive tree of codegens, without affecting the next tree.
>>
>> Note that I'm using getPointerToFunction as an abbreviation for the
>> 3ish public functions that'll need to take this option.
>
> The documentation should also be updated
> (http://llvm.org/docs/ProgrammersManual.html#threading) to reflect what
> one needs to do,
> to ensure thread-safe JITing.
Thanks for that reminder. I've updated it in the patch I'm about to
mail, but I should apply the update regardless of whether the rest of
the patch goes in.
> Also does every JIT target support non-lazy JITing now?  See PR4816,
> last time I checked (r83242) it only worked on X86, and failed on PPC;
> so I had to keep lazy JITing enabled even if its not what I want for
> many reasons.
It's still the case that only X86 supports eager jitting. It doesn't
look that hard to add it to the rest of the targets though.
> Also perhaps the lazy compilation stub should spin waiting on a lock
> (implemented using atomics), and the compilation callback should
> execute while holding the lock just before patching the callsite, so it
> would look like this in pseudocode:
Good idea. This increases the code size a bit, but it's clearly better
than the "load the target address" option I mentioned in the bug.
Would you add it to the bug so we don't lose it?

I think we can put the entire "not yet patched" branch inside the
compilation callback to minimize the code size impact:

callsite_patch_state = 0;// for each callsite one byte of memory

callsite:
if  (atomic_load(&callsite_patch_state) != 2) {
  call CompilationCallback  // Doesn't return until the patchsite is
patched.
}
//fast- and slow-path: already compiled and patched
patchsite:
      call <nop nop nop nop nop nop nop nop> // will be patched
> callsite_patch_state = 0;// for each callsite one byte of memory
>
> callsite:
> if  (atomic_load(&callsite_patch_state) == 2) {
>      //fast-path: already compiled and patched
> patchsite:
>       jmp <nop nop nop nop nop nop nop nop> // will be patched
> }
> // not yet patched, it may already be compiling
> if  (atomic_test_and_set(&callsite_patch_state, 0, 1) == 0) {
>      // not yet compiling, set state to compiling, and start compiling
>      call CompilationCallBack
>      // set state to patched
>      atomic_set(&callsite_patch_state, 2)
> }
> // wait for patched state
> while (atomic_load(&callsite_patch_state) != 2) {
>  waitJIT();
> }
> // serialize
> CPUID
> patchsite2:
> // execute new code
> jmp <nop nop nop nop nop nop nop nop> // will be patched
>
> waitJIT:
>    jitLock()
>    jitUnlock()
>
> ^This should be consistent with the Intel Manual's requirements on XMC,
> which has a similar algorithm, except for the fast-path.
>
> CompilationCallBack:
>   jitLock();
>        if (isJITed(F)) {jitUnlock(); return;}
>        JIT function
>
>        patch_callsite(&patchsite, compiledFunctionAddress);
>        patch_callsite(&patchsite2, compiledFunctionAddress);
>        setJITed(F, true);
>
>   jitUnlock();
>
> This way once it is compiled the callsite will only execute:
>    atomic_load(&callsite_patch_state)
>    == 2
>    jmp compiledFunctionAddress
>
> Best regards,
> --Edwin
>

Török Edwin

2009-Nov-01 09:53 UTC

head link

[LLVMdev] Should LLVM JIT default to lazy or non-lazy?

On 2009-11-01 08:40, Jeffrey Yasskin wrote:> 2009/10/30 Török Edwin <edwintorok at gmail.com>:
>   
>> On 2009-10-29 23:55, Jeffrey Yasskin wrote:
>>     
>>> On Thu, Oct 29, 2009 at 2:30 PM, Nicolas Geoffray
>>> <nicolas.geoffray at lip6.fr> wrote:
>>>
>>>       
>>>> Hi Jeffrey,
>>>>
>>>> Jeffrey Yasskin wrote:
>>>>
>>>>         
>>>>> Cool, I'll start implementing it.
>>>>>
>>>>>
>>>>>           
>>>> Great! Thanks.
>>>>
>>>> Just to clarify things: on my end, it doesn't really matter
what is the
>>>> default behavior, as long as vmkit can continue to have the
existing
>>>> behavior of lazy compilation. With Chris' solution, I was
wondering how you
>>>> would implement the getPointerToFunction{Eager, Lazy} functions
when the
>>>> getPointerToFunction is called by the JIT, not the user. For
example, when
>>>> Function F calls Function G and the JIT needs an address for G
(either a
>>>> callback or the function address), how will it know if it must
call
>>>> getPointerToFunctionEager or getPointerToFunctionLazy? Do you
plan on
>>>> continuing having a flag that enables/disables lazy compilation
and poll
>>>> this flag on each function call? How is that different than the
existing
>>>> system?
>>>>
>>>>         
>>> Semantically, I'll thread the flag through all the calls that
may
>>> eventually need to recursively call getPointerToFunction. To
implement
>>> that without having to modify lots of calls, I'll probably
replace the
>>> current public default eager/lazy setting with a private flag with
>>> values {Unknown, Lazy, Eager}, set it on entry and exit of
>>> getPointerToFunction, and check it on each internal recursive call.
>>> The difference with the current system is that the user is forced
to
>>> set the flag to their desired value whenever they call into the
JIT,
>>> rather than relying on a default. That choice then propagates
through
>>> the whole recursive tree of codegens, without affecting the next
tree.
>>>
>>> Note that I'm using getPointerToFunction as an abbreviation for
the
>>> 3ish public functions that'll need to take this option.
>>>       
>> The documentation should also be updated
>> (http://llvm.org/docs/ProgrammersManual.html#threading) to reflect what
>> one needs to do,
>> to ensure thread-safe JITing.
>>     
>
> Thanks for that reminder. I've updated it in the patch I'm about to
> mail, but I should apply the update regardless of whether the rest of
> the patch goes in.
>
>   
>> Also does every JIT target support non-lazy JITing now?  See PR4816,
>> last time I checked (r83242) it only worked on X86, and failed on PPC;
>> so I had to keep lazy JITing enabled even if its not what I want for
>> many reasons.
>>     
>
> It's still the case that only X86 supports eager jitting. It
doesn't
> look that hard to add it to the rest of the targets though.
>
>   
Ok.
>> Also perhaps the lazy compilation stub should spin waiting on a lock
>> (implemented using atomics), and the compilation callback should
>> execute while holding the lock just before patching the callsite, so it
>> would look like this in pseudocode:
>>     
>
> Good idea. This increases the code size a bit, but it's clearly better
> than the "load the target address" option I mentioned in the bug.
> Would you add it to the bug so we don't lose it?
>
> I think we can put the entire "not yet patched" branch inside the
> compilation callback to minimize the code size impact:
>
> callsite_patch_state = 0;// for each callsite one byte of memory
>
> callsite:
> if  (atomic_load(&callsite_patch_state) != 2) {
>   call CompilationCallback  // Doesn't return until the patchsite is
patched.
> }
> //fast- and slow-path: already compiled and patched
> patchsite:
>       call <nop nop nop nop nop nop nop nop> // will be patched
>
>   
Yes, that sounds good (except that we'd want a jmp instead of a call I
think), I'll post a patch to that bugreport tomorrow if time permits.

Would this mean that lazy-JITing would be threadsafe and the default
could stay lazy?

Best regards,
--Edwin

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Nov 2009 - [LLVMdev] Should LLVM JIT default to lazy or non-lazy?

[LLVMdev] Should LLVM JIT default to lazy or non-lazy?

[LLVMdev] Should LLVM JIT default to lazy or non-lazy?

[LLVMdev] Should LLVM JIT default to lazy or non-lazy?

Maybe Matching Threads