On 2009-10-29 23:55, Jeffrey Yasskin wrote:> On Thu, Oct 29, 2009 at 2:30 PM, Nicolas Geoffray > <nicolas.geoffray at lip6.fr> wrote: > >> Hi Jeffrey, >> >> Jeffrey Yasskin wrote: >> >>> Cool, I'll start implementing it. >>> >>> >> Great! Thanks. >> >> Just to clarify things: on my end, it doesn't really matter what is the >> default behavior, as long as vmkit can continue to have the existing >> behavior of lazy compilation. With Chris' solution, I was wondering how you >> would implement the getPointerToFunction{Eager, Lazy} functions when the >> getPointerToFunction is called by the JIT, not the user. For example, when >> Function F calls Function G and the JIT needs an address for G (either a >> callback or the function address), how will it know if it must call >> getPointerToFunctionEager or getPointerToFunctionLazy? Do you plan on >> continuing having a flag that enables/disables lazy compilation and poll >> this flag on each function call? How is that different than the existing >> system? >> > > Semantically, I'll thread the flag through all the calls that may > eventually need to recursively call getPointerToFunction. To implement > that without having to modify lots of calls, I'll probably replace the > current public default eager/lazy setting with a private flag with > values {Unknown, Lazy, Eager}, set it on entry and exit of > getPointerToFunction, and check it on each internal recursive call. > The difference with the current system is that the user is forced to > set the flag to their desired value whenever they call into the JIT, > rather than relying on a default. That choice then propagates through > the whole recursive tree of codegens, without affecting the next tree. > > Note that I'm using getPointerToFunction as an abbreviation for the > 3ish public functions that'll need to take this option.The documentation should also be updated (http://llvm.org/docs/ProgrammersManual.html#threading) to reflect what one needs to do, to ensure thread-safe JITing. Also does every JIT target support non-lazy JITing now? See PR4816, last time I checked (r83242) it only worked on X86, and failed on PPC; so I had to keep lazy JITing enabled even if its not what I want for many reasons. Also perhaps the lazy compilation stub should spin waiting on a lock (implemented using atomics), and the compilation callback should execute while holding the lock just before patching the callsite, so it would look like this in pseudocode: callsite_patch_state = 0;// for each callsite one byte of memory callsite: if (atomic_load(&callsite_patch_state) == 2) { //fast-path: already compiled and patched patchsite: jmp <nop nop nop nop nop nop nop nop> // will be patched } // not yet patched, it may already be compiling if (atomic_test_and_set(&callsite_patch_state, 0, 1) == 0) { // not yet compiling, set state to compiling, and start compiling call CompilationCallBack // set state to patched atomic_set(&callsite_patch_state, 2) } // wait for patched state while (atomic_load(&callsite_patch_state) != 2) { waitJIT(); } // serialize CPUID patchsite2: // execute new code jmp <nop nop nop nop nop nop nop nop> // will be patched waitJIT: jitLock() jitUnlock() ^This should be consistent with the Intel Manual's requirements on XMC, which has a similar algorithm, except for the fast-path. CompilationCallBack: jitLock(); if (isJITed(F)) {jitUnlock(); return;} JIT function patch_callsite(&patchsite, compiledFunctionAddress); patch_callsite(&patchsite2, compiledFunctionAddress); setJITed(F, true); jitUnlock(); This way once it is compiled the callsite will only execute: atomic_load(&callsite_patch_state) == 2 jmp compiledFunctionAddress Best regards, --Edwin
Jeffrey Yasskin
2009-Nov-01 06:40 UTC
[LLVMdev] Should LLVM JIT default to lazy or non-lazy?
2009/10/30 Török Edwin <edwintorok at gmail.com>:> On 2009-10-29 23:55, Jeffrey Yasskin wrote: >> On Thu, Oct 29, 2009 at 2:30 PM, Nicolas Geoffray >> <nicolas.geoffray at lip6.fr> wrote: >> >>> Hi Jeffrey, >>> >>> Jeffrey Yasskin wrote: >>> >>>> Cool, I'll start implementing it. >>>> >>>> >>> Great! Thanks. >>> >>> Just to clarify things: on my end, it doesn't really matter what is the >>> default behavior, as long as vmkit can continue to have the existing >>> behavior of lazy compilation. With Chris' solution, I was wondering how you >>> would implement the getPointerToFunction{Eager, Lazy} functions when the >>> getPointerToFunction is called by the JIT, not the user. For example, when >>> Function F calls Function G and the JIT needs an address for G (either a >>> callback or the function address), how will it know if it must call >>> getPointerToFunctionEager or getPointerToFunctionLazy? Do you plan on >>> continuing having a flag that enables/disables lazy compilation and poll >>> this flag on each function call? How is that different than the existing >>> system? >>> >> >> Semantically, I'll thread the flag through all the calls that may >> eventually need to recursively call getPointerToFunction. To implement >> that without having to modify lots of calls, I'll probably replace the >> current public default eager/lazy setting with a private flag with >> values {Unknown, Lazy, Eager}, set it on entry and exit of >> getPointerToFunction, and check it on each internal recursive call. >> The difference with the current system is that the user is forced to >> set the flag to their desired value whenever they call into the JIT, >> rather than relying on a default. That choice then propagates through >> the whole recursive tree of codegens, without affecting the next tree. >> >> Note that I'm using getPointerToFunction as an abbreviation for the >> 3ish public functions that'll need to take this option. > > The documentation should also be updated > (http://llvm.org/docs/ProgrammersManual.html#threading) to reflect what > one needs to do, > to ensure thread-safe JITing.Thanks for that reminder. I've updated it in the patch I'm about to mail, but I should apply the update regardless of whether the rest of the patch goes in.> Also does every JIT target support non-lazy JITing now? See PR4816, > last time I checked (r83242) it only worked on X86, and failed on PPC; > so I had to keep lazy JITing enabled even if its not what I want for > many reasons.It's still the case that only X86 supports eager jitting. It doesn't look that hard to add it to the rest of the targets though.> Also perhaps the lazy compilation stub should spin waiting on a lock > (implemented using atomics), and the compilation callback should > execute while holding the lock just before patching the callsite, so it > would look like this in pseudocode:Good idea. This increases the code size a bit, but it's clearly better than the "load the target address" option I mentioned in the bug. Would you add it to the bug so we don't lose it? I think we can put the entire "not yet patched" branch inside the compilation callback to minimize the code size impact: callsite_patch_state = 0;// for each callsite one byte of memory callsite: if (atomic_load(&callsite_patch_state) != 2) { call CompilationCallback // Doesn't return until the patchsite is patched. } //fast- and slow-path: already compiled and patched patchsite: call <nop nop nop nop nop nop nop nop> // will be patched> callsite_patch_state = 0;// for each callsite one byte of memory > > callsite: > if (atomic_load(&callsite_patch_state) == 2) { > //fast-path: already compiled and patched > patchsite: > jmp <nop nop nop nop nop nop nop nop> // will be patched > } > // not yet patched, it may already be compiling > if (atomic_test_and_set(&callsite_patch_state, 0, 1) == 0) { > // not yet compiling, set state to compiling, and start compiling > call CompilationCallBack > // set state to patched > atomic_set(&callsite_patch_state, 2) > } > // wait for patched state > while (atomic_load(&callsite_patch_state) != 2) { > waitJIT(); > } > // serialize > CPUID > patchsite2: > // execute new code > jmp <nop nop nop nop nop nop nop nop> // will be patched > > waitJIT: > jitLock() > jitUnlock() > > ^This should be consistent with the Intel Manual's requirements on XMC, > which has a similar algorithm, except for the fast-path. > > CompilationCallBack: > jitLock(); > if (isJITed(F)) {jitUnlock(); return;} > JIT function > > patch_callsite(&patchsite, compiledFunctionAddress); > patch_callsite(&patchsite2, compiledFunctionAddress); > setJITed(F, true); > > jitUnlock(); > > This way once it is compiled the callsite will only execute: > atomic_load(&callsite_patch_state) > == 2 > jmp compiledFunctionAddress > > Best regards, > --Edwin >
On 2009-11-01 08:40, Jeffrey Yasskin wrote:> 2009/10/30 Török Edwin <edwintorok at gmail.com>: > >> On 2009-10-29 23:55, Jeffrey Yasskin wrote: >> >>> On Thu, Oct 29, 2009 at 2:30 PM, Nicolas Geoffray >>> <nicolas.geoffray at lip6.fr> wrote: >>> >>> >>>> Hi Jeffrey, >>>> >>>> Jeffrey Yasskin wrote: >>>> >>>> >>>>> Cool, I'll start implementing it. >>>>> >>>>> >>>>> >>>> Great! Thanks. >>>> >>>> Just to clarify things: on my end, it doesn't really matter what is the >>>> default behavior, as long as vmkit can continue to have the existing >>>> behavior of lazy compilation. With Chris' solution, I was wondering how you >>>> would implement the getPointerToFunction{Eager, Lazy} functions when the >>>> getPointerToFunction is called by the JIT, not the user. For example, when >>>> Function F calls Function G and the JIT needs an address for G (either a >>>> callback or the function address), how will it know if it must call >>>> getPointerToFunctionEager or getPointerToFunctionLazy? Do you plan on >>>> continuing having a flag that enables/disables lazy compilation and poll >>>> this flag on each function call? How is that different than the existing >>>> system? >>>> >>>> >>> Semantically, I'll thread the flag through all the calls that may >>> eventually need to recursively call getPointerToFunction. To implement >>> that without having to modify lots of calls, I'll probably replace the >>> current public default eager/lazy setting with a private flag with >>> values {Unknown, Lazy, Eager}, set it on entry and exit of >>> getPointerToFunction, and check it on each internal recursive call. >>> The difference with the current system is that the user is forced to >>> set the flag to their desired value whenever they call into the JIT, >>> rather than relying on a default. That choice then propagates through >>> the whole recursive tree of codegens, without affecting the next tree. >>> >>> Note that I'm using getPointerToFunction as an abbreviation for the >>> 3ish public functions that'll need to take this option. >>> >> The documentation should also be updated >> (http://llvm.org/docs/ProgrammersManual.html#threading) to reflect what >> one needs to do, >> to ensure thread-safe JITing. >> > > Thanks for that reminder. I've updated it in the patch I'm about to > mail, but I should apply the update regardless of whether the rest of > the patch goes in. > > >> Also does every JIT target support non-lazy JITing now? See PR4816, >> last time I checked (r83242) it only worked on X86, and failed on PPC; >> so I had to keep lazy JITing enabled even if its not what I want for >> many reasons. >> > > It's still the case that only X86 supports eager jitting. It doesn't > look that hard to add it to the rest of the targets though. > >Ok.>> Also perhaps the lazy compilation stub should spin waiting on a lock >> (implemented using atomics), and the compilation callback should >> execute while holding the lock just before patching the callsite, so it >> would look like this in pseudocode: >> > > Good idea. This increases the code size a bit, but it's clearly better > than the "load the target address" option I mentioned in the bug. > Would you add it to the bug so we don't lose it? > > I think we can put the entire "not yet patched" branch inside the > compilation callback to minimize the code size impact: > > callsite_patch_state = 0;// for each callsite one byte of memory > > callsite: > if (atomic_load(&callsite_patch_state) != 2) { > call CompilationCallback // Doesn't return until the patchsite is patched. > } > //fast- and slow-path: already compiled and patched > patchsite: > call <nop nop nop nop nop nop nop nop> // will be patched > >Yes, that sounds good (except that we'd want a jmp instead of a call I think), I'll post a patch to that bugreport tomorrow if time permits. Would this mean that lazy-JITing would be threadsafe and the default could stay lazy? Best regards, --Edwin
Maybe Matching Threads
- [LLVMdev] Should LLVM JIT default to lazy or non-lazy?
- [LLVMdev] Should LLVM JIT default to lazy or non-lazy?
- [LLVMdev] Should LLVM JIT default to lazy or non-lazy?
- [LLVMdev] Should LLVM JIT default to lazy or non-lazy?
- [LLVMdev] Proposal for a new LLVM concurrency memory model