thr3ads.net - llvm dev - [LLVMdev] How will OrcJIT guarantee thread-safety when a function is asked to be re generated? [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Lang Hames

2015-Mar-19 06:47 UTC

[LLVMdev] How will OrcJIT guarantee thread-safety when a function is asked to be re generated?

Hi Sanjoy,
>> (1) Replacing function bodies at the same address is impossible if the
>> function is already on the stack: You'd be replacing a definition
that
>> you're later going to return through.
> 
> If the function you wish to replace is active on the stack, you can
> replace the return PC that was going to return into that active frame
> with a PC pointing into a stub that knows how to replace the active
> stack frame with something that would let the new code continue
> executing.  The stub will then have to branch into a suitable position
> in the new generated code.  Once you have done this for all "pending
> returns" into the old bit of generated code, you can throw the old
> code away, since nothing will ever return into it.
> 
> This can be tricky to get right but if you have built OSR support
> already for some other reason then this is a viable option.  This
> scheme is very similar to throwing an exception, and the semantics of
> "catching" an exception is to branch to a newly generated block
of
> code.
That all makes sense. What are your thoughts on the trade-offs of this vs the
patchpoint approach though? If you can modify previously executable memory it
seems like the patchpoint approach would have lower overhead, unless you have a
truly huge number of callsites to update?
 >> So, if you want to replace functions
>> at the same address you'll have to have some sort of safe-point
concept
>> where you know the function you want to replace isn't on the stack.
> 
> That will work, but can be very hard to make happen.  For instance,
> the method you want to replace may have called a function that has an
> infinite loop in it.
Agreed. This *might* find a home in simple REPLs where calling an infinite-loop
would be undesirable/unexpected behavior, but that's also an environment
where you are unlikely to want reoptimization.
 >> (2) Replacing function bodies at the same address isn't the only
way to
>> avoid the overhead of a trampoline. I haven't implemented this yet,
but I
>> really want to add llvm.patchpoint support to Orc. In that case you can
lay
>> down your replacement definition at a different address, update all
your
>> callsites, then delete your old definition after you're done
executing it.
>> Relative to using trampolines this lowers your execution cost (calls
are
>> direct rather than indirect), but increases your update cost (you have
to
>> update many callsites, rather than a single trampoline).
FWIW, Pete Cooper and I have tossed around ideas about adding utilities to Orc
for injecting frame-residence counting and automatic cleanup in to functions to
facilitate this 2nd approach. The rough idea was that each function would
increment a counter on entry and decrement it on exit. Every time the counter
hits zero it would check whether it has been "deleted" (presumably due
to being replaced), and if so it would free its memory. This scheme should be
easy to implement, but hasn't gone past speculation on our part.

- Lang.

Sent from my iPad
> On Mar 19, 2015, at 3:00 PM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> 
>> On Wed, Mar 18, 2015 at 6:39 PM, Lang Hames <lhames at gmail.com>
wrote:
>> Hi Hayden,
>> 
>> Dave's answer covers this pretty well. Neither Orc nor MCJIT
currently
>> reason about replacing function bodies. They may let you add duplicate
>> definitions, but how they'll behave if you do that isn't
specified in their
>> contracts. They definitely won't replace old definitions unless you
provide
>> a custom memory manager that's rigged to lay new definitions down
on top of
>> old ones.
>> 
>> I suspect that existing clients of MCJIT have tackled this by adding
thread
>> safety into their wrappers around MCJIT, or into the JIT'd code
itself, but
>> I'm just guessing. (CC'ing Keno and Philip, in case they have
insights).
>> 
>> I think this would be cool to build in to Orc though. Two quick
thoughts:
>> 
>> (1) Replacing function bodies at the same address is impossible if the
>> function is already on the stack: You'd be replacing a definition
that
>> you're later going to return through.
> 
> If the function you wish to replace is active on the stack, you can
> replace the return PC that was going to return into that active frame
> with a PC pointing into a stub that knows how to replace the active
> stack frame with something that would let the new code continue
> executing.  The stub will then have to branch into a suitable position
> in the new generated code.  Once you have done this for all "pending
> returns" into the old bit of generated code, you can throw the old
> code away, since nothing will ever return into it.
> 
> This can be tricky to get right but if you have built OSR support
> already for some other reason then this is a viable option.  This
> scheme is very similar to throwing an exception, and the semantics of
> "catching" an exception is to branch to a newly generated block
of
> code.
> 
>> So, if you want to replace functions
>> at the same address you'll have to have some sort of safe-point
concept
>> where you know the function you want to replace isn't on the stack.
> 
> That will work, but can be very hard to make happen.  For instance,
> the method you want to replace may have called a function that has an
> infinite loop in it.
> 
>> 
>> (2) Replacing function bodies at the same address isn't the only
way to
>> avoid the overhead of a trampoline. I haven't implemented this yet,
but I
>> really want to add llvm.patchpoint support to Orc. In that case you can
lay
>> down your replacement definition at a different address, update all
your
>> callsites, then delete your old definition after you're done
executing it.
>> Relative to using trampolines this lowers your execution cost (calls
are
>> direct rather than indirect), but increases your update cost (you have
to
>> update many callsites, rather than a single trampoline).
>> 
>> Out of interest, why the desire to avoid trampolines? They do make life
a
>> lot easier here. :)
>> 
>> Cheers,
>> Lang.
>> 
>>> On Wed, Mar 18, 2015 at 3:13 AM, David Blaikie <dblaikie at
gmail.com> wrote:
>>> 
>>> [+Lang, keeper of JITs, designer of ORCs]
>>> 
>>> On Tue, Mar 17, 2015 at 1:27 AM, Hayden Livingston
>>> <halivingston at gmail.com> wrote:
>>>> 
>>>> I've been playing with OrcJIT a bit, and from the looks of
it I can (like
>>>> in the previous JIT I suppose?) ask for a function to be re
generated.
>>>> 
>>>> If I've given the address of the function that LLVM gave me
to an
>>>> external party, do "I" need to ensure thread-safety?
>>>> 
>>>> Or is it safe to ask OrcJIT to re generate code at that address
and
>>>> everything will work magically?
>>> 
>>> 
>>> As I understand it, Orc won't regenerate the function at the
same location
>>> unless your memory manager returns the same memory twice - so if
you know
>>> you've successfully migrated all callers off a certain chunk of
allocated
>>> memory, you might be able to recycle it back into Orc (but I think
on MacOS,
>>> the way page permissions work, this would be impossible - once a
memory page
>>> is marked executable, it's no longer writable and can't be
set back - you
>>> need a new page).
>>> 
>>>> 
>>>> I'm thinking it won't because it's quite possible
some thread might be
>>>> executing code, and we'll be asking LLVM to write bytes
there.
>>>> 
>>>> How does one generally go do such updates? I'm looking for
some guidance
>>>> without adding a trampoline in front of it. Do runtimes that
support
>>>> re-generation of code have an if check or something before
entering the
>>>> method?
>>> 
>>> 
>>> Without a trampoline you're probably going to have to be
constrained in
>>> some other ways - possibly (& I'm really out of my depth at
this point) the
>>> kind of safe/pause points used for GC - but perhaps more
constrained than
>>> that, such that you have safe places where your JIT'd code (or
at least the
>>> replaceable functions) isn't running.
>>> 
>>> But again, still depends on platform - writing to executable memory
isn't
>>> possible on MacOS so far as I know (as mentioned above) so there
would be no
>>> way to replace a function there without a trampoline or at least a
global
>>> variable to load/jump to.
>>> 
>>> - David
>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> 
>>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> -------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150319/cc8ce9a7/attachment.html>

Sanjoy Das

2015-Mar-19 17:55 UTC

head link

[LLVMdev] How will OrcJIT guarantee thread-safety when a function is asked to be re generated?

> That all makes sense. What are your thoughts on the trade-offs of this vs
> the patchpoint approach though? If you can modify previously executable
> memory it seems like the patchpoint approach would have lower overhead,
> unless you have a truly huge number of callsites to update?
You need the hijack-return-pc approach *in addition* to a call-site
patching approach.  Modifying the return PC lets you guarantee that
nothing will *return* into the old generated code. To guarantee that
nothing will *call* into it either you could use a double indirection
(all calls go through a trampoline) or patchpoints.

-- Sanjoy

Hayden Livingston

2015-Mar-19 21:51 UTC

head link

[LLVMdev] How will OrcJIT guarantee thread-safety when a function is asked to be re generated?

Thanks for all the comments on this.

The concern for not using a trampoline is that it's very likely my language
will interpret for a while, then generate code, then generate better code,
etc. Much like the WebKit FLT JavaScript project. It seems like they will
be using patch points? I should probably use that for my perf use-case as
well.

However, I have an additional case that requires correctness to maintained
across threads, i.e. the code may be semantically different. In this case I
need either all callsites updated at once, or none. It can't be that half
of the
callsites are calling into a different function because there maybe real
differences.

I suppose an approach I can take is to use patch points for eventual
consistency and have an if check that has maintains the correctness and
will trampoline or execute code depending on if all call sites are done
being patched.

Or maybe I'm over complicating this for my language :-)

On Thu, Mar 19, 2015 at 10:55 AM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> > That all makes sense. What are your thoughts on the trade-offs of this
vs
> > the patchpoint approach though? If you can modify previously
executable
> > memory it seems like the patchpoint approach would have lower
overhead,
> > unless you have a truly huge number of callsites to update?
>
> You need the hijack-return-pc approach *in addition* to a call-site
> patching approach.  Modifying the return PC lets you guarantee that
> nothing will *return* into the old generated code. To guarantee that
> nothing will *call* into it either you could use a double indirection
> (all calls go through a trampoline) or patchpoints.
>
> -- Sanjoy
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150319/898c692a/attachment.html>

Lang Hames

2015-Mar-19 22:28 UTC

head link

[LLVMdev] How will OrcJIT guarantee thread-safety when a function is asked to be re generated?

Hi Sanjoy,
> You need the hijack-return-pc approach *in addition* to a call-site
> patching approach.  Modifying the return PC lets you guarantee that
> nothing will *return* into the old generated code. To guarantee that
> nothing will *call* into it either you could use a double indirection
> (all calls go through a trampoline) or patchpoints.
You need to hijack the return addresses if you want to delete the original
function body immediately, but what if you just leave the original in-place
until you return through it? That is the scheme that Pete and I had been
talking about. On the one-hand that means that the old code may live for an
arbitrarily long time, on the other hand it saves you from implementing
some complicated infrastructure. I suspect that in most JIT use-cases the
cost of keeping the duplicate function around will be minimal, but I don't
actually have any data to back that up. :)

It's worth noting that, as per its design goals, Orc is agnostic about all
this. Either scheme, or both, should be able to be implemented in the new
framework. It's just that nobody has implemented it yet.

- Lang.

On Fri, Mar 20, 2015 at 4:55 AM, Sanjoy Das <sanjoy at
playingwithpointers.com>
wrote:
> > That all makes sense. What are your thoughts on the trade-offs of this
vs
> > the patchpoint approach though? If you can modify previously
executable
> > memory it seems like the patchpoint approach would have lower
overhead,
> > unless you have a truly huge number of callsites to update?
>
> You need the hijack-return-pc approach *in addition* to a call-site
> patching approach.  Modifying the return PC lets you guarantee that
> nothing will *return* into the old generated code. To guarantee that
> nothing will *call* into it either you could use a double indirection
> (all calls go through a trampoline) or patchpoints.
>
> -- Sanjoy
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150320/fabfa31d/attachment.html>

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Mar 2015 - [LLVMdev] How will OrcJIT guarantee thread-safety when a function is asked to be re generated?

[LLVMdev] How will OrcJIT guarantee thread-safety when a function is asked to be re generated?

[LLVMdev] How will OrcJIT guarantee thread-safety when a function is asked to be re generated?

[LLVMdev] How will OrcJIT guarantee thread-safety when a function is asked to be re generated?

[LLVMdev] How will OrcJIT guarantee thread-safety when a function is asked to be re generated?

Seemingly Similar Threads