thr3ads.net - llvm dev - [llvm-dev] Mull JIT Design [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Lang Hames via llvm-dev

2018-Sep-17 16:01 UTC

[llvm-dev] Mull JIT Design

Hi Alex,

I'm replying to this on a new thread so as not to take the LLVMContext:
Threads and Ownership discussion too far off topic.

If you did want to fit your use case in to ORC's model I think the solution
would be to clone the module before adding it to the compile layer and (if
desired) save a copy of the compiled object and pass a non-owning memory
buffer to the linking layer.

That said, if you are not using lazy compilation, remote compilation, or
concurrent compilation then using ORC would not buy you much.

We also parallelize lots of things, but this could also work using
Orc> given that we only use the ObjectLinkingLayer.

In case it is of interest for your tool, here's the short overview of
ORC's
new concurrency support: You can now set up a compiler dispatch function
for the JIT that will be called whenever something needs to be compiled,
allowing the compilation task to be run on a new thread. Compilation is
still triggered on symbol lookup as before, and the compile task is still
opaque to the JIT so that clients can supply their own. To deal with the
challenges that arise from this (e.g. what if two threads look up the same
symbol at the same time? Or two threads look up mutually recursive symbols
at the same time?) the new symbol table design guarantees the following
invariants for basic lookups: (1) Each symbol's compiler will be executed
at most once, regardless of the number of concurrent lookups made on it,
and (2) either the lookup will return an error, or if it succeeds then all
pointers returned will be valid (for reading/writing/calling, depending on
the nature of the symbol) regardless of how the compilation work was
dispatched. This means that you can have lookup calls coming in on multiple
threads for interdependent symbols, with compilers dispatched to multiple
threads to maximize performance, and everything will Just Work.

If that sounds useful, there will be more documentation coming out in the
next few weeks, and I will be giving a talk on the new design at the
developer's meeting.

Cheers,
Lang.

On Mon, Sep 17, 2018 at 12:33 AM Alex Denisov <1101.debian at gmail.com>
wrote:
> Hi Lang,
>
> Thank you for clarification.
>
> > Do you have a use-case for manually managing LLVM module lifetimes?
>
> Our[1] initial idea was to feed all the modules to a JIT engine, run some
> code,
> then do further analysis and modification of the modules, feed the new
> versions
> to a JIT engine and run some code again.
>
> It didn't work well with MCJIT because we did not want to give up the
> ownership
> of the modules.
> However, this initial approach did not work properly because, as you
> mentioned,
> compilation does modify modules, which does not suit our need.
>
> Eventually, we decide to compile things on our own and communicate with the
> Orc JIT using only object files. Before compilation we make a clone of a
> module
> (simply by loading it from disk again), using the original module only for
> analysis.
> This way, we actually have two copies of a module in memory, with the
> second copy
> disposed immediately after compilation.
> It also worth mentioning that our use case is AOT compilation, not really
> lazy JIT.
> As of the object files we also cannot give up on the ownership because we
> have
> to reuse them many times. At least this is the current state, though
I'm
> working on
> another improvement: to load all object files once and then re-execute
> program
> many times.
>
> Also, we wanted to support several versions of LLVM. Because of drastic
> changes in the Orc APIs we decided to hand-craft another JIT engine based
> on your
> work. It is less powerful, but fits our needs very well:
>
>
>
https://github.com/mull-project/mull/blob/master/include/Toolchain/JITEngine.h
>
>
https://github.com/mull-project/mull/blob/master/lib/Toolchain/JITEngine.cpp
>
> We also parallelize lots of things, but this could also work using Orc
> given that we
> only use the ObjectLinkingLayer.
>
> P.S. Sorry for giving such a chaotic explanation, but I hope it shows our
> use case :)
>
> [1] https://github.com/mull-project/mull
>
> > On 17. Sep 2018, at 02:05, Lang Hames <lhames at gmail.com>
wrote:
> >
> > Hi Alex,
> >
> > > We would like to deallocate contexts
> >
> > I did not look at the Orc APIs as of LLVM-6+, but I'm curious
where this
> requirement comes from?
> >
> > The older generation of ORC APIs were single-threaded so in common use
> cases the client would create one LLVMContext for all IR created within a
> JIT session. The new generation supports concurrent compilation, so each
> module needs a different LLVMContext (including modules created inside the
> JIT itself, for example when extracting functions in the
> CompileOnDemandLayer). This means creating and managing context lifetimes
> alongside module lifetimes.
> >
> > Does Orc takes ownership of the modules that are being JIT'ed?
I.e., it
> is the same 'ownership model' as MCJIT has, am I right?
> >
> > The latest version of ORC takes ownership of modules until they are
> compiled, at which point it passes the Module's unique-ptr to a
> 'NotifyCompiled' callback. By default this just throws away the
pointer,
> deallocating the module. It can be used by clients to retrieve ownership of
> modules if they prefer to manage the Module's lifetime themselves.
> >
> > I think the JIT users should take care of memory
> allocation/deallocation. Also, I believe that you have strong reasons to
> implement things this way :)
> > Could you please tell us more about the topic?
> >
> > Do you have a use-case for manually managing LLVM module lifetimes? I
> would like to hear more about that so I can factor it in to the design.
> >
> > There were two motivations for having the JIT take ownership by
default
> (with the NotifyCompiled escape hatch for those who wanted to hold on to
> the module after it is compiled): First, it makes efficient memory
> management the default: Unless you have a reason to hold on to the module
> it is thrown away as soon as possible, freeing up memory. Second, since
> compilation is a mutating operation, it seemed more natural to have the
> "right-to-mutate" flow alongside ownership of the underlying
memory:
> whoever owns the Module is allowed to mutate it, rather than clients having
> to be aware of internal JIT state t know when the could or could not
> operate on a module.
> >
> > Cheers,
> > Lang.
> >
> > On Sun, Sep 16, 2018 at 11:06 AM Alex Denisov <1101.debian at
gmail.com>
> wrote:
> > Hi Lang,
> >
> > > We would like to deallocate contexts
> >
> >
> > I did not look at the Orc APIs as of LLVM-6+, but I'm curious
where this
> requirement comes from?
> > Does Orc takes ownership of the modules that are being JIT'ed?
I.e., it
> is the same 'ownership model' as MCJIT has, am I right?
> >
> > I think the JIT users should take care of memory
> allocation/deallocation. Also, I believe that you have strong reasons to
> implement things this way :)
> > Could you please tell us more about the topic?
> >
> > > On 16. Sep 2018, at 01:14, Lang Hames via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> > >
> > > Hi All,
> > >
> > > ORC's new concurrent compilation model generates some
interesting
> lifetime and thread safety questions around LLVMContext: We need multiple
> LLVMContexts (one per module in the simplest case, but at least one per
> thread), and the lifetime of each context depends on the execution path of
> the JIT'd code. We would like to deallocate contexts once all modules
> associated with them have been compiled, but there is no safe or easy way
> to check that condition at the moment as LLVMContext does not expose how
> many modules are associated with it.
> > >
> > > One way to fix this would be to add a mutex to LLVMContext, and
expose
> this and the module count. Then in the IR-compiling layer of the JIT we
> could have something like:
> > >
> > > // Compile finished, time to deallocate the module.
> > > // Explicit deletes used for clarity, we would use unique_ptrs in
> practice.
> > > auto &Ctx = Mod->getContext();
> > > delete Mod;
> > > std::lock_guard<std::mutex> Lock(Ctx->getMutex());
> > > if (Ctx.getNumModules())
> > >   delete Ctx;
> > >
> > > Another option would be to invert the ownership model and say
that
> each Module shares ownership of its LLVMContext. That way LLVMContexts
> would be automatically deallocated when the last module using them is
> destructed (providing no other shared_ptrs to the context are held
> elsewhere).
> > >
> > > There are other possible approaches (e.g. side tables for the
mutex
> and module count) but before I spent too much time on it I wanted to see
> whether anyone else has encountered these issues or has opinions on
> solutions.
> > >
> > > Cheers,
> > > Lang.
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > llvm-dev at lists.llvm.org
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180917/6768a51f/attachment.html>

Alex Denisov via llvm-dev

2018-Sep-20 07:37 UTC

head link

[llvm-dev] Mull JIT Design

> If you did want to fit your use case in to ORC's model I think the
solution would be to clone the module before adding it to the compile layer and
(if desired) save a copy of the compiled object and pass a non-owning memory
buffer to the linking layer.
Yes, this is understood. The other reason we decided to go with a custom
solution is a willingness to use different LLVM versions without much
maintenance burden.
> That said, if you are not using lazy compilation, remote compilation, or
concurrent compilation then using ORC would not buy you much.
We still use concurrent compilation: we create one SimpleCompiler and one
TargetMachine per thread and then distribute compilation of N modules across all
available threads, then we gather all the object files and feed them into a JIT
engine for further execution.

I also wanted to use lazy compilation to inject mutants, but we also decided to
go another way. Current lazy JIT implementation makes it hard (at least for me)
to reuse or adapt to other needs.

This is how it works now: let's say we have 3 modules A, B, C. We create
several mutants out of C: C1, C2, C3, etc, where each C' has a mutation
applied. Then, for each execution of mutants the JIT engine is fed with {A, B,
C1}, {A, B, C2}, {A, B, C3}, and so on. It works very well, but adds too much
overhead because JIT needs to resolve (the same) symbols several times.
This is what we decided to use at the end: instead of cloning original module we
create a copy of a function and apply mutation on the copy. Then, the original
function is replaced with an indirect call, the indirection is controlled
outside of the JIT engine via pointer manipulation. Here is an example:

Before:

define foo(i32, i32) {
   //// original instructions
}

After:

foo_ptr = null

define foo(i32 %1, i32 %2) {
   ptr = load foo_ptr
   call ptr(%1, %2)
}

define foo_original(i32, i32) {
   //// original instructions
}

define foo_mutant_1(i32, i32) {
   //// mutated instructions
}

define foo_mutant_2(i32, i32) {
   //// mutated instructions
}

Once the object files are loaded and symbols resolved we patch the foo_ptr to
point to the original function (foo_original), and then iterate over all mutants
and change foo_ptr accordingly.

This approach also works quite well and saves lots of time: ~15 minutes instead
of ~2 hours for mutation analysis of LLVM's own test suite.

I still think we can achieve the same with ORC, but its constant evolution makes
it challenging: adapting our solution for each new API is time consuming.
> If that sounds useful, there will be more documentation coming out in the
next few weeks, and I will be giving a talk on the new design at the
developer's meeting.

I think it does sound useful, but the documentation would be essential here. I
tried to construct a simple JIT stack using new APIs, but I could not because of
its complexity.
I see that there is a great idea behind all the abstractions, but I could not
grasp it in one hour.

Also, I think it's worth mentioning: our simple JIT stack does not work with
LLVM modules, which completely eliminates the ownership issues.
The user of JIT takes care of compilation and also takes care of lifetime of
both modules and object files.
Though, this probably won't work for the use cases you want to cover.

At this moment I would focus on the underlying implementation (RuntimeDyld and
friends), because there are few bugs and missing parts I'd like to address.
> On 17. Sep 2018, at 18:01, Lang Hames <lhames at gmail.com> wrote:
> 
> Hi Alex,
> 
> I'm replying to this on a new thread so as not to take the LLVMContext:
Threads and Ownership discussion too far off topic.
> 
> If you did want to fit your use case in to ORC's model I think the
solution would be to clone the module before adding it to the compile layer and
(if desired) save a copy of the compiled object and pass a non-owning memory
buffer to the linking layer.
> 
> That said, if you are not using lazy compilation, remote compilation, or
concurrent compilation then using ORC would not buy you much.
> 
> We also parallelize lots of things, but this could also work using Orc
given that we only use the ObjectLinkingLayer.
> 
> In case it is of interest for your tool, here's the short overview of
ORC's new concurrency support: You can now set up a compiler dispatch
function for the JIT that will be called whenever something needs to be
compiled, allowing the compilation task to be run on a new thread. Compilation
is still triggered on symbol lookup as before, and the compile task is still
opaque to the JIT so that clients can supply their own. To deal with the
challenges that arise from this (e.g. what if two threads look up the same
symbol at the same time? Or two threads look up mutually recursive symbols at
the same time?) the new symbol table design guarantees the following invariants
for basic lookups: (1) Each symbol's compiler will be executed at most once,
regardless of the number of concurrent lookups made on it, and (2) either the
lookup will return an error, or if it succeeds then all pointers returned will
be valid (for reading/writing/calling, depending on the nature of the symbol)
regardless of how the compilation work was dispatched. This means that you can
have lookup calls coming in on multiple threads for interdependent symbols, with
compilers dispatched to multiple threads to maximize performance, and everything
will Just Work.
> 
> If that sounds useful, there will be more documentation coming out in the
next few weeks, and I will be giving a talk on the new design at the
developer's meeting.
> 
> Cheers,
> Lang.
> 
> On Mon, Sep 17, 2018 at 12:33 AM Alex Denisov <1101.debian at
gmail.com> wrote:
> Hi Lang,
> 
> Thank you for clarification.
> 
> > Do you have a use-case for manually managing LLVM module lifetimes?
> 
> Our[1] initial idea was to feed all the modules to a JIT engine, run some
code,
> then do further analysis and modification of the modules, feed the new
versions
> to a JIT engine and run some code again.
> 
> It didn't work well with MCJIT because we did not want to give up the
ownership
> of the modules.
> However, this initial approach did not work properly because, as you
mentioned,
> compilation does modify modules, which does not suit our need.
> 
> Eventually, we decide to compile things on our own and communicate with the
> Orc JIT using only object files. Before compilation we make a clone of a
module
> (simply by loading it from disk again), using the original module only for
analysis.
> This way, we actually have two copies of a module in memory, with the
second copy
> disposed immediately after compilation.
> It also worth mentioning that our use case is AOT compilation, not really
lazy JIT.
> As of the object files we also cannot give up on the ownership because we
have
> to reuse them many times. At least this is the current state, though
I'm working on
> another improvement: to load all object files once and then re-execute
program
> many times.
> 
> Also, we wanted to support several versions of LLVM. Because of drastic
> changes in the Orc APIs we decided to hand-craft another JIT engine based
on your
> work. It is less powerful, but fits our needs very well:
> 
>    
https://github.com/mull-project/mull/blob/master/include/Toolchain/JITEngine.h
>    
https://github.com/mull-project/mull/blob/master/lib/Toolchain/JITEngine.cpp
> 
> We also parallelize lots of things, but this could also work using Orc
given that we
> only use the ObjectLinkingLayer.
> 
> P.S. Sorry for giving such a chaotic explanation, but I hope it shows our
use case :)
> 
> [1] https://github.com/mull-project/mull
> 
> > On 17. Sep 2018, at 02:05, Lang Hames <lhames at gmail.com>
wrote:
> >
> > Hi Alex,
> >
> > > We would like to deallocate contexts
> >
> > I did not look at the Orc APIs as of LLVM-6+, but I'm curious
where this requirement comes from?
> >
> > The older generation of ORC APIs were single-threaded so in common use
cases the client would create one LLVMContext for all IR created within a JIT
session. The new generation supports concurrent compilation, so each module
needs a different LLVMContext (including modules created inside the JIT itself,
for example when extracting functions in the CompileOnDemandLayer). This means
creating and managing context lifetimes alongside module lifetimes.
> >
> > Does Orc takes ownership of the modules that are being JIT'ed?
I.e., it is the same 'ownership model' as MCJIT has, am I right?
> >
> > The latest version of ORC takes ownership of modules until they are
compiled, at which point it passes the Module's unique-ptr to a
'NotifyCompiled' callback. By default this just throws away the pointer,
deallocating the module. It can be used by clients to retrieve ownership of
modules if they prefer to manage the Module's lifetime themselves.
> >
> > I think the JIT users should take care of memory
allocation/deallocation. Also, I believe that you have strong reasons to
implement things this way :)
> > Could you please tell us more about the topic?
> >
> > Do you have a use-case for manually managing LLVM module lifetimes? I
would like to hear more about that so I can factor it in to the design.
> >
> > There were two motivations for having the JIT take ownership by
default (with the NotifyCompiled escape hatch for those who wanted to hold on to
the module after it is compiled): First, it makes efficient memory management
the default: Unless you have a reason to hold on to the module it is thrown away
as soon as possible, freeing up memory. Second, since compilation is a mutating
operation, it seemed more natural to have the "right-to-mutate" flow
alongside ownership of the underlying memory: whoever owns the Module is allowed
to mutate it, rather than clients having to be aware of internal JIT state t
know when the could or could not operate on a module.
> >
> > Cheers,
> > Lang.
> >
> > On Sun, Sep 16, 2018 at 11:06 AM Alex Denisov <1101.debian at
gmail.com> wrote:
> > Hi Lang,
> >
> > > We would like to deallocate contexts
> >
> >
> > I did not look at the Orc APIs as of LLVM-6+, but I'm curious
where this requirement comes from?
> > Does Orc takes ownership of the modules that are being JIT'ed?
I.e., it is the same 'ownership model' as MCJIT has, am I right?
> >
> > I think the JIT users should take care of memory
allocation/deallocation. Also, I believe that you have strong reasons to
implement things this way :)
> > Could you please tell us more about the topic?
> >
> > > On 16. Sep 2018, at 01:14, Lang Hames via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
> > >
> > > Hi All,
> > >
> > > ORC's new concurrent compilation model generates some
interesting lifetime and thread safety questions around LLVMContext: We need
multiple LLVMContexts (one per module in the simplest case, but at least one per
thread), and the lifetime of each context depends on the execution path of the
JIT'd code. We would like to deallocate contexts once all modules associated
with them have been compiled, but there is no safe or easy way to check that
condition at the moment as LLVMContext does not expose how many modules are
associated with it.
> > >
> > > One way to fix this would be to add a mutex to LLVMContext, and
expose this and the module count. Then in the IR-compiling layer of the JIT we
could have something like:
> > >
> > > // Compile finished, time to deallocate the module.
> > > // Explicit deletes used for clarity, we would use unique_ptrs in
practice.
> > > auto &Ctx = Mod->getContext();
> > > delete Mod;
> > > std::lock_guard<std::mutex> Lock(Ctx->getMutex());
> > > if (Ctx.getNumModules())
> > >   delete Ctx;
> > >
> > > Another option would be to invert the ownership model and say
that each Module shares ownership of its LLVMContext. That way LLVMContexts
would be automatically deallocated when the last module using them is destructed
(providing no other shared_ptrs to the context are held elsewhere).
> > >
> > > There are other possible approaches (e.g. side tables for the
mutex and module count) but before I spent too much time on it I wanted to see
whether anyone else has encountered these issues or has opinions on solutions.
> > >
> > > Cheers,
> > > Lang.
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > llvm-dev at lists.llvm.org
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: Message signed with OpenPGP
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180920/9f82649a/attachment.sig>

llvm dev - Sep 2018 - Mull JIT Design

[llvm-dev] Mull JIT Design

[llvm-dev] Mull JIT Design