thr3ads.net - llvm dev - [llvm-dev] ORC JIT Weekly #1 [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Lang Hames via llvm-dev

2020-Jan-28 21:35 UTC

[llvm-dev] ORC JIT Weekly #1

Hi Andres,

I also want to highlight the necessity of some form of C API, that
others> already have.
><snip>
> It's fine if the set of "somewhat stable" C APIs doesn't
provide all the
> possible features, though.

Ok. This got me thinking about what a simple LLJIT API should look like. I
have posted a sketch of a possible API on http://llvm.org/PR31103 . I don't
have time to implement it just yet, but I would be very happy to provide
support and review patches if
anyone else wants to give it a shot.

What's the capability level of ORCv2 on RuntimeDyld compared to
ORCv1?> Are there features supported in v1 that are only available on JITLink
> supported platforms?

At a high level, ORCv2's design allows for basically the same features as
ORCv1, plus concurrent compilation. There are still a number of APIs that
haven't been hooked up or implemented though. Most prominently: Event
listeners and removable code. If you're using either of those features
please let me know: I do want to make sure we continue to support them (or
provide an equivalent).

There are no features supported by ORCv1 that require JITLink under ORCv2.
> - Improve JIT support for static initializers:
> >   - Add support for running initializers from object files, which will
> enable loading and caching of objects containing initializers.
> Hm, that's kind of supported for v1, right?

It's "kind of" supported. MCJIT and ORCv1 provided support for
scanning the
llvm.global_ctors variable to find the names of static initializers to run.
This works fine when (1) you're adding LLVM IR AND (2) you only care
initializers described by llvm.global_ctors. On the other hand, if you add
object files (or loading them from an ObjectCache), or if you have
initializers not described by llvm.global_ctors (e.g. ObjC and Swift, which
have additional initializers described by metadata sections) then MCJIT and
ORCv1 provide no help out-of-the-box. This problem is further exacerbated
by concurrent compilation in ORCv2: You may need to order your initializers
(e.g. according to the llvm.global_ctors priority field), but objects may
arrive at the JIT linker out of order due to concurrent compilation.

The new ORCv2 initializer support aims to make all of this natural: We will
provide 'dlopen' and 'dlclose' equivalent calls on JITDylibs.
This will
trigger compilation and execution of initializers that have not been run
already. If you use JITLink, this will include using JITLink-plugins to
discover the initializers to run, including initializers in metadata
sections.

-- Lang.

On Mon, Jan 27, 2020 at 10:14 AM Andres Freund <andres at anarazel.de>
wrote:
> Hi,
>
> On 2020-01-16 18:00:53 -0800, Lang Hames via llvm-dev wrote:
> > In the interests of improving visibility into ORC JIT development
I'm
> > going to try writing weekly status updates for the community. I hope
> > they will provide insight into the design and state of development of
> > LLVM's JIT APIs, as well as serving as a convenient space for
> > discussions among LLVM's large and growing community of JIT API
users.
>
> That's a great idea.
>
>
> > Since this is the first update, I have also added some highlights from
> last year, and the plan for 2020.
> >
> > Highlights from 2019:
> >
> > (1) ORCv1 was officially deprecated in LLVM 9. I have left it in for
> > the LLVM 10 branch, but plan to remove it from master in the coming
> > weeks. All development effort is now focused on ORCv2. If you are an
> > ORCv1 client, now's the time to switch over. If you need help
please
> > ask on the llvm-dev mailing lists (make sure you CC me) or #llvm on
> > discord. There are also some tips available in
> > https://llvm.org/docs/ORCv2.html
>
> I also want to highlight the necessity of some form of C API, that
> others already have.
>
> Besides just needing something that can be called from languages besides
> C++, some amount of higher API stability is also important. For users of
> LLVM with longer support cycles than LLVM (e.g. Postgres has 5 years of
> back branch maintenance), and which live in a world where vendoring is
> not allowed (most things going into linux distros), the API churn can be
> serious problem.  It's fine if the set of "somewhat stable" C
APIs
> doesn't provide all the possible features, though.
>
> It's easy enough to add a bunch of wrappers or ifdefs hiding some
simple
> signature changes, e.g. LLVMOrcGetSymbolAddress adding a parameter as
> happened in LLVM 6, but backpatching support for a larger API redesigns,
> into stable versions, is scary.  We do however quickly get complaints if
> a supported version cannot be compiled due to dependencies, as people
> tend to upgrade their OS separately from e.g. their database major
> version.
>
>
> > (2) LLVM has a new JIT linker, JITLink, which is intended as an
> > eventual replacement for RuntimeDyld. The new design supports linker
> > plugins (allowing operation on the low-level bits generated by the JIT
> > linker) and native code models (RuntimeDyld required a custom code
> > model on some platforms). Currently JITLink only supports Darwin
> > x86-64 and arm64, but I hope to see support for new platforms added in
> > the future.
>
> What's the capability level of ORCv2 on RuntimeDyld compared to ORCv1?
> Are there features supported in v1 that are only available on JITLink
> supported platforms?
>
>
> > - Improve JIT support for static initializers:
> >   - Add support for running initializers from object files, which will
> enable loading and caching of objects containing initializers.
>
> Hm, that's kind of supported for v1, right?
>
>
> Greetings,
>
> Andres Freund
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200128/061ace30/attachment.html>

Andres Freund via llvm-dev

2020-Feb-01 14:16 UTC

head link

[llvm-dev] ORC JIT Weekly #1

Hi Lang,

On 2020-01-28 13:35:07 -0800, Lang Hames wrote:> I also want to highlight the necessity of some form of C API, that others
> > already have.
> >
> <snip>
> 
> > It's fine if the set of "somewhat stable" C APIs
doesn't provide all the
> > possible features, though.
> Ok. This got me thinking about what a simple LLJIT API should look like. I
> have posted a sketch of a possible API on http://llvm.org/PR31103 .
I'll take a look.

> I don't have time to implement it just yet, but I would be very happy
> to provide support and review patches if anyone else wants to give it
> a shot.
Hm. I don't immediately have time myself, but it's possible that I can
get some help. Otherwise I'll try to look into it once my current set of
tasks is done, if you haven't gotten to it by then.

> > What's the capability level of ORCv2 on RuntimeDyld compared to
ORCv1?
> > Are there features supported in v1 that are only available on JITLink
> > supported platforms?
> At a high level, ORCv2's design allows for basically the same features
as
> ORCv1, plus concurrent compilation.
Cool.

> There are still a number of APIs that
> haven't been hooked up or implemented though. Most prominently: Event
> listeners and removable code. If you're using either of those features
> please let me know: I do want to make sure we continue to support them (or
> provide an equivalent).
Heh, I/pg uses both :(

WRT Event listeners: I don't quite know how one can really develop JITed
code without wiring up profiler and debugger. I'm not wedded to the
event listener interface itself, but debugger & profiler are really
critical. Or is there a different plan for those features?

WRT removable code:

Postgres emits the code for all function it knows to need for a query at
once (often that's all that are needed for one query, but not always),
and removes it once there are no references to that set of functions
anymore. As one session can use a *lot* of code over its lifetime, it's
not at all feasible to not unload.  Right now we use
LLVMOrcRemoveModule(), which seems to work well enough.  FWIW, for that
usecase there's never any references into the code that needs to be
removed (it only exports functions that need to be called by C code).

It doesn't look all that cheap to just create one LLJIT instance for
each set of code that needs to be removable. I don't really forsee using
LLVM side lazy/incremental JITing - so far my experiments showing that
the overhead of a code generation step makes it unattractive to incur
that multiple times, and we have an interpreter that we can use until
JIT compilation succeeds. So perhaps it's not *that* bad?

What is the biggest difficulty in making code removable?

In case you happen to be somewhere around the LLVM devroom at fosdem I'd
be happy to briefly chat in person...

Greetings,

Andres Freund

Lang Hames via llvm-dev

2020-Feb-07 23:35 UTC

head link

[llvm-dev] ORC JIT Weekly #1

Hi Andres,
> There are still a number of APIs that
> > haven't been hooked up or implemented though. Most prominently:
Event
> > listeners and removable code. If you're using either of those
features
> > please let me know: I do want to make sure we continue to support them
> (or
> > provide an equivalent).
>
> Heh, I/pg uses both :(
> WRT Event listeners: I don't quite know how one can really develop
JITed
> code without wiring up profiler and debugger. I'm not wedded to the
> event listener interface itself, but debugger & profiler are really
> critical. Or is there a different plan for those features?

We definitely need debugger and profiling support. The right interface for
this is an open question.

I think we can add support for the existing EventListener interface to
RTDyldObjectLinkingLayer. That will make porting easy for existing clients.

EventListener isn't a good fit for JITLink/ObjectLinkingLayer at the
moment. EventListener (via the RuntimeDyld::LoadedObjectInfo parameter to
notifyObjectLoaded) is implicitly assuming that the linker operates on
whole sections, but JITLink operates on a per-symbol basis, at least on
MachO. Individual symbols within a section may be re-ordered or
dead-stripped, so there's no easy correspondence between the original bytes
of a section and the final allocated bytes.

That said, I don't think there's any fundamental problem here: The
static
linkers perform dead stripping and reordering too. As long as we figure out
the right way to present the layout of the allocated memory to the
debuggers and profilers I think they should be able to handle it just fine.
Better yet, we don't have to come up with a new "EventListener
2.0" API for
ObjectLinkingLayer: It already has ObjectLinkingLayer::Plugin, which (with
some minor tweaks) should be much more flexible.

If you're interested in trying out the ObjectLinkingLayer::Plugin API at
all, there's an example in
llvm/examples/LLJITExamples/LLJITWithObjectLinkingLayerPlugin (code on
GitHub here
<https://github.com/llvm/llvm-project/blob/master/llvm/examples/LLJITExamples/LLJITWithObjectLinkingLayerPlugin/LLJITWithObjectLinkingLayerPlugin.cpp>
).

WRT removable code:>
> Right now we use LLVMOrcRemoveModule(), which seems to work well enough...

Good to hear.

It doesn't look all that cheap to just create one LLJIT instance
for> each set of code that needs to be removable.

I haven't tested the cost yet, so couldn't say either way. I definitely
haven't optimized construction of instances though.

I don't really forsee using LLVM side lazy/incremental JITing - so
far> my experiments showing that the overhead of a code generation step
> makes it unattractive to incur that multiple times, and we have an
> interpreter that we can use until JIT compilation succeeds. So perhaps
> it's not *that* bad?

Just to make sure I understand: Are you saying that the overhead of
constructing the codegen pipeline shows up as substantial overhead? I can
totally believe it, I've just never measured it myself.

If that's the case it would be interesting to dig in to where the time is
being spent. This has never been optimized in the JIT, so there may be some
easy improvements we can make.

What is the biggest difficulty in making code removable?

Concurrency, mostly. If you've added a symbol definition, what happens if
you issue a call to remove it just as someone else tries to look it up?

My answer (not yet implemented) is in several parts:

(1) On the JIT state side:

(1.a) If the symbol hasn't been compiled yet then a call to remove *may*
end up being high latency: E.g. in the case above if the lookup arrives
first it will trigger a compile, and from the JIT's perspective that's
one
long, uninterruptible operation. That's bad luck, but once it's done the
call to remove can prevent the compiled code from being registered with the
symbol table, and can inform anyone who was waiting on that definition that
it failed to compile. On the other hand, if the call to remove arrives
first then the operation will be quick and it will be as-if the symbol were
never defined.

(1.b) If the symbol has been compiled already: We free the allocated
resources and remove it from the symbol table. This operation should be
quick, but see part (2).

(2) On the JIT'd code side: Clients are responsible for resource
dependencies. For example if you've JIT'd two functions, foo and bar,
and
foo contains a call to bar, and then you remove bar: it is up to you to
make sure you never hit that call site for bar in foo.

(3) Overhead: Some clients want fine grained resource tracking, others
don't. My plan is to replace the VModuleKey placeholder type with a
ResourceTracker class. If you specify a ResourceTracker when adding a
module to the JIT then you will be able to call ResourceTracker::remove to
remove just that module. If you do not specify a ResourceTracker when
adding a module then the Module will be assigned to the default
ResourceTracker for the containing JITDylib. Resources for the module will
be deleted when you close the containing JITDylib.

As for why this hasn't been implemented yet: just time constraints on my
part.

-- Lang.

On Sat, Feb 1, 2020 at 6:16 AM Andres Freund <andres at anarazel.de>
wrote:
> Hi Lang,
>
> On 2020-01-28 13:35:07 -0800, Lang Hames wrote:
> > I also want to highlight the necessity of some form of C API, that
others
> > > already have.
> > >
> > <snip>
> >
> > > It's fine if the set of "somewhat stable" C APIs
doesn't provide all
> the
> > > possible features, though.
>
> > Ok. This got me thinking about what a simple LLJIT API should look
like.
> I
> > have posted a sketch of a possible API on http://llvm.org/PR31103 .
>
> I'll take a look.
>
>
> > I don't have time to implement it just yet, but I would be very
happy
> > to provide support and review patches if anyone else wants to give it
> > a shot.
>
> Hm. I don't immediately have time myself, but it's possible that I
can
> get some help. Otherwise I'll try to look into it once my current set
of
> tasks is done, if you haven't gotten to it by then.
>
>
> > > What's the capability level of ORCv2 on RuntimeDyld compared
to ORCv1?
> > > Are there features supported in v1 that are only available on
JITLink
> > > supported platforms?
>
> > At a high level, ORCv2's design allows for basically the same
features as
> > ORCv1, plus concurrent compilation.
>
> Cool.
>
>
> > There are still a number of APIs that
> > haven't been hooked up or implemented though. Most prominently:
Event
> > listeners and removable code. If you're using either of those
features
> > please let me know: I do want to make sure we continue to support them
> (or
> > provide an equivalent).
>
> Heh, I/pg uses both :(
>
> WRT Event listeners: I don't quite know how one can really develop
JITed
> code without wiring up profiler and debugger. I'm not wedded to the
> event listener interface itself, but debugger & profiler are really
> critical. Or is there a different plan for those features?
>
> WRT removable code:
>
> Postgres emits the code for all function it knows to need for a query at
> once (often that's all that are needed for one query, but not always),
> and removes it once there are no references to that set of functions
> anymore. As one session can use a *lot* of code over its lifetime, it's
> not at all feasible to not unload.  Right now we use
> LLVMOrcRemoveModule(), which seems to work well enough.  FWIW, for that
> usecase there's never any references into the code that needs to be
> removed (it only exports functions that need to be called by C code).
>
> It doesn't look all that cheap to just create one LLJIT instance for
> each set of code that needs to be removable. I don't really forsee
using
> LLVM side lazy/incremental JITing - so far my experiments showing that
> the overhead of a code generation step makes it unattractive to incur
> that multiple times, and we have an interpreter that we can use until
> JIT compilation succeeds. So perhaps it's not *that* bad?
>
> What is the biggest difficulty in making code removable?
>
>
> In case you happen to be somewhere around the LLVM devroom at fosdem
I'd
> be happy to briefly chat in person...
>
> Greetings,
>
> Andres Freund
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200207/32d980b6/attachment-0001.html>

llvm dev - Feb 2020 - ORC JIT Weekly #1

[llvm-dev] ORC JIT Weekly #1

[llvm-dev] ORC JIT Weekly #1

[llvm-dev] ORC JIT Weekly #1