Hi Andres, I also want to highlight the necessity of some form of C API, that others> already have. ><snip>> It's fine if the set of "somewhat stable" C APIs doesn't provide all the > possible features, though.Ok. This got me thinking about what a simple LLJIT API should look like. I have posted a sketch of a possible API on http://llvm.org/PR31103 . I don't have time to implement it just yet, but I would be very happy to provide support and review patches if anyone else wants to give it a shot. What's the capability level of ORCv2 on RuntimeDyld compared to ORCv1?> Are there features supported in v1 that are only available on JITLink > supported platforms?At a high level, ORCv2's design allows for basically the same features as ORCv1, plus concurrent compilation. There are still a number of APIs that haven't been hooked up or implemented though. Most prominently: Event listeners and removable code. If you're using either of those features please let me know: I do want to make sure we continue to support them (or provide an equivalent). There are no features supported by ORCv1 that require JITLink under ORCv2.> - Improve JIT support for static initializers: > > - Add support for running initializers from object files, which will > enable loading and caching of objects containing initializers. > Hm, that's kind of supported for v1, right?It's "kind of" supported. MCJIT and ORCv1 provided support for scanning the llvm.global_ctors variable to find the names of static initializers to run. This works fine when (1) you're adding LLVM IR AND (2) you only care initializers described by llvm.global_ctors. On the other hand, if you add object files (or loading them from an ObjectCache), or if you have initializers not described by llvm.global_ctors (e.g. ObjC and Swift, which have additional initializers described by metadata sections) then MCJIT and ORCv1 provide no help out-of-the-box. This problem is further exacerbated by concurrent compilation in ORCv2: You may need to order your initializers (e.g. according to the llvm.global_ctors priority field), but objects may arrive at the JIT linker out of order due to concurrent compilation. The new ORCv2 initializer support aims to make all of this natural: We will provide 'dlopen' and 'dlclose' equivalent calls on JITDylibs. This will trigger compilation and execution of initializers that have not been run already. If you use JITLink, this will include using JITLink-plugins to discover the initializers to run, including initializers in metadata sections. -- Lang. On Mon, Jan 27, 2020 at 10:14 AM Andres Freund <andres at anarazel.de> wrote:> Hi, > > On 2020-01-16 18:00:53 -0800, Lang Hames via llvm-dev wrote: > > In the interests of improving visibility into ORC JIT development I'm > > going to try writing weekly status updates for the community. I hope > > they will provide insight into the design and state of development of > > LLVM's JIT APIs, as well as serving as a convenient space for > > discussions among LLVM's large and growing community of JIT API users. > > That's a great idea. > > > > Since this is the first update, I have also added some highlights from > last year, and the plan for 2020. > > > > Highlights from 2019: > > > > (1) ORCv1 was officially deprecated in LLVM 9. I have left it in for > > the LLVM 10 branch, but plan to remove it from master in the coming > > weeks. All development effort is now focused on ORCv2. If you are an > > ORCv1 client, now's the time to switch over. If you need help please > > ask on the llvm-dev mailing lists (make sure you CC me) or #llvm on > > discord. There are also some tips available in > > https://llvm.org/docs/ORCv2.html > > I also want to highlight the necessity of some form of C API, that > others already have. > > Besides just needing something that can be called from languages besides > C++, some amount of higher API stability is also important. For users of > LLVM with longer support cycles than LLVM (e.g. Postgres has 5 years of > back branch maintenance), and which live in a world where vendoring is > not allowed (most things going into linux distros), the API churn can be > serious problem. It's fine if the set of "somewhat stable" C APIs > doesn't provide all the possible features, though. > > It's easy enough to add a bunch of wrappers or ifdefs hiding some simple > signature changes, e.g. LLVMOrcGetSymbolAddress adding a parameter as > happened in LLVM 6, but backpatching support for a larger API redesigns, > into stable versions, is scary. We do however quickly get complaints if > a supported version cannot be compiled due to dependencies, as people > tend to upgrade their OS separately from e.g. their database major > version. > > > > (2) LLVM has a new JIT linker, JITLink, which is intended as an > > eventual replacement for RuntimeDyld. The new design supports linker > > plugins (allowing operation on the low-level bits generated by the JIT > > linker) and native code models (RuntimeDyld required a custom code > > model on some platforms). Currently JITLink only supports Darwin > > x86-64 and arm64, but I hope to see support for new platforms added in > > the future. > > What's the capability level of ORCv2 on RuntimeDyld compared to ORCv1? > Are there features supported in v1 that are only available on JITLink > supported platforms? > > > > - Improve JIT support for static initializers: > > - Add support for running initializers from object files, which will > enable loading and caching of objects containing initializers. > > Hm, that's kind of supported for v1, right? > > > Greetings, > > Andres Freund >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200128/061ace30/attachment.html>
Hi Lang, On 2020-01-28 13:35:07 -0800, Lang Hames wrote:> I also want to highlight the necessity of some form of C API, that others > > already have. > > > <snip> > > > It's fine if the set of "somewhat stable" C APIs doesn't provide all the > > possible features, though.> Ok. This got me thinking about what a simple LLJIT API should look like. I > have posted a sketch of a possible API on http://llvm.org/PR31103 .I'll take a look.> I don't have time to implement it just yet, but I would be very happy > to provide support and review patches if anyone else wants to give it > a shot.Hm. I don't immediately have time myself, but it's possible that I can get some help. Otherwise I'll try to look into it once my current set of tasks is done, if you haven't gotten to it by then.> > What's the capability level of ORCv2 on RuntimeDyld compared to ORCv1? > > Are there features supported in v1 that are only available on JITLink > > supported platforms?> At a high level, ORCv2's design allows for basically the same features as > ORCv1, plus concurrent compilation.Cool.> There are still a number of APIs that > haven't been hooked up or implemented though. Most prominently: Event > listeners and removable code. If you're using either of those features > please let me know: I do want to make sure we continue to support them (or > provide an equivalent).Heh, I/pg uses both :( WRT Event listeners: I don't quite know how one can really develop JITed code without wiring up profiler and debugger. I'm not wedded to the event listener interface itself, but debugger & profiler are really critical. Or is there a different plan for those features? WRT removable code: Postgres emits the code for all function it knows to need for a query at once (often that's all that are needed for one query, but not always), and removes it once there are no references to that set of functions anymore. As one session can use a *lot* of code over its lifetime, it's not at all feasible to not unload. Right now we use LLVMOrcRemoveModule(), which seems to work well enough. FWIW, for that usecase there's never any references into the code that needs to be removed (it only exports functions that need to be called by C code). It doesn't look all that cheap to just create one LLJIT instance for each set of code that needs to be removable. I don't really forsee using LLVM side lazy/incremental JITing - so far my experiments showing that the overhead of a code generation step makes it unattractive to incur that multiple times, and we have an interpreter that we can use until JIT compilation succeeds. So perhaps it's not *that* bad? What is the biggest difficulty in making code removable? In case you happen to be somewhere around the LLVM devroom at fosdem I'd be happy to briefly chat in person... Greetings, Andres Freund
Hi Andres,> There are still a number of APIs that > > haven't been hooked up or implemented though. Most prominently: Event > > listeners and removable code. If you're using either of those features > > please let me know: I do want to make sure we continue to support them > (or > > provide an equivalent). > > Heh, I/pg uses both :( > WRT Event listeners: I don't quite know how one can really develop JITed > code without wiring up profiler and debugger. I'm not wedded to the > event listener interface itself, but debugger & profiler are really > critical. Or is there a different plan for those features?We definitely need debugger and profiling support. The right interface for this is an open question. I think we can add support for the existing EventListener interface to RTDyldObjectLinkingLayer. That will make porting easy for existing clients. EventListener isn't a good fit for JITLink/ObjectLinkingLayer at the moment. EventListener (via the RuntimeDyld::LoadedObjectInfo parameter to notifyObjectLoaded) is implicitly assuming that the linker operates on whole sections, but JITLink operates on a per-symbol basis, at least on MachO. Individual symbols within a section may be re-ordered or dead-stripped, so there's no easy correspondence between the original bytes of a section and the final allocated bytes. That said, I don't think there's any fundamental problem here: The static linkers perform dead stripping and reordering too. As long as we figure out the right way to present the layout of the allocated memory to the debuggers and profilers I think they should be able to handle it just fine. Better yet, we don't have to come up with a new "EventListener 2.0" API for ObjectLinkingLayer: It already has ObjectLinkingLayer::Plugin, which (with some minor tweaks) should be much more flexible. If you're interested in trying out the ObjectLinkingLayer::Plugin API at all, there's an example in llvm/examples/LLJITExamples/LLJITWithObjectLinkingLayerPlugin (code on GitHub here <https://github.com/llvm/llvm-project/blob/master/llvm/examples/LLJITExamples/LLJITWithObjectLinkingLayerPlugin/LLJITWithObjectLinkingLayerPlugin.cpp> ). WRT removable code:> > Right now we use LLVMOrcRemoveModule(), which seems to work well enough...Good to hear. It doesn't look all that cheap to just create one LLJIT instance for> each set of code that needs to be removable.I haven't tested the cost yet, so couldn't say either way. I definitely haven't optimized construction of instances though. I don't really forsee using LLVM side lazy/incremental JITing - so far> my experiments showing that the overhead of a code generation step > makes it unattractive to incur that multiple times, and we have an > interpreter that we can use until JIT compilation succeeds. So perhaps > it's not *that* bad?Just to make sure I understand: Are you saying that the overhead of constructing the codegen pipeline shows up as substantial overhead? I can totally believe it, I've just never measured it myself. If that's the case it would be interesting to dig in to where the time is being spent. This has never been optimized in the JIT, so there may be some easy improvements we can make. What is the biggest difficulty in making code removable? Concurrency, mostly. If you've added a symbol definition, what happens if you issue a call to remove it just as someone else tries to look it up? My answer (not yet implemented) is in several parts: (1) On the JIT state side: (1.a) If the symbol hasn't been compiled yet then a call to remove *may* end up being high latency: E.g. in the case above if the lookup arrives first it will trigger a compile, and from the JIT's perspective that's one long, uninterruptible operation. That's bad luck, but once it's done the call to remove can prevent the compiled code from being registered with the symbol table, and can inform anyone who was waiting on that definition that it failed to compile. On the other hand, if the call to remove arrives first then the operation will be quick and it will be as-if the symbol were never defined. (1.b) If the symbol has been compiled already: We free the allocated resources and remove it from the symbol table. This operation should be quick, but see part (2). (2) On the JIT'd code side: Clients are responsible for resource dependencies. For example if you've JIT'd two functions, foo and bar, and foo contains a call to bar, and then you remove bar: it is up to you to make sure you never hit that call site for bar in foo. (3) Overhead: Some clients want fine grained resource tracking, others don't. My plan is to replace the VModuleKey placeholder type with a ResourceTracker class. If you specify a ResourceTracker when adding a module to the JIT then you will be able to call ResourceTracker::remove to remove just that module. If you do not specify a ResourceTracker when adding a module then the Module will be assigned to the default ResourceTracker for the containing JITDylib. Resources for the module will be deleted when you close the containing JITDylib. As for why this hasn't been implemented yet: just time constraints on my part. -- Lang. On Sat, Feb 1, 2020 at 6:16 AM Andres Freund <andres at anarazel.de> wrote:> Hi Lang, > > On 2020-01-28 13:35:07 -0800, Lang Hames wrote: > > I also want to highlight the necessity of some form of C API, that others > > > already have. > > > > > <snip> > > > > > It's fine if the set of "somewhat stable" C APIs doesn't provide all > the > > > possible features, though. > > > Ok. This got me thinking about what a simple LLJIT API should look like. > I > > have posted a sketch of a possible API on http://llvm.org/PR31103 . > > I'll take a look. > > > > I don't have time to implement it just yet, but I would be very happy > > to provide support and review patches if anyone else wants to give it > > a shot. > > Hm. I don't immediately have time myself, but it's possible that I can > get some help. Otherwise I'll try to look into it once my current set of > tasks is done, if you haven't gotten to it by then. > > > > > What's the capability level of ORCv2 on RuntimeDyld compared to ORCv1? > > > Are there features supported in v1 that are only available on JITLink > > > supported platforms? > > > At a high level, ORCv2's design allows for basically the same features as > > ORCv1, plus concurrent compilation. > > Cool. > > > > There are still a number of APIs that > > haven't been hooked up or implemented though. Most prominently: Event > > listeners and removable code. If you're using either of those features > > please let me know: I do want to make sure we continue to support them > (or > > provide an equivalent). > > Heh, I/pg uses both :( > > WRT Event listeners: I don't quite know how one can really develop JITed > code without wiring up profiler and debugger. I'm not wedded to the > event listener interface itself, but debugger & profiler are really > critical. Or is there a different plan for those features? > > WRT removable code: > > Postgres emits the code for all function it knows to need for a query at > once (often that's all that are needed for one query, but not always), > and removes it once there are no references to that set of functions > anymore. As one session can use a *lot* of code over its lifetime, it's > not at all feasible to not unload. Right now we use > LLVMOrcRemoveModule(), which seems to work well enough. FWIW, for that > usecase there's never any references into the code that needs to be > removed (it only exports functions that need to be called by C code). > > It doesn't look all that cheap to just create one LLJIT instance for > each set of code that needs to be removable. I don't really forsee using > LLVM side lazy/incremental JITing - so far my experiments showing that > the overhead of a code generation step makes it unattractive to incur > that multiple times, and we have an interpreter that we can use until > JIT compilation succeeds. So perhaps it's not *that* bad? > > What is the biggest difficulty in making code removable? > > > In case you happen to be somewhere around the LLVM devroom at fosdem I'd > be happy to briefly chat in person... > > Greetings, > > Andres Freund >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200207/32d980b6/attachment-0001.html>