thr3ads.net - llvm dev - [llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Revital1 Eres

2015-Jul-27 06:17 UTC

[LLVMdev] Help with using LLVM to re-compile hot functions at run-time

Hi Again,

I'm a little confused regarding what is the exact Orc's functions I
should
use
in order to save the functions code in a code cache so it could be later
replaced with different versions of it and I appreciate your help.

Just a reminder I want to dynamically recompile the program based on 
profile
 collected at the run-time. I would like to start executing the program 
from
the code-cache and at some point be able to replace a function body with 
it's
new compiled version; this can be done by replacing the entry in the 
function
 code with a trampoline to It's new version so that future calls to it 
will
call the new version code.

Does the CompileOnDemandLayer executes the program from a code cache 
and holds pointers to the code of the functions it executes? I am 
compiling for Power machine. 
Is there a target specific pieces that I should implement for making Orc 
work on Power?

Thanks again,
Revital




From:   Lang Hames <lhames at gmail.com>
To:     Revital1 Eres/Haifa/IBM at IBMIL
Cc:     LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:   20/07/2015 08:41 PM
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

The CompileOnDemand layer is used by the lazy bitcode JIT in the lli tool. 
You can find the code in llvm/tools/lli/OrcLazyJIT.* .

Cheers,
Lang.


On Mon, Jul 20, 2015 at 2:32 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hello Lang, 

Thanks for your answer. 

I am now looking for an example of the usage of CompileOnDemandLayer. Is 
there an example available for that (could not find one in llvm/examples)? 


Thanks, 
Revital 



From:        Lang Hames <lhames at gmail.com> 
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> 
Date:        10/07/2015 12:10 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time 



Hi Revital, 

LLVM does have an IR interpreter, but I don't think it's maintained well
(or possibly at all). The interpreter is also not designed to interact 
with the LLVM JITs. 

We generally encourage people to just JIT LLVM IR, rather than 
interpreting it. For the use-case you have described, you could JIT IR 
with no optimizations to begin with, then re-JIT hot functions at a higher 
level. 

The Orc JIT APIs (LLVM's newer JIT APIs) were written with this kind of 
use-case in mind, and are probably a better fit for this than MCJIT. There 
is no built-in hot-function detection or recompilation yet, but I think 
this would be *fairly* easy to write in terms of Orc's callback API. 

Cheers, 
Lang. 


On Thu, Jul 9, 2015 at 4:19 AM, Revital1 Eres <ERES at il.ibm.com> wrote: 
Hello, 

I am new to LLVM and a I appreciate your help with the following: 

I want to run the LLVM IR through virtual machine (LLVM interpreter?) and 
jit 
compile the hot functions (using MCJIT). 

This task will require amongst other identifying the hot functions and 
having a 
code cache that should be patched with the native code of the functions 
after 
they are jitted. 

I've read so far about MCJIT and lli however I have not seen that the LLVM 

interpreter can be used as a VM the way I was looking for; meaning 
execute the code one instruction at a time; have a profiling mode to 
identify hot functions and call jit to compile the hot functions. 

I appreciate any advice/starting points for this project. 

Thanks, 
Revital 

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev





-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150727/39ce254f/attachment.html>

Hal Finkel

2015-Jul-27 06:35 UTC

head link

[LLVMdev] Help with using LLVM to re-compile hot functions at run-time

----- Original Message -----> From: "Revital1 Eres" <ERES at il.ibm.com>
> To: "Lang Hames" <lhames at gmail.com>
> Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Monday, July 27, 2015 1:17:52 AM
> Subject: Re: [LLVMdev] Help with using LLVM to re-compile hot functions at
run-time
> 
> 
> Hi Again,
> 
> I'm a little confused regarding what is the exact Orc's functions I
> should use
> in order to save the functions code in a code cache so it could be
> later
> replaced with different versions of it and I appreciate your help.
> 
> Just a reminder I want to dynamically recompile the program based on
> profile
> collected at the run-time. I would like to start executing the
> program from
> the code-cache and at some point be able to replace a function body
> with it's
> new compiled version; this can be done by replacing the entry in the
> function
> code with a trampoline to It's new version so that future calls to it
> will
> call the new version code.
> 
> Does the CompileOnDemandLayer executes the program from a code cache
> and holds pointers to the code of the functions it executes? I am
> compiling for Power machine.
> Is there a target specific pieces that I should implement for making
> Orc work on Power?
There is code in lib/ExecutionEngine/Orc/OrcTargetSupport.cpp that is currently
only implemented for x86_64 that is necessary in order to make the lazy
compilation work (triggering compilation only when a function is first called,
etc.). I'll let someone else comment on the rest of the details...

 -Hal
> 
> Thanks again,
> Revital
> 
> 
> 
> 
> From: Lang Hames <lhames at gmail.com>
> To: Revital1 Eres/Haifa/IBM at IBMIL
> Cc: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
> Date: 20/07/2015 08:41 PM
> Subject: Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> 
> 
> 
> 
> Hi Revital,
> 
> The CompileOnDemand layer is used by the lazy bitcode JIT in the lli
> tool. You can find the code in llvm/tools/lli/OrcLazyJIT.* .
> 
> Cheers,
> Lang.
> 
> 
> On Mon, Jul 20, 2015 at 2:32 AM, Revital1 Eres < ERES at il.ibm.com >
> wrote:
> Hello Lang ,
> 
> Thanks for your answer.
> 
> I am now looking for an example of the usage of CompileOnDemandLayer.
> Is there an example available for that (could not find one in
> llvm/examples)?
> 
> Thanks,
> Revital
> 
> 
> 
> From: Lang Hames < lhames at gmail.com >
> To: Revital1 Eres/Haifa/IBM at IBMIL
> Cc: LLVM Developers Mailing List < llvmdev at cs.uiuc.edu >
> Date: 10/07/2015 12:10 AM
> Subject: Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> 
> 
> 
> 
> Hi Revital,
> 
> LLVM does have an IR interpreter, but I don't think it's maintained
> well (or possibly at all). The interpreter is also not designed to
> interact with the LLVM JITs.
> 
> We generally encourage people to just JIT LLVM IR, rather than
> interpreting it. For the use-case you have described, you could JIT
> IR with no optimizations to begin with, then re-JIT hot functions at
> a higher level.
> 
> The Orc JIT APIs (LLVM's newer JIT APIs) were written with this kind
> of use-case in mind, and are probably a better fit for this than
> MCJIT. There is no built-in hot-function detection or recompilation
> yet, but I think this would be *fairly* easy to write in terms of
> Orc's callback API.
> 
> Cheers,
> Lang.
> 
> 
> On Thu, Jul 9, 2015 at 4:19 AM, Revital1 Eres < ERES at il.ibm.com >
> wrote:
> Hello,
> 
> I am new to LLVM and a I appreciate your help with the following:
> 
> I want to run the LLVM IR through virtual machine (LLVM interpreter?)
> and jit
> compile the hot functions (using MCJIT).
> 
> This task will require amongst other identifying the hot functions
> and having a
> code cache that should be patched with the native code of the
> functions after
> they are jitted.
> 
> I've read so far about MCJIT and lli however I have not seen that the
> LLVM
> interpreter can be used as a VM the way I was looking for; meaning
> execute the code one instruction at a time; have a profiling mode to
> identify hot functions and call jit to compile the hot functions.
> 
> I appreciate any advice/starting points for this project.
> 
> Thanks,
> Revital
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> 
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Lang Hames

2015-Jul-28 02:58 UTC

head link

[LLVMdev] Help with using LLVM to re-compile hot functions at run-time

Hi Revital,

What do you mean by "code cache"? Orc (and MCJIT) does have the
concept of
an ObjectCache, which is a long-lived, potentially persistent, compiled
version of some IR. It's not a key component of the JIT though: Most
clients run without a cache attached and just JIT their code from scratch
in each session.

Recompilation is orthogonal to caching. There is no in-tree support for
recompilation yet. There are several ways that it could be supported,
depending on what security / performance trade-offs you're willing to make,
and how deep in to the LLVM code you want to get. As things stand at the
moment all function calls in the lazy JIT are indirected via function
pointers. We want to add support for patchable call-sites, but this hasn't
been implemented yet. The Indirect calls make recompilation reasonably
easy: You could add a transform layer on top of the CompileCallbackLayer
which would modify each function like this:

void foo$impl() {          void foo$impl() {
  // foo body        ->      if (trigger_condition) {
}                              auto fooOpt = jit_recompile_hot(&foo);
                               fooOpt();
                             }
                             // foo body
                           }

You would implement the jit_recompile_hot function yourself in your JIT and
make it available to JIT'd code via the SymbolResolver. When the trigger
condition is met you'll get a call to recompile foo, at which point you:
(1) Add the IR for foo to a 2nd IRCompileLayer that has been configured
with a higher optimization level, (2) look up the address of the optimized
version of foo, and (3) update the function pointer for foo to point at the
optimized version. The process for patchable callsites should be fairly
similar once they're available, except that you'll trigger a call-site
update rather than rewriting a function pointer.

This neglects all sorts of fun details (threading, garbage collection of
old function implementations), but hopefully it gives you a place to start.


Regarding laziness, as Hal mentioned you'll have to provide some target
support for PowerPC to support lazy compilation. For a rough guide you can
check out the X86_64 support code in
llvm/include/llvm/ExecutionEngine/Orc/OrcTargetSupport.h and
llvm/lib/ExecutionEngine/Orc/OrcTargetSupport.cpp.

There are two methods that you'll need to implement:
insertCompileCallbackTrampoline and insertResolverBlock. These work
together to enable lazy compilation. Both of these methods inject blobs of
target specific code in to the JIT process. To do this (at least for now) I
make use of a handy feature of LLVM IR: You can write raw assembly code
directly into a bitcode module ("module-level asm"). If you look at
the X86
implementation of each of these methods you'll see they're written in
terms
of string-streams building up a string of assembly which will be handed off
to the JIT to compile like any other code.

The first blob that you need to be able to output is the resolver block.
The purpose of the resolver block is to save program state and call back in
to the JIT to trigger lazy compilation of a function. When the JIT is done
compiling the function it returns the address of the compiled function to
the resolver block, and the resolver block returns to the compiled function
(rather than its original return address).

Because all functions share the same resolver block, the JIT needs some way
to distinguish them, which is where the trampolines come in. The JIT emits
one trampoline per function and each trampoline just calls the resolver
block. The return address of the call in each trampoline provides the
unique address that the JIT associates with the to-be-compiled functions.
The CompileCallbackManager manages this association between trampolines and
functions for you, you just need to provide the resolver/trampoline
primitives.

In case it helps, here's what the output of all this looks like on X86.
Trampolines are trivial - they're emitted in blocks and proceeded by a
pointer to the resolver block:

module asm "Lorc_resolve_block_addr:"
module asm "  .quad 140439143575560"
module asm "orc_jcc_0:"
module asm "  callq *Lorc_resolve_block_addr(%rip)"
module asm "orc_jcc_1:"
module asm "  callq *Lorc_resolve_block_addr(%rip)"
module asm "orc_jcc_2:"
module asm "  callq *Lorc_resolve_block_addr(%rip)"
...


The resolver block is more complicated and I won't provide the full code
for it here. You can find it by running:

lli -jit-kind=orc-lazy -orc-lazy-debug=mods-to-stderr <hello_world.ll>

and looking at the initial output. In pseudo-asm though, it looks like this:

module asm "jit_callback_manager_addr:"
module asm "  .quad 0x46fc190" // <- address of callback manager
object
module asm "orc_resolver_block:"
module asm "  // save register state."
module asm "  // load jit_callback_manager_addr into %rdi
module asm "  // load the return address (from the trampoline call) into
%rsi
module asm "  // %rax = call jit(%rdi, %rsi)
module asm "  // save %rax over the return address
module asm "  //  restore register state
module asm "  //  retq"

So, that's a whirlwind intro to implementing lazy JITing support for a new
architecture in Orc. I'll try to answer any questions you have on the
topic, though I'm not familiar with PowerPC at all. If you're
comfortable
with PowerPC assembly I think it should be possible to implement once you
grok the concepts.

Hope this helps!

Cheers,
Lang.


On Jul 26, 2015, at 11:17 PM, Revital1 Eres <ERES at il.ibm.com> wrote:

Hi Again,

I'm a little confused regarding what is the exact Orc's functions I
should
use
in order to save the functions code in a code cache so it could be later
replaced with different versions of it and I appreciate your help.

Just a reminder I want to dynamically recompile the program based on profile
 collected at the run-time. I would like to start executing the program from
the code-cache and at some point be able to replace a function body with
it's
new compiled version; this can be done by replacing the entry in the
function
 code with a trampoline to It's new version so that future calls to it will
call the new version code.

Does the CompileOnDemandLayer executes the program from a code cache
and holds pointers to the code of the functions it executes? I am compiling
for Power machine.
Is there a target specific pieces that I should implement for making Orc
work on Power?

Thanks again,
Revital




From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:        20/07/2015 08:41 PM
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
functions at run-time
------------------------------



Hi Revital,

The CompileOnDemand layer is used by the lazy bitcode JIT in the lli tool.
You can find the code in llvm/tools/lli/OrcLazyJIT.* .

Cheers,
Lang.


On Mon, Jul 20, 2015 at 2:32 AM, Revital1 Eres <*ERES at il.ibm.com*
<ERES at il.ibm.com>> wrote:
Hello Lang,

Thanks for your answer.

I am now looking for an example of the usage of CompileOnDemandLayer. Is
there an example available for that (could not find one in llvm/examples)?

Thanks,
Revital



From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
To:        Revital1 Eres/Haifa/IBM at IBMIL
Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
<llvmdev at cs.uiuc.edu>>
Date:        10/07/2015 12:10 AM
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
functions at run-time
 ------------------------------



Hi Revital,

LLVM does have an IR interpreter, but I don't think it's maintained well
(or possibly at all). The interpreter is also not designed to interact with
the LLVM JITs.

We generally encourage people to just JIT LLVM IR, rather than interpreting
it. For the use-case you have described, you could JIT IR with no
optimizations to begin with, then re-JIT hot functions at a higher level.

The Orc JIT APIs (LLVM's newer JIT APIs) were written with this kind of
use-case in mind, and are probably a better fit for this than MCJIT. There
is no built-in hot-function detection or recompilation yet, but I think
this would be *fairly* easy to write in terms of Orc's callback API.

Cheers,
Lang.


On Thu, Jul 9, 2015 at 4:19 AM, Revital1 Eres <*ERES at il.ibm.com*
<ERES at il.ibm.com>> wrote:
Hello,

I am new to LLVM and a I appreciate your help with the following:

I want to run the LLVM IR through virtual machine (LLVM interpreter?) and
jit
compile the hot functions (using MCJIT).

This task will require amongst other identifying the hot functions and
having a
code cache that should be patched with the native code of the functions
after
they are jitted.

I've read so far about MCJIT and lli however I have not seen that the LLVM
interpreter can be used as a VM the way I was looking for; meaning
execute the code one instruction at a time; have a profiling mode to
identify hot functions and call jit to compile the hot functions.

I appreciate any advice/starting points for this project.

Thanks,
Revital

_______________________________________________
LLVM Developers mailing list
*LLVMdev at cs.uiuc.edu* <LLVMdev at cs.uiuc.edu>
*http://llvm.cs.uiuc.edu* <http://llvm.cs.uiuc.edu/>
*http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev*
<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150727/8a9851af/attachment.html>

Revital1 Eres

2015-Jul-28 08:33 UTC

head link

[LLVMdev] Help with using LLVM to re-compile hot functions at run-time

Hi Lang,

Thank you very much for the detailed reply!! I will take a closer 
look at it and hopefully could start implementing my task 
based on Orc API.

Btw, by code cache I meant to have the ability to run the
the executed code from a place where I could later 
patch it -- redirect calls to a new version of functions
and store new versions of functions in it as well.

Thanks again,
Revital



From:   Lang Hames <lhames at gmail.com>
To:     Revital1 Eres/Haifa/IBM at IBMIL
Cc:     LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:   28/07/2015 05:58 AM
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

What do you mean by "code cache"? Orc (and MCJIT) does have the
concept of
an ObjectCache, which is a long-lived, potentially persistent, compiled 
version of some IR. It's not a key component of the JIT though: Most 
clients run without a cache attached and just JIT their code from scratch 
in each session.

Recompilation is orthogonal to caching. There is no in-tree support for 
recompilation yet. There are several ways that it could be supported, 
depending on what security / performance trade-offs you're willing to 
make, and how deep in to the LLVM code you want to get. As things stand at 
the moment all function calls in the lazy JIT are indirected via function 
pointers. We want to add support for patchable call-sites, but this hasn't 
been implemented yet. The Indirect calls make recompilation reasonably 
easy: You could add a transform layer on top of the CompileCallbackLayer 
which would modify each function like this:

void foo$impl() {          void foo$impl() {
  // foo body        ->      if (trigger_condition) {
}                              auto fooOpt = jit_recompile_hot(&foo);
                               fooOpt();
                             }
                             // foo body
                           }

You would implement the jit_recompile_hot function yourself in your JIT 
and make it available to JIT'd code via the SymbolResolver. When the 
trigger condition is met you'll get a call to recompile foo, at which 
point you: (1) Add the IR for foo to a 2nd IRCompileLayer that has been 
configured with a higher optimization level, (2) look up the address of 
the optimized version of foo, and (3) update the function pointer for foo 
to point at the optimized version. The process for patchable callsites 
should be fairly similar once they're available, except that you'll 
trigger a call-site update rather than rewriting a function pointer.

This neglects all sorts of fun details (threading, garbage collection of 
old function implementations), but hopefully it gives you a place to 
start. 


Regarding laziness, as Hal mentioned you'll have to provide some target 
support for PowerPC to support lazy compilation. For a rough guide you can 
check out the X86_64 support code in 
llvm/include/llvm/ExecutionEngine/Orc/OrcTargetSupport.h and 
llvm/lib/ExecutionEngine/Orc/OrcTargetSupport.cpp.

There are two methods that you'll need to implement: 
insertCompileCallbackTrampoline and insertResolverBlock. These work 
together to enable lazy compilation. Both of these methods inject blobs of 
target specific code in to the JIT process. To do this (at least for now) 
I make use of a handy feature of LLVM IR: You can write raw assembly code 
directly into a bitcode module ("module-level asm"). If you look at
the
X86 implementation of each of these methods you'll see they're written
in
terms of string-streams building up a string of assembly which will be 
handed off to the JIT to compile like any other code.

The first blob that you need to be able to output is the resolver block. 
The purpose of the resolver block is to save program state and call back 
in to the JIT to trigger lazy compilation of a function. When the JIT is 
done compiling the function it returns the address of the compiled 
function to the resolver block, and the resolver block returns to the 
compiled function (rather than its original return address).

Because all functions share the same resolver block, the JIT needs some 
way to distinguish them, which is where the trampolines come in. The JIT 
emits one trampoline per function and each trampoline just calls the 
resolver block. The return address of the call in each trampoline provides 
the unique address that the JIT associates with the to-be-compiled 
functions. The CompileCallbackManager manages this association between 
trampolines and functions for you, you just need to provide the 
resolver/trampoline primitives.

In case it helps, here's what the output of all this looks like on X86. 
Trampolines are trivial - they're emitted in blocks and proceeded by a 
pointer to the resolver block:

module asm "Lorc_resolve_block_addr:"
module asm "  .quad 140439143575560"
module asm "orc_jcc_0:"
module asm "  callq *Lorc_resolve_block_addr(%rip)"
module asm "orc_jcc_1:"
module asm "  callq *Lorc_resolve_block_addr(%rip)"
module asm "orc_jcc_2:"
module asm "  callq *Lorc_resolve_block_addr(%rip)"
...


The resolver block is more complicated and I won't provide the full code 
for it here. You can find it by running:

lli -jit-kind=orc-lazy -orc-lazy-debug=mods-to-stderr <hello_world.ll>

and looking at the initial output. In pseudo-asm though, it looks like 
this:

module asm "jit_callback_manager_addr:"
module asm "  .quad 0x46fc190" // <- address of callback manager
object
module asm "orc_resolver_block:"
module asm "  // save register state."
module asm "  // load jit_callback_manager_addr into %rdi
module asm "  // load the return address (from the trampoline call) into 
%rsi
module asm "  // %rax = call jit(%rdi, %rsi)
module asm "  // save %rax over the return address
module asm "  //  restore register state
module asm "  //  retq"

So, that's a whirlwind intro to implementing lazy JITing support for a new 
architecture in Orc. I'll try to answer any questions you have on the 
topic, though I'm not familiar with PowerPC at all. If you're
comfortable
with PowerPC assembly I think it should be possible to implement once you 
grok the concepts.

Hope this helps!

Cheers,
Lang.


On Jul 26, 2015, at 11:17 PM, Revital1 Eres <ERES at il.ibm.com> wrote:

Hi Again, 

I'm a little confused regarding what is the exact Orc's functions I
should
use 
in order to save the functions code in a code cache so it could be later 
replaced with different versions of it and I appreciate your help. 

Just a reminder I want to dynamically recompile the program based on 
profile 
 collected at the run-time. I would like to start executing the program 
from 
the code-cache and at some point be able to replace a function body with 
it's 
new compiled version; this can be done by replacing the entry in the 
function 
 code with a trampoline to It's new version so that future calls to it 
will 
call the new version code. 

Does the CompileOnDemandLayer executes the program from a code cache 
and holds pointers to the code of the functions it executes? I am 
compiling for Power machine. 
Is there a target specific pieces that I should implement for making Orc 
work on Power? 

Thanks again, 
Revital 




From:        Lang Hames <lhames at gmail.com> 
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> 
Date:        20/07/2015 08:41 PM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time 



Hi Revital, 

The CompileOnDemand layer is used by the lazy bitcode JIT in the lli tool. 
You can find the code in llvm/tools/lli/OrcLazyJIT.* . 

Cheers, 
Lang. 


On Mon, Jul 20, 2015 at 2:32 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hello Lang, 

Thanks for your answer. 

I am now looking for an example of the usage of CompileOnDemandLayer. Is 
there an example available for that (could not find one in llvm/examples)? 


Thanks, 
Revital 



From:        Lang Hames <lhames at gmail.com> 
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> 
Date:        10/07/2015 12:10 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time 



Hi Revital, 

LLVM does have an IR interpreter, but I don't think it's maintained well
(or possibly at all). The interpreter is also not designed to interact 
with the LLVM JITs. 

We generally encourage people to just JIT LLVM IR, rather than 
interpreting it. For the use-case you have described, you could JIT IR 
with no optimizations to begin with, then re-JIT hot functions at a higher 
level. 

The Orc JIT APIs (LLVM's newer JIT APIs) were written with this kind of 
use-case in mind, and are probably a better fit for this than MCJIT. There 
is no built-in hot-function detection or recompilation yet, but I think 
this would be *fairly* easy to write in terms of Orc's callback API. 

Cheers, 
Lang. 


On Thu, Jul 9, 2015 at 4:19 AM, Revital1 Eres <ERES at il.ibm.com> wrote: 
Hello, 

I am new to LLVM and a I appreciate your help with the following: 

I want to run the LLVM IR through virtual machine (LLVM interpreter?) and 
jit 
compile the hot functions (using MCJIT). 

This task will require amongst other identifying the hot functions and 
having a 
code cache that should be patched with the native code of the functions 
after 
they are jitted. 

I've read so far about MCJIT and lli however I have not seen that the LLVM 

interpreter can be used as a VM the way I was looking for; meaning 
execute the code one instruction at a time; have a profiling mode to 
identify hot functions and call jit to compile the hot functions. 

I appreciate any advice/starting points for this project. 

Thanks, 
Revital 

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev






-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150728/7b4605f2/attachment.html>

Revital1 Eres via llvm-dev

2015-Sep-08 07:36 UTC

head link

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

Hi Lang,

Apologies if you receive multiple copies of this email.

After spending some time debugging Kaleidoscope orc fully_lazy toy example 
on
x86 I want to start implementing run-time optimizer as you suggested and 
again
I highly appreciate your help.
For now I'll defer the target specific implementation to the end after 
I'll have
the non target parts in place as I can run on x86 as a start.
Given a simple example of main function calling foo and bar functions;
IIUC I should start from the IR level of this module which means that
ParseIRFile will be be first called on the IR of the program, is that 
right?

I would like to make sure I understand your suggestion which is to insert 
a new
layer that should be implemented on top of the CompileCallbackLayer in 
order to
be able to call trigger_condition at the beginning of a function.
IIUC until the function (bar or foo) is optimized the call to foo and bar 
will
go through the resolver (foo and bar will not be compiled from scratch 
every
time we go through the resolver but rather execute the cached non 
optimized
version after first compiled). The resolver will check trigger_condition
to see if the cached non optimized version should be executed or a new
optimizied version should be compiled and executed.
After the trigger_condition is true foo and bar will be compiled to 
generate
their optimized version and this version will be executed directly from 
now on
(not going through the resolver any more). Is that right?
Does this layer on top of the CompileCallbackLayer should be similar to
class KaleidoscopeJIT?
I saw that in Kaleidoscope Orc's example the Lambda functions that are 
added in
createLambdaResolver are been executed by the resolver before compiling a 
call
so I assume that the trigger_condition should be added also by
createLambdaResolver so before compiling foo or bar the Lambda functions
that are added by calling createLambdaResolver and contain 
trigger_condition
will be executed, is that right?

IIUC in Kaleidoscope Orc's example the interpreter calls the addModule 
upon
parsing call expression in HandleTopLevelExpression.
In my case I assume addModule be called for the module returned from
ParseIRFile, right?
In this case should calling getAddress on the whole module (the IR of all
functions) will trigger calling the Lambda functions defined in
createLambdaResolver on foo and bar functions? Also - in Kaleidoscope orc
example the execution of the function is done explicitly in
HandleTopLevelExpression after calling getAddress and its not clear to me 
where
I should insert this in my case.

Thanks again,
Revital


Lang Hames <lhames at gmail.com> wrote on 28/07/2015 05:58:41 AM:
> From: Lang Hames <lhames at gmail.com>
> To: Revital1 Eres/Haifa/IBM at IBMIL
> Cc: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
> Date: 28/07/2015 05:58 AM
> Subject: Re: [LLVMdev] Help with using LLVM to re-compile hot 
> functions at run-time
> 
> Hi Revital,
> 
> What do you mean by "code cache"? Orc (and MCJIT) does have the 
> concept of an ObjectCache, which is a long-lived, potentially 
> persistent, compiled version of some IR. It's not a key component of
> the JIT though: Most clients run without a cache attached and just 
> JIT their code from scratch in each session.
> 
> Recompilation is orthogonal to caching. There is no in-tree support 
> for recompilation yet. There are several ways that it could be 
> supported, depending on what security / performance trade-offs 
> you're willing to make, and how deep in to the LLVM code you want to
> get. As things stand at the moment all function calls in the lazy 
> JIT are indirected via function pointers. We want to add support for
> patchable call-sites, but this hasn't been implemented yet. The 
> Indirect calls make recompilation reasonably easy: You could add a 
> transform layer on top of the CompileCallbackLayer which would 
> modify each function like this:
> 
> void foo$impl() {          void foo$impl() {
>   // foo body        ->      if (trigger_condition) {
> }                              auto fooOpt = jit_recompile_hot(&foo);
>                                fooOpt();
>                              }
>                              // foo body
>                            }
> 
> You would implement the jit_recompile_hot function yourself in your 
> JIT and make it available to JIT'd code via the SymbolResolver. When
> the trigger condition is met you'll get a call to recompile foo, at 
> which point you: (1) Add the IR for foo to a 2nd IRCompileLayer that
> has been configured with a higher optimization level, (2) look up 
> the address of the optimized version of foo, and (3) update the 
> function pointer for foo to point at the optimized version. The 
> process for patchable callsites should be fairly similar once 
> they're available, except that you'll trigger a call-site update 
> rather than rewriting a function pointer.
> 
> This neglects all sorts of fun details (threading, garbage 
> collection of old function implementations), but hopefully it gives 
> you a place to start. 
> 
> Regarding laziness, as Hal mentioned you'll have to provide some 
> target support for PowerPC to support lazy compilation. For a rough 
> guide you can check out the X86_64 support code in llvm/include/
> llvm/ExecutionEngine/Orc/OrcTargetSupport.h and llvm/lib/
> ExecutionEngine/Orc/OrcTargetSupport.cpp.
> 
> There are two methods that you'll need to implement: 
> insertCompileCallbackTrampoline and insertResolverBlock. These work 
> together to enable lazy compilation. Both of these methods inject 
> blobs of target specific code in to the JIT process. To do this (at 
> least for now) I make use of a handy feature of LLVM IR: You can 
> write raw assembly code directly into a bitcode module ("module-
> level asm"). If you look at the X86 implementation of each of these 
> methods you'll see they're written in terms of string-streams 
> building up a string of assembly which will be handed off to the JIT
> to compile like any other code.
> 
> The first blob that you need to be able to output is the resolver 
> block. The purpose of the resolver block is to save program state 
> and call back in to the JIT to trigger lazy compilation of a 
> function. When the JIT is done compiling the function it returns the
> address of the compiled function to the resolver block, and the 
> resolver block returns to the compiled function (rather than its 
> original return address).
> 
> Because all functions share the same resolver block, the JIT needs 
> some way to distinguish them, which is where the trampolines come 
> in. The JIT emits one trampoline per function and each trampoline 
> just calls the resolver block. The return address of the call in 
> each trampoline provides the unique address that the JIT associates 
> with the to-be-compiled functions. The CompileCallbackManager 
> manages this association between trampolines and functions for you, 
> you just need to provide the resolver/trampoline primitives.
> 
> In case it helps, here's what the output of all this looks like on 
> X86. Trampolines are trivial - they're emitted in blocks and 
> proceeded by a pointer to the resolver block:
> 
> module asm "Lorc_resolve_block_addr:"
> module asm "  .quad 140439143575560"
> module asm "orc_jcc_0:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> module asm "orc_jcc_1:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> module asm "orc_jcc_2:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> ...
> 
> The resolver block is more complicated and I won't provide the full 
> code for it here. You can find it by running:
> 
> lli -jit-kind=orc-lazy -orc-lazy-debug=mods-to-stderr
<hello_world.ll>
> 
> and looking at the initial output. In pseudo-asm though, it looks like 
this:> 
> module asm "jit_callback_manager_addr:"
> module asm "  .quad 0x46fc190" // <- address of callback
manager object
> module asm "orc_resolver_block:"
> module asm "  // save register state."
> module asm "  // load jit_callback_manager_addr into %rdi
> module asm "  // load the return address (from the trampoline call)
into
%rsi> module asm "  // %rax = call jit(%rdi, %rsi)
> module asm "  // save %rax over the return address
> module asm "  //  restore register state
> module asm "  //  retq"
> 
> So, that's a whirlwind intro to implementing lazy JITing support for
> a new architecture in Orc. I'll try to answer any questions you have
> on the topic, though I'm not familiar with PowerPC at all. If
you're
> comfortable with PowerPC assembly I think it should be possible to 
> implement once you grok the concepts.
> 
> Hope this helps!
> 
> Cheers,
> Lang.
> 
> On Jul 26, 2015, at 11:17 PM, Revital1 Eres <ERES at il.ibm.com>
wrote:-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150908/7be104e7/attachment-0001.html>

Lang Hames via llvm-dev

2015-Sep-18 06:47 UTC

head link

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

Hi Revital,

Attached is a new version of the fully_lazy Orc Kaleidoscope demo that has
been extended to enable re-compilation at higher optimisation levels,
roughly following the scheme I outlined before.

In the compile action for the callback, the initial IR for each is
transformed like this:


                           unsigned foo_counter = 0;
void foo$impl() {          void foo$impl() {
  // foo body        ->      if (++foo_counter > 1000) {
}                              auto fooOpt = $recompile(&foo);
                               fooOpt();
                             }
                             // foo body
                           }

The key changes to make this work (which you can see by diff'ing toy.cpp
against the original fully_lazy version):

1) New layers HotCompileLayer and HotIROptsLayer added. These perform IR
optimisation and code generation at higher optimisation levels than the
default layers.
2) The symbol resolver function (not to be confused with the resolver
block) has been pulled out into its own function, createResolver, so that
it can be shared between optimised & non-optimized code. It also resolves
the "$recompile" function to a static method on the KaleidoscopeJIT
class
itself.
3) The lazy compile action now calls 'instrumentFunctions' before adding
the IR for cold functions to the JIT.
4) The instrumentFunctions method injects the counter code and call to
recompile.
5) The recompileHot method re-IRGens functions, then adds them to the
HotIROpts layer to generate more optimized versions. It then updates the
function-body pointer so that subsequent calls go to the optimised version.

This is a bit quick-and-dirty, but does work. In the future I'll try to
tidy this up and turn it into a new tutorial chapter.

Hope this helps!

Cheers,
Lang.




On Wed, Sep 16, 2015 at 10:09 PM, Revital1 Eres <ERES at il.ibm.com>
wrote:
> Hi Lang,
>
> Many thanks!!! I just wanted to make sure you did not miss it...
>
> Thanks again!
> Revital
>
>
>
> From:        Lang Hames <lhames at gmail.com>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
> Date:        17/09/2015 01:56 AM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> Apologies for the delayed reply.
>
> I'm working on some example code for how to do this. I'll try to
post it
> tomorrow.
>
> Cheers,
> Lang.
>
> On Tue, Sep 8, 2015 at 12:23 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hi Lang,
>
> After spending some time debugging Kaleidoscope orc fully_lazy toy example
> on
> x86 I want to start implementing run-time optimizer as you suggested and
> again
> I highly appreciate your help.
> For now I'll defer the target specific implementation to the end after
> I'll have
> the non target parts in place as I can run on x86 as a start.
> Given a simple example of main function calling foo and bar functions;
> IIUC I should start from the IR level of this module which means that
> ParseIRFile will be be first called on the IR of the program, is that
> right?
>
> I would like to make sure I understand your suggestion which is to insert
> a new
> layer that should be implemented on top of the CompileCallbackLayer in
> order to
> be able to call trigger_condition at the beginning of a function.
> IIUC until the function (bar or foo) is optimized the call to foo and bar
> will
> go through the resolver (foo and bar will not be compiled from scratch
> every
> time we go through the resolver but rather execute the cached non
> optimized
> version after first compiled). The resolver will check trigger_condition
> to see if the cached non optimized version should be executed or a new
> optimizied version should be compiled and executed.
> After the trigger_condition is true foo and bar will be compiled to
> generate
> their optimized version and this version will be executed directly from
> now on
> (not going through the resolver any more). Is that right?
> Does this layer on top of the CompileCallbackLayer should be similar to
> class KaleidoscopeJIT?
> I saw that in Kaleidoscope Orc's example the Lambda functions that are
> added in
> createLambdaResolver are been executed by the resolver before compiling a
> call
> so I assume that the trigger_condition should be added also by
> createLambdaResolver so before compiling foo or bar the Lambda functions
> that are added by calling createLambdaResolver and contain
> trigger_condition
> will be executed, is that right?
>
> IIUC in Kaleidoscope Orc's example the interpreter calls the addModule
> upon
> parsing call expression in HandleTopLevelExpression.
> In my case I assume addModule be called for the module returned from
> ParseIRFile, right?
> In this case should calling getAddress on the whole module (the IR of all
> functions) will trigger calling the Lambda functions defined in
> createLambdaResolver on foo and bar functions? Also - in Kaleidoscope orc
> example the execution of the function is done explicitly in
> HandleTopLevelExpression after calling getAddress and its not clear to me
> where
> I should insert this in my case.
>
> Thanks again,
> Revital
>
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        28/07/2015 05:58 AM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> What do you mean by "code cache"? Orc (and MCJIT) does have the
concept of
> an ObjectCache, which is a long-lived, potentially persistent, compiled
> version of some IR. It's not a key component of the JIT though: Most
> clients run without a cache attached and just JIT their code from scratch
> in each session.
>
> Recompilation is orthogonal to caching. There is no in-tree support for
> recompilation yet. There are several ways that it could be supported,
> depending on what security / performance trade-offs you're willing to
make,
> and how deep in to the LLVM code you want to get. As things stand at the
> moment all function calls in the lazy JIT are indirected via function
> pointers. We want to add support for patchable call-sites, but this
hasn't
> been implemented yet. The Indirect calls make recompilation reasonably
> easy: You could add a transform layer on top of the CompileCallbackLayer
> which would modify each function like this:
>
> void foo$impl() {          void foo$impl() {
>   // foo body        ->      if (trigger_condition) {
> }                              auto fooOpt = jit_recompile_hot(&foo);
>                                fooOpt();
>                              }
>                              // foo body
>                            }
>
> You would implement the jit_recompile_hot function yourself in your JIT
> and make it available to JIT'd code via the SymbolResolver. When the
> trigger condition is met you'll get a call to recompile foo, at which
point
> you: (1) Add the IR for foo to a 2nd IRCompileLayer that has been
> configured with a higher optimization level, (2) look up the address of the
> optimized version of foo, and (3) update the function pointer for foo to
> point at the optimized version. The process for patchable callsites should
> be fairly similar once they're available, except that you'll
trigger a
> call-site update rather than rewriting a function pointer.
>
> This neglects all sorts of fun details (threading, garbage collection of
> old function implementations), but hopefully it gives you a place to
> start.
>
>
> Regarding laziness, as Hal mentioned you'll have to provide some target
> support for PowerPC to support lazy compilation. For a rough guide you can
> check out the X86_64 support code in
> llvm/include/llvm/ExecutionEngine/Orc/OrcTargetSupport.h and
> llvm/lib/ExecutionEngine/Orc/OrcTargetSupport.cpp.
>
> There are two methods that you'll need to implement:
> insertCompileCallbackTrampoline and insertResolverBlock. These work
> together to enable lazy compilation. Both of these methods inject blobs of
> target specific code in to the JIT process. To do this (at least for now) I
> make use of a handy feature of LLVM IR: You can write raw assembly code
> directly into a bitcode module ("module-level asm"). If you look
at the X86
> implementation of each of these methods you'll see they're written
in terms
> of string-streams building up a string of assembly which will be handed off
> to the JIT to compile like any other code.
>
> The first blob that you need to be able to output is the resolver block.
> The purpose of the resolver block is to save program state and call back in
> to the JIT to trigger lazy compilation of a function. When the JIT is done
> compiling the function it returns the address of the compiled function to
> the resolver block, and the resolver block returns to the compiled function
> (rather than its original return address).
>
> Because all functions share the same resolver block, the JIT needs some
> way to distinguish them, which is where the trampolines come in. The JIT
> emits one trampoline per function and each trampoline just calls the
> resolver block. The return address of the call in each trampoline provides
> the unique address that the JIT associates with the to-be-compiled
> functions. The CompileCallbackManager manages this association between
> trampolines and functions for you, you just need to provide the
> resolver/trampoline primitives.
>
> In case it helps, here's what the output of all this looks like on X86.
> Trampolines are trivial - they're emitted in blocks and proceeded by a
> pointer to the resolver block:
>
> module asm "Lorc_resolve_block_addr:"
> module asm "  .quad 140439143575560"
> module asm "orc_jcc_0:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> module asm "orc_jcc_1:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> module asm "orc_jcc_2:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> ...
>
>
> The resolver block is more complicated and I won't provide the full
code
> for it here. You can find it by running:
> lli -jit-kind=orc-lazy -orc-lazy-debug=mods-to-stderr
<hello_world.ll>
>
>
>
> and looking at the initial output. In pseudo-asm though, it looks like
> this:
>
> module asm "jit_callback_manager_addr:"
> module asm "  .quad 0x46fc190" // <- address of callback
manager object
> module asm "orc_resolver_block:"
> module asm "  // save register state."
> module asm "  // load jit_callback_manager_addr into %rdi
> module asm "  // load the return address (from the trampoline call)
into
> %rsi
> module asm "  // %rax = call jit(%rdi, %rsi)
> module asm "  // save %rax over the return address
> module asm "  //  restore register state
> module asm "  //  retq"
>
> So, that's a whirlwind intro to implementing lazy JITing support for a
new
> architecture in Orc. I'll try to answer any questions you have on the
> topic, though I'm not familiar with PowerPC at all. If you're
comfortable
> with PowerPC assembly I think it should be possible to implement once you
> grok the concepts.
>
> Hope this helps!
>
> Cheers,
> Lang.
>
>
> On Jul 26, 2015, at 11:17 PM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
>
> Hi Again,
>
> I'm a little confused regarding what is the exact Orc's functions I
should
> use
> in order to save the functions code in a code cache so it could be later
> replaced with different versions of it and I appreciate your help.
>
> Just a reminder I want to dynamically recompile the program based on
> profile
>  collected at the run-time. I would like to start executing the program
> from
> the code-cache and at some point be able to replace a function body with
> it's
> new compiled version; this can be done by replacing the entry in the
> function
>  code with a trampoline to It's new version so that future calls to it
will
> call the new version code.
>
> Does the CompileOnDemandLayer executes the program from a code cache
> and holds pointers to the code of the functions it executes? I am
> compiling for Power machine.
> Is there a target specific pieces that I should implement for making Orc
> work on Power?
>
> Thanks again,
> Revital
>
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        20/07/2015 08:41 PM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> The CompileOnDemand layer is used by the lazy bitcode JIT in the lli tool.
> You can find the code in llvm/tools/lli/OrcLazyJIT.* .
>
> Cheers,
> Lang.
>
>
> On Mon, Jul 20, 2015 at 2:32 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hello Lang,
>
> Thanks for your answer.
>
> I am now looking for an example of the usage of CompileOnDemandLayer. Is
> there an example available for that (could not find one in llvm/examples)?
>
> Thanks,
> Revital
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        10/07/2015 12:10 AM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> LLVM does have an IR interpreter, but I don't think it's maintained
well
> (or possibly at all). The interpreter is also not designed to interact with
> the LLVM JITs.
>
> We generally encourage people to just JIT LLVM IR, rather than
> interpreting it. For the use-case you have described, you could JIT IR with
> no optimizations to begin with, then re-JIT hot functions at a higher
> level.
>
> The Orc JIT APIs (LLVM's newer JIT APIs) were written with this kind of
> use-case in mind, and are probably a better fit for this than MCJIT. There
> is no built-in hot-function detection or recompilation yet, but I think
> this would be *fairly* easy to write in terms of Orc's callback API.
>
> Cheers,
> Lang.
>
>
> On Thu, Jul 9, 2015 at 4:19 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hello,
>
> I am new to LLVM and a I appreciate your help with the following:
>
> I want to run the LLVM IR through virtual machine (LLVM interpreter?) and
> jit
> compile the hot functions (using MCJIT).
>
> This task will require amongst other identifying the hot functions and
> having a
> code cache that should be patched with the native code of the functions
> after
> they are jitted.
>
> I've read so far about MCJIT and lli however I have not seen that the
LLVM
> interpreter can be used as a VM the way I was looking for; meaning
> execute the code one instruction at a time; have a profiling mode to
> identify hot functions and call jit to compile the hot functions.
>
> I appreciate any advice/starting points for this project.
>
> Thanks,
> Revital
>
> _______________________________________________
> LLVM Developers mailing list
> *LLVMdev at cs.uiuc.edu* <LLVMdev at cs.uiuc.edu>
> *http://llvm.cs.uiuc.edu* <http://llvm.cs.uiuc.edu/>
> *http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev*
> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>
>
>
>
>
>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150917/63f9f733/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fully_lazy_with_recompile.tgz
Type: application/x-gzip
Size: 27632 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150917/63f9f733/attachment-0001.bin>

Revital1 Eres via llvm-dev

2015-Sep-21 17:05 UTC

head link

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

Hello Lang,

Thanks very much for the implementation!!!
I will take a closer look at the code as soon as I can.

Thanks again,
Revital



From:   Lang Hames <lhames at gmail.com>
To:     Revital1 Eres/Haifa/IBM at IBMIL, LLVM Developers Mailing List 
<llvm-dev at lists.llvm.org>
Date:   18/09/2015 09:47 AM
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

Attached is a new version of the fully_lazy Orc Kaleidoscope demo that has 
been extended to enable re-compilation at higher optimisation levels, 
roughly following the scheme I outlined before.

In the compile action for the callback, the initial IR for each is 
transformed like this:


                           unsigned foo_counter = 0;
void foo$impl() {          void foo$impl() { 
  // foo body        ->      if (++foo_counter > 1000) { 
}                              auto fooOpt = $recompile(&foo); 
                               fooOpt(); 
                             } 
                             // foo body 
                           }

The key changes to make this work (which you can see by diff'ing toy.cpp 
against the original fully_lazy version):

1) New layers HotCompileLayer and HotIROptsLayer added. These perform IR 
optimisation and code generation at higher optimisation levels than the 
default layers.
2) The symbol resolver function (not to be confused with the resolver 
block) has been pulled out into its own function, createResolver, so that 
it can be shared between optimised & non-optimized code. It also resolves 
the "$recompile" function to a static method on the KaleidoscopeJIT
class
itself.
3) The lazy compile action now calls 'instrumentFunctions' before adding
the IR for cold functions to the JIT.
4) The instrumentFunctions method injects the counter code and call to 
recompile.
5) The recompileHot method re-IRGens functions, then adds them to the 
HotIROpts layer to generate more optimized versions. It then updates the 
function-body pointer so that subsequent calls go to the optimised 
version.
 
This is a bit quick-and-dirty, but does work. In the future I'll try to 
tidy this up and turn it into a new tutorial chapter.

Hope this helps!

Cheers,
Lang.




On Wed, Sep 16, 2015 at 10:09 PM, Revital1 Eres <ERES at il.ibm.com>
wrote:
Hi Lang, 

Many thanks!!! I just wanted to make sure you did not miss it... 

Thanks again! 
Revital 



From:        Lang Hames <lhames at gmail.com> 
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> 
Date:        17/09/2015 01:56 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time 



Hi Revital, 

Apologies for the delayed reply. 

I'm working on some example code for how to do this. I'll try to post it
tomorrow. 

Cheers, 
Lang. 

On Tue, Sep 8, 2015 at 12:23 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hi Lang, 

After spending some time debugging Kaleidoscope orc fully_lazy toy example 
on 
x86 I want to start implementing run-time optimizer as you suggested and 
again 
I highly appreciate your help. 
For now I'll defer the target specific implementation to the end after 
I'll have 
the non target parts in place as I can run on x86 as a start. 
Given a simple example of main function calling foo and bar functions; 
IIUC I should start from the IR level of this module which means that 
ParseIRFile will be be first called on the IR of the program, is that 
right? 

I would like to make sure I understand your suggestion which is to insert 
a new 
layer that should be implemented on top of the CompileCallbackLayer in 
order to 
be able to call trigger_condition at the beginning of a function. 
IIUC until the function (bar or foo) is optimized the call to foo and bar 
will 
go through the resolver (foo and bar will not be compiled from scratch 
every 
time we go through the resolver but rather execute the cached non 
optimized 
version after first compiled). The resolver will check trigger_condition 
to see if the cached non optimized version should be executed or a new 
optimizied version should be compiled and executed. 
After the trigger_condition is true foo and bar will be compiled to 
generate 
their optimized version and this version will be executed directly from 
now on 
(not going through the resolver any more). Is that right? 
Does this layer on top of the CompileCallbackLayer should be similar to 
class KaleidoscopeJIT? 
I saw that in Kaleidoscope Orc's example the Lambda functions that are 
added in 
createLambdaResolver are been executed by the resolver before compiling a 
call 
so I assume that the trigger_condition should be added also by 
createLambdaResolver so before compiling foo or bar the Lambda functions 
that are added by calling createLambdaResolver and contain 
trigger_condition 
will be executed, is that right? 

IIUC in Kaleidoscope Orc's example the interpreter calls the addModule 
upon 
parsing call expression in HandleTopLevelExpression. 
In my case I assume addModule be called for the module returned from 
ParseIRFile, right? 
In this case should calling getAddress on the whole module (the IR of all 
functions) will trigger calling the Lambda functions defined in 
createLambdaResolver on foo and bar functions? Also - in Kaleidoscope orc 
example the execution of the function is done explicitly in 
HandleTopLevelExpression after calling getAddress and its not clear to me 
where 
I should insert this in my case. 

Thanks again, 
Revital 




From:        Lang Hames <lhames at gmail.com> 
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> 
Date:        28/07/2015 05:58 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time 



Hi Revital, 

What do you mean by "code cache"? Orc (and MCJIT) does have the
concept of
an ObjectCache, which is a long-lived, potentially persistent, compiled 
version of some IR. It's not a key component of the JIT though: Most 
clients run without a cache attached and just JIT their code from scratch 
in each session. 

Recompilation is orthogonal to caching. There is no in-tree support for 
recompilation yet. There are several ways that it could be supported, 
depending on what security / performance trade-offs you're willing to 
make, and how deep in to the LLVM code you want to get. As things stand at 
the moment all function calls in the lazy JIT are indirected via function 
pointers. We want to add support for patchable call-sites, but this hasn't 
been implemented yet. The Indirect calls make recompilation reasonably 
easy: You could add a transform layer on top of the CompileCallbackLayer 
which would modify each function like this: 

void foo$impl() {          void foo$impl() { 
  // foo body        ->      if (trigger_condition) { 
}                              auto fooOpt = jit_recompile_hot(&foo); 
                               fooOpt(); 
                             } 
                             // foo body 
                           } 

You would implement the jit_recompile_hot function yourself in your JIT 
and make it available to JIT'd code via the SymbolResolver. When the 
trigger condition is met you'll get a call to recompile foo, at which 
point you: (1) Add the IR for foo to a 2nd IRCompileLayer that has been 
configured with a higher optimization level, (2) look up the address of 
the optimized version of foo, and (3) update the function pointer for foo 
to point at the optimized version. The process for patchable callsites 
should be fairly similar once they're available, except that you'll 
trigger a call-site update rather than rewriting a function pointer. 

This neglects all sorts of fun details (threading, garbage collection of 
old function implementations), but hopefully it gives you a place to 
start.  


Regarding laziness, as Hal mentioned you'll have to provide some target 
support for PowerPC to support lazy compilation. For a rough guide you can 
check out the X86_64 support code in 
llvm/include/llvm/ExecutionEngine/Orc/OrcTargetSupport.h and 
llvm/lib/ExecutionEngine/Orc/OrcTargetSupport.cpp. 

There are two methods that you'll need to implement: 
insertCompileCallbackTrampoline and insertResolverBlock. These work 
together to enable lazy compilation. Both of these methods inject blobs of 
target specific code in to the JIT process. To do this (at least for now) 
I make use of a handy feature of LLVM IR: You can write raw assembly code 
directly into a bitcode module ("module-level asm"). If you look at
the
X86 implementation of each of these methods you'll see they're written
in
terms of string-streams building up a string of assembly which will be 
handed off to the JIT to compile like any other code. 

The first blob that you need to be able to output is the resolver block. 
The purpose of the resolver block is to save program state and call back 
in to the JIT to trigger lazy compilation of a function. When the JIT is 
done compiling the function it returns the address of the compiled 
function to the resolver block, and the resolver block returns to the 
compiled function (rather than its original return address). 

Because all functions share the same resolver block, the JIT needs some 
way to distinguish them, which is where the trampolines come in. The JIT 
emits one trampoline per function and each trampoline just calls the 
resolver block. The return address of the call in each trampoline provides 
the unique address that the JIT associates with the to-be-compiled 
functions. The CompileCallbackManager manages this association between 
trampolines and functions for you, you just need to provide the 
resolver/trampoline primitives. 

In case it helps, here's what the output of all this looks like on X86. 
Trampolines are trivial - they're emitted in blocks and proceeded by a 
pointer to the resolver block: 

module asm "Lorc_resolve_block_addr:" 
module asm "  .quad 140439143575560" 
module asm "orc_jcc_0:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)" 
module asm "orc_jcc_1:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)" 
module asm "orc_jcc_2:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)" 
... 


The resolver block is more complicated and I won't provide the full code 
for it here. You can find it by running: 
lli -jit-kind=orc-lazy -orc-lazy-debug=mods-to-stderr <hello_world.ll>



and looking at the initial output. In pseudo-asm though, it looks like 
this: 

module asm "jit_callback_manager_addr:" 
module asm "  .quad 0x46fc190" // <- address of callback manager
object
module asm "orc_resolver_block:" 
module asm "  // save register state." 
module asm "  // load jit_callback_manager_addr into %rdi 
module asm "  // load the return address (from the trampoline call) into 
%rsi 
module asm "  // %rax = call jit(%rdi, %rsi) 
module asm "  // save %rax over the return address 
module asm "  //  restore register state 
module asm "  //  retq" 

So, that's a whirlwind intro to implementing lazy JITing support for a new 
architecture in Orc. I'll try to answer any questions you have on the 
topic, though I'm not familiar with PowerPC at all. If you're
comfortable
with PowerPC assembly I think it should be possible to implement once you 
grok the concepts. 

Hope this helps! 

Cheers, 
Lang. 


On Jul 26, 2015, at 11:17 PM, Revital1 Eres <ERES at il.ibm.com> wrote:

Hi Again, 

I'm a little confused regarding what is the exact Orc's functions I
should
use 
in order to save the functions code in a code cache so it could be later 
replaced with different versions of it and I appreciate your help. 

Just a reminder I want to dynamically recompile the program based on 
profile 
 collected at the run-time. I would like to start executing the program 
from 
the code-cache and at some point be able to replace a function body with 
it's 
new compiled version; this can be done by replacing the entry in the 
function 
 code with a trampoline to It's new version so that future calls to it 
will 
call the new version code. 

Does the CompileOnDemandLayer executes the program from a code cache 
and holds pointers to the code of the functions it executes? I am 
compiling for Power machine. 
Is there a target specific pieces that I should implement for making Orc 
work on Power? 

Thanks again, 
Revital 




From:        Lang Hames <lhames at gmail.com> 
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> 
Date:        20/07/2015 08:41 PM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time 



Hi Revital, 

The CompileOnDemand layer is used by the lazy bitcode JIT in the lli tool. 
You can find the code in llvm/tools/lli/OrcLazyJIT.* . 

Cheers, 
Lang. 


On Mon, Jul 20, 2015 at 2:32 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hello Lang, 

Thanks for your answer. 

I am now looking for an example of the usage of CompileOnDemandLayer. Is 
there an example available for that (could not find one in llvm/examples)? 


Thanks, 
Revital 



From:        Lang Hames <lhames at gmail.com> 
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> 
Date:        10/07/2015 12:10 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time 



Hi Revital, 

LLVM does have an IR interpreter, but I don't think it's maintained well
(or possibly at all). The interpreter is also not designed to interact 
with the LLVM JITs. 

We generally encourage people to just JIT LLVM IR, rather than 
interpreting it. For the use-case you have described, you could JIT IR 
with no optimizations to begin with, then re-JIT hot functions at a higher 
level. 

The Orc JIT APIs (LLVM's newer JIT APIs) were written with this kind of 
use-case in mind, and are probably a better fit for this than MCJIT. There 
is no built-in hot-function detection or recompilation yet, but I think 
this would be *fairly* easy to write in terms of Orc's callback API. 

Cheers, 
Lang. 


On Thu, Jul 9, 2015 at 4:19 AM, Revital1 Eres <ERES at il.ibm.com> wrote: 
Hello, 

I am new to LLVM and a I appreciate your help with the following: 

I want to run the LLVM IR through virtual machine (LLVM interpreter?) and 
jit 
compile the hot functions (using MCJIT). 

This task will require amongst other identifying the hot functions and 
having a 
code cache that should be patched with the native code of the functions 
after 
they are jitted. 

I've read so far about MCJIT and lli however I have not seen that the LLVM 

interpreter can be used as a VM the way I was looking for; meaning 
execute the code one instruction at a time; have a profiling mode to 
identify hot functions and call jit to compile the hot functions. 

I appreciate any advice/starting points for this project. 

Thanks, 
Revital 

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev








[attachment "fully_lazy_with_recompile.tgz" deleted by Revital1 
Eres/Haifa/IBM] 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150921/cda56085/attachment.html>

Revital1 Eres via llvm-dev

2015-Oct-20 13:33 UTC

head link

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

Hello Lang,

Thanks again for the new version!
I have a question -- when I try to compile it by creating a new directory 
in llvm/examples/Kaleidoscope/Orc/ similar to 
Fully_lazy dir I get the following error. I appreciate your help to avoid 
this error. (I tried without success to link with LLVMipo)

Thanks again,
Revital

Linking CXX executable 
../../../../bin/Kaleidoscope-Orc-fully_lazy_with_recompile
CMakeFiles/Kaleidoscope-Orc-fully_lazy_with_recompile.dir/toy.cpp.o: In 
function 
`KaleidoscopeJIT::KaleidoscopeJIT(SessionContext&)::{lambda(std::unique_ptr<llvm::Module,
std::default_delete<llvm::Module> >)#1}::operator()(std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >) const':
llvm/examples/Kaleidoscope/Orc/fully_lazy_with_recompile/toy.cpp:1184: 
undefined reference to `llvm::PassManagerBuilder::PassManagerBuilder()'
llvm/examples/Kaleidoscope/Orc/fully_lazy_with_recompile/toy.cpp:1187: 
undefined reference to 
`llvm::PassManagerBuilder::populateFunctionPassManager(llvm::legacy::FunctionPassManager&)'
llvm/examples/Kaleidoscope/Orc/fully_lazy_with_recompile/toy.cpp:1191: 
undefined reference to `llvm::PassManagerBuilder::~PassManagerBuilder()'
collect2: error: ld returned 1 exit status
gmake[3]: *** [bin/Kaleidoscope-Orc-fully_lazy_with_recompile] Error 1
gmake[2]: *** 
[examples/Kaleidoscope/Orc/fully_lazy_with_recompile/CMakeFiles/Kaleidoscope-Orc-fully_lazy_with_recompile.dir/all]
Error 2
gmake[1]: *** [examples/Kaleidoscope/CMakeFiles/Kaleidoscope.dir/rule] 
Error 2





From:   Lang Hames <lhames at gmail.com>
To:     Revital1 Eres/Haifa/IBM at IBMIL, LLVM Developers Mailing List 
<llvm-dev at lists.llvm.org>
Date:   18/09/2015 09:47 AM
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

Attached is a new version of the fully_lazy Orc Kaleidoscope demo that has 
been extended to enable re-compilation at higher optimisation levels, 
roughly following the scheme I outlined before.

In the compile action for the callback, the initial IR for each is 
transformed like this:


                           unsigned foo_counter = 0;
void foo$impl() {          void foo$impl() { 
  // foo body        ->      if (++foo_counter > 1000) { 
}                              auto fooOpt = $recompile(&foo); 
                               fooOpt(); 
                             } 
                             // foo body 
                           }

The key changes to make this work (which you can see by diff'ing toy.cpp 
against the original fully_lazy version):

1) New layers HotCompileLayer and HotIROptsLayer added. These perform IR 
optimisation and code generation at higher optimisation levels than the 
default layers.
2) The symbol resolver function (not to be confused with the resolver 
block) has been pulled out into its own function, createResolver, so that 
it can be shared between optimised & non-optimized code. It also resolves 
the "$recompile" function to a static method on the KaleidoscopeJIT
class
itself.
3) The lazy compile action now calls 'instrumentFunctions' before adding
the IR for cold functions to the JIT.
4) The instrumentFunctions method injects the counter code and call to 
recompile.
5) The recompileHot method re-IRGens functions, then adds them to the 
HotIROpts layer to generate more optimized versions. It then updates the 
function-body pointer so that subsequent calls go to the optimised 
version.
 
This is a bit quick-and-dirty, but does work. In the future I'll try to 
tidy this up and turn it into a new tutorial chapter.

Hope this helps!

Cheers,
Lang.




On Wed, Sep 16, 2015 at 10:09 PM, Revital1 Eres <ERES at il.ibm.com>
wrote:
Hi Lang, 

Many thanks!!! I just wanted to make sure you did not miss it... 

Thanks again! 
Revital 



From:        Lang Hames <lhames at gmail.com> 
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> 
Date:        17/09/2015 01:56 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time 



Hi Revital, 

Apologies for the delayed reply. 

I'm working on some example code for how to do this. I'll try to post it
tomorrow. 

Cheers, 
Lang. 

On Tue, Sep 8, 2015 at 12:23 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hi Lang, 

After spending some time debugging Kaleidoscope orc fully_lazy toy example 
on 
x86 I want to start implementing run-time optimizer as you suggested and 
again 
I highly appreciate your help. 
For now I'll defer the target specific implementation to the end after 
I'll have 
the non target parts in place as I can run on x86 as a start. 
Given a simple example of main function calling foo and bar functions; 
IIUC I should start from the IR level of this module which means that 
ParseIRFile will be be first called on the IR of the program, is that 
right? 

I would like to make sure I understand your suggestion which is to insert 
a new 
layer that should be implemented on top of the CompileCallbackLayer in 
order to 
be able to call trigger_condition at the beginning of a function. 
IIUC until the function (bar or foo) is optimized the call to foo and bar 
will 
go through the resolver (foo and bar will not be compiled from scratch 
every 
time we go through the resolver but rather execute the cached non 
optimized 
version after first compiled). The resolver will check trigger_condition 
to see if the cached non optimized version should be executed or a new 
optimizied version should be compiled and executed. 
After the trigger_condition is true foo and bar will be compiled to 
generate 
their optimized version and this version will be executed directly from 
now on 
(not going through the resolver any more). Is that right? 
Does this layer on top of the CompileCallbackLayer should be similar to 
class KaleidoscopeJIT? 
I saw that in Kaleidoscope Orc's example the Lambda functions that are 
added in 
createLambdaResolver are been executed by the resolver before compiling a 
call 
so I assume that the trigger_condition should be added also by 
createLambdaResolver so before compiling foo or bar the Lambda functions 
that are added by calling createLambdaResolver and contain 
trigger_condition 
will be executed, is that right? 

IIUC in Kaleidoscope Orc's example the interpreter calls the addModule 
upon 
parsing call expression in HandleTopLevelExpression. 
In my case I assume addModule be called for the module returned from 
ParseIRFile, right? 
In this case should calling getAddress on the whole module (the IR of all 
functions) will trigger calling the Lambda functions defined in 
createLambdaResolver on foo and bar functions? Also - in Kaleidoscope orc 
example the execution of the function is done explicitly in 
HandleTopLevelExpression after calling getAddress and its not clear to me 
where 
I should insert this in my case. 

Thanks again, 
Revital 




From:        Lang Hames <lhames at gmail.com> 
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> 
Date:        28/07/2015 05:58 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time 



Hi Revital, 

What do you mean by "code cache"? Orc (and MCJIT) does have the
concept of
an ObjectCache, which is a long-lived, potentially persistent, compiled 
version of some IR. It's not a key component of the JIT though: Most 
clients run without a cache attached and just JIT their code from scratch 
in each session. 

Recompilation is orthogonal to caching. There is no in-tree support for 
recompilation yet. There are several ways that it could be supported, 
depending on what security / performance trade-offs you're willing to 
make, and how deep in to the LLVM code you want to get. As things stand at 
the moment all function calls in the lazy JIT are indirected via function 
pointers. We want to add support for patchable call-sites, but this hasn't 
been implemented yet. The Indirect calls make recompilation reasonably 
easy: You could add a transform layer on top of the CompileCallbackLayer 
which would modify each function like this: 

void foo$impl() {          void foo$impl() { 
  // foo body        ->      if (trigger_condition) { 
}                              auto fooOpt = jit_recompile_hot(&foo); 
                               fooOpt(); 
                             } 
                             // foo body 
                           } 

You would implement the jit_recompile_hot function yourself in your JIT 
and make it available to JIT'd code via the SymbolResolver. When the 
trigger condition is met you'll get a call to recompile foo, at which 
point you: (1) Add the IR for foo to a 2nd IRCompileLayer that has been 
configured with a higher optimization level, (2) look up the address of 
the optimized version of foo, and (3) update the function pointer for foo 
to point at the optimized version. The process for patchable callsites 
should be fairly similar once they're available, except that you'll 
trigger a call-site update rather than rewriting a function pointer. 

This neglects all sorts of fun details (threading, garbage collection of 
old function implementations), but hopefully it gives you a place to 
start.  


Regarding laziness, as Hal mentioned you'll have to provide some target 
support for PowerPC to support lazy compilation. For a rough guide you can 
check out the X86_64 support code in 
llvm/include/llvm/ExecutionEngine/Orc/OrcTargetSupport.h and 
llvm/lib/ExecutionEngine/Orc/OrcTargetSupport.cpp. 

There are two methods that you'll need to implement: 
insertCompileCallbackTrampoline and insertResolverBlock. These work 
together to enable lazy compilation. Both of these methods inject blobs of 
target specific code in to the JIT process. To do this (at least for now) 
I make use of a handy feature of LLVM IR: You can write raw assembly code 
directly into a bitcode module ("module-level asm"). If you look at
the
X86 implementation of each of these methods you'll see they're written
in
terms of string-streams building up a string of assembly which will be 
handed off to the JIT to compile like any other code. 

The first blob that you need to be able to output is the resolver block. 
The purpose of the resolver block is to save program state and call back 
in to the JIT to trigger lazy compilation of a function. When the JIT is 
done compiling the function it returns the address of the compiled 
function to the resolver block, and the resolver block returns to the 
compiled function (rather than its original return address). 

Because all functions share the same resolver block, the JIT needs some 
way to distinguish them, which is where the trampolines come in. The JIT 
emits one trampoline per function and each trampoline just calls the 
resolver block. The return address of the call in each trampoline provides 
the unique address that the JIT associates with the to-be-compiled 
functions. The CompileCallbackManager manages this association between 
trampolines and functions for you, you just need to provide the 
resolver/trampoline primitives. 

In case it helps, here's what the output of all this looks like on X86. 
Trampolines are trivial - they're emitted in blocks and proceeded by a 
pointer to the resolver block: 

module asm "Lorc_resolve_block_addr:" 
module asm "  .quad 140439143575560" 
module asm "orc_jcc_0:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)" 
module asm "orc_jcc_1:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)" 
module asm "orc_jcc_2:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)" 
... 


The resolver block is more complicated and I won't provide the full code 
for it here. You can find it by running: 
lli -jit-kind=orc-lazy -orc-lazy-debug=mods-to-stderr <hello_world.ll>



and looking at the initial output. In pseudo-asm though, it looks like 
this: 

module asm "jit_callback_manager_addr:" 
module asm "  .quad 0x46fc190" // <- address of callback manager
object
module asm "orc_resolver_block:" 
module asm "  // save register state." 
module asm "  // load jit_callback_manager_addr into %rdi 
module asm "  // load the return address (from the trampoline call) into 
%rsi 
module asm "  // %rax = call jit(%rdi, %rsi) 
module asm "  // save %rax over the return address 
module asm "  //  restore register state 
module asm "  //  retq" 

So, that's a whirlwind intro to implementing lazy JITing support for a new 
architecture in Orc. I'll try to answer any questions you have on the 
topic, though I'm not familiar with PowerPC at all. If you're
comfortable
with PowerPC assembly I think it should be possible to implement once you 
grok the concepts. 

Hope this helps! 

Cheers, 
Lang. 


On Jul 26, 2015, at 11:17 PM, Revital1 Eres <ERES at il.ibm.com> wrote:

Hi Again, 

I'm a little confused regarding what is the exact Orc's functions I
should
use 
in order to save the functions code in a code cache so it could be later 
replaced with different versions of it and I appreciate your help. 

Just a reminder I want to dynamically recompile the program based on 
profile 
 collected at the run-time. I would like to start executing the program 
from 
the code-cache and at some point be able to replace a function body with 
it's 
new compiled version; this can be done by replacing the entry in the 
function 
 code with a trampoline to It's new version so that future calls to it 
will 
call the new version code. 

Does the CompileOnDemandLayer executes the program from a code cache 
and holds pointers to the code of the functions it executes? I am 
compiling for Power machine. 
Is there a target specific pieces that I should implement for making Orc 
work on Power? 

Thanks again, 
Revital 




From:        Lang Hames <lhames at gmail.com> 
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> 
Date:        20/07/2015 08:41 PM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time 



Hi Revital, 

The CompileOnDemand layer is used by the lazy bitcode JIT in the lli tool. 
You can find the code in llvm/tools/lli/OrcLazyJIT.* . 

Cheers, 
Lang. 


On Mon, Jul 20, 2015 at 2:32 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hello Lang, 

Thanks for your answer. 

I am now looking for an example of the usage of CompileOnDemandLayer. Is 
there an example available for that (could not find one in llvm/examples)? 


Thanks, 
Revital 



From:        Lang Hames <lhames at gmail.com> 
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> 
Date:        10/07/2015 12:10 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time 



Hi Revital, 

LLVM does have an IR interpreter, but I don't think it's maintained well
(or possibly at all). The interpreter is also not designed to interact 
with the LLVM JITs. 

We generally encourage people to just JIT LLVM IR, rather than 
interpreting it. For the use-case you have described, you could JIT IR 
with no optimizations to begin with, then re-JIT hot functions at a higher 
level. 

The Orc JIT APIs (LLVM's newer JIT APIs) were written with this kind of 
use-case in mind, and are probably a better fit for this than MCJIT. There 
is no built-in hot-function detection or recompilation yet, but I think 
this would be *fairly* easy to write in terms of Orc's callback API. 

Cheers, 
Lang. 


On Thu, Jul 9, 2015 at 4:19 AM, Revital1 Eres <ERES at il.ibm.com> wrote: 
Hello, 

I am new to LLVM and a I appreciate your help with the following: 

I want to run the LLVM IR through virtual machine (LLVM interpreter?) and 
jit 
compile the hot functions (using MCJIT). 

This task will require amongst other identifying the hot functions and 
having a 
code cache that should be patched with the native code of the functions 
after 
they are jitted. 

I've read so far about MCJIT and lli however I have not seen that the LLVM 

interpreter can be used as a VM the way I was looking for; meaning 
execute the code one instruction at a time; have a profiling mode to 
identify hot functions and call jit to compile the hot functions. 

I appreciate any advice/starting points for this project. 

Thanks, 
Revital 

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev








[attachment "fully_lazy_with_recompile.tgz" deleted by Revital1 
Eres/Haifa/IBM] 



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151020/20877181/attachment.html>

Revital1 Eres via llvm-dev

2015-Nov-04 07:37 UTC

head link

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

Hello Lang,

I want to use the lazy recompilation program you posted to compile an 
input program RI (not processing the input by
 interpreter as it is done in the example).
To do that I called the addModule function on the module returned from 
parseInputIR as was done with the other 
functions in the Kaleidoscope examples. 
Now, to start the codegen I am using getAddress and at this point I was 
expecting to see a call to the lamda resolver defined 
in createResolver but I did not see it happen and I appreciate your help 
to understand why.

Here is a snippet from my additions to the new version of the fully_lazy 
Orc Kaleidoscope.

Thanks again,
Revital

  SessionContext S(getGlobalContext());
  KaleidoscopeJIT J(S);

  cl::ParseCommandLineOptions(argc, argv,
                              "Kaleidoscope example program\n");

 std::unique_ptr<Module> M;
  if (!InputIR.empty()) {
      M = parseInputIR(InputIR);;
      auto H = J.addModule(std::move(M));
     char ModID[256];
     sprintf(ModID, "IR:%s", InputIR.c_str());
     auto ExprSymbol = J.findUnmangledSymbolIn(H,ModID);
     double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
     std::cerr << "Evaluated to " << FP() <<
"\n";
     J.removeModule(H);
  }
 



From:   Lang Hames <lhames at gmail.com>
To:     Revital1 Eres/Haifa/IBM at IBMIL, LLVM Developers Mailing List 
<llvm-dev at lists.llvm.org>
Date:   18/09/2015 09:47 AM
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

Attached is a new version of the fully_lazy Orc Kaleidoscope demo that has 
been extended to enable re-compilation at higher optimisation levels, 
roughly following the scheme I outlined before.

In the compile action for the callback, the initial IR for each is 
transformed like this:


                           unsigned foo_counter = 0;
void foo$impl() {          void foo$impl() { 
  // foo body        ->      if (++foo_counter > 1000) { 
}                              auto fooOpt = $recompile(&foo); 
                               fooOpt(); 
                             } 
                             // foo body 
                           }

The key changes to make this work (which you can see by diff'ing toy.cpp 
against the original fully_lazy version):

1) New layers HotCompileLayer and HotIROptsLayer added. These perform IR 
optimisation and code generation at higher optimisation levels than the 
default layers.
2) The symbol resolver function (not to be confused with the resolver 
block) has been pulled out into its own function, createResolver, so that 
it can be shared between optimised & non-optimized code. It also resolves 
the "$recompile" function to a static method on the KaleidoscopeJIT
class
itself.
3) The lazy compile action now calls 'instrumentFunctions' before adding
the IR for cold functions to the JIT.
4) The instrumentFunctions method injects the counter code and call to 
recompile.
5) The recompileHot method re-IRGens functions, then adds them to the 
HotIROpts layer to generate more optimized versions. It then updates the 
function-body pointer so that subsequent calls go to the optimised 
version.
 
This is a bit quick-and-dirty, but does work. In the future I'll try to 
tidy this up and turn it into a new tutorial chapter.

Hope this helps!

Cheers,
Lang.




On Wed, Sep 16, 2015 at 10:09 PM, Revital1 Eres <ERES at il.ibm.com>
wrote:
Hi Lang, 

Many thanks!!! I just wanted to make sure you did not miss it... 

Thanks again! 
Revital 



From:        Lang Hames <lhames at gmail.com> 
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> 
Date:        17/09/2015 01:56 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time 



Hi Revital, 

Apologies for the delayed reply. 

I'm working on some example code for how to do this. I'll try to post it
tomorrow. 

Cheers, 
Lang. 

On Tue, Sep 8, 2015 at 12:23 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hi Lang, 

After spending some time debugging Kaleidoscope orc fully_lazy toy example 
on 
x86 I want to start implementing run-time optimizer as you suggested and 
again 
I highly appreciate your help. 
For now I'll defer the target specific implementation to the end after 
I'll have 
the non target parts in place as I can run on x86 as a start. 
Given a simple example of main function calling foo and bar functions; 
IIUC I should start from the IR level of this module which means that 
ParseIRFile will be be first called on the IR of the program, is that 
right? 

I would like to make sure I understand your suggestion which is to insert 
a new 
layer that should be implemented on top of the CompileCallbackLayer in 
order to 
be able to call trigger_condition at the beginning of a function. 
IIUC until the function (bar or foo) is optimized the call to foo and bar 
will 
go through the resolver (foo and bar will not be compiled from scratch 
every 
time we go through the resolver but rather execute the cached non 
optimized 
version after first compiled). The resolver will check trigger_condition 
to see if the cached non optimized version should be executed or a new 
optimizied version should be compiled and executed. 
After the trigger_condition is true foo and bar will be compiled to 
generate 
their optimized version and this version will be executed directly from 
now on 
(not going through the resolver any more). Is that right? 
Does this layer on top of the CompileCallbackLayer should be similar to 
class KaleidoscopeJIT? 
I saw that in Kaleidoscope Orc's example the Lambda functions that are 
added in 
createLambdaResolver are been executed by the resolver before compiling a 
call 
so I assume that the trigger_condition should be added also by 
createLambdaResolver so before compiling foo or bar the Lambda functions 
that are added by calling createLambdaResolver and contain 
trigger_condition 
will be executed, is that right? 

IIUC in Kaleidoscope Orc's example the interpreter calls the addModule 
upon 
parsing call expression in HandleTopLevelExpression. 
In my case I assume addModule be called for the module returned from 
ParseIRFile, right? 
In this case should calling getAddress on the whole module (the IR of all 
functions) will trigger calling the Lambda functions defined in 
createLambdaResolver on foo and bar functions? Also - in Kaleidoscope orc 
example the execution of the function is done explicitly in 
HandleTopLevelExpression after calling getAddress and its not clear to me 
where 
I should insert this in my case. 

Thanks again, 
Revital 




From:        Lang Hames <lhames at gmail.com> 
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> 
Date:        28/07/2015 05:58 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time 



Hi Revital, 

What do you mean by "code cache"? Orc (and MCJIT) does have the
concept of
an ObjectCache, which is a long-lived, potentially persistent, compiled 
version of some IR. It's not a key component of the JIT though: Most 
clients run without a cache attached and just JIT their code from scratch 
in each session. 

Recompilation is orthogonal to caching. There is no in-tree support for 
recompilation yet. There are several ways that it could be supported, 
depending on what security / performance trade-offs you're willing to 
make, and how deep in to the LLVM code you want to get. As things stand at 
the moment all function calls in the lazy JIT are indirected via function 
pointers. We want to add support for patchable call-sites, but this hasn't 
been implemented yet. The Indirect calls make recompilation reasonably 
easy: You could add a transform layer on top of the CompileCallbackLayer 
which would modify each function like this: 

void foo$impl() {          void foo$impl() { 
  // foo body        ->      if (trigger_condition) { 
}                              auto fooOpt = jit_recompile_hot(&foo); 
                               fooOpt(); 
                             } 
                             // foo body 
                           } 

You would implement the jit_recompile_hot function yourself in your JIT 
and make it available to JIT'd code via the SymbolResolver. When the 
trigger condition is met you'll get a call to recompile foo, at which 
point you: (1) Add the IR for foo to a 2nd IRCompileLayer that has been 
configured with a higher optimization level, (2) look up the address of 
the optimized version of foo, and (3) update the function pointer for foo 
to point at the optimized version. The process for patchable callsites 
should be fairly similar once they're available, except that you'll 
trigger a call-site update rather than rewriting a function pointer. 

This neglects all sorts of fun details (threading, garbage collection of 
old function implementations), but hopefully it gives you a place to 
start.  


Regarding laziness, as Hal mentioned you'll have to provide some target 
support for PowerPC to support lazy compilation. For a rough guide you can 
check out the X86_64 support code in 
llvm/include/llvm/ExecutionEngine/Orc/OrcTargetSupport.h and 
llvm/lib/ExecutionEngine/Orc/OrcTargetSupport.cpp. 

There are two methods that you'll need to implement: 
insertCompileCallbackTrampoline and insertResolverBlock. These work 
together to enable lazy compilation. Both of these methods inject blobs of 
target specific code in to the JIT process. To do this (at least for now) 
I make use of a handy feature of LLVM IR: You can write raw assembly code 
directly into a bitcode module ("module-level asm"). If you look at
the
X86 implementation of each of these methods you'll see they're written
in
terms of string-streams building up a string of assembly which will be 
handed off to the JIT to compile like any other code. 

The first blob that you need to be able to output is the resolver block. 
The purpose of the resolver block is to save program state and call back 
in to the JIT to trigger lazy compilation of a function. When the JIT is 
done compiling the function it returns the address of the compiled 
function to the resolver block, and the resolver block returns to the 
compiled function (rather than its original return address). 

Because all functions share the same resolver block, the JIT needs some 
way to distinguish them, which is where the trampolines come in. The JIT 
emits one trampoline per function and each trampoline just calls the 
resolver block. The return address of the call in each trampoline provides 
the unique address that the JIT associates with the to-be-compiled 
functions. The CompileCallbackManager manages this association between 
trampolines and functions for you, you just need to provide the 
resolver/trampoline primitives. 

In case it helps, here's what the output of all this looks like on X86. 
Trampolines are trivial - they're emitted in blocks and proceeded by a 
pointer to the resolver block: 

module asm "Lorc_resolve_block_addr:" 
module asm "  .quad 140439143575560" 
module asm "orc_jcc_0:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)" 
module asm "orc_jcc_1:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)" 
module asm "orc_jcc_2:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)" 
... 


The resolver block is more complicated and I won't provide the full code 
for it here. You can find it by running: 
lli -jit-kind=orc-lazy -orc-lazy-debug=mods-to-stderr <hello_world.ll>



and looking at the initial output. In pseudo-asm though, it looks like 
this: 

module asm "jit_callback_manager_addr:" 
module asm "  .quad 0x46fc190" // <- address of callback manager
object
module asm "orc_resolver_block:" 
module asm "  // save register state." 
module asm "  // load jit_callback_manager_addr into %rdi 
module asm "  // load the return address (from the trampoline call) into 
%rsi 
module asm "  // %rax = call jit(%rdi, %rsi) 
module asm "  // save %rax over the return address 
module asm "  //  restore register state 
module asm "  //  retq" 

So, that's a whirlwind intro to implementing lazy JITing support for a new 
architecture in Orc. I'll try to answer any questions you have on the 
topic, though I'm not familiar with PowerPC at all. If you're
comfortable
with PowerPC assembly I think it should be possible to implement once you 
grok the concepts. 

Hope this helps! 

Cheers, 
Lang. 


On Jul 26, 2015, at 11:17 PM, Revital1 Eres <ERES at il.ibm.com> wrote:

Hi Again, 

I'm a little confused regarding what is the exact Orc's functions I
should
use 
in order to save the functions code in a code cache so it could be later 
replaced with different versions of it and I appreciate your help. 

Just a reminder I want to dynamically recompile the program based on 
profile 
 collected at the run-time. I would like to start executing the program 
from 
the code-cache and at some point be able to replace a function body with 
it's 
new compiled version; this can be done by replacing the entry in the 
function 
 code with a trampoline to It's new version so that future calls to it 
will 
call the new version code. 

Does the CompileOnDemandLayer executes the program from a code cache 
and holds pointers to the code of the functions it executes? I am 
compiling for Power machine. 
Is there a target specific pieces that I should implement for making Orc 
work on Power? 

Thanks again, 
Revital 




From:        Lang Hames <lhames at gmail.com> 
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> 
Date:        20/07/2015 08:41 PM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time 



Hi Revital, 

The CompileOnDemand layer is used by the lazy bitcode JIT in the lli tool. 
You can find the code in llvm/tools/lli/OrcLazyJIT.* . 

Cheers, 
Lang. 


On Mon, Jul 20, 2015 at 2:32 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hello Lang, 

Thanks for your answer. 

I am now looking for an example of the usage of CompileOnDemandLayer. Is 
there an example available for that (could not find one in llvm/examples)? 


Thanks, 
Revital 



From:        Lang Hames <lhames at gmail.com> 
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> 
Date:        10/07/2015 12:10 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time 



Hi Revital, 

LLVM does have an IR interpreter, but I don't think it's maintained well
(or possibly at all). The interpreter is also not designed to interact 
with the LLVM JITs. 

We generally encourage people to just JIT LLVM IR, rather than 
interpreting it. For the use-case you have described, you could JIT IR 
with no optimizations to begin with, then re-JIT hot functions at a higher 
level. 

The Orc JIT APIs (LLVM's newer JIT APIs) were written with this kind of 
use-case in mind, and are probably a better fit for this than MCJIT. There 
is no built-in hot-function detection or recompilation yet, but I think 
this would be *fairly* easy to write in terms of Orc's callback API. 

Cheers, 
Lang. 


On Thu, Jul 9, 2015 at 4:19 AM, Revital1 Eres <ERES at il.ibm.com> wrote: 
Hello, 

I am new to LLVM and a I appreciate your help with the following: 

I want to run the LLVM IR through virtual machine (LLVM interpreter?) and 
jit 
compile the hot functions (using MCJIT). 

This task will require amongst other identifying the hot functions and 
having a 
code cache that should be patched with the native code of the functions 
after 
they are jitted. 

I've read so far about MCJIT and lli however I have not seen that the LLVM 

interpreter can be used as a VM the way I was looking for; meaning 
execute the code one instruction at a time; have a profiling mode to 
identify hot functions and call jit to compile the hot functions. 

I appreciate any advice/starting points for this project. 

Thanks, 
Revital 

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev








[attachment "fully_lazy_with_recompile.tgz" deleted by Revital1 
Eres/Haifa/IBM] 



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151104/7304814e/attachment.html>

Lang Hames via llvm-dev

2015-Nov-10 16:30 UTC

head link

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

Hi Revital,

Apologies for the delayed reply - I'm traveling at the moment and not able
to check my email often.

You will only see a callback on the resolver for symbols that are external
to the module. What did the IR that you added look like?

Cheers,
Lang.

On Wed, Nov 4, 2015 at 8:37 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
> Hello Lang,
>
> I want to use the lazy recompilation program you posted to compile an
> input program RI (not processing the input by
>  interpreter as it is done in the example).
> To do that I called the addModule function on the module returned from
> parseInputIR as was done with the other
> functions in the Kaleidoscope examples.
> Now, to start the codegen I am using getAddress and at this point I was
> expecting to see a call to the lamda resolver defined
> in createResolver but I did not see it happen and I appreciate your help
> to understand why.
>
> Here is a snippet from my additions to the new version of the fully_lazy
> Orc Kaleidoscope.
>
> Thanks again,
> Revital
>
>   SessionContext S(getGlobalContext());
>   KaleidoscopeJIT J(S);
>
>   cl::ParseCommandLineOptions(argc, argv,
>                               "Kaleidoscope example program\n");
>
>  std::unique_ptr<Module> M;
>   if (!InputIR.empty()) {
>       M = parseInputIR(InputIR);;
>       auto H = J.addModule(std::move(M));
>      char ModID[256];
>      sprintf(ModID, "IR:%s", InputIR.c_str());
>      auto ExprSymbol = J.findUnmangledSymbolIn(H,ModID);
>      double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
>      std::cerr << "Evaluated to " << FP() <<
"\n";
>      J.removeModule(H);
>   }
>
>
>
>
> From:        Lang Hames <lhames at gmail.com>
> To:        Revital1 Eres/Haifa/IBM at IBMIL, LLVM Developers Mailing List
<
> llvm-dev at lists.llvm.org>
> Date:        18/09/2015 09:47 AM
>
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> Attached is a new version of the fully_lazy Orc Kaleidoscope demo that has
> been extended to enable re-compilation at higher optimisation levels,
> roughly following the scheme I outlined before.
>
> In the compile action for the callback, the initial IR for each is
> transformed like this:
>
>
>                            unsigned foo_counter = 0;
> void foo$impl() {          void foo$impl() {
>   // foo body        ->      if (++foo_counter > 1000) {
> }                              auto fooOpt = $recompile(&foo);
>                                fooOpt();
>                              }
>                              // foo body
>                            }
>
> The key changes to make this work (which you can see by diff'ing
toy.cpp
> against the original fully_lazy version):
>
> 1) New layers HotCompileLayer and HotIROptsLayer added. These perform IR
> optimisation and code generation at higher optimisation levels than the
> default layers.
> 2) The symbol resolver function (not to be confused with the resolver
> block) has been pulled out into its own function, createResolver, so that
> it can be shared between optimised & non-optimized code. It also
resolves
> the "$recompile" function to a static method on the
KaleidoscopeJIT class
> itself.
> 3) The lazy compile action now calls 'instrumentFunctions' before
adding
> the IR for cold functions to the JIT.
> 4) The instrumentFunctions method injects the counter code and call to
> recompile.
> 5) The recompileHot method re-IRGens functions, then adds them to the
> HotIROpts layer to generate more optimized versions. It then updates the
> function-body pointer so that subsequent calls go to the optimised version.
>
> This is a bit quick-and-dirty, but does work. In the future I'll try to
> tidy this up and turn it into a new tutorial chapter.
>
> Hope this helps!
>
> Cheers,
> Lang.
>
>
>
>
> On Wed, Sep 16, 2015 at 10:09 PM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hi Lang,
>
> Many thanks!!! I just wanted to make sure you did not miss it...
>
> Thanks again!
> Revital
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        17/09/2015 01:56 AM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> Apologies for the delayed reply.
>
> I'm working on some example code for how to do this. I'll try to
post it
> tomorrow.
>
> Cheers,
> Lang.
>
> On Tue, Sep 8, 2015 at 12:23 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hi Lang,
>
> After spending some time debugging Kaleidoscope orc fully_lazy toy example
> on
> x86 I want to start implementing run-time optimizer as you suggested and
> again
> I highly appreciate your help.
> For now I'll defer the target specific implementation to the end after
> I'll have
> the non target parts in place as I can run on x86 as a start.
> Given a simple example of main function calling foo and bar functions;
> IIUC I should start from the IR level of this module which means that
> ParseIRFile will be be first called on the IR of the program, is that
> right?
>
> I would like to make sure I understand your suggestion which is to insert
> a new
> layer that should be implemented on top of the CompileCallbackLayer in
> order to
> be able to call trigger_condition at the beginning of a function.
> IIUC until the function (bar or foo) is optimized the call to foo and bar
> will
> go through the resolver (foo and bar will not be compiled from scratch
> every
> time we go through the resolver but rather execute the cached non
> optimized
> version after first compiled). The resolver will check trigger_condition
> to see if the cached non optimized version should be executed or a new
> optimizied version should be compiled and executed.
> After the trigger_condition is true foo and bar will be compiled to
> generate
> their optimized version and this version will be executed directly from
> now on
> (not going through the resolver any more). Is that right?
> Does this layer on top of the CompileCallbackLayer should be similar to
> class KaleidoscopeJIT?
> I saw that in Kaleidoscope Orc's example the Lambda functions that are
> added in
> createLambdaResolver are been executed by the resolver before compiling a
> call
> so I assume that the trigger_condition should be added also by
> createLambdaResolver so before compiling foo or bar the Lambda functions
> that are added by calling createLambdaResolver and contain
> trigger_condition
> will be executed, is that right?
>
> IIUC in Kaleidoscope Orc's example the interpreter calls the addModule
> upon
> parsing call expression in HandleTopLevelExpression.
> In my case I assume addModule be called for the module returned from
> ParseIRFile, right?
> In this case should calling getAddress on the whole module (the IR of all
> functions) will trigger calling the Lambda functions defined in
> createLambdaResolver on foo and bar functions? Also - in Kaleidoscope orc
> example the execution of the function is done explicitly in
> HandleTopLevelExpression after calling getAddress and its not clear to me
> where
> I should insert this in my case.
>
> Thanks again,
> Revital
>
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        28/07/2015 05:58 AM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> What do you mean by "code cache"? Orc (and MCJIT) does have the
concept of
> an ObjectCache, which is a long-lived, potentially persistent, compiled
> version of some IR. It's not a key component of the JIT though: Most
> clients run without a cache attached and just JIT their code from scratch
> in each session.
>
> Recompilation is orthogonal to caching. There is no in-tree support for
> recompilation yet. There are several ways that it could be supported,
> depending on what security / performance trade-offs you're willing to
make,
> and how deep in to the LLVM code you want to get. As things stand at the
> moment all function calls in the lazy JIT are indirected via function
> pointers. We want to add support for patchable call-sites, but this
hasn't
> been implemented yet. The Indirect calls make recompilation reasonably
> easy: You could add a transform layer on top of the CompileCallbackLayer
> which would modify each function like this:
>
> void foo$impl() {          void foo$impl() {
>   // foo body        ->      if (trigger_condition) {
> }                              auto fooOpt = jit_recompile_hot(&foo);
>                                fooOpt();
>                              }
>                              // foo body
>                            }
>
> You would implement the jit_recompile_hot function yourself in your JIT
> and make it available to JIT'd code via the SymbolResolver. When the
> trigger condition is met you'll get a call to recompile foo, at which
point
> you: (1) Add the IR for foo to a 2nd IRCompileLayer that has been
> configured with a higher optimization level, (2) look up the address of the
> optimized version of foo, and (3) update the function pointer for foo to
> point at the optimized version. The process for patchable callsites should
> be fairly similar once they're available, except that you'll
trigger a
> call-site update rather than rewriting a function pointer.
>
> This neglects all sorts of fun details (threading, garbage collection of
> old function implementations), but hopefully it gives you a place to
> start.
>
>
> Regarding laziness, as Hal mentioned you'll have to provide some target
> support for PowerPC to support lazy compilation. For a rough guide you can
> check out the X86_64 support code in
> llvm/include/llvm/ExecutionEngine/Orc/OrcTargetSupport.h and
> llvm/lib/ExecutionEngine/Orc/OrcTargetSupport.cpp.
>
> There are two methods that you'll need to implement:
> insertCompileCallbackTrampoline and insertResolverBlock. These work
> together to enable lazy compilation. Both of these methods inject blobs of
> target specific code in to the JIT process. To do this (at least for now) I
> make use of a handy feature of LLVM IR: You can write raw assembly code
> directly into a bitcode module ("module-level asm"). If you look
at the X86
> implementation of each of these methods you'll see they're written
in terms
> of string-streams building up a string of assembly which will be handed off
> to the JIT to compile like any other code.
>
> The first blob that you need to be able to output is the resolver block.
> The purpose of the resolver block is to save program state and call back in
> to the JIT to trigger lazy compilation of a function. When the JIT is done
> compiling the function it returns the address of the compiled function to
> the resolver block, and the resolver block returns to the compiled function
> (rather than its original return address).
>
> Because all functions share the same resolver block, the JIT needs some
> way to distinguish them, which is where the trampolines come in. The JIT
> emits one trampoline per function and each trampoline just calls the
> resolver block. The return address of the call in each trampoline provides
> the unique address that the JIT associates with the to-be-compiled
> functions. The CompileCallbackManager manages this association between
> trampolines and functions for you, you just need to provide the
> resolver/trampoline primitives.
>
> In case it helps, here's what the output of all this looks like on X86.
> Trampolines are trivial - they're emitted in blocks and proceeded by a
> pointer to the resolver block:
>
> module asm "Lorc_resolve_block_addr:"
> module asm "  .quad 140439143575560"
> module asm "orc_jcc_0:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> module asm "orc_jcc_1:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> module asm "orc_jcc_2:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> ...
>
>
> The resolver block is more complicated and I won't provide the full
code
> for it here. You can find it by running:
> lli -jit-kind=orc-lazy -orc-lazy-debug=mods-to-stderr
<hello_world.ll>
>
>
>
>
> and looking at the initial output. In pseudo-asm though, it looks like
> this:
>
> module asm "jit_callback_manager_addr:"
> module asm "  .quad 0x46fc190" // <- address of callback
manager object
> module asm "orc_resolver_block:"
> module asm "  // save register state."
> module asm "  // load jit_callback_manager_addr into %rdi
> module asm "  // load the return address (from the trampoline call)
into
> %rsi
> module asm "  // %rax = call jit(%rdi, %rsi)
> module asm "  // save %rax over the return address
> module asm "  //  restore register state
> module asm "  //  retq"
>
> So, that's a whirlwind intro to implementing lazy JITing support for a
new
> architecture in Orc. I'll try to answer any questions you have on the
> topic, though I'm not familiar with PowerPC at all. If you're
comfortable
> with PowerPC assembly I think it should be possible to implement once you
> grok the concepts.
>
> Hope this helps!
>
> Cheers,
> Lang.
>
>
> On Jul 26, 2015, at 11:17 PM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
>
> Hi Again,
>
> I'm a little confused regarding what is the exact Orc's functions I
should
> use
> in order to save the functions code in a code cache so it could be later
> replaced with different versions of it and I appreciate your help.
>
> Just a reminder I want to dynamically recompile the program based on
> profile
>  collected at the run-time. I would like to start executing the program
> from
> the code-cache and at some point be able to replace a function body with
> it's
> new compiled version; this can be done by replacing the entry in the
> function
>  code with a trampoline to It's new version so that future calls to it
will
> call the new version code.
>
> Does the CompileOnDemandLayer executes the program from a code cache
> and holds pointers to the code of the functions it executes? I am
> compiling for Power machine.
> Is there a target specific pieces that I should implement for making Orc
> work on Power?
>
> Thanks again,
> Revital
>
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        20/07/2015 08:41 PM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> The CompileOnDemand layer is used by the lazy bitcode JIT in the lli tool.
> You can find the code in llvm/tools/lli/OrcLazyJIT.* .
>
> Cheers,
> Lang.
>
>
> On Mon, Jul 20, 2015 at 2:32 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hello Lang,
>
> Thanks for your answer.
>
> I am now looking for an example of the usage of CompileOnDemandLayer. Is
> there an example available for that (could not find one in llvm/examples)?
>
> Thanks,
> Revital
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        10/07/2015 12:10 AM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> LLVM does have an IR interpreter, but I don't think it's maintained
well
> (or possibly at all). The interpreter is also not designed to interact with
> the LLVM JITs.
>
> We generally encourage people to just JIT LLVM IR, rather than
> interpreting it. For the use-case you have described, you could JIT IR with
> no optimizations to begin with, then re-JIT hot functions at a higher
> level.
>
> The Orc JIT APIs (LLVM's newer JIT APIs) were written with this kind of
> use-case in mind, and are probably a better fit for this than MCJIT. There
> is no built-in hot-function detection or recompilation yet, but I think
> this would be *fairly* easy to write in terms of Orc's callback API.
>
> Cheers,
> Lang.
>
>
> On Thu, Jul 9, 2015 at 4:19 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hello,
>
> I am new to LLVM and a I appreciate your help with the following:
>
> I want to run the LLVM IR through virtual machine (LLVM interpreter?) and
> jit
> compile the hot functions (using MCJIT).
>
> This task will require amongst other identifying the hot functions and
> having a
> code cache that should be patched with the native code of the functions
> after
> they are jitted.
>
> I've read so far about MCJIT and lli however I have not seen that the
LLVM
> interpreter can be used as a VM the way I was looking for; meaning
> execute the code one instruction at a time; have a profiling mode to
> identify hot functions and call jit to compile the hot functions.
>
> I appreciate any advice/starting points for this project.
>
> Thanks,
> Revital
>
> _______________________________________________
> LLVM Developers mailing list
> *LLVMdev at cs.uiuc.edu* <LLVMdev at cs.uiuc.edu>
> *http://llvm.cs.uiuc.edu* <http://llvm.cs.uiuc.edu/>
> *http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev*
> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>
>
>
>
>
>
>
>
> [attachment "fully_lazy_with_recompile.tgz" deleted by Revital1
> Eres/Haifa/IBM]
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151110/081e067a/attachment.html>

Revital1 Eres via llvm-dev

2015-Nov-11 10:44 UTC

head link

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

Hi Lang,

Thanks for your reply!

The program I'm compiling is the following toy program which is compiled 
with -fno-inline to
avoid inlining foo into main. 

In the fully_lazy_with_recompile code I've added the following statements. 
When running the 
code with gdb I do not see it breaks in the lamda resolver as described in 
my previous mail.

 auto ExprSymbol = J.findUnmangledSymbolIn(H,"main");
 double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
 std::cerr << "Evaluated to " << FP() <<
"\n";

Btw, another issue I need to resolve - some of the parameters were 
originally read from command line using argv but due to the following 
error
I avoided that for now (I also got similar error regarding 
ZNSt8ios_base4InitC1Ev when using prints):
LLVM ERROR: Program used external function 'atoi' which could not be 
resolved!

Thanks again,
Revital

#define ITERS 1000000
int arr[ITERS];

int
foo (int x, int y)
{
  int res = 950;
  if (x > 3 && y < 77)
    res = 97;
  else
    res = res * x;
  return res;
}

int
main ()
{
  int x = 880;
  int num = 990;
  int i, j;
  int b = 0;

  for (i = 0; i < ITERS; i++)
    arr[i] = i;

  for (j = 0; j < num; j++)
    for (i = 0; i < ITERS; i++)
      {
        b += foo (x, arr[i]) /2;
      }
  return 0;
}



From:   Lang Hames <lhames at gmail.com>
To:     Revital1 Eres/Haifa/IBM at IBMIL
Cc:     LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
Date:   10/11/2015 06:31 PM
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

Apologies for the delayed reply - I'm traveling at the moment and not able 
to check my email often.

You will only see a callback on the resolver for symbols that are external 
to the module. What did the IR that you added look like?

Cheers,
Lang.

On Wed, Nov 4, 2015 at 8:37 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hello Lang,

I want to use the lazy recompilation program you posted to compile an 
input program RI (not processing the input by
 interpreter as it is done in the example).
To do that I called the addModule function on the module returned from 
parseInputIR as was done with the other 
functions in the Kaleidoscope examples. 
Now, to start the codegen I am using getAddress and at this point I was 
expecting to see a call to the lamda resolver defined 
in createResolver but I did not see it happen and I appreciate your help 
to understand why.

Here is a snippet from my additions to the new version of the fully_lazy 
Orc Kaleidoscope.

Thanks again,
Revital

  SessionContext S(getGlobalContext());
  KaleidoscopeJIT J(S);

  cl::ParseCommandLineOptions(argc, argv,
                              "Kaleidoscope example program\n");

 std::unique_ptr<Module> M;
  if (!InputIR.empty()) {
      M = parseInputIR(InputIR);;
      auto H = J.addModule(std::move(M));
     char ModID[256];
     sprintf(ModID, "IR:%s", InputIR.c_str());
     auto ExprSymbol = J.findUnmangledSymbolIn(H,ModID);
     double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
     std::cerr << "Evaluated to " << FP() <<
"\n";
     J.removeModule(H);
  }
               



From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL, LLVM Developers Mailing List <
llvm-dev at lists.llvm.org>
Date:        18/09/2015 09:47 AM

Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

Attached is a new version of the fully_lazy Orc Kaleidoscope demo that has 
been extended to enable re-compilation at higher optimisation levels, 
roughly following the scheme I outlined before.

In the compile action for the callback, the initial IR for each is 
transformed like this:


                           unsigned foo_counter = 0;
void foo$impl() {          void foo$impl() { 
  // foo body        ->      if (++foo_counter > 1000) { 
}                              auto fooOpt = $recompile(&foo); 
                               fooOpt(); 
                             } 
                             // foo body 
                           }

The key changes to make this work (which you can see by diff'ing toy.cpp 
against the original fully_lazy version):

1) New layers HotCompileLayer and HotIROptsLayer added. These perform IR 
optimisation and code generation at higher optimisation levels than the 
default layers.
2) The symbol resolver function (not to be confused with the resolver 
block) has been pulled out into its own function, createResolver, so that 
it can be shared between optimised & non-optimized code. It also resolves 
the "$recompile" function to a static method on the KaleidoscopeJIT
class
itself.
3) The lazy compile action now calls 'instrumentFunctions' before adding
the IR for cold functions to the JIT.
4) The instrumentFunctions method injects the counter code and call to 
recompile.
5) The recompileHot method re-IRGens functions, then adds them to the 
HotIROpts layer to generate more optimized versions. It then updates the 
function-body pointer so that subsequent calls go to the optimised 
version.
 
This is a bit quick-and-dirty, but does work. In the future I'll try to 
tidy this up and turn it into a new tutorial chapter.

Hope this helps!

Cheers,
Lang.




On Wed, Sep 16, 2015 at 10:09 PM, Revital1 Eres <ERES at il.ibm.com>
wrote:
Hi Lang, 

Many thanks!!! I just wanted to make sure you did not miss it...

Thanks again! 
Revital 



From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:        17/09/2015 01:56 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital, 

Apologies for the delayed reply. 

I'm working on some example code for how to do this. I'll try to post it
tomorrow. 

Cheers, 
Lang. 

On Tue, Sep 8, 2015 at 12:23 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hi Lang, 

After spending some time debugging Kaleidoscope orc fully_lazy toy example 
on 
x86 I want to start implementing run-time optimizer as you suggested and 
again 
I highly appreciate your help. 
For now I'll defer the target specific implementation to the end after 
I'll have 
the non target parts in place as I can run on x86 as a start. 
Given a simple example of main function calling foo and bar functions; 
IIUC I should start from the IR level of this module which means that 
ParseIRFile will be be first called on the IR of the program, is that 
right? 

I would like to make sure I understand your suggestion which is to insert 
a new 
layer that should be implemented on top of the CompileCallbackLayer in 
order to 
be able to call trigger_condition at the beginning of a function. 
IIUC until the function (bar or foo) is optimized the call to foo and bar 
will 
go through the resolver (foo and bar will not be compiled from scratch 
every 
time we go through the resolver but rather execute the cached non 
optimized 
version after first compiled). The resolver will check trigger_condition 
to see if the cached non optimized version should be executed or a new 
optimizied version should be compiled and executed. 
After the trigger_condition is true foo and bar will be compiled to 
generate 
their optimized version and this version will be executed directly from 
now on 
(not going through the resolver any more). Is that right? 
Does this layer on top of the CompileCallbackLayer should be similar to 
class KaleidoscopeJIT? 
I saw that in Kaleidoscope Orc's example the Lambda functions that are 
added in 
createLambdaResolver are been executed by the resolver before compiling a 
call 
so I assume that the trigger_condition should be added also by 
createLambdaResolver so before compiling foo or bar the Lambda functions 
that are added by calling createLambdaResolver and contain 
trigger_condition 
will be executed, is that right? 

IIUC in Kaleidoscope Orc's example the interpreter calls the addModule 
upon 
parsing call expression in HandleTopLevelExpression. 
In my case I assume addModule be called for the module returned from 
ParseIRFile, right? 
In this case should calling getAddress on the whole module (the IR of all 
functions) will trigger calling the Lambda functions defined in 
createLambdaResolver on foo and bar functions? Also - in Kaleidoscope orc 
example the execution of the function is done explicitly in 
HandleTopLevelExpression after calling getAddress and its not clear to me 
where 
I should insert this in my case. 

Thanks again, 
Revital 




From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:        28/07/2015 05:58 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital, 

What do you mean by "code cache"? Orc (and MCJIT) does have the
concept of
an ObjectCache, which is a long-lived, potentially persistent, compiled 
version of some IR. It's not a key component of the JIT though: Most 
clients run without a cache attached and just JIT their code from scratch 
in each session. 

Recompilation is orthogonal to caching. There is no in-tree support for 
recompilation yet. There are several ways that it could be supported, 
depending on what security / performance trade-offs you're willing to 
make, and how deep in to the LLVM code you want to get. As things stand at 
the moment all function calls in the lazy JIT are indirected via function 
pointers. We want to add support for patchable call-sites, but this hasn't 
been implemented yet. The Indirect calls make recompilation reasonably 
easy: You could add a transform layer on top of the CompileCallbackLayer 
which would modify each function like this: 

void foo$impl() {          void foo$impl() {
  // foo body        ->      if (trigger_condition) { 
}                              auto fooOpt = jit_recompile_hot(&foo);
                               fooOpt(); 
                             } 
                             // foo body 
                           } 

You would implement the jit_recompile_hot function yourself in your JIT 
and make it available to JIT'd code via the SymbolResolver. When the 
trigger condition is met you'll get a call to recompile foo, at which 
point you: (1) Add the IR for foo to a 2nd IRCompileLayer that has been 
configured with a higher optimization level, (2) look up the address of 
the optimized version of foo, and (3) update the function pointer for foo 
to point at the optimized version. The process for patchable callsites 
should be fairly similar once they're available, except that you'll 
trigger a call-site update rather than rewriting a function pointer. 

This neglects all sorts of fun details (threading, garbage collection of 
old function implementations), but hopefully it gives you a place to 
start.  


Regarding laziness, as Hal mentioned you'll have to provide some target 
support for PowerPC to support lazy compilation. For a rough guide you can 
check out the X86_64 support code in 
llvm/include/llvm/ExecutionEngine/Orc/OrcTargetSupport.h and 
llvm/lib/ExecutionEngine/Orc/OrcTargetSupport.cpp. 

There are two methods that you'll need to implement: 
insertCompileCallbackTrampoline and insertResolverBlock. These work 
together to enable lazy compilation. Both of these methods inject blobs of 
target specific code in to the JIT process. To do this (at least for now) 
I make use of a handy feature of LLVM IR: You can write raw assembly code 
directly into a bitcode module ("module-level asm"). If you look at
the
X86 implementation of each of these methods you'll see they're written
in
terms of string-streams building up a string of assembly which will be 
handed off to the JIT to compile like any other code. 

The first blob that you need to be able to output is the resolver block. 
The purpose of the resolver block is to save program state and call back 
in to the JIT to trigger lazy compilation of a function. When the JIT is 
done compiling the function it returns the address of the compiled 
function to the resolver block, and the resolver block returns to the 
compiled function (rather than its original return address). 

Because all functions share the same resolver block, the JIT needs some 
way to distinguish them, which is where the trampolines come in. The JIT 
emits one trampoline per function and each trampoline just calls the 
resolver block. The return address of the call in each trampoline provides 
the unique address that the JIT associates with the to-be-compiled 
functions. The CompileCallbackManager manages this association between 
trampolines and functions for you, you just need to provide the 
resolver/trampoline primitives. 

In case it helps, here's what the output of all this looks like on X86. 
Trampolines are trivial - they're emitted in blocks and proceeded by a 
pointer to the resolver block: 

module asm "Lorc_resolve_block_addr:"
module asm "  .quad 140439143575560"
module asm "orc_jcc_0:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)"
module asm "orc_jcc_1:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)"
module asm "orc_jcc_2:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)"
... 


The resolver block is more complicated and I won't provide the full code 
for it here. You can find it by running: 
lli -jit-kind=orc-lazy -orc-lazy-debug=mods-to-stderr <hello_world.ll>




and looking at the initial output. In pseudo-asm though, it looks like 
this: 

module asm "jit_callback_manager_addr:"
module asm "  .quad 0x46fc190" // <- address of callback manager
object
module asm "orc_resolver_block:" 
module asm "  // save register state."
module asm "  // load jit_callback_manager_addr into %rdi
module asm "  // load the return address (from the trampoline call) into 
%rsi 
module asm "  // %rax = call jit(%rdi, %rsi)
module asm "  // save %rax over the return address
module asm "  //  restore register state
module asm "  //  retq" 

So, that's a whirlwind intro to implementing lazy JITing support for a new 
architecture in Orc. I'll try to answer any questions you have on the 
topic, though I'm not familiar with PowerPC at all. If you're
comfortable
with PowerPC assembly I think it should be possible to implement once you 
grok the concepts. 

Hope this helps! 

Cheers, 
Lang. 


On Jul 26, 2015, at 11:17 PM, Revital1 Eres <ERES at il.ibm.com> wrote:

Hi Again, 

I'm a little confused regarding what is the exact Orc's functions I
should
use 
in order to save the functions code in a code cache so it could be later
replaced with different versions of it and I appreciate your help.

Just a reminder I want to dynamically recompile the program based on 
profile
 collected at the run-time. I would like to start executing the program 
from 
the code-cache and at some point be able to replace a function body with 
it's 
new compiled version; this can be done by replacing the entry in the 
function
 code with a trampoline to It's new version so that future calls to it 
will 
call the new version code. 

Does the CompileOnDemandLayer executes the program from a code cache 
and holds pointers to the code of the functions it executes? I am 
compiling for Power machine. 
Is there a target specific pieces that I should implement for making Orc 
work on Power? 

Thanks again, 
Revital 




From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:        20/07/2015 08:41 PM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital, 

The CompileOnDemand layer is used by the lazy bitcode JIT in the lli tool. 
You can find the code in llvm/tools/lli/OrcLazyJIT.* . 

Cheers, 
Lang. 


On Mon, Jul 20, 2015 at 2:32 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hello Lang,

Thanks for your answer. 

I am now looking for an example of the usage of CompileOnDemandLayer. Is 
there an example available for that (could not find one in llvm/examples)?

Thanks, 
Revital 



From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:        10/07/2015 12:10 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital, 

LLVM does have an IR interpreter, but I don't think it's maintained well
(or possibly at all). The interpreter is also not designed to interact 
with the LLVM JITs. 

We generally encourage people to just JIT LLVM IR, rather than 
interpreting it. For the use-case you have described, you could JIT IR 
with no optimizations to begin with, then re-JIT hot functions at a higher 
level. 

The Orc JIT APIs (LLVM's newer JIT APIs) were written with this kind of 
use-case in mind, and are probably a better fit for this than MCJIT. There 
is no built-in hot-function detection or recompilation yet, but I think 
this would be *fairly* easy to write in terms of Orc's callback API. 

Cheers, 
Lang. 


On Thu, Jul 9, 2015 at 4:19 AM, Revital1 Eres <ERES at il.ibm.com> wrote: 
Hello, 

I am new to LLVM and a I appreciate your help with the following:

I want to run the LLVM IR through virtual machine (LLVM interpreter?) and 
jit 
compile the hot functions (using MCJIT). 

This task will require amongst other identifying the hot functions and 
having a 
code cache that should be patched with the native code of the functions 
after 
they are jitted. 

I've read so far about MCJIT and lli however I have not seen that the LLVM 

interpreter can be used as a VM the way I was looking for; meaning
execute the code one instruction at a time; have a profiling mode to 
identify hot functions and call jit to compile the hot functions.

I appreciate any advice/starting points for this project.

Thanks, 
Revital 

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev








[attachment "fully_lazy_with_recompile.tgz" deleted by Revital1 
Eres/Haifa/IBM] 






-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151111/ab785b1e/attachment-0001.html>

Lang Hames via llvm-dev

2015-Nov-15 11:33 UTC

head link

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

Hi Revital,

This program does not contain any external references, and so I would not
expect it to call the resolver at all.

What symbol were you expecting to see a resolver call for?

Cheers,
Lang.

On Wed, Nov 11, 2015 at 11:44 AM, Revital1 Eres <ERES at il.ibm.com>
wrote:
> Hi Lang,
>
> Thanks for your reply!
>
> The program I'm compiling is the following toy program which is
compiled
> with -fno-inline to
> avoid inlining foo into main.
>
> In the fully_lazy_with_recompile code I've added the following
statements.
> When running the
> code with gdb I do not see it breaks in the lamda resolver as described in
> my previous mail.
>
>  auto ExprSymbol = J.findUnmangledSymbolIn(H,"main");
>  double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
>  std::cerr << "Evaluated to " << FP() <<
"\n";
>
> Btw, another issue I need to resolve - some of the parameters were
> originally read from command line using argv but due to the following error
> I avoided that for now (I also got similar error regarding
> ZNSt8ios_base4InitC1Ev when using prints):
> LLVM ERROR: Program used external function 'atoi' which could not
be
> resolved!
>
> Thanks again,
> Revital
>
> #define ITERS 1000000
> int arr[ITERS];
>
> int
> foo (int x, int y)
> {
>   int res = 950;
>   if (x > 3 && y < 77)
>     res = 97;
>   else
>     res = res * x;
>   return res;
> }
>
> int
> main ()
> {
>   int x = 880;
>   int num = 990;
>   int i, j;
>   int b = 0;
>
>   for (i = 0; i < ITERS; i++)
>     arr[i] = i;
>
>   for (j = 0; j < num; j++)
>     for (i = 0; i < ITERS; i++)
>       {
>         b += foo (x, arr[i]) /2;
>       }
>   return 0;
> }
>
>
>
> From:        Lang Hames <lhames at gmail.com>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
> Date:        10/11/2015 06:31 PM
>
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> Apologies for the delayed reply - I'm traveling at the moment and not
able
> to check my email often.
>
> You will only see a callback on the resolver for symbols that are external
> to the module. What did the IR that you added look like?
>
> Cheers,
> Lang.
>
> On Wed, Nov 4, 2015 at 8:37 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hello Lang,
>
> I want to use the lazy recompilation program you posted to compile an
> input program RI (not processing the input by
>  interpreter as it is done in the example).
> To do that I called the addModule function on the module returned from
> parseInputIR as was done with the other
> functions in the Kaleidoscope examples.
> Now, to start the codegen I am using getAddress and at this point I was
> expecting to see a call to the lamda resolver defined
> in createResolver but I did not see it happen and I appreciate your help
> to understand why.
>
> Here is a snippet from my additions to the new version of the fully_lazy
> Orc Kaleidoscope.
>
> Thanks again,
> Revital
>
>   SessionContext S(getGlobalContext());
>   KaleidoscopeJIT J(S);
>
>   cl::ParseCommandLineOptions(argc, argv,
>                               "Kaleidoscope example program\n");
>
>  std::unique_ptr<Module> M;
>   if (!InputIR.empty()) {
>       M = parseInputIR(InputIR);;
>       auto H = J.addModule(std::move(M));
>      char ModID[256];
>      sprintf(ModID, "IR:%s", InputIR.c_str());
>      auto ExprSymbol = J.findUnmangledSymbolIn(H,ModID);
>      double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
>      std::cerr << "Evaluated to " << FP() <<
"\n";
>      J.removeModule(H);
>   }
>
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL, LLVM Developers Mailing List
<
> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>>
> Date:        18/09/2015 09:47 AM
>
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> Attached is a new version of the fully_lazy Orc Kaleidoscope demo that has
> been extended to enable re-compilation at higher optimisation levels,
> roughly following the scheme I outlined before.
>
> In the compile action for the callback, the initial IR for each is
> transformed like this:
>
>
>                            unsigned foo_counter = 0;
> void foo$impl() {          void foo$impl() {
>   // foo body        ->      if (++foo_counter > 1000) {
> }                              auto fooOpt = $recompile(&foo);
>                                fooOpt();
>                              }
>                              // foo body
>                            }
>
> The key changes to make this work (which you can see by diff'ing
toy.cpp
> against the original fully_lazy version):
>
> 1) New layers HotCompileLayer and HotIROptsLayer added. These perform IR
> optimisation and code generation at higher optimisation levels than the
> default layers.
> 2) The symbol resolver function (not to be confused with the resolver
> block) has been pulled out into its own function, createResolver, so that
> it can be shared between optimised & non-optimized code. It also
resolves
> the "$recompile" function to a static method on the
KaleidoscopeJIT class
> itself.
> 3) The lazy compile action now calls 'instrumentFunctions' before
adding
> the IR for cold functions to the JIT.
> 4) The instrumentFunctions method injects the counter code and call to
> recompile.
> 5) The recompileHot method re-IRGens functions, then adds them to the
> HotIROpts layer to generate more optimized versions. It then updates the
> function-body pointer so that subsequent calls go to the optimised version.
>
> This is a bit quick-and-dirty, but does work. In the future I'll try to
> tidy this up and turn it into a new tutorial chapter.
>
> Hope this helps!
>
> Cheers,
> Lang.
>
>
>
>
> On Wed, Sep 16, 2015 at 10:09 PM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hi Lang,
>
> Many thanks!!! I just wanted to make sure you did not miss it...
>
> Thanks again!
> Revital
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        17/09/2015 01:56 AM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> Apologies for the delayed reply.
>
> I'm working on some example code for how to do this. I'll try to
post it
> tomorrow.
>
> Cheers,
> Lang.
>
> On Tue, Sep 8, 2015 at 12:23 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hi Lang,
>
> After spending some time debugging Kaleidoscope orc fully_lazy toy example
> on
> x86 I want to start implementing run-time optimizer as you suggested and
> again
> I highly appreciate your help.
> For now I'll defer the target specific implementation to the end after
> I'll have
> the non target parts in place as I can run on x86 as a start.
> Given a simple example of main function calling foo and bar functions;
> IIUC I should start from the IR level of this module which means that
> ParseIRFile will be be first called on the IR of the program, is that
> right?
>
> I would like to make sure I understand your suggestion which is to insert
> a new
> layer that should be implemented on top of the CompileCallbackLayer in
> order to
> be able to call trigger_condition at the beginning of a function.
> IIUC until the function (bar or foo) is optimized the call to foo and bar
> will
> go through the resolver (foo and bar will not be compiled from scratch
> every
> time we go through the resolver but rather execute the cached non
> optimized
> version after first compiled). The resolver will check trigger_condition
> to see if the cached non optimized version should be executed or a new
> optimizied version should be compiled and executed.
> After the trigger_condition is true foo and bar will be compiled to
> generate
> their optimized version and this version will be executed directly from
> now on
> (not going through the resolver any more). Is that right?
> Does this layer on top of the CompileCallbackLayer should be similar to
> class KaleidoscopeJIT?
> I saw that in Kaleidoscope Orc's example the Lambda functions that are
> added in
> createLambdaResolver are been executed by the resolver before compiling a
> call
> so I assume that the trigger_condition should be added also by
> createLambdaResolver so before compiling foo or bar the Lambda functions
> that are added by calling createLambdaResolver and contain
> trigger_condition
> will be executed, is that right?
>
> IIUC in Kaleidoscope Orc's example the interpreter calls the addModule
> upon
> parsing call expression in HandleTopLevelExpression.
> In my case I assume addModule be called for the module returned from
> ParseIRFile, right?
> In this case should calling getAddress on the whole module (the IR of all
> functions) will trigger calling the Lambda functions defined in
> createLambdaResolver on foo and bar functions? Also - in Kaleidoscope orc
> example the execution of the function is done explicitly in
> HandleTopLevelExpression after calling getAddress and its not clear to me
> where
> I should insert this in my case.
>
> Thanks again,
> Revital
>
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        28/07/2015 05:58 AM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> What do you mean by "code cache"? Orc (and MCJIT) does have the
concept of
> an ObjectCache, which is a long-lived, potentially persistent, compiled
> version of some IR. It's not a key component of the JIT though: Most
> clients run without a cache attached and just JIT their code from scratch
> in each session.
>
> Recompilation is orthogonal to caching. There is no in-tree support for
> recompilation yet. There are several ways that it could be supported,
> depending on what security / performance trade-offs you're willing to
make,
> and how deep in to the LLVM code you want to get. As things stand at the
> moment all function calls in the lazy JIT are indirected via function
> pointers. We want to add support for patchable call-sites, but this
hasn't
> been implemented yet. The Indirect calls make recompilation reasonably
> easy: You could add a transform layer on top of the CompileCallbackLayer
> which would modify each function like this:
>
> void foo$impl() {          void foo$impl() {
>   // foo body        ->      if (trigger_condition) {
> }                              auto fooOpt = jit_recompile_hot(&foo);
>                                fooOpt();
>                              }
>                              // foo body
>                            }
>
> You would implement the jit_recompile_hot function yourself in your JIT
> and make it available to JIT'd code via the SymbolResolver. When the
> trigger condition is met you'll get a call to recompile foo, at which
point
> you: (1) Add the IR for foo to a 2nd IRCompileLayer that has been
> configured with a higher optimization level, (2) look up the address of the
> optimized version of foo, and (3) update the function pointer for foo to
> point at the optimized version. The process for patchable callsites should
> be fairly similar once they're available, except that you'll
trigger a
> call-site update rather than rewriting a function pointer.
>
> This neglects all sorts of fun details (threading, garbage collection of
> old function implementations), but hopefully it gives you a place to
> start.
>
>
> Regarding laziness, as Hal mentioned you'll have to provide some target
> support for PowerPC to support lazy compilation. For a rough guide you can
> check out the X86_64 support code in
> llvm/include/llvm/ExecutionEngine/Orc/OrcTargetSupport.h and
> llvm/lib/ExecutionEngine/Orc/OrcTargetSupport.cpp.
>
> There are two methods that you'll need to implement:
> insertCompileCallbackTrampoline and insertResolverBlock. These work
> together to enable lazy compilation. Both of these methods inject blobs of
> target specific code in to the JIT process. To do this (at least for now) I
> make use of a handy feature of LLVM IR: You can write raw assembly code
> directly into a bitcode module ("module-level asm"). If you look
at the X86
> implementation of each of these methods you'll see they're written
in terms
> of string-streams building up a string of assembly which will be handed off
> to the JIT to compile like any other code.
>
> The first blob that you need to be able to output is the resolver block.
> The purpose of the resolver block is to save program state and call back in
> to the JIT to trigger lazy compilation of a function. When the JIT is done
> compiling the function it returns the address of the compiled function to
> the resolver block, and the resolver block returns to the compiled function
> (rather than its original return address).
>
> Because all functions share the same resolver block, the JIT needs some
> way to distinguish them, which is where the trampolines come in. The JIT
> emits one trampoline per function and each trampoline just calls the
> resolver block. The return address of the call in each trampoline provides
> the unique address that the JIT associates with the to-be-compiled
> functions. The CompileCallbackManager manages this association between
> trampolines and functions for you, you just need to provide the
> resolver/trampoline primitives.
>
> In case it helps, here's what the output of all this looks like on X86.
> Trampolines are trivial - they're emitted in blocks and proceeded by a
> pointer to the resolver block:
>
> module asm "Lorc_resolve_block_addr:"
> module asm "  .quad 140439143575560"
> module asm "orc_jcc_0:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> module asm "orc_jcc_1:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> module asm "orc_jcc_2:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> ...
>
>
> The resolver block is more complicated and I won't provide the full
code
> for it here. You can find it by running:
> lli -jit-kind=orc-lazy -orc-lazy-debug=mods-to-stderr
<hello_world.ll>
>
>
>
>
>
> and looking at the initial output. In pseudo-asm though, it looks like
> this:
>
> module asm "jit_callback_manager_addr:"
> module asm "  .quad 0x46fc190" // <- address of callback
manager object
> module asm "orc_resolver_block:"
> module asm "  // save register state."
> module asm "  // load jit_callback_manager_addr into %rdi
> module asm "  // load the return address (from the trampoline call)
into
> %rsi
> module asm "  // %rax = call jit(%rdi, %rsi)
> module asm "  // save %rax over the return address
> module asm "  //  restore register state
> module asm "  //  retq"
>
> So, that's a whirlwind intro to implementing lazy JITing support for a
new
> architecture in Orc. I'll try to answer any questions you have on the
> topic, though I'm not familiar with PowerPC at all. If you're
comfortable
> with PowerPC assembly I think it should be possible to implement once you
> grok the concepts.
>
> Hope this helps!
>
> Cheers,
> Lang.
>
>
> On Jul 26, 2015, at 11:17 PM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
>
> Hi Again,
>
> I'm a little confused regarding what is the exact Orc's functions I
should
> use
> in order to save the functions code in a code cache so it could be later
> replaced with different versions of it and I appreciate your help.
>
> Just a reminder I want to dynamically recompile the program based on
> profile
>  collected at the run-time. I would like to start executing the program
> from
> the code-cache and at some point be able to replace a function body with
> it's
> new compiled version; this can be done by replacing the entry in the
> function
>  code with a trampoline to It's new version so that future calls to it
will
> call the new version code.
>
> Does the CompileOnDemandLayer executes the program from a code cache
> and holds pointers to the code of the functions it executes? I am
> compiling for Power machine.
> Is there a target specific pieces that I should implement for making Orc
> work on Power?
>
> Thanks again,
> Revital
>
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        20/07/2015 08:41 PM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> The CompileOnDemand layer is used by the lazy bitcode JIT in the lli tool.
> You can find the code in llvm/tools/lli/OrcLazyJIT.* .
>
> Cheers,
> Lang.
>
>
> On Mon, Jul 20, 2015 at 2:32 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hello Lang,
>
> Thanks for your answer.
>
> I am now looking for an example of the usage of CompileOnDemandLayer. Is
> there an example available for that (could not find one in llvm/examples)?
>
> Thanks,
> Revital
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        10/07/2015 12:10 AM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> LLVM does have an IR interpreter, but I don't think it's maintained
well
> (or possibly at all). The interpreter is also not designed to interact with
> the LLVM JITs.
>
> We generally encourage people to just JIT LLVM IR, rather than
> interpreting it. For the use-case you have described, you could JIT IR with
> no optimizations to begin with, then re-JIT hot functions at a higher
> level.
>
> The Orc JIT APIs (LLVM's newer JIT APIs) were written with this kind of
> use-case in mind, and are probably a better fit for this than MCJIT. There
> is no built-in hot-function detection or recompilation yet, but I think
> this would be *fairly* easy to write in terms of Orc's callback API.
>
> Cheers,
> Lang.
>
>
> On Thu, Jul 9, 2015 at 4:19 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hello,
>
> I am new to LLVM and a I appreciate your help with the following:
>
> I want to run the LLVM IR through virtual machine (LLVM interpreter?) and
> jit
> compile the hot functions (using MCJIT).
>
> This task will require amongst other identifying the hot functions and
> having a
> code cache that should be patched with the native code of the functions
> after
> they are jitted.
>
> I've read so far about MCJIT and lli however I have not seen that the
LLVM
> interpreter can be used as a VM the way I was looking for; meaning
> execute the code one instruction at a time; have a profiling mode to
> identify hot functions and call jit to compile the hot functions.
>
> I appreciate any advice/starting points for this project.
>
> Thanks,
> Revital
>
> _______________________________________________
> LLVM Developers mailing list
> *LLVMdev at cs.uiuc.edu* <LLVMdev at cs.uiuc.edu>
> *http://llvm.cs.uiuc.edu* <http://llvm.cs.uiuc.edu/>
> *http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev*
> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>
>
>
>
>
>
>
>
> [attachment "fully_lazy_with_recompile.tgz" deleted by Revital1
> Eres/Haifa/IBM]
>
>
>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151115/1cacc417/attachment-0001.html>

Revital1 Eres via llvm-dev

2015-Nov-15 11:54 UTC

head link

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

Hi Lang,

I was trying to recompile foo.
It is not declared as static function so I thought it should be
visible outside of the program but I'm guessing I'm missing something 
here.

Thanks again,
Revital



From:   Lang Hames <lhames at gmail.com>
To:     Revital1 Eres/Haifa/IBM at IBMIL
Cc:     LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
Date:   15/11/2015 01:33 PM
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

This program does not contain any external references, and so I would not 
expect it to call the resolver at all.

What symbol were you expecting to see a resolver call for?

Cheers,
Lang.

On Wed, Nov 11, 2015 at 11:44 AM, Revital1 Eres <ERES at il.ibm.com>
wrote:
Hi Lang,

Thanks for your reply!

The program I'm compiling is the following toy program which is compiled 
with -fno-inline to
avoid inlining foo into main.  

In the fully_lazy_with_recompile code I've added the following statements. 
When running the 
code with gdb I do not see it breaks in the lamda resolver as described in 
my previous mail.

 auto ExprSymbol = J.findUnmangledSymbolIn(H,"main");
 double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
 std::cerr << "Evaluated to " << FP() <<
"\n";

Btw, another issue I need to resolve - some of the parameters were 
originally read from command line using argv but due to the following 
error
I avoided that for now (I also got similar error regarding 
ZNSt8ios_base4InitC1Ev when using prints):
LLVM ERROR: Program used external function 'atoi' which could not be 
resolved!

Thanks again,
Revital

#define ITERS 1000000
int arr[ITERS];

int
foo (int x, int y)
{
  int res = 950;
  if (x > 3 && y < 77)
    res = 97;
  else
    res = res * x;
  return res;
}

int
main ()
{
  int x = 880;
  int num = 990;
  int i, j;
  int b = 0;

  for (i = 0; i < ITERS; i++)
    arr[i] = i;

  for (j = 0; j < num; j++)
    for (i = 0; i < ITERS; i++)
      {
        b += foo (x, arr[i]) /2;
      }
  return 0;
}



From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL
Cc:        LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
Date:        10/11/2015 06:31 PM

Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

Apologies for the delayed reply - I'm traveling at the moment and not able 
to check my email often.

You will only see a callback on the resolver for symbols that are external 
to the module. What did the IR that you added look like?

Cheers,
Lang.

On Wed, Nov 4, 2015 at 8:37 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hello Lang,

I want to use the lazy recompilation program you posted to compile an 
input program RI (not processing the input by
 interpreter as it is done in the example).
To do that I called the addModule function on the module returned from 
parseInputIR as was done with the other 
functions in the Kaleidoscope examples. 
Now, to start the codegen I am using getAddress and at this point I was 
expecting to see a call to the lamda resolver defined 
in createResolver but I did not see it happen and I appreciate your help 
to understand why.

Here is a snippet from my additions to the new version of the fully_lazy 
Orc Kaleidoscope.

Thanks again,
Revital

  SessionContext S(getGlobalContext());
  KaleidoscopeJIT J(S);

  cl::ParseCommandLineOptions(argc, argv,
                              "Kaleidoscope example program\n");

 std::unique_ptr<Module> M;
  if (!InputIR.empty()) {
      M = parseInputIR(InputIR);;
      auto H = J.addModule(std::move(M));
     char ModID[256];
     sprintf(ModID, "IR:%s", InputIR.c_str());
     auto ExprSymbol = J.findUnmangledSymbolIn(H,ModID);
     double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
     std::cerr << "Evaluated to " << FP() <<
"\n";
     J.removeModule(H);
  }
               



From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL, LLVM Developers Mailing List <
llvm-dev at lists.llvm.org>
Date:        18/09/2015 09:47 AM

Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

Attached is a new version of the fully_lazy Orc Kaleidoscope demo that has 
been extended to enable re-compilation at higher optimisation levels, 
roughly following the scheme I outlined before.

In the compile action for the callback, the initial IR for each is 
transformed like this:


                           unsigned foo_counter = 0;
void foo$impl() {          void foo$impl() { 
  // foo body        ->      if (++foo_counter > 1000) { 
}                              auto fooOpt = $recompile(&foo); 
                               fooOpt(); 
                             } 
                             // foo body 
                           }

The key changes to make this work (which you can see by diff'ing toy.cpp 
against the original fully_lazy version):

1) New layers HotCompileLayer and HotIROptsLayer added. These perform IR 
optimisation and code generation at higher optimisation levels than the 
default layers.
2) The symbol resolver function (not to be confused with the resolver 
block) has been pulled out into its own function, createResolver, so that 
it can be shared between optimised & non-optimized code. It also resolves 
the "$recompile" function to a static method on the KaleidoscopeJIT
class
itself.
3) The lazy compile action now calls 'instrumentFunctions' before adding
the IR for cold functions to the JIT.
4) The instrumentFunctions method injects the counter code and call to 
recompile.
5) The recompileHot method re-IRGens functions, then adds them to the 
HotIROpts layer to generate more optimized versions. It then updates the 
function-body pointer so that subsequent calls go to the optimised 
version.
 
This is a bit quick-and-dirty, but does work. In the future I'll try to 
tidy this up and turn it into a new tutorial chapter.

Hope this helps!

Cheers,
Lang.




On Wed, Sep 16, 2015 at 10:09 PM, Revital1 Eres <ERES at il.ibm.com>
wrote:
Hi Lang, 

Many thanks!!! I just wanted to make sure you did not miss it...

Thanks again! 
Revital 



From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:        17/09/2015 01:56 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital, 

Apologies for the delayed reply. 

I'm working on some example code for how to do this. I'll try to post it
tomorrow. 

Cheers, 
Lang. 

On Tue, Sep 8, 2015 at 12:23 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hi Lang, 

After spending some time debugging Kaleidoscope orc fully_lazy toy example 
on 
x86 I want to start implementing run-time optimizer as you suggested and 
again 
I highly appreciate your help. 
For now I'll defer the target specific implementation to the end after 
I'll have 
the non target parts in place as I can run on x86 as a start. 
Given a simple example of main function calling foo and bar functions; 
IIUC I should start from the IR level of this module which means that 
ParseIRFile will be be first called on the IR of the program, is that 
right? 

I would like to make sure I understand your suggestion which is to insert 
a new 
layer that should be implemented on top of the CompileCallbackLayer in 
order to 
be able to call trigger_condition at the beginning of a function. 
IIUC until the function (bar or foo) is optimized the call to foo and bar 
will 
go through the resolver (foo and bar will not be compiled from scratch 
every 
time we go through the resolver but rather execute the cached non 
optimized 
version after first compiled). The resolver will check trigger_condition 
to see if the cached non optimized version should be executed or a new 
optimizied version should be compiled and executed. 
After the trigger_condition is true foo and bar will be compiled to 
generate 
their optimized version and this version will be executed directly from 
now on 
(not going through the resolver any more). Is that right? 
Does this layer on top of the CompileCallbackLayer should be similar to 
class KaleidoscopeJIT? 
I saw that in Kaleidoscope Orc's example the Lambda functions that are 
added in 
createLambdaResolver are been executed by the resolver before compiling a 
call 
so I assume that the trigger_condition should be added also by 
createLambdaResolver so before compiling foo or bar the Lambda functions 
that are added by calling createLambdaResolver and contain 
trigger_condition 
will be executed, is that right? 

IIUC in Kaleidoscope Orc's example the interpreter calls the addModule 
upon 
parsing call expression in HandleTopLevelExpression. 
In my case I assume addModule be called for the module returned from 
ParseIRFile, right? 
In this case should calling getAddress on the whole module (the IR of all 
functions) will trigger calling the Lambda functions defined in 
createLambdaResolver on foo and bar functions? Also - in Kaleidoscope orc 
example the execution of the function is done explicitly in 
HandleTopLevelExpression after calling getAddress and its not clear to me 
where 
I should insert this in my case. 

Thanks again, 
Revital 




From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:        28/07/2015 05:58 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital, 

What do you mean by "code cache"? Orc (and MCJIT) does have the
concept of
an ObjectCache, which is a long-lived, potentially persistent, compiled 
version of some IR. It's not a key component of the JIT though: Most 
clients run without a cache attached and just JIT their code from scratch 
in each session. 

Recompilation is orthogonal to caching. There is no in-tree support for 
recompilation yet. There are several ways that it could be supported, 
depending on what security / performance trade-offs you're willing to 
make, and how deep in to the LLVM code you want to get. As things stand at 
the moment all function calls in the lazy JIT are indirected via function 
pointers. We want to add support for patchable call-sites, but this hasn't 
been implemented yet. The Indirect calls make recompilation reasonably 
easy: You could add a transform layer on top of the CompileCallbackLayer 
which would modify each function like this: 

void foo$impl() {          void foo$impl() {
  // foo body        ->      if (trigger_condition) { 
}                              auto fooOpt = jit_recompile_hot(&foo);
                               fooOpt(); 
                             } 
                             // foo body 
                           } 

You would implement the jit_recompile_hot function yourself in your JIT 
and make it available to JIT'd code via the SymbolResolver. When the 
trigger condition is met you'll get a call to recompile foo, at which 
point you: (1) Add the IR for foo to a 2nd IRCompileLayer that has been 
configured with a higher optimization level, (2) look up the address of 
the optimized version of foo, and (3) update the function pointer for foo 
to point at the optimized version. The process for patchable callsites 
should be fairly similar once they're available, except that you'll 
trigger a call-site update rather than rewriting a function pointer. 

This neglects all sorts of fun details (threading, garbage collection of 
old function implementations), but hopefully it gives you a place to 
start.  


Regarding laziness, as Hal mentioned you'll have to provide some target 
support for PowerPC to support lazy compilation. For a rough guide you can 
check out the X86_64 support code in 
llvm/include/llvm/ExecutionEngine/Orc/OrcTargetSupport.h and 
llvm/lib/ExecutionEngine/Orc/OrcTargetSupport.cpp. 

There are two methods that you'll need to implement: 
insertCompileCallbackTrampoline and insertResolverBlock. These work 
together to enable lazy compilation. Both of these methods inject blobs of 
target specific code in to the JIT process. To do this (at least for now) 
I make use of a handy feature of LLVM IR: You can write raw assembly code 
directly into a bitcode module ("module-level asm"). If you look at
the
X86 implementation of each of these methods you'll see they're written
in
terms of string-streams building up a string of assembly which will be 
handed off to the JIT to compile like any other code. 

The first blob that you need to be able to output is the resolver block. 
The purpose of the resolver block is to save program state and call back 
in to the JIT to trigger lazy compilation of a function. When the JIT is 
done compiling the function it returns the address of the compiled 
function to the resolver block, and the resolver block returns to the 
compiled function (rather than its original return address). 

Because all functions share the same resolver block, the JIT needs some 
way to distinguish them, which is where the trampolines come in. The JIT 
emits one trampoline per function and each trampoline just calls the 
resolver block. The return address of the call in each trampoline provides 
the unique address that the JIT associates with the to-be-compiled 
functions. The CompileCallbackManager manages this association between 
trampolines and functions for you, you just need to provide the 
resolver/trampoline primitives. 

In case it helps, here's what the output of all this looks like on X86. 
Trampolines are trivial - they're emitted in blocks and proceeded by a 
pointer to the resolver block: 

module asm "Lorc_resolve_block_addr:"
module asm "  .quad 140439143575560"
module asm "orc_jcc_0:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)"
module asm "orc_jcc_1:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)"
module asm "orc_jcc_2:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)"
... 


The resolver block is more complicated and I won't provide the full code 
for it here. You can find it by running: 
lli -jit-kind=orc-lazy -orc-lazy-debug=mods-to-stderr <hello_world.ll>





and looking at the initial output. In pseudo-asm though, it looks like 
this: 

module asm "jit_callback_manager_addr:"
module asm "  .quad 0x46fc190" // <- address of callback manager
object
module asm "orc_resolver_block:" 
module asm "  // save register state."
module asm "  // load jit_callback_manager_addr into %rdi
module asm "  // load the return address (from the trampoline call) into 
%rsi 
module asm "  // %rax = call jit(%rdi, %rsi)
module asm "  // save %rax over the return address
module asm "  //  restore register state
module asm "  //  retq" 

So, that's a whirlwind intro to implementing lazy JITing support for a new 
architecture in Orc. I'll try to answer any questions you have on the 
topic, though I'm not familiar with PowerPC at all. If you're
comfortable
with PowerPC assembly I think it should be possible to implement once you 
grok the concepts. 

Hope this helps! 

Cheers, 
Lang. 


On Jul 26, 2015, at 11:17 PM, Revital1 Eres <ERES at il.ibm.com> wrote:

Hi Again, 

I'm a little confused regarding what is the exact Orc's functions I
should
use 
in order to save the functions code in a code cache so it could be later
replaced with different versions of it and I appreciate your help.

Just a reminder I want to dynamically recompile the program based on 
profile
 collected at the run-time. I would like to start executing the program 
from 
the code-cache and at some point be able to replace a function body with 
it's 
new compiled version; this can be done by replacing the entry in the 
function
 code with a trampoline to It's new version so that future calls to it 
will 
call the new version code. 

Does the CompileOnDemandLayer executes the program from a code cache 
and holds pointers to the code of the functions it executes? I am 
compiling for Power machine. 
Is there a target specific pieces that I should implement for making Orc 
work on Power? 

Thanks again, 
Revital 




From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:        20/07/2015 08:41 PM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital, 

The CompileOnDemand layer is used by the lazy bitcode JIT in the lli tool. 
You can find the code in llvm/tools/lli/OrcLazyJIT.* . 

Cheers, 
Lang. 


On Mon, Jul 20, 2015 at 2:32 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hello Lang,

Thanks for your answer. 

I am now looking for an example of the usage of CompileOnDemandLayer. Is 
there an example available for that (could not find one in llvm/examples)?

Thanks, 
Revital 



From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:        10/07/2015 12:10 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital, 

LLVM does have an IR interpreter, but I don't think it's maintained well
(or possibly at all). The interpreter is also not designed to interact 
with the LLVM JITs. 

We generally encourage people to just JIT LLVM IR, rather than 
interpreting it. For the use-case you have described, you could JIT IR 
with no optimizations to begin with, then re-JIT hot functions at a higher 
level. 

The Orc JIT APIs (LLVM's newer JIT APIs) were written with this kind of 
use-case in mind, and are probably a better fit for this than MCJIT. There 
is no built-in hot-function detection or recompilation yet, but I think 
this would be *fairly* easy to write in terms of Orc's callback API. 

Cheers, 
Lang. 


On Thu, Jul 9, 2015 at 4:19 AM, Revital1 Eres <ERES at il.ibm.com> wrote: 
Hello, 

I am new to LLVM and a I appreciate your help with the following:

I want to run the LLVM IR through virtual machine (LLVM interpreter?) and 
jit 
compile the hot functions (using MCJIT). 

This task will require amongst other identifying the hot functions and 
having a 
code cache that should be patched with the native code of the functions 
after 
they are jitted. 

I've read so far about MCJIT and lli however I have not seen that the LLVM 

interpreter can be used as a VM the way I was looking for; meaning
execute the code one instruction at a time; have a profiling mode to 
identify hot functions and call jit to compile the hot functions.

I appreciate any advice/starting points for this project.

Thanks, 
Revital 

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev








[attachment "fully_lazy_with_recompile.tgz" deleted by Revital1 
Eres/Haifa/IBM] 









-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151115/3b6d4928/attachment.html>

Lang Hames via llvm-dev

2015-Nov-15 21:13 UTC

head link

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

Hi Revital,

In this context, an external function is one that is not defined inside the
module itself. If, for example, your code contained a call to printf (and
you hadn't defined printf yourself), that would be an external symbol.

Cheers,
Lang.

On Sun, Nov 15, 2015 at 12:54 PM, Revital1 Eres <ERES at il.ibm.com>
wrote:
> Hi Lang,
>
> I was trying to recompile foo.
> It is not declared as static function so I thought it should be
> visible outside of the program but I'm guessing I'm missing
something here.
>
> Thanks again,
> Revital
>
>
>
> From:        Lang Hames <lhames at gmail.com>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
> Date:        15/11/2015 01:33 PM
>
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> This program does not contain any external references, and so I would not
> expect it to call the resolver at all.
>
> What symbol were you expecting to see a resolver call for?
>
> Cheers,
> Lang.
>
> On Wed, Nov 11, 2015 at 11:44 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hi Lang,
>
> Thanks for your reply!
>
> The program I'm compiling is the following toy program which is
compiled
> with -fno-inline to
> avoid inlining foo into main.
>
> In the fully_lazy_with_recompile code I've added the following
statements.
> When running the
> code with gdb I do not see it breaks in the lamda resolver as described in
> my previous mail.
>
>  auto ExprSymbol = J.findUnmangledSymbolIn(H,"main");
>  double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
>  std::cerr << "Evaluated to " << FP() <<
"\n";
>
> Btw, another issue I need to resolve - some of the parameters were
> originally read from command line using argv but due to the following error
> I avoided that for now (I also got similar error regarding
> ZNSt8ios_base4InitC1Ev when using prints):
> LLVM ERROR: Program used external function 'atoi' which could not
be
> resolved!
>
> Thanks again,
> Revital
>
> #define ITERS 1000000
> int arr[ITERS];
>
> int
> foo (int x, int y)
> {
>   int res = 950;
>   if (x > 3 && y < 77)
>     res = 97;
>   else
>     res = res * x;
>   return res;
> }
>
> int
> main ()
> {
>   int x = 880;
>   int num = 990;
>   int i, j;
>   int b = 0;
>
>   for (i = 0; i < ITERS; i++)
>     arr[i] = i;
>
>   for (j = 0; j < num; j++)
>     for (i = 0; i < ITERS; i++)
>       {
>         b += foo (x, arr[i]) /2;
>       }
>   return 0;
> }
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvm-dev at lists.llvm.org*
> <llvm-dev at lists.llvm.org>>
> Date:        10/11/2015 06:31 PM
>
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> Apologies for the delayed reply - I'm traveling at the moment and not
able
> to check my email often.
>
> You will only see a callback on the resolver for symbols that are external
> to the module. What did the IR that you added look like?
>
> Cheers,
> Lang.
>
> On Wed, Nov 4, 2015 at 8:37 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hello Lang,
>
> I want to use the lazy recompilation program you posted to compile an
> input program RI (not processing the input by
>  interpreter as it is done in the example).
> To do that I called the addModule function on the module returned from
> parseInputIR as was done with the other
> functions in the Kaleidoscope examples.
> Now, to start the codegen I am using getAddress and at this point I was
> expecting to see a call to the lamda resolver defined
> in createResolver but I did not see it happen and I appreciate your help
> to understand why.
>
> Here is a snippet from my additions to the new version of the fully_lazy
> Orc Kaleidoscope.
>
> Thanks again,
> Revital
>
>   SessionContext S(getGlobalContext());
>   KaleidoscopeJIT J(S);
>
>   cl::ParseCommandLineOptions(argc, argv,
>                               "Kaleidoscope example program\n");
>
>  std::unique_ptr<Module> M;
>   if (!InputIR.empty()) {
>       M = parseInputIR(InputIR);;
>       auto H = J.addModule(std::move(M));
>      char ModID[256];
>      sprintf(ModID, "IR:%s", InputIR.c_str());
>      auto ExprSymbol = J.findUnmangledSymbolIn(H,ModID);
>      double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
>      std::cerr << "Evaluated to " << FP() <<
"\n";
>      J.removeModule(H);
>   }
>
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL, LLVM Developers Mailing List
<
> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>>
> Date:        18/09/2015 09:47 AM
>
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> Attached is a new version of the fully_lazy Orc Kaleidoscope demo that has
> been extended to enable re-compilation at higher optimisation levels,
> roughly following the scheme I outlined before.
>
> In the compile action for the callback, the initial IR for each is
> transformed like this:
>
>
>                            unsigned foo_counter = 0;
> void foo$impl() {          void foo$impl() {
>   // foo body        ->      if (++foo_counter > 1000) {
> }                              auto fooOpt = $recompile(&foo);
>                                fooOpt();
>                              }
>                              // foo body
>                            }
>
> The key changes to make this work (which you can see by diff'ing
toy.cpp
> against the original fully_lazy version):
>
> 1) New layers HotCompileLayer and HotIROptsLayer added. These perform IR
> optimisation and code generation at higher optimisation levels than the
> default layers.
> 2) The symbol resolver function (not to be confused with the resolver
> block) has been pulled out into its own function, createResolver, so that
> it can be shared between optimised & non-optimized code. It also
resolves
> the "$recompile" function to a static method on the
KaleidoscopeJIT class
> itself.
> 3) The lazy compile action now calls 'instrumentFunctions' before
adding
> the IR for cold functions to the JIT.
> 4) The instrumentFunctions method injects the counter code and call to
> recompile.
> 5) The recompileHot method re-IRGens functions, then adds them to the
> HotIROpts layer to generate more optimized versions. It then updates the
> function-body pointer so that subsequent calls go to the optimised version.
>
> This is a bit quick-and-dirty, but does work. In the future I'll try to
> tidy this up and turn it into a new tutorial chapter.
>
> Hope this helps!
>
> Cheers,
> Lang.
>
>
>
>
> On Wed, Sep 16, 2015 at 10:09 PM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hi Lang,
>
> Many thanks!!! I just wanted to make sure you did not miss it...
>
> Thanks again!
> Revital
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        17/09/2015 01:56 AM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> Apologies for the delayed reply.
>
> I'm working on some example code for how to do this. I'll try to
post it
> tomorrow.
>
> Cheers,
> Lang.
>
> On Tue, Sep 8, 2015 at 12:23 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hi Lang,
>
> After spending some time debugging Kaleidoscope orc fully_lazy toy example
> on
> x86 I want to start implementing run-time optimizer as you suggested and
> again
> I highly appreciate your help.
> For now I'll defer the target specific implementation to the end after
> I'll have
> the non target parts in place as I can run on x86 as a start.
> Given a simple example of main function calling foo and bar functions;
> IIUC I should start from the IR level of this module which means that
> ParseIRFile will be be first called on the IR of the program, is that
> right?
>
> I would like to make sure I understand your suggestion which is to insert
> a new
> layer that should be implemented on top of the CompileCallbackLayer in
> order to
> be able to call trigger_condition at the beginning of a function.
> IIUC until the function (bar or foo) is optimized the call to foo and bar
> will
> go through the resolver (foo and bar will not be compiled from scratch
> every
> time we go through the resolver but rather execute the cached non
> optimized
> version after first compiled). The resolver will check trigger_condition
> to see if the cached non optimized version should be executed or a new
> optimizied version should be compiled and executed.
> After the trigger_condition is true foo and bar will be compiled to
> generate
> their optimized version and this version will be executed directly from
> now on
> (not going through the resolver any more). Is that right?
> Does this layer on top of the CompileCallbackLayer should be similar to
> class KaleidoscopeJIT?
> I saw that in Kaleidoscope Orc's example the Lambda functions that are
> added in
> createLambdaResolver are been executed by the resolver before compiling a
> call
> so I assume that the trigger_condition should be added also by
> createLambdaResolver so before compiling foo or bar the Lambda functions
> that are added by calling createLambdaResolver and contain
> trigger_condition
> will be executed, is that right?
>
> IIUC in Kaleidoscope Orc's example the interpreter calls the addModule
> upon
> parsing call expression in HandleTopLevelExpression.
> In my case I assume addModule be called for the module returned from
> ParseIRFile, right?
> In this case should calling getAddress on the whole module (the IR of all
> functions) will trigger calling the Lambda functions defined in
> createLambdaResolver on foo and bar functions? Also - in Kaleidoscope orc
> example the execution of the function is done explicitly in
> HandleTopLevelExpression after calling getAddress and its not clear to me
> where
> I should insert this in my case.
>
> Thanks again,
> Revital
>
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        28/07/2015 05:58 AM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> What do you mean by "code cache"? Orc (and MCJIT) does have the
concept of
> an ObjectCache, which is a long-lived, potentially persistent, compiled
> version of some IR. It's not a key component of the JIT though: Most
> clients run without a cache attached and just JIT their code from scratch
> in each session.
>
> Recompilation is orthogonal to caching. There is no in-tree support for
> recompilation yet. There are several ways that it could be supported,
> depending on what security / performance trade-offs you're willing to
make,
> and how deep in to the LLVM code you want to get. As things stand at the
> moment all function calls in the lazy JIT are indirected via function
> pointers. We want to add support for patchable call-sites, but this
hasn't
> been implemented yet. The Indirect calls make recompilation reasonably
> easy: You could add a transform layer on top of the CompileCallbackLayer
> which would modify each function like this:
>
> void foo$impl() {          void foo$impl() {
>   // foo body        ->      if (trigger_condition) {
> }                              auto fooOpt = jit_recompile_hot(&foo);
>                                fooOpt();
>                              }
>                              // foo body
>                            }
>
> You would implement the jit_recompile_hot function yourself in your JIT
> and make it available to JIT'd code via the SymbolResolver. When the
> trigger condition is met you'll get a call to recompile foo, at which
point
> you: (1) Add the IR for foo to a 2nd IRCompileLayer that has been
> configured with a higher optimization level, (2) look up the address of the
> optimized version of foo, and (3) update the function pointer for foo to
> point at the optimized version. The process for patchable callsites should
> be fairly similar once they're available, except that you'll
trigger a
> call-site update rather than rewriting a function pointer.
>
> This neglects all sorts of fun details (threading, garbage collection of
> old function implementations), but hopefully it gives you a place to
> start.
>
>
> Regarding laziness, as Hal mentioned you'll have to provide some target
> support for PowerPC to support lazy compilation. For a rough guide you can
> check out the X86_64 support code in
> llvm/include/llvm/ExecutionEngine/Orc/OrcTargetSupport.h and
> llvm/lib/ExecutionEngine/Orc/OrcTargetSupport.cpp.
>
> There are two methods that you'll need to implement:
> insertCompileCallbackTrampoline and insertResolverBlock. These work
> together to enable lazy compilation. Both of these methods inject blobs of
> target specific code in to the JIT process. To do this (at least for now) I
> make use of a handy feature of LLVM IR: You can write raw assembly code
> directly into a bitcode module ("module-level asm"). If you look
at the X86
> implementation of each of these methods you'll see they're written
in terms
> of string-streams building up a string of assembly which will be handed off
> to the JIT to compile like any other code.
>
> The first blob that you need to be able to output is the resolver block.
> The purpose of the resolver block is to save program state and call back in
> to the JIT to trigger lazy compilation of a function. When the JIT is done
> compiling the function it returns the address of the compiled function to
> the resolver block, and the resolver block returns to the compiled function
> (rather than its original return address).
>
> Because all functions share the same resolver block, the JIT needs some
> way to distinguish them, which is where the trampolines come in. The JIT
> emits one trampoline per function and each trampoline just calls the
> resolver block. The return address of the call in each trampoline provides
> the unique address that the JIT associates with the to-be-compiled
> functions. The CompileCallbackManager manages this association between
> trampolines and functions for you, you just need to provide the
> resolver/trampoline primitives.
>
> In case it helps, here's what the output of all this looks like on X86.
> Trampolines are trivial - they're emitted in blocks and proceeded by a
> pointer to the resolver block:
>
> module asm "Lorc_resolve_block_addr:"
> module asm "  .quad 140439143575560"
> module asm "orc_jcc_0:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> module asm "orc_jcc_1:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> module asm "orc_jcc_2:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> ...
>
>
> The resolver block is more complicated and I won't provide the full
code
> for it here. You can find it by running:
> lli -jit-kind=orc-lazy -orc-lazy-debug=mods-to-stderr
<hello_world.ll>
>
>
>
>
>
>
> and looking at the initial output. In pseudo-asm though, it looks like
> this:
>
> module asm "jit_callback_manager_addr:"
> module asm "  .quad 0x46fc190" // <- address of callback
manager object
> module asm "orc_resolver_block:"
> module asm "  // save register state."
> module asm "  // load jit_callback_manager_addr into %rdi
> module asm "  // load the return address (from the trampoline call)
into
> %rsi
> module asm "  // %rax = call jit(%rdi, %rsi)
> module asm "  // save %rax over the return address
> module asm "  //  restore register state
> module asm "  //  retq"
>
> So, that's a whirlwind intro to implementing lazy JITing support for a
new
> architecture in Orc. I'll try to answer any questions you have on the
> topic, though I'm not familiar with PowerPC at all. If you're
comfortable
> with PowerPC assembly I think it should be possible to implement once you
> grok the concepts.
>
> Hope this helps!
>
> Cheers,
> Lang.
>
>
> On Jul 26, 2015, at 11:17 PM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
>
> Hi Again,
>
> I'm a little confused regarding what is the exact Orc's functions I
should
> use
> in order to save the functions code in a code cache so it could be later
> replaced with different versions of it and I appreciate your help.
>
> Just a reminder I want to dynamically recompile the program based on
> profile
>  collected at the run-time. I would like to start executing the program
> from
> the code-cache and at some point be able to replace a function body with
> it's
> new compiled version; this can be done by replacing the entry in the
> function
>  code with a trampoline to It's new version so that future calls to it
will
> call the new version code.
>
> Does the CompileOnDemandLayer executes the program from a code cache
> and holds pointers to the code of the functions it executes? I am
> compiling for Power machine.
> Is there a target specific pieces that I should implement for making Orc
> work on Power?
>
> Thanks again,
> Revital
>
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        20/07/2015 08:41 PM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> The CompileOnDemand layer is used by the lazy bitcode JIT in the lli tool.
> You can find the code in llvm/tools/lli/OrcLazyJIT.* .
>
> Cheers,
> Lang.
>
>
> On Mon, Jul 20, 2015 at 2:32 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hello Lang,
>
> Thanks for your answer.
>
> I am now looking for an example of the usage of CompileOnDemandLayer. Is
> there an example available for that (could not find one in llvm/examples)?
>
> Thanks,
> Revital
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at
gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        10/07/2015 12:10 AM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> LLVM does have an IR interpreter, but I don't think it's maintained
well
> (or possibly at all). The interpreter is also not designed to interact with
> the LLVM JITs.
>
> We generally encourage people to just JIT LLVM IR, rather than
> interpreting it. For the use-case you have described, you could JIT IR with
> no optimizations to begin with, then re-JIT hot functions at a higher
> level.
>
> The Orc JIT APIs (LLVM's newer JIT APIs) were written with this kind of
> use-case in mind, and are probably a better fit for this than MCJIT. There
> is no built-in hot-function detection or recompilation yet, but I think
> this would be *fairly* easy to write in terms of Orc's callback API.
>
> Cheers,
> Lang.
>
>
> On Thu, Jul 9, 2015 at 4:19 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hello,
>
> I am new to LLVM and a I appreciate your help with the following:
>
> I want to run the LLVM IR through virtual machine (LLVM interpreter?) and
> jit
> compile the hot functions (using MCJIT).
>
> This task will require amongst other identifying the hot functions and
> having a
> code cache that should be patched with the native code of the functions
> after
> they are jitted.
>
> I've read so far about MCJIT and lli however I have not seen that the
LLVM
> interpreter can be used as a VM the way I was looking for; meaning
> execute the code one instruction at a time; have a profiling mode to
> identify hot functions and call jit to compile the hot functions.
>
> I appreciate any advice/starting points for this project.
>
> Thanks,
> Revital
>
> _______________________________________________
> LLVM Developers mailing list
> *LLVMdev at cs.uiuc.edu* <LLVMdev at cs.uiuc.edu>
> *http://llvm.cs.uiuc.edu* <http://llvm.cs.uiuc.edu/>
> *http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev*
> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>
>
>
>
>
>
>
>
> [attachment "fully_lazy_with_recompile.tgz" deleted by Revital1
> Eres/Haifa/IBM]
>
>
>
>
>
>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151115/7a60f29d/attachment.html>

Revital1 Eres via llvm-dev

2015-Nov-16 07:22 UTC

head link

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

Hi Lang,

OK, thanks for the explanation.

To recompile foo I've created two bc files: main.bc and foo.bc
and called addModule on each. So now after the following getAddress call 
for main createResolver function is been called for foo.

auto ExprSymbol = J.findUnmangledSymbol("main");
int (*FP)(int) = (int (*)(int))(intptr_t)ExprSymbol.getAddress();
std::cerr << "Evaluated to " << FP(8) <<
"\n";

However instead of calling searchFunctionASTs in createResolver 
to insert a stub it executes the following.

 if (auto Symbol = findSymbol(Name))
   return RuntimeDyld::SymbolInfo(Symbol.getAddress(),
              Symbol.getFlags());

So my next mission is to  insert the stub as is done in 
searchFunctionASTs.
As I'm reading the functions from input IR I do not call HandleDefinition 
like it is done in the
examples and thus addFunctionAST is not been called on the function 
definition. I wonder how can I
Get the function definition AST from the module returned from addModule.

When I'll have foo's AST I plan to call irGenStub like you have done in 
order to recompile.

Thanks again,
Revital



From:   Lang Hames <lhames at gmail.com>
To:     Revital1 Eres/Haifa/IBM at IBMIL
Cc:     LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
Date:   15/11/2015 11:13 PM
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

In this context, an external function is one that is not defined inside 
the module itself. If, for example, your code contained a call to printf 
(and you hadn't defined printf yourself), that would be an external 
symbol.

Cheers,
Lang.

On Sun, Nov 15, 2015 at 12:54 PM, Revital1 Eres <ERES at il.ibm.com>
wrote:
Hi Lang,

I was trying to recompile foo.
It is not declared as static function so I thought it should be
visible outside of the program but I'm guessing I'm missing something 
here.

Thanks again,
Revital



From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL
Cc:        LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
Date:        15/11/2015 01:33 PM

Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

This program does not contain any external references, and so I would not 
expect it to call the resolver at all.

What symbol were you expecting to see a resolver call for?

Cheers,
Lang.

On Wed, Nov 11, 2015 at 11:44 AM, Revital1 Eres <ERES at il.ibm.com>
wrote:
Hi Lang,

Thanks for your reply!

The program I'm compiling is the following toy program which is compiled 
with -fno-inline to
avoid inlining foo into main.  

In the fully_lazy_with_recompile code I've added the following statements. 
When running the 
code with gdb I do not see it breaks in the lamda resolver as described in 
my previous mail.

 auto ExprSymbol = J.findUnmangledSymbolIn(H,"main");
 double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
 std::cerr << "Evaluated to " << FP() <<
"\n";

Btw, another issue I need to resolve - some of the parameters were 
originally read from command line using argv but due to the following 
error
I avoided that for now (I also got similar error regarding 
ZNSt8ios_base4InitC1Ev when using prints):
LLVM ERROR: Program used external function 'atoi' which could not be 
resolved!

Thanks again,
Revital

#define ITERS 1000000
int arr[ITERS];

int
foo (int x, int y)
{
  int res = 950;
  if (x > 3 && y < 77)
    res = 97;
  else
    res = res * x;
  return res;
}

int
main ()
{
  int x = 880;
  int num = 990;
  int i, j;
  int b = 0;

  for (i = 0; i < ITERS; i++)
    arr[i] = i;

  for (j = 0; j < num; j++)
    for (i = 0; i < ITERS; i++)
      {
        b += foo (x, arr[i]) /2;
      }
  return 0;
}



From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL
Cc:        LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
Date:        10/11/2015 06:31 PM

Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

Apologies for the delayed reply - I'm traveling at the moment and not able 
to check my email often.

You will only see a callback on the resolver for symbols that are external 
to the module. What did the IR that you added look like?

Cheers,
Lang.

On Wed, Nov 4, 2015 at 8:37 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hello Lang,

I want to use the lazy recompilation program you posted to compile an 
input program RI (not processing the input by
 interpreter as it is done in the example).
To do that I called the addModule function on the module returned from 
parseInputIR as was done with the other 
functions in the Kaleidoscope examples. 
Now, to start the codegen I am using getAddress and at this point I was 
expecting to see a call to the lamda resolver defined 
in createResolver but I did not see it happen and I appreciate your help 
to understand why.

Here is a snippet from my additions to the new version of the fully_lazy 
Orc Kaleidoscope.

Thanks again,
Revital

  SessionContext S(getGlobalContext());
  KaleidoscopeJIT J(S);

  cl::ParseCommandLineOptions(argc, argv,
                              "Kaleidoscope example program\n");

 std::unique_ptr<Module> M;
  if (!InputIR.empty()) {
      M = parseInputIR(InputIR);;
      auto H = J.addModule(std::move(M));
     char ModID[256];
     sprintf(ModID, "IR:%s", InputIR.c_str());
     auto ExprSymbol = J.findUnmangledSymbolIn(H,ModID);
     double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
     std::cerr << "Evaluated to " << FP() <<
"\n";
     J.removeModule(H);
  }
               



From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL, LLVM Developers Mailing List <
llvm-dev at lists.llvm.org>
Date:        18/09/2015 09:47 AM

Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

Attached is a new version of the fully_lazy Orc Kaleidoscope demo that has 
been extended to enable re-compilation at higher optimisation levels, 
roughly following the scheme I outlined before.

In the compile action for the callback, the initial IR for each is 
transformed like this:


                           unsigned foo_counter = 0;
void foo$impl() {          void foo$impl() { 
  // foo body        ->      if (++foo_counter > 1000) { 
}                              auto fooOpt = $recompile(&foo); 
                               fooOpt(); 
                             } 
                             // foo body 
                           }

The key changes to make this work (which you can see by diff'ing toy.cpp 
against the original fully_lazy version):

1) New layers HotCompileLayer and HotIROptsLayer added. These perform IR 
optimisation and code generation at higher optimisation levels than the 
default layers.
2) The symbol resolver function (not to be confused with the resolver 
block) has been pulled out into its own function, createResolver, so that 
it can be shared between optimised & non-optimized code. It also resolves 
the "$recompile" function to a static method on the KaleidoscopeJIT
class
itself.
3) The lazy compile action now calls 'instrumentFunctions' before adding
the IR for cold functions to the JIT.
4) The instrumentFunctions method injects the counter code and call to 
recompile.
5) The recompileHot method re-IRGens functions, then adds them to the 
HotIROpts layer to generate more optimized versions. It then updates the 
function-body pointer so that subsequent calls go to the optimised 
version.
 
This is a bit quick-and-dirty, but does work. In the future I'll try to 
tidy this up and turn it into a new tutorial chapter.

Hope this helps!

Cheers,
Lang.




On Wed, Sep 16, 2015 at 10:09 PM, Revital1 Eres <ERES at il.ibm.com>
wrote:
Hi Lang, 

Many thanks!!! I just wanted to make sure you did not miss it...

Thanks again! 
Revital 



From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:        17/09/2015 01:56 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital, 

Apologies for the delayed reply. 

I'm working on some example code for how to do this. I'll try to post it
tomorrow. 

Cheers, 
Lang. 

On Tue, Sep 8, 2015 at 12:23 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hi Lang, 

After spending some time debugging Kaleidoscope orc fully_lazy toy example 
on 
x86 I want to start implementing run-time optimizer as you suggested and 
again 
I highly appreciate your help. 
For now I'll defer the target specific implementation to the end after 
I'll have 
the non target parts in place as I can run on x86 as a start. 
Given a simple example of main function calling foo and bar functions; 
IIUC I should start from the IR level of this module which means that 
ParseIRFile will be be first called on the IR of the program, is that 
right? 

I would like to make sure I understand your suggestion which is to insert 
a new 
layer that should be implemented on top of the CompileCallbackLayer in 
order to 
be able to call trigger_condition at the beginning of a function. 
IIUC until the function (bar or foo) is optimized the call to foo and bar 
will 
go through the resolver (foo and bar will not be compiled from scratch 
every 
time we go through the resolver but rather execute the cached non 
optimized 
version after first compiled). The resolver will check trigger_condition 
to see if the cached non optimized version should be executed or a new 
optimizied version should be compiled and executed. 
After the trigger_condition is true foo and bar will be compiled to 
generate 
their optimized version and this version will be executed directly from 
now on 
(not going through the resolver any more). Is that right? 
Does this layer on top of the CompileCallbackLayer should be similar to 
class KaleidoscopeJIT? 
I saw that in Kaleidoscope Orc's example the Lambda functions that are 
added in 
createLambdaResolver are been executed by the resolver before compiling a 
call 
so I assume that the trigger_condition should be added also by 
createLambdaResolver so before compiling foo or bar the Lambda functions 
that are added by calling createLambdaResolver and contain 
trigger_condition 
will be executed, is that right? 

IIUC in Kaleidoscope Orc's example the interpreter calls the addModule 
upon 
parsing call expression in HandleTopLevelExpression. 
In my case I assume addModule be called for the module returned from 
ParseIRFile, right? 
In this case should calling getAddress on the whole module (the IR of all 
functions) will trigger calling the Lambda functions defined in 
createLambdaResolver on foo and bar functions? Also - in Kaleidoscope orc 
example the execution of the function is done explicitly in 
HandleTopLevelExpression after calling getAddress and its not clear to me 
where 
I should insert this in my case. 

Thanks again, 
Revital 




From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:        28/07/2015 05:58 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital, 

What do you mean by "code cache"? Orc (and MCJIT) does have the
concept of
an ObjectCache, which is a long-lived, potentially persistent, compiled 
version of some IR. It's not a key component of the JIT though: Most 
clients run without a cache attached and just JIT their code from scratch 
in each session. 

Recompilation is orthogonal to caching. There is no in-tree support for 
recompilation yet. There are several ways that it could be supported, 
depending on what security / performance trade-offs you're willing to 
make, and how deep in to the LLVM code you want to get. As things stand at 
the moment all function calls in the lazy JIT are indirected via function 
pointers. We want to add support for patchable call-sites, but this hasn't 
been implemented yet. The Indirect calls make recompilation reasonably 
easy: You could add a transform layer on top of the CompileCallbackLayer 
which would modify each function like this: 

void foo$impl() {          void foo$impl() {
  // foo body        ->      if (trigger_condition) { 
}                              auto fooOpt = jit_recompile_hot(&foo);
                               fooOpt(); 
                             } 
                             // foo body 
                           } 

You would implement the jit_recompile_hot function yourself in your JIT 
and make it available to JIT'd code via the SymbolResolver. When the 
trigger condition is met you'll get a call to recompile foo, at which 
point you: (1) Add the IR for foo to a 2nd IRCompileLayer that has been 
configured with a higher optimization level, (2) look up the address of 
the optimized version of foo, and (3) update the function pointer for foo 
to point at the optimized version. The process for patchable callsites 
should be fairly similar once they're available, except that you'll 
trigger a call-site update rather than rewriting a function pointer. 

This neglects all sorts of fun details (threading, garbage collection of 
old function implementations), but hopefully it gives you a place to 
start.  


Regarding laziness, as Hal mentioned you'll have to provide some target 
support for PowerPC to support lazy compilation. For a rough guide you can 
check out the X86_64 support code in 
llvm/include/llvm/ExecutionEngine/Orc/OrcTargetSupport.h and 
llvm/lib/ExecutionEngine/Orc/OrcTargetSupport.cpp. 

There are two methods that you'll need to implement: 
insertCompileCallbackTrampoline and insertResolverBlock. These work 
together to enable lazy compilation. Both of these methods inject blobs of 
target specific code in to the JIT process. To do this (at least for now) 
I make use of a handy feature of LLVM IR: You can write raw assembly code 
directly into a bitcode module ("module-level asm"). If you look at
the
X86 implementation of each of these methods you'll see they're written
in
terms of string-streams building up a string of assembly which will be 
handed off to the JIT to compile like any other code. 

The first blob that you need to be able to output is the resolver block. 
The purpose of the resolver block is to save program state and call back 
in to the JIT to trigger lazy compilation of a function. When the JIT is 
done compiling the function it returns the address of the compiled 
function to the resolver block, and the resolver block returns to the 
compiled function (rather than its original return address). 

Because all functions share the same resolver block, the JIT needs some 
way to distinguish them, which is where the trampolines come in. The JIT 
emits one trampoline per function and each trampoline just calls the 
resolver block. The return address of the call in each trampoline provides 
the unique address that the JIT associates with the to-be-compiled 
functions. The CompileCallbackManager manages this association between 
trampolines and functions for you, you just need to provide the 
resolver/trampoline primitives. 

In case it helps, here's what the output of all this looks like on X86. 
Trampolines are trivial - they're emitted in blocks and proceeded by a 
pointer to the resolver block: 

module asm "Lorc_resolve_block_addr:"
module asm "  .quad 140439143575560"
module asm "orc_jcc_0:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)"
module asm "orc_jcc_1:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)"
module asm "orc_jcc_2:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)"
... 


The resolver block is more complicated and I won't provide the full code 
for it here. You can find it by running: 
lli -jit-kind=orc-lazy -orc-lazy-debug=mods-to-stderr <hello_world.ll>






and looking at the initial output. In pseudo-asm though, it looks like 
this: 

module asm "jit_callback_manager_addr:"
module asm "  .quad 0x46fc190" // <- address of callback manager
object
module asm "orc_resolver_block:" 
module asm "  // save register state."
module asm "  // load jit_callback_manager_addr into %rdi
module asm "  // load the return address (from the trampoline call) into 
%rsi 
module asm "  // %rax = call jit(%rdi, %rsi)
module asm "  // save %rax over the return address
module asm "  //  restore register state
module asm "  //  retq" 

So, that's a whirlwind intro to implementing lazy JITing support for a new 
architecture in Orc. I'll try to answer any questions you have on the 
topic, though I'm not familiar with PowerPC at all. If you're
comfortable
with PowerPC assembly I think it should be possible to implement once you 
grok the concepts. 

Hope this helps! 

Cheers, 
Lang. 


On Jul 26, 2015, at 11:17 PM, Revital1 Eres <ERES at il.ibm.com> wrote:

Hi Again, 

I'm a little confused regarding what is the exact Orc's functions I
should
use 
in order to save the functions code in a code cache so it could be later
replaced with different versions of it and I appreciate your help.

Just a reminder I want to dynamically recompile the program based on 
profile
 collected at the run-time. I would like to start executing the program 
from 
the code-cache and at some point be able to replace a function body with 
it's 
new compiled version; this can be done by replacing the entry in the 
function
 code with a trampoline to It's new version so that future calls to it 
will 
call the new version code. 

Does the CompileOnDemandLayer executes the program from a code cache 
and holds pointers to the code of the functions it executes? I am 
compiling for Power machine. 
Is there a target specific pieces that I should implement for making Orc 
work on Power? 

Thanks again, 
Revital 




From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:        20/07/2015 08:41 PM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital, 

The CompileOnDemand layer is used by the lazy bitcode JIT in the lli tool. 
You can find the code in llvm/tools/lli/OrcLazyJIT.* . 

Cheers, 
Lang. 


On Mon, Jul 20, 2015 at 2:32 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hello Lang,

Thanks for your answer. 

I am now looking for an example of the usage of CompileOnDemandLayer. Is 
there an example available for that (could not find one in llvm/examples)?

Thanks, 
Revital 



From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:        10/07/2015 12:10 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital, 

LLVM does have an IR interpreter, but I don't think it's maintained well
(or possibly at all). The interpreter is also not designed to interact 
with the LLVM JITs. 

We generally encourage people to just JIT LLVM IR, rather than 
interpreting it. For the use-case you have described, you could JIT IR 
with no optimizations to begin with, then re-JIT hot functions at a higher 
level. 

The Orc JIT APIs (LLVM's newer JIT APIs) were written with this kind of 
use-case in mind, and are probably a better fit for this than MCJIT. There 
is no built-in hot-function detection or recompilation yet, but I think 
this would be *fairly* easy to write in terms of Orc's callback API. 

Cheers, 
Lang. 


On Thu, Jul 9, 2015 at 4:19 AM, Revital1 Eres <ERES at il.ibm.com> wrote: 
Hello, 

I am new to LLVM and a I appreciate your help with the following:

I want to run the LLVM IR through virtual machine (LLVM interpreter?) and 
jit 
compile the hot functions (using MCJIT). 

This task will require amongst other identifying the hot functions and 
having a 
code cache that should be patched with the native code of the functions 
after 
they are jitted. 

I've read so far about MCJIT and lli however I have not seen that the LLVM 

interpreter can be used as a VM the way I was looking for; meaning
execute the code one instruction at a time; have a profiling mode to 
identify hot functions and call jit to compile the hot functions.

I appreciate any advice/starting points for this project.

Thanks, 
Revital 

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev








[attachment "fully_lazy_with_recompile.tgz" deleted by Revital1 
Eres/Haifa/IBM] 












-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151116/45d05385/attachment-0001.html>

Revital1 Eres via llvm-dev

2015-Nov-16 12:22 UTC

head link

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

Hi Lang,

As I mentioned in my previous email I want to  insert the stub as is done 
in searchFunctionASTs towards recompilation of foo.
However it occurred to me that it might not be feasible as I start to 
digesting the source code from  IR and I can not get the AST from the IR, 
is that right?
If so could I generate the stub and the instrumentation code in 
instrumentFunctions on the IR level instead of AST as is written now?

Thanks again,
Revital



From:   Lang Hames <lhames at gmail.com>
To:     Revital1 Eres/Haifa/IBM at IBMIL
Cc:     LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
Date:   15/11/2015 11:13 PM
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

In this context, an external function is one that is not defined inside 
the module itself. If, for example, your code contained a call to printf 
(and you hadn't defined printf yourself), that would be an external 
symbol.

Cheers,
Lang.

On Sun, Nov 15, 2015 at 12:54 PM, Revital1 Eres <ERES at il.ibm.com>
wrote:
Hi Lang,

I was trying to recompile foo.
It is not declared as static function so I thought it should be
visible outside of the program but I'm guessing I'm missing something 
here.

Thanks again,
Revital



From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL
Cc:        LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
Date:        15/11/2015 01:33 PM

Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

This program does not contain any external references, and so I would not 
expect it to call the resolver at all.

What symbol were you expecting to see a resolver call for?

Cheers,
Lang.

On Wed, Nov 11, 2015 at 11:44 AM, Revital1 Eres <ERES at il.ibm.com>
wrote:
Hi Lang,

Thanks for your reply!

The program I'm compiling is the following toy program which is compiled 
with -fno-inline to
avoid inlining foo into main.  

In the fully_lazy_with_recompile code I've added the following statements. 
When running the 
code with gdb I do not see it breaks in the lamda resolver as described in 
my previous mail.

 auto ExprSymbol = J.findUnmangledSymbolIn(H,"main");
 double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
 std::cerr << "Evaluated to " << FP() <<
"\n";

Btw, another issue I need to resolve - some of the parameters were 
originally read from command line using argv but due to the following 
error
I avoided that for now (I also got similar error regarding 
ZNSt8ios_base4InitC1Ev when using prints):
LLVM ERROR: Program used external function 'atoi' which could not be 
resolved!

Thanks again,
Revital

#define ITERS 1000000
int arr[ITERS];

int
foo (int x, int y)
{
  int res = 950;
  if (x > 3 && y < 77)
    res = 97;
  else
    res = res * x;
  return res;
}

int
main ()
{
  int x = 880;
  int num = 990;
  int i, j;
  int b = 0;

  for (i = 0; i < ITERS; i++)
    arr[i] = i;

  for (j = 0; j < num; j++)
    for (i = 0; i < ITERS; i++)
      {
        b += foo (x, arr[i]) /2;
      }
  return 0;
}



From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL
Cc:        LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
Date:        10/11/2015 06:31 PM

Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

Apologies for the delayed reply - I'm traveling at the moment and not able 
to check my email often.

You will only see a callback on the resolver for symbols that are external 
to the module. What did the IR that you added look like?

Cheers,
Lang.

On Wed, Nov 4, 2015 at 8:37 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hello Lang,

I want to use the lazy recompilation program you posted to compile an 
input program RI (not processing the input by
 interpreter as it is done in the example).
To do that I called the addModule function on the module returned from 
parseInputIR as was done with the other 
functions in the Kaleidoscope examples. 
Now, to start the codegen I am using getAddress and at this point I was 
expecting to see a call to the lamda resolver defined 
in createResolver but I did not see it happen and I appreciate your help 
to understand why.

Here is a snippet from my additions to the new version of the fully_lazy 
Orc Kaleidoscope.

Thanks again,
Revital

  SessionContext S(getGlobalContext());
  KaleidoscopeJIT J(S);

  cl::ParseCommandLineOptions(argc, argv,
                              "Kaleidoscope example program\n");

 std::unique_ptr<Module> M;
  if (!InputIR.empty()) {
      M = parseInputIR(InputIR);;
      auto H = J.addModule(std::move(M));
     char ModID[256];
     sprintf(ModID, "IR:%s", InputIR.c_str());
     auto ExprSymbol = J.findUnmangledSymbolIn(H,ModID);
     double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
     std::cerr << "Evaluated to " << FP() <<
"\n";
     J.removeModule(H);
  }
               



From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL, LLVM Developers Mailing List <
llvm-dev at lists.llvm.org>
Date:        18/09/2015 09:47 AM

Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital,

Attached is a new version of the fully_lazy Orc Kaleidoscope demo that has 
been extended to enable re-compilation at higher optimisation levels, 
roughly following the scheme I outlined before.

In the compile action for the callback, the initial IR for each is 
transformed like this:


                           unsigned foo_counter = 0;
void foo$impl() {          void foo$impl() { 
  // foo body        ->      if (++foo_counter > 1000) { 
}                              auto fooOpt = $recompile(&foo); 
                               fooOpt(); 
                             } 
                             // foo body 
                           }

The key changes to make this work (which you can see by diff'ing toy.cpp 
against the original fully_lazy version):

1) New layers HotCompileLayer and HotIROptsLayer added. These perform IR 
optimisation and code generation at higher optimisation levels than the 
default layers.
2) The symbol resolver function (not to be confused with the resolver 
block) has been pulled out into its own function, createResolver, so that 
it can be shared between optimised & non-optimized code. It also resolves 
the "$recompile" function to a static method on the KaleidoscopeJIT
class
itself.
3) The lazy compile action now calls 'instrumentFunctions' before adding
the IR for cold functions to the JIT.
4) The instrumentFunctions method injects the counter code and call to 
recompile.
5) The recompileHot method re-IRGens functions, then adds them to the 
HotIROpts layer to generate more optimized versions. It then updates the 
function-body pointer so that subsequent calls go to the optimised 
version.
 
This is a bit quick-and-dirty, but does work. In the future I'll try to 
tidy this up and turn it into a new tutorial chapter.

Hope this helps!

Cheers,
Lang.




On Wed, Sep 16, 2015 at 10:09 PM, Revital1 Eres <ERES at il.ibm.com>
wrote:
Hi Lang, 

Many thanks!!! I just wanted to make sure you did not miss it...

Thanks again! 
Revital 



From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:        17/09/2015 01:56 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital, 

Apologies for the delayed reply. 

I'm working on some example code for how to do this. I'll try to post it
tomorrow. 

Cheers, 
Lang. 

On Tue, Sep 8, 2015 at 12:23 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hi Lang, 

After spending some time debugging Kaleidoscope orc fully_lazy toy example 
on 
x86 I want to start implementing run-time optimizer as you suggested and 
again 
I highly appreciate your help. 
For now I'll defer the target specific implementation to the end after 
I'll have 
the non target parts in place as I can run on x86 as a start. 
Given a simple example of main function calling foo and bar functions; 
IIUC I should start from the IR level of this module which means that 
ParseIRFile will be be first called on the IR of the program, is that 
right? 

I would like to make sure I understand your suggestion which is to insert 
a new 
layer that should be implemented on top of the CompileCallbackLayer in 
order to 
be able to call trigger_condition at the beginning of a function. 
IIUC until the function (bar or foo) is optimized the call to foo and bar 
will 
go through the resolver (foo and bar will not be compiled from scratch 
every 
time we go through the resolver but rather execute the cached non 
optimized 
version after first compiled). The resolver will check trigger_condition 
to see if the cached non optimized version should be executed or a new 
optimizied version should be compiled and executed. 
After the trigger_condition is true foo and bar will be compiled to 
generate 
their optimized version and this version will be executed directly from 
now on 
(not going through the resolver any more). Is that right? 
Does this layer on top of the CompileCallbackLayer should be similar to 
class KaleidoscopeJIT? 
I saw that in Kaleidoscope Orc's example the Lambda functions that are 
added in 
createLambdaResolver are been executed by the resolver before compiling a 
call 
so I assume that the trigger_condition should be added also by 
createLambdaResolver so before compiling foo or bar the Lambda functions 
that are added by calling createLambdaResolver and contain 
trigger_condition 
will be executed, is that right? 

IIUC in Kaleidoscope Orc's example the interpreter calls the addModule 
upon 
parsing call expression in HandleTopLevelExpression. 
In my case I assume addModule be called for the module returned from 
ParseIRFile, right? 
In this case should calling getAddress on the whole module (the IR of all 
functions) will trigger calling the Lambda functions defined in 
createLambdaResolver on foo and bar functions? Also - in Kaleidoscope orc 
example the execution of the function is done explicitly in 
HandleTopLevelExpression after calling getAddress and its not clear to me 
where 
I should insert this in my case. 

Thanks again, 
Revital 




From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:        28/07/2015 05:58 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital, 

What do you mean by "code cache"? Orc (and MCJIT) does have the
concept of
an ObjectCache, which is a long-lived, potentially persistent, compiled 
version of some IR. It's not a key component of the JIT though: Most 
clients run without a cache attached and just JIT their code from scratch 
in each session. 

Recompilation is orthogonal to caching. There is no in-tree support for 
recompilation yet. There are several ways that it could be supported, 
depending on what security / performance trade-offs you're willing to 
make, and how deep in to the LLVM code you want to get. As things stand at 
the moment all function calls in the lazy JIT are indirected via function 
pointers. We want to add support for patchable call-sites, but this hasn't 
been implemented yet. The Indirect calls make recompilation reasonably 
easy: You could add a transform layer on top of the CompileCallbackLayer 
which would modify each function like this: 

void foo$impl() {          void foo$impl() {
  // foo body        ->      if (trigger_condition) { 
}                              auto fooOpt = jit_recompile_hot(&foo);
                               fooOpt(); 
                             } 
                             // foo body 
                           } 

You would implement the jit_recompile_hot function yourself in your JIT 
and make it available to JIT'd code via the SymbolResolver. When the 
trigger condition is met you'll get a call to recompile foo, at which 
point you: (1) Add the IR for foo to a 2nd IRCompileLayer that has been 
configured with a higher optimization level, (2) look up the address of 
the optimized version of foo, and (3) update the function pointer for foo 
to point at the optimized version. The process for patchable callsites 
should be fairly similar once they're available, except that you'll 
trigger a call-site update rather than rewriting a function pointer. 

This neglects all sorts of fun details (threading, garbage collection of 
old function implementations), but hopefully it gives you a place to 
start.  


Regarding laziness, as Hal mentioned you'll have to provide some target 
support for PowerPC to support lazy compilation. For a rough guide you can 
check out the X86_64 support code in 
llvm/include/llvm/ExecutionEngine/Orc/OrcTargetSupport.h and 
llvm/lib/ExecutionEngine/Orc/OrcTargetSupport.cpp. 

There are two methods that you'll need to implement: 
insertCompileCallbackTrampoline and insertResolverBlock. These work 
together to enable lazy compilation. Both of these methods inject blobs of 
target specific code in to the JIT process. To do this (at least for now) 
I make use of a handy feature of LLVM IR: You can write raw assembly code 
directly into a bitcode module ("module-level asm"). If you look at
the
X86 implementation of each of these methods you'll see they're written
in
terms of string-streams building up a string of assembly which will be 
handed off to the JIT to compile like any other code. 

The first blob that you need to be able to output is the resolver block. 
The purpose of the resolver block is to save program state and call back 
in to the JIT to trigger lazy compilation of a function. When the JIT is 
done compiling the function it returns the address of the compiled 
function to the resolver block, and the resolver block returns to the 
compiled function (rather than its original return address). 

Because all functions share the same resolver block, the JIT needs some 
way to distinguish them, which is where the trampolines come in. The JIT 
emits one trampoline per function and each trampoline just calls the 
resolver block. The return address of the call in each trampoline provides 
the unique address that the JIT associates with the to-be-compiled 
functions. The CompileCallbackManager manages this association between 
trampolines and functions for you, you just need to provide the 
resolver/trampoline primitives. 

In case it helps, here's what the output of all this looks like on X86. 
Trampolines are trivial - they're emitted in blocks and proceeded by a 
pointer to the resolver block: 

module asm "Lorc_resolve_block_addr:"
module asm "  .quad 140439143575560"
module asm "orc_jcc_0:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)"
module asm "orc_jcc_1:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)"
module asm "orc_jcc_2:" 
module asm "  callq *Lorc_resolve_block_addr(%rip)"
... 


The resolver block is more complicated and I won't provide the full code 
for it here. You can find it by running: 
lli -jit-kind=orc-lazy -orc-lazy-debug=mods-to-stderr <hello_world.ll>






and looking at the initial output. In pseudo-asm though, it looks like 
this: 

module asm "jit_callback_manager_addr:"
module asm "  .quad 0x46fc190" // <- address of callback manager
object
module asm "orc_resolver_block:" 
module asm "  // save register state."
module asm "  // load jit_callback_manager_addr into %rdi
module asm "  // load the return address (from the trampoline call) into 
%rsi 
module asm "  // %rax = call jit(%rdi, %rsi)
module asm "  // save %rax over the return address
module asm "  //  restore register state
module asm "  //  retq" 

So, that's a whirlwind intro to implementing lazy JITing support for a new 
architecture in Orc. I'll try to answer any questions you have on the 
topic, though I'm not familiar with PowerPC at all. If you're
comfortable
with PowerPC assembly I think it should be possible to implement once you 
grok the concepts. 

Hope this helps! 

Cheers, 
Lang. 


On Jul 26, 2015, at 11:17 PM, Revital1 Eres <ERES at il.ibm.com> wrote:

Hi Again, 

I'm a little confused regarding what is the exact Orc's functions I
should
use 
in order to save the functions code in a code cache so it could be later
replaced with different versions of it and I appreciate your help.

Just a reminder I want to dynamically recompile the program based on 
profile
 collected at the run-time. I would like to start executing the program 
from 
the code-cache and at some point be able to replace a function body with 
it's 
new compiled version; this can be done by replacing the entry in the 
function
 code with a trampoline to It's new version so that future calls to it 
will 
call the new version code. 

Does the CompileOnDemandLayer executes the program from a code cache 
and holds pointers to the code of the functions it executes? I am 
compiling for Power machine. 
Is there a target specific pieces that I should implement for making Orc 
work on Power? 

Thanks again, 
Revital 




From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:        20/07/2015 08:41 PM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital, 

The CompileOnDemand layer is used by the lazy bitcode JIT in the lli tool. 
You can find the code in llvm/tools/lli/OrcLazyJIT.* . 

Cheers, 
Lang. 


On Mon, Jul 20, 2015 at 2:32 AM, Revital1 Eres <ERES at il.ibm.com> wrote:
Hello Lang,

Thanks for your answer. 

I am now looking for an example of the usage of CompileOnDemandLayer. Is 
there an example available for that (could not find one in llvm/examples)?

Thanks, 
Revital 



From:        Lang Hames <lhames at gmail.com>
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:        10/07/2015 12:10 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time



Hi Revital, 

LLVM does have an IR interpreter, but I don't think it's maintained well
(or possibly at all). The interpreter is also not designed to interact 
with the LLVM JITs. 

We generally encourage people to just JIT LLVM IR, rather than 
interpreting it. For the use-case you have described, you could JIT IR 
with no optimizations to begin with, then re-JIT hot functions at a higher 
level. 

The Orc JIT APIs (LLVM's newer JIT APIs) were written with this kind of 
use-case in mind, and are probably a better fit for this than MCJIT. There 
is no built-in hot-function detection or recompilation yet, but I think 
this would be *fairly* easy to write in terms of Orc's callback API. 

Cheers, 
Lang. 


On Thu, Jul 9, 2015 at 4:19 AM, Revital1 Eres <ERES at il.ibm.com> wrote: 
Hello, 

I am new to LLVM and a I appreciate your help with the following:

I want to run the LLVM IR through virtual machine (LLVM interpreter?) and 
jit 
compile the hot functions (using MCJIT). 

This task will require amongst other identifying the hot functions and 
having a 
code cache that should be patched with the native code of the functions 
after 
they are jitted. 

I've read so far about MCJIT and lli however I have not seen that the LLVM 

interpreter can be used as a VM the way I was looking for; meaning
execute the code one instruction at a time; have a profiling mode to 
identify hot functions and call jit to compile the hot functions.

I appreciate any advice/starting points for this project.

Thanks, 
Revital 

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev








[attachment "fully_lazy_with_recompile.tgz" deleted by Revital1 
Eres/Haifa/IBM] 












-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151116/4d280c66/attachment.html>

llvm dev - Nov 2015 - [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

[LLVMdev] Help with using LLVM to re-compile hot functions at run-time

[LLVMdev] Help with using LLVM to re-compile hot functions at run-time

[LLVMdev] Help with using LLVM to re-compile hot functions at run-time

[LLVMdev] Help with using LLVM to re-compile hot functions at run-time

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time