thr3ads.net - llvm dev - [llvm-dev] ORC JIT - different behaviour of ExecutionSession.lookup? [Oct 2020]

If this information is useful, please help other people find it:
Share via:

Gaier, Bjoern via llvm-dev

2020-Oct-01 07:53 UTC

[llvm-dev] ORC JIT - different behaviour of ExecutionSession.lookup?

Hey Lang,

Woah! That mail contains a lot of information and things I never tried yet…
Actually… the entire MaterializationUnit and MaterializationResponsibility part
is… quite… overwhelming >O<

With “pop up” I mean… the process which is waiting for Module “Planschi” to “pop
up” can not do a thing about it. It just waits until there is an table entry for
it, indicating that the object file was loaded by another process – or not.

To understand the MU and MR things better….  I try now to explain the flow of
program, when somebody would add an IR file to it – not caring about how that
file comes to that central process.

  1.  I go over all symbols and grab the names I’m caring about for execution.
  2.  I create a DyLib with a unique name and stuff the IR module in it
  3.  I call lookup from LLJIT and pass the DyLib mentioned in 2. with my
symbols, get the addresses and can execute them

This would be the case where everything went well.
Okay…. So I use the “LLJITBuilder” to get my LLJIT instance, this would be
before step 1.)

In step 2.) when I created the new DyLib, I would call addGenerator and pass a
class to it that inherits from “DefinitionGenerator” right? Let’s call it
“MyDefinitionGenerator" then.
“MyDefinitionGenerator" needs to provide an implementation for
“tryToGenerate” which contains stuff which I ignore at this point.

Now I’m in step 3.) right? And oh nooo! Planschi_test is undefined.
Now the tryToGenerate function is executed and I realize there, that
“Planschi_test” is something coming from a different module.

Normally I would call the “define” function of the DyLib I got (in the
tryToGenerate function) and providing an absolute value. But I can’t because
“Planschi” didn’t popped up yet. So I create instead a
“MyTentativeDefinitionMaterializationUnit" and stuff it into the define
function right?
„MyTentativeDefinitionMaterializationUnit“ is something that inherits from
„TentativeDefinitionMaterializationUnit" right? Now I have those overloaded
functions where I need to provide implantation for it. This is kinda shady to
me, but this will make waaaay more sense when I looked at it I guess.

However… Now I’m lost…

What will happen now? So I gave „MyTentativeDefinitionMaterializationUnit“ To
“Planschi_test” while doing the lookup and…. For some reason “Planschi” has not
popped up yet. Where will the waiting for it happen? In one of the
“MyTentativeDefinitionMaterializationUnit" functions? Will the lookup I did
in Step 3 return or will it be blocked until I used the
MaterializationResponsibility To finally resolve the symbol?
I’m also not sure yet where exactly the code snippet you showed would go to….
But I guess it is in one of the functions of
“MyTentativeDefinitionMaterializationUnit" right?

Uh… Thank you again and sorry for any stupid question in this…

Kind greetings
Björn

From: Lang Hames <lhames at gmail.com>
Sent: 30 September 2020 19:41
To: Gaier, Bjoern <Bjoern.Gaier at horiba.com>
Cc: LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] ORC JIT - different behaviour of
ExecutionSession.lookup?

Hi Bjoern,

In the current state we don’t have a JIT only an handcrafted object loader. All
our object files are pre-compiled but will be loaded by different processes into
the same shared memory for the main process to execute them.

Ok -- this helps a lot. As a general observation (not ORC specific): If you have
multiple source processes targeting a single process then either everything
needs to be lazy or your source processes will need to communicate via IPC to
make things safe (if modules in different processes can directly depend on one
another). ORC is not a good fit for this model: It aims to solve the
cross-module dependency problem with a dependency graph in one process, rather
than IPC between multiple processes.

I think you could make ORC work for you here by changing your IPC just a little
bit. Instead of:

Source Process 1 (with JIT) -- Sends-linked-memory-to -\
Source Process 2 (with JIT) -- Sends-linked-memory-to --> Target Process
Source Process 3 (with JIT) -- Sends-linked-memory-to -/

You would use:

Source Process 1 -- Sends-relocatable-object-to -\
Source Process 2 -- Sends-relocatable-object-to --> Target Process (With JIT)
Source Process 3 -- Sends-relocatable-object-to -/

In the second scenario you're sending relocatable objects via shared memory
and then linking them with the JIT in the target process where we can fully
track dependencies. You can link on multiple threads concurrently in the target
process so you needn't worry about this fully serializing your link steps.
In fact it is likely to cut down on some of the IPC needed between source
processes.

If you don't want the JIT linked into your target process then you can add
one extra step:

Source Process 1 -- Sends-relocatable-object-to -\
Source Process 2 -- Sends-relocatable-object-to --> JIT Process --> Target
Process
Source Process 3 -- Sends-relocatable-object-to -/

We would like to still have the files pre-compiled but we want to remove the
handcrafted object loader. However, since we already create those files with
Clang, it is not such a big change to compile them to IR.

If you want to keep pre-compiling them to objects that should be fine too. The
JIT can just load objects directly. The namespacing issue still needs to be
solved, but that's doable at the object level.
> You're trying to do all this on Hard Mode. ;)Yeah it was a silly thought… sorry xO

No worries -- Just want to make sure you don't do more work than you need
to. :)

You wouldn’t find a reference to “test”, you would find “Planschi_test”. We
would first try resolving that with the standard C library functions. If this
fails, then we will ask the object file “Planschi” if it has “test” and resolve
it. If “Planschi” is not loaded for some reason, then we go on with the next
symbol and mark “Planschi_test” as undefined. So… we fail linking. However, our
code allows to just try the linking again – so we do and do until it succeeds.
Only when the undefined references hit 0 we mark the object file as executable.
However, if the module you depend on has still undefined references you can be
screwed. Cause our code will not check the dependencies. Another reason why we
want to move away.

Great. I think I understand this now.

I think… but this is probably not thought through well:
I need to do those steps on the IR level now – because the DefinitionGenerator
needs to resolve all symbols, but I might have to wait for a different module to
pop up first.

What causes a module to "pop up"?

Either way I think you can solve this with a custom MaterializationUnit.
Here's one scheme for solving this:
If your definition generator receives a lookup for an external symbol (e.g.
Planschi_test)  then we'll call it "not-resolved-elsewhere" and
create a TentativeDefinitionMaterializationUnit for it (this is something you
would need to write up). The TentativeDefinitionMaterializationUnit says "I
provide Planschi_test" (by including it in its symbol flags map) but has no
idea how it will actually do that yet. The emit method on your
TentativeDefinitionMaterializationUnit will be called immediately to satisfy the
query, and be given a MaterializationResponsibility object covering
Planschi_test. You hold on to this MaterializationResponsibility object while
you do whatever it is that might cause the Planschi module to "pop
up". If/when Planschi does pop up you can add the object file to it to its
own "Planschi" JITDylib, then do the following:

(1) Look up the address of test in the Planschi JITDylib. This time you do want
to use the Resolved state (rather than ready) because we don't want to
create any cycles due to queries*.

* E.g. "A" depends on "B", "B" depends on
"A". If you try to produce each by looking up the other and waiting
for it to be "Ready" then you'll deadlock. But you can look up the
other and wait for it to be Resolved and that will always succeed (addresses are
assigned without waiting for dependencies to be resolved). Then you add
"B" as a dependence of "A" and vice-versa and let Orc's
dependence tracking take care of the rest. Anybody issuing a query looking for
"A" or "B" to be ready will find that their query
doesn't return until both symbols are fully-materialized and safe to access.

(2) Update the JIT state using your MaterializationResponsibility object:

if (auto TestSym = ES.lookup({&PlanschiJD}, ES.intern("test")),
SymbolState::Resolved) {
  // Tell the JIT that the address of PlanschiTest is the same
  // as the address of "test" in the Planschi JITDylib.
  MR.notifyResolved({
      { ES.intern("Planschi_test"),
        JITEvaluatedSymbol(TestSym.getAddress(),
                           JITSymbolFlags::Exported) }
  });

  // Tell the JIT that "Planschi_test" depends on the "test"
symbol in
  // the Planschi JITDylib:
  MR.addDependencies(ES.intern("Planschi_test"),
                     {{&PlanschJD, { ES.intern("test") }}});

  // Tell the JIT that "Planschi_test" has been emitted. We did not
  // actually have to do any work for this because Planschi_test is
  // just an alias.
  MR.notifyEmitted();

  // And you're done.
} else {

  // If we get here then the "test" symbol couldn't be resolved
for some
  // reason, so we need to report that error and notify everyone that
  // "Planschi_test" failed too.
  ES.reportError(TestSym.takeError());
  MR.failMaterialization();
}

The code above takes care of the case where the Planschi module can be found.
The other possibility is that you get to the end of your process that causes
modules to "pop up" (whatever this is) and there's still no
Planschi module. In this case you just report an error and call
MR.failMaterialization() to tell everyone who needed Planschi_test that it
really couldn't be found.

1.) When all symbols have such a unique name now, couldn’t I just add them all
to the same DyLib and use the RessourceTracker to unload them later? This would
spare me searching the right DyLibs.

The scheme I just described uses JITDylibs to namespace things, but you could
also (1) do everything at the IR level as you've described and rename them
there, or (2) use a JITLink plugin to rename everything at the object file level
(this is doable, as long as you have a JITLink implementation for your target
platform. So far we only have MachO and ELF on x86-64. I think you need COFF,
right?).

2.) Can I ask a DyLib if it has something for the “Parent_Child__Plansch_?
test@@3HA" loaded?

Sort of. You can look up the symbol flags for “Parent_Child__Plansch_?
test@@3HA". This will not trigger materialization, but will call your
definition generator to try to generate a definition. In the scheme I described
above you shouldn't need to do this at all.

My biggest worry is, I do all those fancy renaming and such, but I still have no
guarantee that the missing symbols are already present, but when I look up my
symbols of interest it is to late… cause if something is missing because the
module was not loaded yet, then I get in an error state and I’m stuck there :c

You shouldn't need to worry about this. By using a custom materialization
unit as described above you can keep a query waiting as long as you like before
either failing it or providing a definition. You don't need to error out
until you've determined that a symbol really can't be provided.

 3.) Isn’t the process of renaming those symbols like… really memory costly?
Could I run into limitations of creating a too long function name?

It depends on how long you expect your symbol names to get, and how memory
constrained you are. Orc uses a string pool, so each string value is usually
only held in one place in memory. This should keep the overhead pretty low.

-- Lang.

(n Wed, Sep 30, 2020 at 12:05 AM Gaier, Bjoern <Bjoern.Gaier at
horiba.com<mailto:Bjoern.Gaier at horiba.com>> wrote:
Hey Lang,
> Do you mean that the object file is produced by another process and is
being loaded into your JIT process for execution, or that you want your JIT to
produce code for several different processes? These are different problems with
different solutions. I'll wait until I understand your use case to answer
further.In the current state we don’t have a JIT only an handcrafted object loader. All
our object files are pre-compiled but will be loaded by different processes into
the same shared memory for the main process to execute them.

We would like to still have the files pre-compiled but we want to remove the
handcrafted object loader. However, since we already create those files with
Clang, it is not such a big change to compile them to IR.
> You're trying to do all this on Hard Mode. ;)Yeah it was a silly thought… sorry xO
> The part of your use case that is the most opaque to me is the renaming.
When you see a reference to "test" in some object, how do you decide
that it should resolve to the definition of "test" in, for example,
Planschi, as opposed to some other module? Do you just have a list of modules
that you check in-order until you find a matching symbol name?You wouldn’t find a reference to “test”, you would find “Planschi_test”. We
would first try resolving that with the standard C library functions. If this
fails, then we will ask the object file “Planschi” if it has “test” and resolve
it. If “Planschi” is not loaded for some reason, then we go on with the next
symbol and mark “Planschi_test” as undefined. So… we fail linking. However, our
code allows to just try the linking again – so we do and do until it succeeds.
Only when the undefined references hit 0 we mark the object file as executable.
However, if the module you depend on has still undefined references you can be
screwed. Cause our code will not check the dependencies. Another reason why we
want to move away.

I think… but this is probably not thought through well:
I need to do those steps on the IR level now – because the DefinitionGenerator
needs to resolve all symbols, but I might have to wait for a different module to
pop up first.

So I would iterate over the globals and the functions of the Module:
I would rename each symbol I encounter to have a unique name.
This would be easy because when we load stuff, we give them a hierarchical name
like “Parent_Child_Child_Dino”
So "?Sampler@@YAXXZ" Could be renamed to
"Parent_Child_Child_Dino_?Sampler@@YAXXZ".

If I encounter an undefined reference that is not part of the standard library
like "?_Plansch_test@@3HA" Then I would extract the relative path and
convert it to an absolute one, getting: “Parent_Child__Plansch_? test@@3HA”.
When the “Plansch” module was loaded, I would have give it an unique name as
well so they would fit. I would then add them to there own DyLibs and would be
happy… would I?

1.) When all symbols have such a unique name now, couldn’t I just add them all
to the same DyLib and use the RessourceTracker to unload them later? This would
spare me searching the right DyLibs.
2.) Can I ask a DyLib if it has something for the “Parent_Child__Plansch_?
test@@3HA" loaded?
My biggest worry is, I do all those fancy renaming and such, but I still have no
guarantee that the missing symbols are already present, but when I look up my
symbols of interest it is to late… cause if something is missing because the
module was not loaded yet, then I get in an error state and I’m stuck there :c
3.) Isn’t the process of renaming those symbols like… really memory costly?
Could I run into limitations of creating a too long function name?

I hope that makes sense… cause I’m not good in explaining anything, especially
not when it is in a different language :c

Thank you again :D

Kind greetings
Björn

From: Lang Hames <lhames at gmail.com<mailto:lhames at gmail.com>>
Sent: 29 September 2020 18:55
To: Gaier, Bjoern <Bjoern.Gaier at horiba.com<mailto:Bjoern.Gaier at
horiba.com>>
Cc: LLVM Developers Mailing List <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [llvm-dev] ORC JIT - different behaviour of
ExecutionSession.lookup?

Hi Bjoern,

However, another thing of our system is, that each object file was loaded from a
different process,

Do you mean that the object file is produced by another process and is being
loaded into your JIT process for execution, or that you want your JIT to produce
code for several different processes? These are different problems with
different solutions. I'll wait until I understand your use case to answer
further.

Writing this… I actually wondered about something else but this feels like a
terrible approach… But now I’m curious xD
So… if I load an IR-Module, could I use the static LLVM compiler to compile it
to an object file and then use the source code of the LLD to load and resolve
the symbols the same way/kind we did in the past?
That sounds more like the “addObjectFile” function of the LLJIT… And then I
guess I have to write a LinkLayer or something? That is where my knowledge ends…
Disclaimer: I don’t like that approach but it would be interesting to know (also
cause some people here would be happy with it .w.”)

You're trying to do all this on Hard Mode. ;)

ORC takes care of all this kind of stuff for you:
  Don't re-write IR. Leave references as symbolic -- they will be fixed up
in the JIT linker.
  You don't need to write your own JIT linker. LLJIT has one built in.

When you add things to the JIT:
  - If you have a program representation (module, object file, etc.) and you
want to add it then just go ahead.
  - If your program representations contain external references then they must
be resolvable or linking will fail (there's no getting around that, in a JIT
or in a regular compile), BUT...
  - You can always add new definitions in response to a query by using a
definition generator. If your definition generator can find/create a definition
then great. If it can't then the reference really is unresolved and linking
really should fail.

The part of your use case that is the most opaque to me is the renaming. When
you see a reference to "test" in some object, how do you decide that
it should resolve to the definition of "test" in, for example,
Planschi, as opposed to some other module? Do you just have a list of modules
that you check in-order until you find a matching symbol name?

-- Lang.

On Tue, Sep 29, 2020 at 2:30 AM Gaier, Bjoern <Bjoern.Gaier at
horiba.com<mailto:Bjoern.Gaier at horiba.com>> wrote:
Hey Lang,

Thank you for your help and your patience – also for your answers in the “ORC
JIT - Can modules independently managed with one LLJIT instance? + problems with
ExecutionSession.lookup” mail. Both problems have the same origin so I keep
writing about it here, to avoid duplication.
My big problem is still handling cross references between modules with “our”
name scheme. Since our old loader loads object files, we resolved those
references with object files and since they were already compiled, we knew all
addresses right away. With the LLJIT as I finally understand, I will only get
the addresses when I have resolved every references, which makes the code way
safer.
However, another thing of our system is, that each object file was loaded from a
different process, so sometimes not all symbols for ModuleA were present because
ModuleB was not loaded/requested yet. That was okay, so we kept resolving the
undefined references of ModuleA until ModuleB was loaded and everything was
fine.

If I get it right… This would change now to having a single LLJIT representing
the entire system. Each process would get it’s own DyLib for there module.
However, I would need to check on IR-Level now which symbols would be undefined
– correct? Because if I wait until “DefinitionGenerator::tryToGenerate” is
called and have to wait for a module that might never be loaded, then I’m stuck
there forever.
1.) If I find a symbol that is undefined – and it has our name scheme, then I
would jump to for example ModuleB, which is also not jitted yet and would do a
“replaceAllUsesWith” on the Symbol of ModuleA to ModuleB – right?
                - Would that mean, when I add ModuleA to DyLibA – is ModuleB
then part of DyLibA as well?
1.1.) Alternatively I could rename the symbol
2.) If the ModuleB is already jitted, then I can take the address to  do the
“replaceAllUsesWith” right?

When I resolved all those references, then I can add the IR Module to my DyLib
and compile it. However is it a good idea to use “replaceAllUsesWith” with
addresses? Seems like the DefinitionGenerator would be jobless…

Writing this… I actually wondered about something else but this feels like a
terrible approach… But now I’m curious xD
So… if I load an IR-Module, could I use the static LLVM compiler to compile it
to an object file and then use the source code of the LLD to load and resolve
the symbols the same way/kind we did in the past?
That sounds more like the “addObjectFile” function of the LLJIT… And then I
guess I have to write a LinkLayer or something? That is where my knowledge ends…
Disclaimer: I don’t like that approach but it would be interesting to know (also
cause some people here would be happy with it .w.”)

Thank you so far!

Kind greetings
Björn

From: Lang Hames <lhames at gmail.com<mailto:lhames at gmail.com>>
Sent: 29 September 2020 01:47
To: Gaier, Bjoern <Bjoern.Gaier at horiba.com<mailto:Bjoern.Gaier at
horiba.com>>
Cc: LLVM Developers Mailing List <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [llvm-dev] ORC JIT - different behaviour of
ExecutionSession.lookup?

Hi Bjoern,

Even though the "tryToGenerate" function of my DefinitionGenerator
returned a "llvm::orc::SymbolsNotFound" for the
"?_Plansch_test@@3HA", I got an address for "?

That's because you're issuing the lookup with RequiredState ==
SymbolState::Resolved. This means that your query will return as soon as
"?Sampler@@YAXXZ" is assigned an address. In the JIT linker(s)
addresses are assigned before external references are looked up. So after your
lookup returns the linker attempts to find "?_Plansch_test@@3HA",
fails, and so moves "?Sampler@@YAXXZ" to the error state.

You almost always want to issue your lookups with RequiredState ==
SymbolState::Ready. This ensures that the query will not return until / unless
the requested symbols (and all their dependencies) are successfully linked into
the target process and ready to execute.

Question 1.)
Is there any way to reset the error state of "?Sampler@@YAXXZ" at this
point?

No. However, the removable code feature will allow you to remove failed
materialization units once it lands in the mainline.

- After my first call I used the "define" function of the JITDylib to
define ?_Plansch_test@@3HA and then I tried calling the lookup function again
and again, however I only got the error: "Failed to materialize
symbols" even though "?_Plansch_test@@3HA" was defined now...
- Changing the order of the "define" and the "lookup" call
works of course, but I'm interested in the case where I don't know the
address yet.

The JIT doesn't re-try linking. Once a symbol has failed to link it remains
in the error state. In theory, once removable code is added you could choose to
remove and then re-add "?Sampler@@YAXXZ" after
"?_Plansch_test@@3HA" is defined. The real solution though is just to
make sure that "?_Plansch_test@@3HA" is defined (either directly or
via a generator) before you look up "?Sampler@@YAXXZ".

Out of curiosity I repeated the previous scenario - but added
"?_Plansch_test@@3HA" to the "lookupSet" which changed
things drasticly. When executing "lookup" I now get the
"llvm::orc::SymbolsNotFound" error from my DefinitionGenerator...

Yes. Because "?_Plansch_test@@3HA" is not defined. You should see a
SymbolsNotFound error sent to your error reporter in the first scenario too,
followed by a failure-to-materialize error for "?Sampler@@YAXXZ".

... and "?Sampler@@YAXXZ" is stuck as a pending query in the
MaterializingInfos entries.

Huh. That sounds like a bug: All references to the query should be removed from
the state machine before it returns its result (in this case an error). I'll
see if I can reproduce this locally and fix it up, but it doesn't affect the
discussion here.

When I then add a definition for "?_Plansch_test@@3HA" and call
"lookup" the second time, it will succeed and give me the addresses.
Also I'm able to execute the code now. This is great! However...

When a lookup fails we try to restore the ExecutionSession state to what it was
prior to the query. This is why the sequence "lookup -> symbols not
found -> define -> lookup again" worked.

Question 2.)
Why did the first call to lookup not return the address of
"?Sampler@@YAXXZ" like in the first scenario? I expected it would
return an address for it.

A lookup must match against all symbols before anything is JIT'd. When it
failed to match "?_Plansch_test@@3HA" we immediately bailed out with
an error. There was no further attempt to compile "?Sampler@@YAXXZ".

Question 3.)
Can I somehow combine both behaviours? Getting the address for all the symbols
(like in scenario 1) while still being able to provide definitions later (like
in scenario 2)?

Sort of.

Definition generators allow you to provide a definition at the last minute (i.e.
in response to a query). The best mental model though is: "All definitions
that a generator can generate are part of the interface of the dylib". E.g.
if you use a DynamicLibrarySearchGenerator to mirror symbols from a dynamic
library containing "foo", "bar" and "baz" then you
should think of your JITDylib as containing definitions for "foo",
"bar" and "baz", even if the generator hasn't actually
added them to the JITDylib yet. The reason is that it will add them in response
to any query for them, so it's indistinguishable (except for timing and
debug logging) from the case where they're already present.

If you need to be able to defer adding a "real" definition beyond the
initial lookup then your only option (and this only applies to functions) is a
lazy-reexport. This allows you to provide a definition for a function while
deferring lookup until the first execution of the re-export at runtime. I
wouldn't generally use this to break dependencies though: You want a
definition of the real function body for "?_Plansch_test@@3HA" already
added to your JIT because (in general) you never know when JIT'd code will
need it.

My turn to ask a question: How is "?_Plansch_test@@3HA" created, and
why not just add it up-front? :)

-- Lang.

On Mon, Sep 28, 2020 at 4:57 AM Gaier, Bjoern via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hey everyone,

I felt this question is different from my other question - hope this is okay.

So - I was playing around with the lookup function of the ExecutionSession and
there are some things I don't understand.
I have a .BC file with a function "?Sampler@@YAXXZ" referencing a
value "?_Plansch_test@@3HA" that is not defined in that module itself.
I first planed on not providing an address for "?_Plansch_test@@3HA"
but wanted to know the address of "?Sampler@@YAXXZ". So I issued
something like that:

                auto &ES = this->jit->getExecutionSession();
                SymbolLookupSet lookupSet;

                lookupSet.add("?Sampler@@YAXXZ",
llvm::orc::SymbolLookupFlags::WeaklyReferencedSymbol);
                ES.lookup({{&jit->getMainJITDylib(),
llvm::orc::JITDylibLookupFlags::MatchAllSymbols}}, lookupSet,
llvm::orc::LookupKind::Static, llvm::orc::SymbolState::Resolved);

Even though the "tryToGenerate" function of my DefinitionGenerator
returned a "llvm::orc::SymbolsNotFound" for the
"?_Plansch_test@@3HA", I got an address for
"?Sampler@@YAXXZ". Dumping the "MainJITDylib" I saw, that
the "?Sampler@@YAXXZ" was in an Error state. Which made sense - I
guess.

Question 1.)
Is there any way to reset the error state of "?Sampler@@YAXXZ" at this
point?
- After my first call I used the "define" function of the JITDylib to
define ?_Plansch_test@@3HA and then I tried calling the lookup function again
and again, however I only got the error: "Failed to materialize
symbols" even though "?_Plansch_test@@3HA" was defined now...
- Changing the order of the "define" and the "lookup" call
works of course, but I'm interested in the case where I don't know the
address yet.

Out of curiosity I repeated the previous scenario - but added
"?_Plansch_test@@3HA" to the "lookupSet" which changed
things drasticly. When executing "lookup" I now get the
"llvm::orc::SymbolsNotFound" error from my DefinitionGenerator and
"?Sampler@@YAXXZ" is stuck as a pending query in the
MaterializingInfos entries. When I then add a definition for
"?_Plansch_test@@3HA" and call "lookup" the second time, it
will succeed and give me the addresses. Also I'm able to execute the code
now. This is great! However...

Question 2.)
Why did the first call to lookup not return the address of
"?Sampler@@YAXXZ" like in the first scenario? I expected it would
return an address for it.

Question 3.)
Can I somehow combine both behaviours? Getting the address for all the symbols
(like in scenario 1) while still being able to provide definitions later (like
in scenario 2)?

Thank you in advance and kind greetings,
Björn
Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816, USt.ID-Nr.
DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert Plank, Markus
Bode, Heiko Lampert, Takashi Nagano, Junichi Tajika, Ergin Cansiz.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816, USt.ID-Nr.
DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert Plank, Markus
Bode, Heiko Lampert, Takashi Nagano, Junichi Tajika, Ergin Cansiz.
Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816, USt.ID-Nr.
DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert Plank, Markus
Bode, Heiko Lampert, Takashi Nagano, Junichi Tajika, Ergin Cansiz.
Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816, USt.ID-Nr.
DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert Plank, Markus
Bode, Heiko Lampert, Takashi Nagano, Junichi Tajika, Ergin Cansiz.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201001/2b03310a/attachment-0001.html>

Lang Hames via llvm-dev

2020-Oct-01 17:38 UTC

head link

[llvm-dev] ORC JIT - different behaviour of ExecutionSession.lookup?

Hi Bjoern,

Woah! That mail contains a lot of information and things I never tried
yet…> Actually… the entire MaterializationUnit and
> MaterializationResponsibility part is… quite… overwhelming >O<

The names sound intimidating, but these classes are actually relatively
simple:

  -- MaterializationUnit wraps a program representation, a map describing
the symbols that the program representation provides, and a callback to
materialize the program representation. For example a materialization unit
for an object file would contain the object file memory buffer, a list of
symbols defined in the object file, and a callback to an object linking
layer to link the object file. The most important method on
MaterializationUnit is the materialize method: The JIT will call this when
you look up any symbols that the unit provides.

-- MaterializationResponsibility objects get created when the JIT calls the
materialize method on a MaterializationUnit. They track the symbols being
materialized and provide a way to update the ExecutionSession, unblocking
any queries waiting on these particular symbols. Usually you just have to
pass them along to the JIT linker, but you can also interact with them
directly. The most important methods are notifyResolved, which assigns
addresses to symbols; notifyEmitted, which tells the ExecutionSession that
these symbols have been completely emitted; addDependencies, which
describes dependencies between symbols so that the ExecutionSession can
hold queries until symbols *and all of their dependencies* are safe to
access; and finally failMaterialization which can be used to tell
ExecutionSession that some symbols really couldn't be provided.

With “pop up” I mean… the process which is waiting for Module “Planschi”
to> “pop up” can not do a thing about it. It just waits until there is an table
> entry for it, indicating that the object file was loaded by another process
> – or not.

How do you decide when to stop waiting? There must be some way to detect
that the process has been completed otherwise a missing module would cause
you to wait forever. Do you just have a timeout? Or does it complete when
all processes have been asked whether they can produce Planschi and have
returned "no"?

To understand the MU and MR things better….  I try now to explain the
flow> of program...

Yes -- Everything you described is correct, so I'll skip ahead to where
things get fuzzy.

Normally I would call the “define” function of the DyLib I got (in
the> tryToGenerate function) and providing an absolute value. But I can’t
> because “Planschi” didn’t popped up yet. So I create instead a
> “MyTentativeDefinitionMaterializationUnit" and stuff it into the
define
> function right?
> „MyTentativeDefinitionMaterializationUnit“ is something that inherits from
> „TentativeDefinitionMaterializationUnit" right? Now I have those
overloaded
> functions where I need to provide implantation for it. This is kinda shady
> to me, but this will make waaaay more sense when I looked at it I guess.

Let's define a basic MyTentativeMaterializationUnit (there's no
TentativeMaterializationUnit -- that name was just an example to convey the
concept):

class MyTentativeDefinitionMaterializationUnit
    : public MaterializationUnit {
public:

  // Constructor tells us the name of the module (e.g. Planschi) that
  // we want to try to load, and the ObjectLinkingLayer that we will
  // use to load it.
  MyTentativeDefinitionMaterializationUnit(StringRef ModuleName,
                                           ObjectLinkingLayer &LinkLayer,
                                           SymbolFlagsMap InitalSymbolFlags,
                                           SymbolStringPtr InitSymbol,
                                           VModuleKey K)
    : MaterializationUnit(std::move(InitialSymbolFlags),
                          std::move(InitSymbol), K),
      ModuleName(ModuleName), LinkLayer(LinkLayer) {}

  // getName is used in JIT debug logging.
  StringRef getName() const override {
    return "MyTentativeDefinitionMaterializationUnit";
  }

  // Our materialize method will be called by the JIT outside the session
lock
  // to materialize the symbols contained in R->getSymbols() (which is the
same
  // set of symbols passed in to the constructor as InitialSymbolFlags).
  void materialize(std::unique_ptr<MaterializationResponsibility> R)
override {
    dbgs() << "I am a tentative definition unit trying to materialize
"
              "the following symbols for module " << ModuleName
<< ": "
           << R->getSymbols() << "\n";

    // Try to produce an object file for ModuleName. This is where all the
work
    // specific to your use-case will happen.
    Expected<std::unique_ptr<MemoryBuffer>> ObjectForModule      
requestObjectForModule(ModuleName);

    // If we didn't get an object file then register the failure and bail
out.
    if (!ObjectForModule) {

LinkLayer.getExecutionSession().reportError(ObjectForModule.takeError());
      R->failMaterialization();
      return.
    }

    // If we did get an object file back then we need to check the set of
    // definitions that it provided: There might be more than just the
symbols
    // we were searching for (e.g. if module Planschi defines "test"
and
"foo"
    // and we only created this tentative unit to cover "test" then we
need
to
    // tell the JIT that we're taking responsibility for "foo"
too.
    auto ProvidedSymbols getObjectSymbolInfo(LinkLayer.getExecutionSession(),
                                               *ObjectForModule);

    // If there was something wrong with the object and we couldn't get the
set
    // of symbols that it defines then bail out.
    if (!ProvidedSymbols) {

LinkLayer.getExecutionSession().reportError(ProvidedSymbols.takeError());
      R->failMaterialization();
      return;
    }

    // Otherwise we can define the set of new symbols as the set of all
defined
    // symbols minus the ones we already knew about:
    SymbolFlagsMap NewSymbols = std::move(ProvidedSymbols->first);
    for (auto &KV : R->getSymbols())
      NewSymbols.erase(KV.first);

    // If the set of new symbols is non-empty then notify the JIT about the
    // new symbols.
    if (!NewSymbols.empty()) {
      if (auto Err = R->defineMaterializing(std::move(NewSymbols))) {
        // If there was an error, for example one of our new symbols clashed
        // with an existing definition, then bail out.
        LinkLayer.getExecutionSession().reportError(std::move(Err));
        R->failMaterialization();
        return;
      }
    }

    // Otherwise we're all done: hand the object off to be linked.
    LinkLayer.emit(std::move(R), std::move(*ObjectForModule));
  }
private:
  StringRef ModuleName;
  ObjectLinkingLayer &LinkLayer;
};

All the interesting stuff for your specific system goes in to the call
"requestObjectForModule". It should return an object file buffer for
the
module if it is able, or an error otherwise (if the module can't be found).

If you want to make all of this asynchronous you can always write
requestObjectForModule as an asynchronous operation and do the rest of the
work contained in the materialize method above in a callback.

However… Now I’m lost…>
> What will happen now? So I gave „MyTentativeDefinitionMaterializationUnit“
> To “Planschi_test” while doing the lookup and…. For some reason “Planschi”
> has not popped up yet. Where will the waiting for it happen? In one of the
> “MyTentativeDefinitionMaterializationUnit" functions? Will the lookup
I did
> in Step 3 return or will it be blocked until I used the
> MaterializationResponsibility To finally resolve the symbol?
> I’m also not sure yet where exactly the code snippet you showed would go
> to…. But I guess it is in one of the functions of
> “MyTentativeDefinitionMaterializationUnit" right?

The MyTentativeDefinitionMaterializationUnit represents an attempt to
materialize the symbol you're looking up ("test") from an
as-yet-unknown
module ("Planschi"). The motivatino for doing this in a
MaterializationUnit
rather than the definition generator is that the definition generator runs
under the session lock, so it blocks any other JIT operations from
continuing. If you have any circular dependencies between modules this will
deadlock the JIT.

Uh… Thank you again and sorry for any stupid question in this…


Nope. No stupid questions -- this is tricky (and not as well documented as
I'd like). Hopefully the discussion has been helpful to you, and it's
definitely useful to me to hear what JIT clients are trying to do and where
the APIs are unclear.

-- Lang.

On Thu, Oct 1, 2020 at 12:53 AM Gaier, Bjoern <Bjoern.Gaier at horiba.com>
wrote:
> Hey Lang,
>
>
>
> Woah! That mail contains a lot of information and things I never tried
> yet… Actually… the entire MaterializationUnit and
> MaterializationResponsibility part is… quite… overwhelming >O<
>
>
>
> With “pop up” I mean… the process which is waiting for Module “Planschi”
> to “pop up” can not do a thing about it. It just waits until there is an
> table entry for it, indicating that the object file was loaded by another
> process – or not.
>
>
>
> To understand the MU and MR things better….  I try now to explain the flow
> of program, when somebody would add an IR file to it – not caring about how
> that file comes to that central process.
>
>
>
>    1. I go over all symbols and grab the names I’m caring about for
>    execution.
>    2. I create a DyLib with a unique name and stuff the IR module in it
>    3. I call lookup from LLJIT and pass the DyLib mentioned in 2. with my
>    symbols, get the addresses and can execute them
>
>
>
> This would be the case where everything went well.
>
> Okay…. So I use the “LLJITBuilder” to get my LLJIT instance, this would be
> before step 1.)
>
>
>
> In step 2.) when I created the new DyLib, I would call addGenerator and
> pass a class to it that inherits from “DefinitionGenerator” right? Let’s
> call it “MyDefinitionGenerator" then.
>
> “MyDefinitionGenerator" needs to provide an implementation for
> “tryToGenerate” which contains stuff which I ignore at this point.
>
>
>
> Now I’m in step 3.) right? And oh nooo! Planschi_test is undefined.
>
> Now the tryToGenerate function is executed and I realize there, that
> “Planschi_test” is something coming from a different module.
>
>
>
> Normally I would call the “define” function of the DyLib I got (in the
> tryToGenerate function) and providing an absolute value. But I can’t
> because “Planschi” didn’t popped up yet. So I create instead a
“MyTentativeDefinitionMaterializationUnit"
> and stuff it into the define function right?
>
> „MyTentativeDefinitionMaterializationUnit“ is something that inherits from
> „TentativeDefinitionMaterializationUnit" right? Now I have those
overloaded
> functions where I need to provide implantation for it. This is kinda shady
> to me, but this will make waaaay more sense when I looked at it I guess.
>
>
>
> However… Now I’m lost…
>
>
>
> What will happen now? So I gave „MyTentativeDefinitionMaterializationUnit“
> To “Planschi_test” while doing the lookup and…. For some reason “Planschi”
> has not popped up yet. Where will the waiting for it happen? In one of the
> “MyTentativeDefinitionMaterializationUnit" functions? Will the lookup
I did
> in Step 3 return or will it be blocked until I used the
> MaterializationResponsibility To finally resolve the symbol?
>
> I’m also not sure yet where exactly the code snippet you showed would go
> to…. But I guess it is in one of the functions of
“MyTentativeDefinitionMaterializationUnit"
> right?
>
>
>
> Uh… Thank you again and sorry for any stupid question in this…
>
>
>
> Kind greetings
>
> Björn
>
>
>
>
>
> *From:* Lang Hames <lhames at gmail.com>
> *Sent:* 30 September 2020 19:41
> *To:* Gaier, Bjoern <Bjoern.Gaier at horiba.com>
> *Cc:* LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] ORC JIT - different behaviour of
> ExecutionSession.lookup?
>
>
>
> Hi Bjoern,
>
>
>
> In the current state we don’t have a JIT only an handcrafted object
> loader. All our object files are pre-compiled but will be loaded by
> different processes into the same shared memory for the main process to
> execute them.
>
>
>
> Ok -- this helps a lot. As a general observation (not ORC specific): If
> you have multiple source processes targeting a single process then either
> everything needs to be lazy or your source processes will need to
> communicate via IPC to make things safe (if modules in different processes
> can directly depend on one another). ORC is not a good fit for this model:
> It aims to solve the cross-module dependency problem with a dependency
> graph in one process, rather than IPC between multiple processes.
>
>
>
> I think you could make ORC work for you here by changing your IPC just a
> little bit. Instead of:
>
>
>
> Source Process 1 (with JIT) -- Sends-linked-memory-to -\
>
> Source Process 2 (with JIT) -- Sends-linked-memory-to --> Target Process
>
> Source Process 3 (with JIT) -- Sends-linked-memory-to -/
>
>
>
> You would use:
>
>
>
> Source Process 1 -- Sends-relocatable-object-to -\
>
> Source Process 2 -- Sends-relocatable-object-to --> Target Process (With
> JIT)
>
> Source Process 3 -- Sends-relocatable-object-to -/
>
>
>
> In the second scenario you're sending relocatable objects via shared
> memory and then linking them with the JIT in the target process where we
> can fully track dependencies. You can link on multiple threads concurrently
> in the target process so you needn't worry about this fully serializing
> your link steps. In fact it is likely to cut down on some of the IPC needed
> between source processes.
>
>
>
> If you don't want the JIT linked into your target process then you can
add
> one extra step:
>
>
>
> Source Process 1 -- Sends-relocatable-object-to -\
>
> Source Process 2 -- Sends-relocatable-object-to --> JIT Process -->
Target
> Process
>
> Source Process 3 -- Sends-relocatable-object-to -/
>
>
>
> We would like to still have the files pre-compiled but we want to remove
> the handcrafted object loader. However, since we already create those files
> with Clang, it is not such a big change to compile them to IR.
>
>
>
> If you want to keep pre-compiling them to objects that should be fine too.
> The JIT can just load objects directly. The namespacing issue still needs
> to be solved, but that's doable at the object level.
>
>
>
> > You're trying to do all this on Hard Mode. ;)
> Yeah it was a silly thought… sorry xO
>
>
>
> No worries -- Just want to make sure you don't do more work than you
need
> to. :)
>
>
>
> You wouldn’t find a reference to “test”, you would find “Planschi_test”.
> We would first try resolving that with the standard C library functions. If
> this fails, then we will ask the object file “Planschi” if it has “test”
> and resolve it. If “Planschi” is not loaded for some reason, then we go on
> with the next symbol and mark “Planschi_test” as undefined. So… we fail
> linking. However, our code allows to just try the linking again – so we do
> and do until it succeeds. Only when the undefined references hit 0 we mark
> the object file as executable. However, if the module you depend on has
> still undefined references you can be screwed. Cause our code will not
> check the dependencies. Another reason why we want to move away.
>
>
>
> Great. I think I understand this now.
>
>
>
> I think… but this is probably not thought through well:
> I need to do those steps on the IR level now – because the
> DefinitionGenerator needs to resolve all symbols, but I might have to wait
> for a different module to pop up first.
>
>
>
> What causes a module to "pop up"?
>
>
>
> Either way I think you can solve this with a custom MaterializationUnit.
> Here's one scheme for solving this:
>
> If your definition generator receives a lookup for an external symbol
> (e.g. Planschi_test)  then we'll call it
"not-resolved-elsewhere" and
> create a TentativeDefinitionMaterializationUnit for it (this is something
> you would need to write up). The TentativeDefinitionMaterializationUnit
> says "I provide Planschi_test" (by including it in its symbol
flags map)
> but has no idea how it will actually do that yet. The emit method on your
> TentativeDefinitionMaterializationUnit will be called immediately to
> satisfy the query, and be given a MaterializationResponsibility object
> covering Planschi_test. You hold on to this MaterializationResponsibility
> object while you do whatever it is that might cause the Planschi module to
> "pop up". If/when Planschi does pop up you can add the object
file to it to
> its own "Planschi" JITDylib, then do the following:
>
>
>
> (1) Look up the address of test in the Planschi JITDylib. This time you do
> want to use the Resolved state (rather than ready) because we don't
want to
> create any cycles due to queries*.
>
>
>
> * E.g. "A" depends on "B", "B" depends on
"A". If you try to produce each
> by looking up the other and waiting for it to be "Ready" then
you'll
> deadlock. But you can look up the other and wait for it to be Resolved and
> that will always succeed (addresses are assigned without waiting for
> dependencies to be resolved). Then you add "B" as a dependence of
"A" and
> vice-versa and let Orc's dependence tracking take care of the rest.
Anybody
> issuing a query looking for "A" or "B" to be ready will
find that their
> query doesn't return until both symbols are fully-materialized and safe
to
> access.
>
>
>
> (2) Update the JIT state using your MaterializationResponsibility object:
>
>
>
> if (auto TestSym = ES.lookup({&PlanschiJD},
ES.intern("test")),
> SymbolState::Resolved) {
>
>   // Tell the JIT that the address of PlanschiTest is the same
>
>   // as the address of "test" in the Planschi JITDylib.
>
>   MR.notifyResolved({
>
>       { ES.intern("Planschi_test"),
>
>         JITEvaluatedSymbol(TestSym.getAddress(),
>
>                            JITSymbolFlags::Exported) }
>
>   });
>
>
>
>   // Tell the JIT that "Planschi_test" depends on the
"test" symbol in
>
>   // the Planschi JITDylib:
>
>   MR.addDependencies(ES.intern("Planschi_test"),
>
>                      {{&PlanschJD, { ES.intern("test") }}});
>
>
>
>   // Tell the JIT that "Planschi_test" has been emitted. We did
not
>
>   // actually have to do any work for this because Planschi_test is
>
>   // just an alias.
>
>   MR.notifyEmitted();
>
>
>
>   // And you're done.
>
> } else {
>
>
>
>   // If we get here then the "test" symbol couldn't be
resolved for some
>
>   // reason, so we need to report that error and notify everyone that
>
>   // "Planschi_test" failed too.
>
>   ES.reportError(TestSym.takeError());
>
>   MR.failMaterialization();
>
> }
>
>
>
> The code above takes care of the case where the Planschi module can be
> found. The other possibility is that you get to the end of your process
> that causes modules to "pop up" (whatever this is) and
there's still no
> Planschi module. In this case you just report an error and call
> MR.failMaterialization() to tell everyone who needed Planschi_test that it
> really couldn't be found.
>
>
>
> 1.) When all symbols have such a unique name now, couldn’t I just add them
> all to the same DyLib and use the RessourceTracker to unload them later?
> This would spare me searching the right DyLibs.
>
>
>
> The scheme I just described uses JITDylibs to namespace things, but you
> could also (1) do everything at the IR level as you've described and
rename
> them there, or (2) use a JITLink plugin to rename everything at the object
> file level (this is doable, as long as you have a JITLink implementation
> for your target platform. So far we only have MachO and ELF on x86-64. I
> think you need COFF, right?).
>
>
>
> 2.) Can I ask a DyLib if it has something for the “Parent_Child__Plansch_?
> test@@3HA" loaded?
>
>
>
> Sort of. You can look up the symbol flags for “Parent_Child__Plansch_?
> test@@3HA". This will not trigger materialization, but will call your
> definition generator to try to generate a definition. In the scheme I
> described above you shouldn't need to do this at all.
>
>
>
> My biggest worry is, I do all those fancy renaming and such, but I still
> have no guarantee that the missing symbols are already present, but when I
> look up my symbols of interest it is to late… cause if something is missing
> because the module was not loaded yet, then I get in an error state and I’m
> stuck there :c
>
>
>
> You shouldn't need to worry about this. By using a custom
materialization
> unit as described above you can keep a query waiting as long as you like
> before either failing it or providing a definition. You don't need to
error
> out until you've determined that a symbol really can't be provided.
>
>
>
>  3.) Isn’t the process of renaming those symbols like… really memory
> costly? Could I run into limitations of creating a too long function name?
>
>
>
> It depends on how long you expect your symbol names to get, and how memory
> constrained you are. Orc uses a string pool, so each string value is
> usually only held in one place in memory. This should keep the overhead
> pretty low.
>
>
>
> -- Lang.
>
>
>
> (n Wed, Sep 30, 2020 at 12:05 AM Gaier, Bjoern <Bjoern.Gaier at
horiba.com>
> wrote:
>
> Hey Lang,
>
>
>
> > Do you mean that the object file is produced by another process and is
> being loaded into your JIT process for execution, or that you want your JIT
> to produce code for several different processes? These are different
> problems with different solutions. I'll wait until I understand your
use
> case to answer further.
>
> In the current state we don’t have a JIT only an handcrafted object
> loader. All our object files are pre-compiled but will be loaded by
> different processes into the same shared memory for the main process to
> execute them.
>
>
>
> We would like to still have the files pre-compiled but we want to remove
> the handcrafted object loader. However, since we already create those files
> with Clang, it is not such a big change to compile them to IR.
>
>
>
> > You're trying to do all this on Hard Mode. ;)
>
> Yeah it was a silly thought… sorry xO
>
>
>
> > The part of your use case that is the most opaque to me is the
renaming.
> When you see a reference to "test" in some object, how do you
decide that
> it should resolve to the definition of "test" in, for example,
Planschi, as
> opposed to some other module? Do you just have a list of modules that you
> check in-order until you find a matching symbol name?
>
> You wouldn’t find a reference to “test”, you would find “Planschi_test”.
> We would first try resolving that with the standard C library functions. If
> this fails, then we will ask the object file “Planschi” if it has “test”
> and resolve it. If “Planschi” is not loaded for some reason, then we go on
> with the next symbol and mark “Planschi_test” as undefined. So… we fail
> linking. However, our code allows to just try the linking again – so we do
> and do until it succeeds. Only when the undefined references hit 0 we mark
> the object file as executable. However, if the module you depend on has
> still undefined references you can be screwed. Cause our code will not
> check the dependencies. Another reason why we want to move away.
>
>
>
> I think… but this is probably not thought through well:
>
> I need to do those steps on the IR level now – because the
> DefinitionGenerator needs to resolve all symbols, but I might have to wait
> for a different module to pop up first.
>
>
>
> So I would iterate over the globals and the functions of the Module:
>
> I would rename each symbol I encounter to have a unique name.
>
> This would be easy because when we load stuff, we give them a hierarchical
> name like “Parent_Child_Child_Dino”
>
> So "?Sampler@@YAXXZ" Could be renamed to
> "Parent_Child_Child_Dino_?Sampler@@YAXXZ".
>
> If I encounter an undefined reference that is not part of the standard
> library like "?_Plansch_test@@3HA" Then I would extract the
relative path
> and convert it to an absolute one, getting: “Parent_Child__Plansch_?
test@@3HA”.
> When the “Plansch” module was loaded, I would have give it an unique name
> as well so they would fit. I would then add them to there own DyLibs and
> would be happy… would I?
>
>
>
> 1.) When all symbols have such a unique name now, couldn’t I just add them
> all to the same DyLib and use the RessourceTracker to unload them later?
> This would spare me searching the right DyLibs.
>
> 2.) Can I ask a DyLib if it has something for the “Parent_Child__Plansch_?
> test@@3HA" loaded?
>
> My biggest worry is, I do all those fancy renaming and such, but I still
> have no guarantee that the missing symbols are already present, but when I
> look up my symbols of interest it is to late… cause if something is missing
> because the module was not loaded yet, then I get in an error state and I’m
> stuck there :c
>
> 3.) Isn’t the process of renaming those symbols like… really memory
> costly? Could I run into limitations of creating a too long function name?
>
>
>
> I hope that makes sense… cause I’m not good in explaining anything,
> especially not when it is in a different language :c
>
>
>
> Thank you again :D
>
>
>
> Kind greetings
>
> Björn
>
>
>
> *From:* Lang Hames <lhames at gmail.com>
> *Sent:* 29 September 2020 18:55
> *To:* Gaier, Bjoern <Bjoern.Gaier at horiba.com>
> *Cc:* LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] ORC JIT - different behaviour of
> ExecutionSession.lookup?
>
>
>
> Hi Bjoern,
>
>
>
> However, another thing of our system is, that each object file was loaded
> from a different process,
>
>
>
> Do you mean that the object file is produced by another process and is
> being loaded into your JIT process for execution, or that you want your JIT
> to produce code for several different processes? These are different
> problems with different solutions. I'll wait until I understand your
use
> case to answer further.
>
>
>
> Writing this… I actually wondered about something else but this feels like
> a terrible approach… But now I’m curious xD
> So… if I load an IR-Module, could I use the static LLVM compiler to
> compile it to an object file and then use the source code of the LLD to
> load and resolve the symbols the same way/kind we did in the past?
> That sounds more like the “addObjectFile” function of the LLJIT… And then
> I guess I have to write a LinkLayer or something? That is where my
> knowledge ends…
> Disclaimer: I don’t like that approach but it would be interesting to know
> (also cause some people here would be happy with it .w.”)
>
>
>
> You're trying to do all this on Hard Mode. ;)
>
>
>
> ORC takes care of all this kind of stuff for you:
>
>   Don't re-write IR. Leave references as symbolic -- they will be fixed
up
> in the JIT linker.
>
>   You don't need to write your own JIT linker. LLJIT has one built in.
>
>
>
> When you add things to the JIT:
>
>   - If you have a program representation (module, object file, etc.) and
> you want to add it then just go ahead.
>
>   - If your program representations contain external references then they
> must be resolvable or linking will fail (there's no getting around
that, in
> a JIT or in a regular compile), BUT...
>
>   - You can always add new definitions in response to a query by using a
> definition generator. If your definition generator can find/create a
> definition then great. If it can't then the reference really is
unresolved
> and linking really should fail.
>
>
>
> The part of your use case that is the most opaque to me is the renaming.
> When you see a reference to "test" in some object, how do you
decide that
> it should resolve to the definition of "test" in, for example,
Planschi, as
> opposed to some other module? Do you just have a list of modules that you
> check in-order until you find a matching symbol name?
>
>
>
> -- Lang.
>
>
>
> On Tue, Sep 29, 2020 at 2:30 AM Gaier, Bjoern <Bjoern.Gaier at
horiba.com>
> wrote:
>
> Hey Lang,
>
>
>
> Thank you for your help and your patience – also for your answers in the
> “ORC JIT - Can modules independently managed with one LLJIT instance? +
> problems with ExecutionSession.lookup” mail. Both problems have the same
> origin so I keep writing about it here, to avoid duplication.
>
> My big problem is still handling cross references between modules with
> “our” name scheme. Since our old loader loads object files, we resolved
> those references with object files and since they were already compiled, we
> knew all addresses right away. With the LLJIT as I finally understand, I
> will only get the addresses when I have resolved every references, which
> makes the code way safer.
>
> However, another thing of our system is, that each object file was loaded
> from a different process, so sometimes not all symbols for ModuleA were
> present because ModuleB was not loaded/requested yet. That was okay, so we
> kept resolving the undefined references of ModuleA until ModuleB was loaded
> and everything was fine.
>
>
>
> If I get it right… This would change now to having a single LLJIT
> representing the entire system. Each process would get it’s own DyLib for
> there module. However, I would need to check on IR-Level now which symbols
> would be undefined – correct? Because if I wait until
> “DefinitionGenerator::tryToGenerate” is called and have to wait for a
> module that might never be loaded, then I’m stuck there forever.
>
> 1.) If I find a symbol that is undefined – and it has our name scheme,
> then I would jump to for example ModuleB, which is also not jitted yet and
> would do a “replaceAllUsesWith” on the Symbol of ModuleA to ModuleB –
> right?
>
>                 - Would that mean, when I add ModuleA to DyLibA – is
> ModuleB then part of DyLibA as well?
>
> 1.1.) Alternatively I could rename the symbol
>
> 2.) If the ModuleB is already jitted, then I can take the address to  do
> the “replaceAllUsesWith” right?
>
>
>
> When I resolved all those references, then I can add the IR Module to my
> DyLib and compile it. However is it a good idea to use “replaceAllUsesWith”
> with addresses? Seems like the DefinitionGenerator would be jobless…
>
>
>
> Writing this… I actually wondered about something else but this feels like
> a terrible approach… But now I’m curious xD
>
> So… if I load an IR-Module, could I use the static LLVM compiler to
> compile it to an object file and then use the source code of the LLD to
> load and resolve the symbols the same way/kind we did in the past?
>
> That sounds more like the “addObjectFile” function of the LLJIT… And then
> I guess I have to write a LinkLayer or something? That is where my
> knowledge ends…
> Disclaimer: I don’t like that approach but it would be interesting to know
> (also cause some people here would be happy with it .w.”)
>
>
>
> Thank you so far!
>
>
>
> Kind greetings
>
> Björn
>
>
>
>
>
> *From:* Lang Hames <lhames at gmail.com>
> *Sent:* 29 September 2020 01:47
> *To:* Gaier, Bjoern <Bjoern.Gaier at horiba.com>
> *Cc:* LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] ORC JIT - different behaviour of
> ExecutionSession.lookup?
>
>
>
> Hi Bjoern,
>
>
>
> Even though the "tryToGenerate" function of my
DefinitionGenerator
> returned a "llvm::orc::SymbolsNotFound" for the
"?_Plansch_test@@3HA", I
> got an address for "?
>
>
>
> That's because you're issuing the lookup with RequiredState =>
SymbolState::Resolved. This means that your query will return as soon as
> "?Sampler@@YAXXZ" is assigned an address. In the JIT linker(s)
addresses
> are assigned before external references are looked up. So after your lookup
> returns the linker attempts to find "?_Plansch_test@@3HA", fails,
and so
> moves "?Sampler@@YAXXZ" to the error state.
>
>
>
> You almost always want to issue your lookups with RequiredState =>
SymbolState::Ready. This ensures that the query will not return until /
> unless the requested symbols (and all their dependencies) are successfully
> linked into the target process and ready to execute.
>
>
>
> Question 1.)
> Is there any way to reset the error state of "?Sampler@@YAXXZ" at
this
> point?
>
>
>
> No. However, the removable code feature will allow you to remove failed
> materialization units once it lands in the mainline.
>
>
>
> - After my first call I used the "define" function of the
JITDylib to
> define ?_Plansch_test@@3HA and then I tried calling the lookup function
> again and again, however I only got the error: "Failed to materialize
> symbols" even though "?_Plansch_test@@3HA" was defined
now...
> - Changing the order of the "define" and the "lookup"
call works of
> course, but I'm interested in the case where I don't know the
address yet.
>
>
>
> The JIT doesn't re-try linking. Once a symbol has failed to link it
> remains in the error state. In theory, once removable code is added you
> could choose to remove and then re-add "?Sampler@@YAXXZ"
> after "?_Plansch_test@@3HA" is defined. The real solution though
is just
> to make sure that "?_Plansch_test@@3HA" is defined (either
directly or
> via a generator) before you look up "?Sampler@@YAXXZ".
>
>
>
> Out of curiosity I repeated the previous scenario - but added
> "?_Plansch_test@@3HA" to the "lookupSet" which changed
things drasticly.
> When executing "lookup" I now get the
"llvm::orc::SymbolsNotFound" error
> from my DefinitionGenerator...
>
>
>
> Yes. Because "?_Plansch_test@@3HA" is not defined. You should see
a
> SymbolsNotFound error sent to your error reporter in the first scenario
> too, followed by a failure-to-materialize error for
"?Sampler@@YAXXZ".
>
>
>
> ... and "?Sampler@@YAXXZ" is stuck as a pending query in the
> MaterializingInfos entries.
>
>
>
> Huh. That sounds like a bug: All references to the query should be removed
> from the state machine before it returns its result (in this case an
> error). I'll see if I can reproduce this locally and fix it up, but it
> doesn't affect the discussion here.
>
>
>
> When I then add a definition for "?_Plansch_test@@3HA" and call
"lookup"
> the second time, it will succeed and give me the addresses. Also I'm
able
> to execute the code now. This is great! However...
>
>
>
> When a lookup fails we try to restore the ExecutionSession state to what
> it was prior to the query. This is why the sequence "lookup ->
symbols not
> found -> define -> lookup again" worked.
>
>
>
> Question 2.)
> Why did the first call to lookup not return the address of
"?Sampler@@YAXXZ"
> like in the first scenario? I expected it would return an address for it.
>
>
>
> A lookup must match against all symbols before anything is JIT'd. When
it
> failed to match "?_Plansch_test@@3HA" we immediately bailed out
with an
> error. There was no further attempt to compile "?Sampler@@YAXXZ".
>
>
>
> Question 3.)
> Can I somehow combine both behaviours? Getting the address for all the
> symbols (like in scenario 1) while still being able to provide definitions
> later (like in scenario 2)?
>
>
>
> *Sort of.*
>
>
>
> Definition generators allow you to provide a definition at the last minute
> (i.e. in response to a query). The best mental model though is: "All
> definitions that a generator can generate are part of the interface of the
> dylib". E.g. if you use a DynamicLibrarySearchGenerator to mirror
symbols
> from a dynamic library containing "foo", "bar" and
"baz" then you should
> think of your JITDylib as containing definitions for "foo",
"bar" and
> "baz", even if the generator hasn't actually added them to
the JITDylib
> yet. The reason is that it will add them in response to any query for them,
> so it's indistinguishable (except for timing and debug logging) from
the
> case where they're already present.
>
>
>
> If you need to be able to defer adding a "real" definition beyond
the
> initial lookup then your only option (and this only applies to functions)
> is a lazy-reexport. This allows you to provide a definition for a function
> while deferring lookup until the first execution of the re-export at
> runtime. I wouldn't generally use this to break dependencies though:
You
> want a definition of the real function body for
"?_Plansch_test@@3HA"
> already added to your JIT because (in general) you never know when
JIT'd
> code will need it.
>
>
>
> My turn to ask a question: How is "?_Plansch_test@@3HA" created,
and why
> not just add it up-front? :)
>
>
>
> -- Lang.
>
>
>
> On Mon, Sep 28, 2020 at 4:57 AM Gaier, Bjoern via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hey everyone,
>
>
>
> I felt this question is different from my other question - hope this is
> okay.
>
>
>
> So - I was playing around with the lookup function of the ExecutionSession
> and there are some things I don't understand.
>
> I have a .BC file with a function "?Sampler@@YAXXZ" referencing a
value
> "?_Plansch_test@@3HA" that is not defined in that module itself.
I first
> planed on not providing an address for "?_Plansch_test@@3HA" but
wanted
> to know the address of "?Sampler@@YAXXZ". So I issued something
like that:
>
>
>
>                 auto &ES = this->jit->getExecutionSession();
>
>                 SymbolLookupSet lookupSet;
>
>
>
>                 lookupSet.add("?Sampler@@YAXXZ",
> llvm::orc::SymbolLookupFlags::WeaklyReferencedSymbol);
>
>                 ES.lookup({{&jit->getMainJITDylib(),
> llvm::orc::JITDylibLookupFlags::MatchAllSymbols}}, lookupSet,
> llvm::orc::LookupKind::Static, llvm::orc::SymbolState::Resolved);
>
>
>
> Even though the "tryToGenerate" function of my
DefinitionGenerator
> returned a "llvm::orc::SymbolsNotFound" for the
"?_Plansch_test@@3HA", I
> got an address for "?Sampler@@YAXXZ". Dumping the
"MainJITDylib" I saw,
> that the "?Sampler@@YAXXZ" was in an Error state. Which made
sense - I
> guess.
>
>
>
> Question 1.)
>
> Is there any way to reset the error state of "?Sampler@@YAXXZ" at
this
> point?
>
> - After my first call I used the "define" function of the
JITDylib to
> define ?_Plansch_test@@3HA and then I tried calling the lookup function
> again and again, however I only got the error: "Failed to materialize
> symbols" even though "?_Plansch_test@@3HA" was defined
now...
>
> - Changing the order of the "define" and the "lookup"
call works of
> course, but I'm interested in the case where I don't know the
address yet.
>
>
>
> Out of curiosity I repeated the previous scenario - but added
> "?_Plansch_test@@3HA" to the "lookupSet" which changed
things drasticly.
> When executing "lookup" I now get the
"llvm::orc::SymbolsNotFound" error
> from my DefinitionGenerator and "?Sampler@@YAXXZ" is stuck as a
pending
> query in the MaterializingInfos entries. When I then add a definition for
> "?_Plansch_test@@3HA" and call "lookup" the second
time, it will succeed
> and give me the addresses. Also I'm able to execute the code now. This
is
> great! However...
>
>
>
> Question 2.)
>
> Why did the first call to lookup not return the address of
"?Sampler@@YAXXZ"
> like in the first scenario? I expected it would return an address for it.
>
>
>
> Question 3.)
>
> Can I somehow combine both behaviours? Getting the address for all the
> symbols (like in scenario 1) while still being able to provide definitions
> later (like in scenario 2)?
>
>
>
> Thank you in advance and kind greetings,
>
> Björn
>
> Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816,
> USt.ID-Nr. DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert
> Plank, Markus Bode, Heiko Lampert, Takashi Nagano, Junichi Tajika, Ergin
> Cansiz.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816,
> USt.ID-Nr. DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert
> Plank, Markus Bode, Heiko Lampert, Takashi Nagano, Junichi Tajika, Ergin
> Cansiz.
>
> Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816,
> USt.ID-Nr. DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert
> Plank, Markus Bode, Heiko Lampert, Takashi Nagano, Junichi Tajika, Ergin
> Cansiz.
>
> Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816,
> USt.ID-Nr. DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert
> Plank, Markus Bode, Heiko Lampert, Takashi Nagano, Junichi Tajika, Ergin
> Cansiz.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201001/4108a7b3/attachment-0001.html>

Gaier, Bjoern via llvm-dev

2020-Oct-02 14:18 UTC

head link

[llvm-dev] ORC JIT - different behaviour of ExecutionSession.lookup?

Hey Lang,

I’m happy that this discussion is useful for you as well. I tried implementing
everything we talked about but… I’m stuck again…

So this time however I attach the source file of my experiment, hoping that this
makes it easier. First of all, I wanted to start easy and don’t use object files
but IR files instead but with the same motivation – “CM_Schwimm.bc" has a
reference to “CM_Plansch.bc", but that one is not loaded yet.

I followed the idea that every module has a name and that name is used for the
DyLib. (bool addModule(const char *name, const char *file) Line: 240)
However, I wanted to use the “MainDyLib” as a central library to define stuff
from the C-Library. So I used the function “static void
defineCRTFunctions(llvm::orc::JITDylib &lib)” on the MainDyLib and when I
loaded a new module, I did the following:
curLib.addToLinkOrder(LLJIT->getMainJITDylib());

Where “curLib” was the freshly created DyLib. However, when I then did an lookup
of the 3 functions I’m interested in on the “curLib”, the C runtime functions
were not resolved. I marked code related to this with a comment containing ‘#1’
in the source file.

My second big issue was… instantiating
“MyTentativeDefinitionMaterializationUnit”.
I had no idea how to obtain the ObjectLinkingLayer…
Since I decided to use IR modules, I thought it is okay to remove that code for
now… so I commented out everything you wrote basically and tried… doing my own
thing.

I failed… bringing me to issue number 3.
Starting at line 65 I ‘noticed’ that a module is missing, loaded it and looked
up the missing reference… However I had no idea what to do with it then and….
Stopped…

Besides from those issues I noticed some things…

  1.  If I do “R.failMaterialization();”, my call to the LLJIT::lookup will
return with an error and subsequent calls to the function will keep failing
while the “MyTentativeDefinitionMaterializationUnit” was never invoked again. Am
I right, that this is because I failed linking and when linking failed I can’t
undo it?
  2.  If I don’t do anything in the materialize function of
“MyTentativeDefinitionMaterializationUnit” then the LLJIT::lookup will not
return – actually… nothing happens. The code seems to be in an endless loop
somewhere?

And… more about our old Code Module Loader… We have a tool that allows you to
load the object files into our system. You would usually invoke the commands
like that:
Load Planschi CM_Planschi.bc
Load Schwimm CM_Schwimm.bc

Resolve Schwimm
Resolve Planschi

You could swap the order of load and also could do as many resolves as you feel
like. This is what I mean, that maybe a missing reference will not be resolved.
If I would use that tool and wouldn’t load Planschi, but Schwimm, then I could
call Resolve again and again. If I decide to load Planschi after this and call
Resolve, then I would actually resolve something. So there are scenarios where
dependencies are never resolved. However, in that case the module is marked as
not executable.

Now… I try to “map” the Load and Resolve instructions of that tool, to the
source code I send in this mail.
Load would map to “bool addModule(const char *name, const char *file)” because I
loaded code but it is not executable yet.
Resolve I felt like would be “LLJIT::lookup” because when everything goes well
the code is executable. However, the major difference is… When “our Resolve”
fails, it returns telling you, that things are not okay, but this is allright!
You can just do another resolve when ever you want or not.
However LLJIT::lookup does not give you that chance, when it tells you it
failed, then that’s it.

With the input you gave me, I felt like resolve maps more to the “materialize”
function, because I have all the time there to wait for every module to appear
or not.
However… This function is invoked via the LLJIT::lookup, which means that this
function will not return until “materialize” did. If I would recreate our tool
with that behaviour, it would mean, when I execute the “Resolve Schwimm” command
(and I couldn’t resolve a thing) I would not get the chance to execute any other
command.
So… I guess the best would be then, to use a thread instead for this? If I have
a thread doing the lookup, then it could set a flag to “resolved” when the
lookup succeeded without blocking the user of the tool to load something else in
the meantime.
Is there no other way?

I really really hope this all makes sense… Cause I’m still trying to understand
“our world” and the “ORC JIT world” – but things are getting clearer and clearer
to me.

Thank you again!

Kind greetings
Björn

From: Lang Hames <lhames at gmail.com>
Sent: 01 October 2020 19:38
To: Gaier, Bjoern <Bjoern.Gaier at horiba.com>
Cc: LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] ORC JIT - different behaviour of
ExecutionSession.lookup?

Hi Bjoern,

Woah! That mail contains a lot of information and things I never tried yet…
Actually… the entire MaterializationUnit and MaterializationResponsibility part
is… quite… overwhelming >O<

The names sound intimidating, but these classes are actually relatively simple:

  -- MaterializationUnit wraps a program representation, a map describing the
symbols that the program representation provides, and a callback to materialize
the program representation. For example a materialization unit for an object
file would contain the object file memory buffer, a list of symbols defined in
the object file, and a callback to an object linking layer to link the object
file. The most important method on MaterializationUnit is the materialize
method: The JIT will call this when you look up any symbols that the unit
provides.

-- MaterializationResponsibility objects get created when the JIT calls the
materialize method on a MaterializationUnit. They track the symbols being
materialized and provide a way to update the ExecutionSession, unblocking any
queries waiting on these particular symbols. Usually you just have to pass them
along to the JIT linker, but you can also interact with them directly. The most
important methods are notifyResolved, which assigns addresses to symbols;
notifyEmitted, which tells the ExecutionSession that these symbols have been
completely emitted; addDependencies, which describes dependencies between
symbols so that the ExecutionSession can hold queries until symbols and all of
their dependencies are safe to access; and finally failMaterialization which can
be used to tell ExecutionSession that some symbols really couldn't be
provided.

With “pop up” I mean… the process which is waiting for Module “Planschi” to “pop
up” can not do a thing about it. It just waits until there is an table entry for
it, indicating that the object file was loaded by another process – or not.

How do you decide when to stop waiting? There must be some way to detect that
the process has been completed otherwise a missing module would cause you to
wait forever. Do you just have a timeout? Or does it complete when all processes
have been asked whether they can produce Planschi and have returned
"no"?

To understand the MU and MR things better….  I try now to explain the flow of
program...

Yes -- Everything you described is correct, so I'll skip ahead to where
things get fuzzy.

Normally I would call the “define” function of the DyLib I got (in the
tryToGenerate function) and providing an absolute value. But I can’t because
“Planschi” didn’t popped up yet. So I create instead a
“MyTentativeDefinitionMaterializationUnit" and stuff it into the define
function right?
„MyTentativeDefinitionMaterializationUnit“ is something that inherits from
„TentativeDefinitionMaterializationUnit" right? Now I have those overloaded
functions where I need to provide implantation for it. This is kinda shady to
me, but this will make waaaay more sense when I looked at it I guess.

Let's define a basic MyTentativeMaterializationUnit (there's no
TentativeMaterializationUnit -- that name was just an example to convey the
concept):

class MyTentativeDefinitionMaterializationUnit
    : public MaterializationUnit {
public:

  // Constructor tells us the name of the module (e.g. Planschi) that
  // we want to try to load, and the ObjectLinkingLayer that we will
  // use to load it.
  MyTentativeDefinitionMaterializationUnit(StringRef ModuleName,
                                           ObjectLinkingLayer &LinkLayer,
                                           SymbolFlagsMap InitalSymbolFlags,
                                           SymbolStringPtr InitSymbol,
                                           VModuleKey K)
    : MaterializationUnit(std::move(InitialSymbolFlags),
                          std::move(InitSymbol), K),
      ModuleName(ModuleName), LinkLayer(LinkLayer) {}

  // getName is used in JIT debug logging.
  StringRef getName() const override {
    return "MyTentativeDefinitionMaterializationUnit";
  }

  // Our materialize method will be called by the JIT outside the session lock
  // to materialize the symbols contained in R->getSymbols() (which is the
same
  // set of symbols passed in to the constructor as InitialSymbolFlags).
  void materialize(std::unique_ptr<MaterializationResponsibility> R)
override {
    dbgs() << "I am a tentative definition unit trying to materialize
"
              "the following symbols for module " << ModuleName
<< ": "
           << R->getSymbols() << "\n";

    // Try to produce an object file for ModuleName. This is where all the work
    // specific to your use-case will happen.
    Expected<std::unique_ptr<MemoryBuffer>> ObjectForModule      
requestObjectForModule(ModuleName);

    // If we didn't get an object file then register the failure and bail
out.
    if (!ObjectForModule) {
      LinkLayer.getExecutionSession().reportError(ObjectForModule.takeError());
      R->failMaterialization();
      return.
    }

    // If we did get an object file back then we need to check the set of
    // definitions that it provided: There might be more than just the symbols
    // we were searching for (e.g. if module Planschi defines "test"
and "foo"
    // and we only created this tentative unit to cover "test" then we
need to
    // tell the JIT that we're taking responsibility for "foo"
too.
    auto ProvidedSymbols = getObjectSymbolInfo(LinkLayer.getExecutionSession(),
                                               *ObjectForModule);

    // If there was something wrong with the object and we couldn't get the
set
    // of symbols that it defines then bail out.
    if (!ProvidedSymbols) {
      LinkLayer.getExecutionSession().reportError(ProvidedSymbols.takeError());
      R->failMaterialization();
      return;
    }

    // Otherwise we can define the set of new symbols as the set of all defined
    // symbols minus the ones we already knew about:
    SymbolFlagsMap NewSymbols = std::move(ProvidedSymbols->first);
    for (auto &KV : R->getSymbols())
      NewSymbols.erase(KV.first);

    // If the set of new symbols is non-empty then notify the JIT about the
    // new symbols.
    if (!NewSymbols.empty()) {
      if (auto Err = R->defineMaterializing(std::move(NewSymbols))) {
        // If there was an error, for example one of our new symbols clashed
        // with an existing definition, then bail out.
        LinkLayer.getExecutionSession().reportError(std::move(Err));
        R->failMaterialization();
        return;
      }
    }

    // Otherwise we're all done: hand the object off to be linked.
    LinkLayer.emit(std::move(R), std::move(*ObjectForModule));
  }
private:
  StringRef ModuleName;
  ObjectLinkingLayer &LinkLayer;
};

All the interesting stuff for your specific system goes in to the call
"requestObjectForModule". It should return an object file buffer for
the module if it is able, or an error otherwise (if the module can't be
found).

If you want to make all of this asynchronous you can always write
requestObjectForModule as an asynchronous operation and do the rest of the work
contained in the materialize method above in a callback.

However… Now I’m lost…

What will happen now? So I gave „MyTentativeDefinitionMaterializationUnit“ To
“Planschi_test” while doing the lookup and…. For some reason “Planschi” has not
popped up yet. Where will the waiting for it happen? In one of the
“MyTentativeDefinitionMaterializationUnit" functions? Will the lookup I did
in Step 3 return or will it be blocked until I used the
MaterializationResponsibility To finally resolve the symbol?
I’m also not sure yet where exactly the code snippet you showed would go to….
But I guess it is in one of the functions of
“MyTentativeDefinitionMaterializationUnit" right?

The MyTentativeDefinitionMaterializationUnit represents an attempt to
materialize the symbol you're looking up ("test") from an
as-yet-unknown module ("Planschi"). The motivatino for doing this in a
MaterializationUnit rather than the definition generator is that the definition
generator runs under the session lock, so it blocks any other JIT operations
from continuing. If you have any circular dependencies between modules this will
deadlock the JIT.

Uh… Thank you again and sorry for any stupid question in this…

Nope. No stupid questions -- this is tricky (and not as well documented as
I'd like). Hopefully the discussion has been helpful to you, and it's
definitely useful to me to hear what JIT clients are trying to do and where the
APIs are unclear.

-- Lang.

On Thu, Oct 1, 2020 at 12:53 AM Gaier, Bjoern <Bjoern.Gaier at
horiba.com<mailto:Bjoern.Gaier at horiba.com>> wrote:
Hey Lang,

Woah! That mail contains a lot of information and things I never tried yet…
Actually… the entire MaterializationUnit and MaterializationResponsibility part
is… quite… overwhelming >O<

With “pop up” I mean… the process which is waiting for Module “Planschi” to “pop
up” can not do a thing about it. It just waits until there is an table entry for
it, indicating that the object file was loaded by another process – or not.

To understand the MU and MR things better….  I try now to explain the flow of
program, when somebody would add an IR file to it – not caring about how that
file comes to that central process.

  1.  I go over all symbols and grab the names I’m caring about for execution.
  2.  I create a DyLib with a unique name and stuff the IR module in it
  3.  I call lookup from LLJIT and pass the DyLib mentioned in 2. with my
symbols, get the addresses and can execute them

This would be the case where everything went well.
Okay…. So I use the “LLJITBuilder” to get my LLJIT instance, this would be
before step 1.)

In step 2.) when I created the new DyLib, I would call addGenerator and pass a
class to it that inherits from “DefinitionGenerator” right? Let’s call it
“MyDefinitionGenerator" then.
“MyDefinitionGenerator" needs to provide an implementation for
“tryToGenerate” which contains stuff which I ignore at this point.

Now I’m in step 3.) right? And oh nooo! Planschi_test is undefined.
Now the tryToGenerate function is executed and I realize there, that
“Planschi_test” is something coming from a different module.

Normally I would call the “define” function of the DyLib I got (in the
tryToGenerate function) and providing an absolute value. But I can’t because
“Planschi” didn’t popped up yet. So I create instead a
“MyTentativeDefinitionMaterializationUnit" and stuff it into the define
function right?
„MyTentativeDefinitionMaterializationUnit“ is something that inherits from
„TentativeDefinitionMaterializationUnit" right? Now I have those overloaded
functions where I need to provide implantation for it. This is kinda shady to
me, but this will make waaaay more sense when I looked at it I guess.

However… Now I’m lost…

What will happen now? So I gave „MyTentativeDefinitionMaterializationUnit“ To
“Planschi_test” while doing the lookup and…. For some reason “Planschi” has not
popped up yet. Where will the waiting for it happen? In one of the
“MyTentativeDefinitionMaterializationUnit" functions? Will the lookup I did
in Step 3 return or will it be blocked until I used the
MaterializationResponsibility To finally resolve the symbol?
I’m also not sure yet where exactly the code snippet you showed would go to….
But I guess it is in one of the functions of
“MyTentativeDefinitionMaterializationUnit" right?

Uh… Thank you again and sorry for any stupid question in this…

Kind greetings
Björn

From: Lang Hames <lhames at gmail.com<mailto:lhames at gmail.com>>
Sent: 30 September 2020 19:41
To: Gaier, Bjoern <Bjoern.Gaier at horiba.com<mailto:Bjoern.Gaier at
horiba.com>>
Cc: LLVM Developers Mailing List <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [llvm-dev] ORC JIT - different behaviour of
ExecutionSession.lookup?

Hi Bjoern,

In the current state we don’t have a JIT only an handcrafted object loader. All
our object files are pre-compiled but will be loaded by different processes into
the same shared memory for the main process to execute them.

Ok -- this helps a lot. As a general observation (not ORC specific): If you have
multiple source processes targeting a single process then either everything
needs to be lazy or your source processes will need to communicate via IPC to
make things safe (if modules in different processes can directly depend on one
another). ORC is not a good fit for this model: It aims to solve the
cross-module dependency problem with a dependency graph in one process, rather
than IPC between multiple processes.

I think you could make ORC work for you here by changing your IPC just a little
bit. Instead of:

Source Process 1 (with JIT) -- Sends-linked-memory-to -\
Source Process 2 (with JIT) -- Sends-linked-memory-to --> Target Process
Source Process 3 (with JIT) -- Sends-linked-memory-to -/

You would use:

Source Process 1 -- Sends-relocatable-object-to -\
Source Process 2 -- Sends-relocatable-object-to --> Target Process (With JIT)
Source Process 3 -- Sends-relocatable-object-to -/

In the second scenario you're sending relocatable objects via shared memory
and then linking them with the JIT in the target process where we can fully
track dependencies. You can link on multiple threads concurrently in the target
process so you needn't worry about this fully serializing your link steps.
In fact it is likely to cut down on some of the IPC needed between source
processes.

If you don't want the JIT linked into your target process then you can add
one extra step:

Source Process 1 -- Sends-relocatable-object-to -\
Source Process 2 -- Sends-relocatable-object-to --> JIT Process --> Target
Process
Source Process 3 -- Sends-relocatable-object-to -/

We would like to still have the files pre-compiled but we want to remove the
handcrafted object loader. However, since we already create those files with
Clang, it is not such a big change to compile them to IR.

If you want to keep pre-compiling them to objects that should be fine too. The
JIT can just load objects directly. The namespacing issue still needs to be
solved, but that's doable at the object level.
> You're trying to do all this on Hard Mode. ;)Yeah it was a silly thought… sorry xO

No worries -- Just want to make sure you don't do more work than you need
to. :)

You wouldn’t find a reference to “test”, you would find “Planschi_test”. We
would first try resolving that with the standard C library functions. If this
fails, then we will ask the object file “Planschi” if it has “test” and resolve
it. If “Planschi” is not loaded for some reason, then we go on with the next
symbol and mark “Planschi_test” as undefined. So… we fail linking. However, our
code allows to just try the linking again – so we do and do until it succeeds.
Only when the undefined references hit 0 we mark the object file as executable.
However, if the module you depend on has still undefined references you can be
screwed. Cause our code will not check the dependencies. Another reason why we
want to move away.

Great. I think I understand this now.

I think… but this is probably not thought through well:
I need to do those steps on the IR level now – because the DefinitionGenerator
needs to resolve all symbols, but I might have to wait for a different module to
pop up first.

What causes a module to "pop up"?

Either way I think you can solve this with a custom MaterializationUnit.
Here's one scheme for solving this:
If your definition generator receives a lookup for an external symbol (e.g.
Planschi_test)  then we'll call it "not-resolved-elsewhere" and
create a TentativeDefinitionMaterializationUnit for it (this is something you
would need to write up). The TentativeDefinitionMaterializationUnit says "I
provide Planschi_test" (by including it in its symbol flags map) but has no
idea how it will actually do that yet. The emit method on your
TentativeDefinitionMaterializationUnit will be called immediately to satisfy the
query, and be given a MaterializationResponsibility object covering
Planschi_test. You hold on to this MaterializationResponsibility object while
you do whatever it is that might cause the Planschi module to "pop
up". If/when Planschi does pop up you can add the object file to it to its
own "Planschi" JITDylib, then do the following:

(1) Look up the address of test in the Planschi JITDylib. This time you do want
to use the Resolved state (rather than ready) because we don't want to
create any cycles due to queries*.

* E.g. "A" depends on "B", "B" depends on
"A". If you try to produce each by looking up the other and waiting
for it to be "Ready" then you'll deadlock. But you can look up the
other and wait for it to be Resolved and that will always succeed (addresses are
assigned without waiting for dependencies to be resolved). Then you add
"B" as a dependence of "A" and vice-versa and let Orc's
dependence tracking take care of the rest. Anybody issuing a query looking for
"A" or "B" to be ready will find that their query
doesn't return until both symbols are fully-materialized and safe to access.

(2) Update the JIT state using your MaterializationResponsibility object:

if (auto TestSym = ES.lookup({&PlanschiJD}, ES.intern("test")),
SymbolState::Resolved) {
  // Tell the JIT that the address of PlanschiTest is the same
  // as the address of "test" in the Planschi JITDylib.
  MR.notifyResolved({
      { ES.intern("Planschi_test"),
        JITEvaluatedSymbol(TestSym.getAddress(),
                           JITSymbolFlags::Exported) }
  });

  // Tell the JIT that "Planschi_test" depends on the "test"
symbol in
  // the Planschi JITDylib:
  MR.addDependencies(ES.intern("Planschi_test"),
                     {{&PlanschJD, { ES.intern("test") }}});

  // Tell the JIT that "Planschi_test" has been emitted. We did not
  // actually have to do any work for this because Planschi_test is
  // just an alias.
  MR.notifyEmitted();

  // And you're done.
} else {

  // If we get here then the "test" symbol couldn't be resolved
for some
  // reason, so we need to report that error and notify everyone that
  // "Planschi_test" failed too.
  ES.reportError(TestSym.takeError());
  MR.failMaterialization();
}

The code above takes care of the case where the Planschi module can be found.
The other possibility is that you get to the end of your process that causes
modules to "pop up" (whatever this is) and there's still no
Planschi module. In this case you just report an error and call
MR.failMaterialization() to tell everyone who needed Planschi_test that it
really couldn't be found.

1.) When all symbols have such a unique name now, couldn’t I just add them all
to the same DyLib and use the RessourceTracker to unload them later? This would
spare me searching the right DyLibs.

The scheme I just described uses JITDylibs to namespace things, but you could
also (1) do everything at the IR level as you've described and rename them
there, or (2) use a JITLink plugin to rename everything at the object file level
(this is doable, as long as you have a JITLink implementation for your target
platform. So far we only have MachO and ELF on x86-64. I think you need COFF,
right?).

2.) Can I ask a DyLib if it has something for the “Parent_Child__Plansch_?
test@@3HA" loaded?

Sort of. You can look up the symbol flags for “Parent_Child__Plansch_?
test@@3HA". This will not trigger materialization, but will call your
definition generator to try to generate a definition. In the scheme I described
above you shouldn't need to do this at all.

My biggest worry is, I do all those fancy renaming and such, but I still have no
guarantee that the missing symbols are already present, but when I look up my
symbols of interest it is to late… cause if something is missing because the
module was not loaded yet, then I get in an error state and I’m stuck there :c

You shouldn't need to worry about this. By using a custom materialization
unit as described above you can keep a query waiting as long as you like before
either failing it or providing a definition. You don't need to error out
until you've determined that a symbol really can't be provided.

 3.) Isn’t the process of renaming those symbols like… really memory costly?
Could I run into limitations of creating a too long function name?

It depends on how long you expect your symbol names to get, and how memory
constrained you are. Orc uses a string pool, so each string value is usually
only held in one place in memory. This should keep the overhead pretty low.

-- Lang.

(n Wed, Sep 30, 2020 at 12:05 AM Gaier, Bjoern <Bjoern.Gaier at
horiba.com<mailto:Bjoern.Gaier at horiba.com>> wrote:
Hey Lang,
> Do you mean that the object file is produced by another process and is
being loaded into your JIT process for execution, or that you want your JIT to
produce code for several different processes? These are different problems with
different solutions. I'll wait until I understand your use case to answer
further.In the current state we don’t have a JIT only an handcrafted object loader. All
our object files are pre-compiled but will be loaded by different processes into
the same shared memory for the main process to execute them.

We would like to still have the files pre-compiled but we want to remove the
handcrafted object loader. However, since we already create those files with
Clang, it is not such a big change to compile them to IR.
> You're trying to do all this on Hard Mode. ;)Yeah it was a silly thought… sorry xO
> The part of your use case that is the most opaque to me is the renaming.
When you see a reference to "test" in some object, how do you decide
that it should resolve to the definition of "test" in, for example,
Planschi, as opposed to some other module? Do you just have a list of modules
that you check in-order until you find a matching symbol name?You wouldn’t find a reference to “test”, you would find “Planschi_test”. We
would first try resolving that with the standard C library functions. If this
fails, then we will ask the object file “Planschi” if it has “test” and resolve
it. If “Planschi” is not loaded for some reason, then we go on with the next
symbol and mark “Planschi_test” as undefined. So… we fail linking. However, our
code allows to just try the linking again – so we do and do until it succeeds.
Only when the undefined references hit 0 we mark the object file as executable.
However, if the module you depend on has still undefined references you can be
screwed. Cause our code will not check the dependencies. Another reason why we
want to move away.

I think… but this is probably not thought through well:
I need to do those steps on the IR level now – because the DefinitionGenerator
needs to resolve all symbols, but I might have to wait for a different module to
pop up first.

So I would iterate over the globals and the functions of the Module:
I would rename each symbol I encounter to have a unique name.
This would be easy because when we load stuff, we give them a hierarchical name
like “Parent_Child_Child_Dino”
So "?Sampler@@YAXXZ" Could be renamed to
"Parent_Child_Child_Dino_?Sampler@@YAXXZ".

If I encounter an undefined reference that is not part of the standard library
like "?_Plansch_test@@3HA" Then I would extract the relative path and
convert it to an absolute one, getting: “Parent_Child__Plansch_? test@@3HA”.
When the “Plansch” module was loaded, I would have give it an unique name as
well so they would fit. I would then add them to there own DyLibs and would be
happy… would I?

1.) When all symbols have such a unique name now, couldn’t I just add them all
to the same DyLib and use the RessourceTracker to unload them later? This would
spare me searching the right DyLibs.
2.) Can I ask a DyLib if it has something for the “Parent_Child__Plansch_?
test@@3HA" loaded?
My biggest worry is, I do all those fancy renaming and such, but I still have no
guarantee that the missing symbols are already present, but when I look up my
symbols of interest it is to late… cause if something is missing because the
module was not loaded yet, then I get in an error state and I’m stuck there :c
3.) Isn’t the process of renaming those symbols like… really memory costly?
Could I run into limitations of creating a too long function name?

I hope that makes sense… cause I’m not good in explaining anything, especially
not when it is in a different language :c

Thank you again :D

Kind greetings
Björn

From: Lang Hames <lhames at gmail.com<mailto:lhames at gmail.com>>
Sent: 29 September 2020 18:55
To: Gaier, Bjoern <Bjoern.Gaier at horiba.com<mailto:Bjoern.Gaier at
horiba.com>>
Cc: LLVM Developers Mailing List <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [llvm-dev] ORC JIT - different behaviour of
ExecutionSession.lookup?

Hi Bjoern,

However, another thing of our system is, that each object file was loaded from a
different process,

Do you mean that the object file is produced by another process and is being
loaded into your JIT process for execution, or that you want your JIT to produce
code for several different processes? These are different problems with
different solutions. I'll wait until I understand your use case to answer
further.

Writing this… I actually wondered about something else but this feels like a
terrible approach… But now I’m curious xD
So… if I load an IR-Module, could I use the static LLVM compiler to compile it
to an object file and then use the source code of the LLD to load and resolve
the symbols the same way/kind we did in the past?
That sounds more like the “addObjectFile” function of the LLJIT… And then I
guess I have to write a LinkLayer or something? That is where my knowledge ends…
Disclaimer: I don’t like that approach but it would be interesting to know (also
cause some people here would be happy with it .w.”)

You're trying to do all this on Hard Mode. ;)

ORC takes care of all this kind of stuff for you:
  Don't re-write IR. Leave references as symbolic -- they will be fixed up
in the JIT linker.
  You don't need to write your own JIT linker. LLJIT has one built in.

When you add things to the JIT:
  - If you have a program representation (module, object file, etc.) and you
want to add it then just go ahead.
  - If your program representations contain external references then they must
be resolvable or linking will fail (there's no getting around that, in a JIT
or in a regular compile), BUT...
  - You can always add new definitions in response to a query by using a
definition generator. If your definition generator can find/create a definition
then great. If it can't then the reference really is unresolved and linking
really should fail.

The part of your use case that is the most opaque to me is the renaming. When
you see a reference to "test" in some object, how do you decide that
it should resolve to the definition of "test" in, for example,
Planschi, as opposed to some other module? Do you just have a list of modules
that you check in-order until you find a matching symbol name?

-- Lang.

On Tue, Sep 29, 2020 at 2:30 AM Gaier, Bjoern <Bjoern.Gaier at
horiba.com<mailto:Bjoern.Gaier at horiba.com>> wrote:
Hey Lang,

Thank you for your help and your patience – also for your answers in the “ORC
JIT - Can modules independently managed with one LLJIT instance? + problems with
ExecutionSession.lookup” mail. Both problems have the same origin so I keep
writing about it here, to avoid duplication.
My big problem is still handling cross references between modules with “our”
name scheme. Since our old loader loads object files, we resolved those
references with object files and since they were already compiled, we knew all
addresses right away. With the LLJIT as I finally understand, I will only get
the addresses when I have resolved every references, which makes the code way
safer.
However, another thing of our system is, that each object file was loaded from a
different process, so sometimes not all symbols for ModuleA were present because
ModuleB was not loaded/requested yet. That was okay, so we kept resolving the
undefined references of ModuleA until ModuleB was loaded and everything was
fine.

If I get it right… This would change now to having a single LLJIT representing
the entire system. Each process would get it’s own DyLib for there module.
However, I would need to check on IR-Level now which symbols would be undefined
– correct? Because if I wait until “DefinitionGenerator::tryToGenerate” is
called and have to wait for a module that might never be loaded, then I’m stuck
there forever.
1.) If I find a symbol that is undefined – and it has our name scheme, then I
would jump to for example ModuleB, which is also not jitted yet and would do a
“replaceAllUsesWith” on the Symbol of ModuleA to ModuleB – right?
                - Would that mean, when I add ModuleA to DyLibA – is ModuleB
then part of DyLibA as well?
1.1.) Alternatively I could rename the symbol
2.) If the ModuleB is already jitted, then I can take the address to  do the
“replaceAllUsesWith” right?

When I resolved all those references, then I can add the IR Module to my DyLib
and compile it. However is it a good idea to use “replaceAllUsesWith” with
addresses? Seems like the DefinitionGenerator would be jobless…

Writing this… I actually wondered about something else but this feels like a
terrible approach… But now I’m curious xD
So… if I load an IR-Module, could I use the static LLVM compiler to compile it
to an object file and then use the source code of the LLD to load and resolve
the symbols the same way/kind we did in the past?
That sounds more like the “addObjectFile” function of the LLJIT… And then I
guess I have to write a LinkLayer or something? That is where my knowledge ends…
Disclaimer: I don’t like that approach but it would be interesting to know (also
cause some people here would be happy with it .w.”)

Thank you so far!

Kind greetings
Björn

From: Lang Hames <lhames at gmail.com<mailto:lhames at gmail.com>>
Sent: 29 September 2020 01:47
To: Gaier, Bjoern <Bjoern.Gaier at horiba.com<mailto:Bjoern.Gaier at
horiba.com>>
Cc: LLVM Developers Mailing List <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [llvm-dev] ORC JIT - different behaviour of
ExecutionSession.lookup?

Hi Bjoern,

Even though the "tryToGenerate" function of my DefinitionGenerator
returned a "llvm::orc::SymbolsNotFound" for the
"?_Plansch_test@@3HA", I got an address for "?

That's because you're issuing the lookup with RequiredState ==
SymbolState::Resolved. This means that your query will return as soon as
"?Sampler@@YAXXZ" is assigned an address. In the JIT linker(s)
addresses are assigned before external references are looked up. So after your
lookup returns the linker attempts to find "?_Plansch_test@@3HA",
fails, and so moves "?Sampler@@YAXXZ" to the error state.

You almost always want to issue your lookups with RequiredState ==
SymbolState::Ready. This ensures that the query will not return until / unless
the requested symbols (and all their dependencies) are successfully linked into
the target process and ready to execute.

Question 1.)
Is there any way to reset the error state of "?Sampler@@YAXXZ" at this
point?

No. However, the removable code feature will allow you to remove failed
materialization units once it lands in the mainline.

- After my first call I used the "define" function of the JITDylib to
define ?_Plansch_test@@3HA and then I tried calling the lookup function again
and again, however I only got the error: "Failed to materialize
symbols" even though "?_Plansch_test@@3HA" was defined now...
- Changing the order of the "define" and the "lookup" call
works of course, but I'm interested in the case where I don't know the
address yet.

The JIT doesn't re-try linking. Once a symbol has failed to link it remains
in the error state. In theory, once removable code is added you could choose to
remove and then re-add "?Sampler@@YAXXZ" after
"?_Plansch_test@@3HA" is defined. The real solution though is just to
make sure that "?_Plansch_test@@3HA" is defined (either directly or
via a generator) before you look up "?Sampler@@YAXXZ".

Out of curiosity I repeated the previous scenario - but added
"?_Plansch_test@@3HA" to the "lookupSet" which changed
things drasticly. When executing "lookup" I now get the
"llvm::orc::SymbolsNotFound" error from my DefinitionGenerator...

Yes. Because "?_Plansch_test@@3HA" is not defined. You should see a
SymbolsNotFound error sent to your error reporter in the first scenario too,
followed by a failure-to-materialize error for "?Sampler@@YAXXZ".

... and "?Sampler@@YAXXZ" is stuck as a pending query in the
MaterializingInfos entries.

Huh. That sounds like a bug: All references to the query should be removed from
the state machine before it returns its result (in this case an error). I'll
see if I can reproduce this locally and fix it up, but it doesn't affect the
discussion here.

When I then add a definition for "?_Plansch_test@@3HA" and call
"lookup" the second time, it will succeed and give me the addresses.
Also I'm able to execute the code now. This is great! However...

When a lookup fails we try to restore the ExecutionSession state to what it was
prior to the query. This is why the sequence "lookup -> symbols not
found -> define -> lookup again" worked.

Question 2.)
Why did the first call to lookup not return the address of
"?Sampler@@YAXXZ" like in the first scenario? I expected it would
return an address for it.

A lookup must match against all symbols before anything is JIT'd. When it
failed to match "?_Plansch_test@@3HA" we immediately bailed out with
an error. There was no further attempt to compile "?Sampler@@YAXXZ".

Question 3.)
Can I somehow combine both behaviours? Getting the address for all the symbols
(like in scenario 1) while still being able to provide definitions later (like
in scenario 2)?

Sort of.

Definition generators allow you to provide a definition at the last minute (i.e.
in response to a query). The best mental model though is: "All definitions
that a generator can generate are part of the interface of the dylib". E.g.
if you use a DynamicLibrarySearchGenerator to mirror symbols from a dynamic
library containing "foo", "bar" and "baz" then you
should think of your JITDylib as containing definitions for "foo",
"bar" and "baz", even if the generator hasn't actually
added them to the JITDylib yet. The reason is that it will add them in response
to any query for them, so it's indistinguishable (except for timing and
debug logging) from the case where they're already present.

If you need to be able to defer adding a "real" definition beyond the
initial lookup then your only option (and this only applies to functions) is a
lazy-reexport. This allows you to provide a definition for a function while
deferring lookup until the first execution of the re-export at runtime. I
wouldn't generally use this to break dependencies though: You want a
definition of the real function body for "?_Plansch_test@@3HA" already
added to your JIT because (in general) you never know when JIT'd code will
need it.

My turn to ask a question: How is "?_Plansch_test@@3HA" created, and
why not just add it up-front? :)

-- Lang.

On Mon, Sep 28, 2020 at 4:57 AM Gaier, Bjoern via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hey everyone,

I felt this question is different from my other question - hope this is okay.

So - I was playing around with the lookup function of the ExecutionSession and
there are some things I don't understand.
I have a .BC file with a function "?Sampler@@YAXXZ" referencing a
value "?_Plansch_test@@3HA" that is not defined in that module itself.
I first planed on not providing an address for "?_Plansch_test@@3HA"
but wanted to know the address of "?Sampler@@YAXXZ". So I issued
something like that:

                auto &ES = this->jit->getExecutionSession();
                SymbolLookupSet lookupSet;

                lookupSet.add("?Sampler@@YAXXZ",
llvm::orc::SymbolLookupFlags::WeaklyReferencedSymbol);
                ES.lookup({{&jit->getMainJITDylib(),
llvm::orc::JITDylibLookupFlags::MatchAllSymbols}}, lookupSet,
llvm::orc::LookupKind::Static, llvm::orc::SymbolState::Resolved);

Even though the "tryToGenerate" function of my DefinitionGenerator
returned a "llvm::orc::SymbolsNotFound" for the
"?_Plansch_test@@3HA", I got an address for
"?Sampler@@YAXXZ". Dumping the "MainJITDylib" I saw, that
the "?Sampler@@YAXXZ" was in an Error state. Which made sense - I
guess.

Question 1.)
Is there any way to reset the error state of "?Sampler@@YAXXZ" at this
point?
- After my first call I used the "define" function of the JITDylib to
define ?_Plansch_test@@3HA and then I tried calling the lookup function again
and again, however I only got the error: "Failed to materialize
symbols" even though "?_Plansch_test@@3HA" was defined now...
- Changing the order of the "define" and the "lookup" call
works of course, but I'm interested in the case where I don't know the
address yet.

Out of curiosity I repeated the previous scenario - but added
"?_Plansch_test@@3HA" to the "lookupSet" which changed
things drasticly. When executing "lookup" I now get the
"llvm::orc::SymbolsNotFound" error from my DefinitionGenerator and
"?Sampler@@YAXXZ" is stuck as a pending query in the
MaterializingInfos entries. When I then add a definition for
"?_Plansch_test@@3HA" and call "lookup" the second time, it
will succeed and give me the addresses. Also I'm able to execute the code
now. This is great! However...

Question 2.)
Why did the first call to lookup not return the address of
"?Sampler@@YAXXZ" like in the first scenario? I expected it would
return an address for it.

Question 3.)
Can I somehow combine both behaviours? Getting the address for all the symbols
(like in scenario 1) while still being able to provide definitions later (like
in scenario 2)?

Thank you in advance and kind greetings,
Björn
Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816, USt.ID-Nr.
DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert Plank, Markus
Bode, Heiko Lampert, Takashi Nagano, Junichi Tajika, Ergin Cansiz.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816, USt.ID-Nr.
DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert Plank, Markus
Bode, Heiko Lampert, Takashi Nagano, Junichi Tajika, Ergin Cansiz.
Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816, USt.ID-Nr.
DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert Plank, Markus
Bode, Heiko Lampert, Takashi Nagano, Junichi Tajika, Ergin Cansiz.
Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816, USt.ID-Nr.
DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert Plank, Markus
Bode, Heiko Lampert, Takashi Nagano, Junichi Tajika, Ergin Cansiz.
Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816, USt.ID-Nr.
DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert Plank, Markus
Bode, Heiko Lampert, Takashi Nagano, Junichi Tajika, Ergin Cansiz.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201002/3a166084/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: LeviathanORC.cpp
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201002/3a166084/attachment-0001.ksh>

llvm dev - Oct 2020 - ORC JIT - different behaviour of ExecutionSession.lookup?

[llvm-dev] ORC JIT - different behaviour of ExecutionSession.lookup?

[llvm-dev] ORC JIT - different behaviour of ExecutionSession.lookup?

[llvm-dev] ORC JIT - different behaviour of ExecutionSession.lookup?