thr3ads.net - llvm dev - [llvm-dev] IRMover asserts "mapping to a source type" when repeatedly linking

If this information is useful, please help other people find it:
Share via:

Andres Freund via llvm-dev

2018-Mar-23 23:11 UTC

[llvm-dev] IRMover asserts "mapping to a source type" when repeatedly linking - usage or LLVM bug?

Hi,

(sorry if the CC's are inappropriate, they seemed relevant based on a
git log of IRMover.cpp)

I'm using LLVM to implement Just-in-Time compilation for PostgreSQL. One
part of that is doing inlining of operators. For that I'm using bitcode
pre-generated using clang.

The current code uses a single LLVMContext & Orc to generate the
code. That largely workes well.  But inlining presents a bit of a
problem.

(end of setup)

I've tried two approaches to inlining:

In the first I'm, lazily, caching the source modules, and CloneModule()
them before handing them to IRMover to inline necessary function into
the target module.  That works well in release builds, but occasionally
triggers an assert in debug mode ("mapping to a source type"). More on
the assert later.

The second approach is to *not* cache modules but re-read them from disk
(or memory, but that's irrelevant here). That works without any sort of
asserts, but "leaks" memory because everytime a module is re-read from
disk it re-creates types (cf BitcodeReader.cpp:parseTypeTableBody()).
Creating a different LLVMContext for every single expression used seems
infeasible, and fixing LLVM to have type lifetimes bound to Modules
seems daunting.

Therefore the first approach seems preferable by far.

What I'm observing is that creating the same Module contents (modulo
some constants) twice and then inlining a number of Modules
(selectively, but that seems unrelated), works the first time, but
crashes the second with the "mapping to a source type" assertion.

I'm not 100% sure yet, but my understanding of what happens is the
following:

1) "main1" module gets created
2) inlinining loads module "a", clones it, links module
"a'" into
   "main1". This module contains an opaque struct definition
"t" that is now
   referenced by "main1'".
3) inlinining loads module "b", clones it, links module
"b'" into
   "main1'". This module has a *non-opaque* definition of
"t". Thus "t",
   as referenced by "main1'", gets its type "t"
completed (via
   linkDefinedTypeBodies()).
4) "main2" module gets created
5) inlining clones module "a". As types are global to a context,
"t" now
   has a body, *and* the body appears to be from the source module
   (actually "b'"). "mapping to a source type" is the
result.

As far as I can tell, which doesn't say much, I don't know this code
well, the code actually handles this situation correctly. It's just the
assert that triggers that is problematic.

Am I doing something unsupported here or is this a bug?

As far as I can tell there unfortunately is no tool in LLVM that allows
to conveniently create a reproducer, unfortunately. I can write a
short-ish C++ program if that helps.

Greetings,

Andres Freund

Duncan P. N. Exon Smith via llvm-dev

2018-Mar-26 23:44 UTC

head link

[llvm-dev] IRMover asserts "mapping to a source type" when repeatedly linking - usage or LLVM bug?

> On Mar 23, 2018, at 16:11, Andres Freund via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi,
> 
> (sorry if the CC's are inappropriate, they seemed relevant based on a
> git log of IRMover.cpp)
> 
> I'm using LLVM to implement Just-in-Time compilation for PostgreSQL.
One
> part of that is doing inlining of operators. For that I'm using bitcode
> pre-generated using clang.
> 
> The current code uses a single LLVMContext & Orc to generate the
> code. That largely workes well.  But inlining presents a bit of a
> problem.
> 
> (end of setup)
> 
> I've tried two approaches to inlining:
> 
> In the first I'm, lazily, caching the source modules, and CloneModule()
> them before handing them to IRMover to inline necessary function into
> the target module.  That works well in release builds, but occasionally
> triggers an assert in debug mode ("mapping to a source type").
More on
> the assert later.
> 
> The second approach is to *not* cache modules but re-read them from disk
> (or memory, but that's irrelevant here). That works without any sort of
> asserts, but "leaks" memory because everytime a module is re-read
from
> disk it re-creates types (cf BitcodeReader.cpp:parseTypeTableBody()).
> Creating a different LLVMContext for every single expression used seems
> infeasible, and fixing LLVM to have type lifetimes bound to Modules
> seems daunting.
The idea would be to fix LLVM to have LLVMContext lifetime bound to llvm::Module
(perhaps, reference-counted), but yes, it's a big project.

Can you GC the types in the LLVMContext?  IIRC, LLVMContext has a list of the
modules using it.  You could GC "every so often" to clean up unused
types (assuming they get discarded).  (It's possible we even have API for
that already.)
> Therefore the first approach seems preferable by far.
> 
> What I'm observing is that creating the same Module contents (modulo
> some constants) twice and then inlining a number of Modules
It's not exactly clear to me what you mean by "inlining a module".
Are you linking one module into another, and then inlining those functions?  Or
just linking?
> (selectively, but that seems unrelated), works the first time, but
> crashes the second with the "mapping to a source type" assertion.
> 
> I'm not 100% sure yet, but my understanding of what happens is the
> following:
> 
> 1) "main1" module gets created
> 2) inlinining loads module "a", clones it, links module
"a'" into
>   "main1". This module contains an opaque struct definition
"t" that is now
>   referenced by "main1'".
> 3) inlinining loads module "b", clones it, links module
"b'" into
>   "main1'". This module has a *non-opaque* definition of
"t". Thus "t",
>   as referenced by "main1'", gets its type "t"
completed (via
>   linkDefinedTypeBodies()).
> 4) "main2" module gets created
> 5) inlining clones module "a". As types are global to a context,
"t" now
>   has a body, *and* the body appears to be from the source module
>   (actually "b'"). "mapping to a source type" is
the result.
When exactly does "mapping to a source type" fire?
- When cloning?
- When linking the clone into another module?
If the latter, which module did you link the cloned "a" into?
> As far as I can tell, which doesn't say much, I don't know this
code
> well, the code actually handles this situation correctly. It's just the
> assert that triggers that is problematic.
The assertion seems valid to me for the usual case of linking one module into
another.
> Am I doing something unsupported here or is this a bug?
It seems strange to link module "X" into module "Y" more
than once, even transitively.  It sounds like you're doing that, and I
suppose that's likely what's causing the trouble.
> As far as I can tell there unfortunately is no tool in LLVM that allows
> to conveniently create a reproducer, unfortunately.
Contributions welcome :).
> I can write a
> short-ish C++ program if that helps.
> 
> Greetings,
> 
> Andres Freund
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Andres Freund via llvm-dev

2018-Mar-27 00:09 UTC

head link

[llvm-dev] IRMover asserts "mapping to a source type" when repeatedly linking - usage or LLVM bug?

Hi,

On 2018-03-26 16:44:05 -0700, Duncan P. N. Exon Smith
wrote:> > The second approach is to *not* cache modules but re-read them from
disk
> > (or memory, but that's irrelevant here). That works without any
sort of
> > asserts, but "leaks" memory because everytime a module is
re-read from
> > disk it re-creates types (cf BitcodeReader.cpp:parseTypeTableBody()).
> > Creating a different LLVMContext for every single expression used
seems
> > infeasible, and fixing LLVM to have type lifetimes bound to Modules
> > seems daunting.
> 
> The idea would be to fix LLVM to have LLVMContext lifetime bound to
> llvm::Module (perhaps, reference-counted), but yes, it's a big
> project.
Right. Sounds like there's a number of hairy problems around doing
so... It's currently trivial to have types around persistently that
exist in memory but aren't in any module, and that's useful. So it'd
have to be refcounting that allows for that, not just on a per-module
basis...

> Can you GC the types in the LLVMContext?
Not without modifying LLVM I think, afaict all the necessary
datastructures are hidden in LLVMContextImpl, which isn't exposed.

> IIRC, LLVMContext has a list of the modules using it.
It does.

> You could GC
> "every so often" to clean up unused types (assuming they get
> discarded).  (It's possible we even have API for that already.)
grepping around I only found additions or lookups in the relevant
datastructures.

> 
> > Therefore the first approach seems preferable by far.
> > 
> > What I'm observing is that creating the same Module contents
(modulo
> > some constants) twice and then inlining a number of Modules
> 
> It's not exactly clear to me what you mean by "inlining a
module".
> Are you linking one module into another, and then inlining those
> functions?  Or just linking?
Sorry for the imprecision. Yes, I'm using IRMover to link subsets of a
module into another one.  The inlining is then done by the normal
inlining passes (that works).

> > (selectively, but that seems unrelated), works the first time, but
> > crashes the second with the "mapping to a source type"
assertion.
> > 
> > I'm not 100% sure yet, but my understanding of what happens is the
> > following:
> > 
> > 1) "main1" module gets created
> > 2) inlinining loads module "a", clones it, links module
"a'" into
> >   "main1". This module contains an opaque struct definition
"t" that is now
> >   referenced by "main1'".
> > 3) inlinining loads module "b", clones it, links module
"b'" into
> >   "main1'". This module has a *non-opaque* definition of
"t". Thus "t",
> >   as referenced by "main1'", gets its type "t"
completed (via
> >   linkDefinedTypeBodies()).
> > 4) "main2" module gets created
> > 5) inlining clones module "a". As types are global to a
context, "t" now
> >   has a body, *and* the body appears to be from the source module
> >   (actually "b'"). "mapping to a source type"
is the result.
> 
> When exactly does "mapping to a source type" fire?
> - When cloning?
> - When linking the clone into another module?
> If the latter, which module did you link the cloned "a" into?
It fires when linking the clone. I'm trying to:

1) IRMover(main1).move(CloneModule(a), list-of-globals1)
2) IRMover(main1).move(CloneModule(b), list-of-globals2)
3) IRMover(main2).move(CloneModule(a), list-of-globals1)
4) IRMover(main2).move(CloneModule(b), list-of-globals2)

The assert fires in 3). The reason is that an opaque type in a (which is
the same as the type in CloneModule(a), because cloning doesn't recreate
types), got linked into main1 in 1). Then 2) completes that type,
because b has a definition of the struct. 3) then gets confused why the
source module knows the type.  At least that's my theory.

> > Am I doing something unsupported here or is this a bug?
> 
> It seems strange to link module "X" into module "Y"
more than once,
> even transitively.  It sounds like you're doing that, and I suppose
> that's likely what's causing the trouble.
Well, I'm not really doing that, am I? The only reason there's some
"persistent" behaviour is that types that are modified in a cloned
module are also modified in the source.

> > As far as I can tell there unfortunately is no tool in LLVM that
allows
> > to conveniently create a reproducer, unfortunately.
> 
> Contributions welcome :).
Happy to do so (I've 5 outstanding reviews.llvm.org submissions ;)). Not
sure if there's a good place to integrate that.  Might be easiest to
write a unittest for it?  Don't quite see how sensibly add to llvm-link
the capability to clone modules without being too tailored to this.

Greetings,

Andres Freund

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Mar 2018 - IRMover asserts "mapping to a source type" when repeatedly linking - usage or LLVM bug?

[llvm-dev] IRMover asserts "mapping to a source type" when repeatedly linking - usage or LLVM bug?

[llvm-dev] IRMover asserts "mapping to a source type" when repeatedly linking - usage or LLVM bug?

[llvm-dev] IRMover asserts "mapping to a source type" when repeatedly linking - usage or LLVM bug?

Possibly Parallel Threads