thr3ads.net - llvm dev - [llvm-dev] IRMover asserts "mapping to a source type" when repeatedly linking

If this information is useful, please help other people find it:
Share via:

Andres Freund via llvm-dev

2018-Mar-27 00:09 UTC

[llvm-dev] IRMover asserts "mapping to a source type" when repeatedly linking - usage or LLVM bug?

Hi,

On 2018-03-26 16:44:05 -0700, Duncan P. N. Exon Smith
wrote:> > The second approach is to *not* cache modules but re-read them from
disk
> > (or memory, but that's irrelevant here). That works without any
sort of
> > asserts, but "leaks" memory because everytime a module is
re-read from
> > disk it re-creates types (cf BitcodeReader.cpp:parseTypeTableBody()).
> > Creating a different LLVMContext for every single expression used
seems
> > infeasible, and fixing LLVM to have type lifetimes bound to Modules
> > seems daunting.
> 
> The idea would be to fix LLVM to have LLVMContext lifetime bound to
> llvm::Module (perhaps, reference-counted), but yes, it's a big
> project.
Right. Sounds like there's a number of hairy problems around doing
so... It's currently trivial to have types around persistently that
exist in memory but aren't in any module, and that's useful. So it'd
have to be refcounting that allows for that, not just on a per-module
basis...

> Can you GC the types in the LLVMContext?
Not without modifying LLVM I think, afaict all the necessary
datastructures are hidden in LLVMContextImpl, which isn't exposed.

> IIRC, LLVMContext has a list of the modules using it.
It does.

> You could GC
> "every so often" to clean up unused types (assuming they get
> discarded).  (It's possible we even have API for that already.)
grepping around I only found additions or lookups in the relevant
datastructures.

> 
> > Therefore the first approach seems preferable by far.
> > 
> > What I'm observing is that creating the same Module contents
(modulo
> > some constants) twice and then inlining a number of Modules
> 
> It's not exactly clear to me what you mean by "inlining a
module".
> Are you linking one module into another, and then inlining those
> functions?  Or just linking?
Sorry for the imprecision. Yes, I'm using IRMover to link subsets of a
module into another one.  The inlining is then done by the normal
inlining passes (that works).

> > (selectively, but that seems unrelated), works the first time, but
> > crashes the second with the "mapping to a source type"
assertion.
> > 
> > I'm not 100% sure yet, but my understanding of what happens is the
> > following:
> > 
> > 1) "main1" module gets created
> > 2) inlinining loads module "a", clones it, links module
"a'" into
> >   "main1". This module contains an opaque struct definition
"t" that is now
> >   referenced by "main1'".
> > 3) inlinining loads module "b", clones it, links module
"b'" into
> >   "main1'". This module has a *non-opaque* definition of
"t". Thus "t",
> >   as referenced by "main1'", gets its type "t"
completed (via
> >   linkDefinedTypeBodies()).
> > 4) "main2" module gets created
> > 5) inlining clones module "a". As types are global to a
context, "t" now
> >   has a body, *and* the body appears to be from the source module
> >   (actually "b'"). "mapping to a source type"
is the result.
> 
> When exactly does "mapping to a source type" fire?
> - When cloning?
> - When linking the clone into another module?
> If the latter, which module did you link the cloned "a" into?
It fires when linking the clone. I'm trying to:

1) IRMover(main1).move(CloneModule(a), list-of-globals1)
2) IRMover(main1).move(CloneModule(b), list-of-globals2)
3) IRMover(main2).move(CloneModule(a), list-of-globals1)
4) IRMover(main2).move(CloneModule(b), list-of-globals2)

The assert fires in 3). The reason is that an opaque type in a (which is
the same as the type in CloneModule(a), because cloning doesn't recreate
types), got linked into main1 in 1). Then 2) completes that type,
because b has a definition of the struct. 3) then gets confused why the
source module knows the type.  At least that's my theory.

> > Am I doing something unsupported here or is this a bug?
> 
> It seems strange to link module "X" into module "Y"
more than once,
> even transitively.  It sounds like you're doing that, and I suppose
> that's likely what's causing the trouble.
Well, I'm not really doing that, am I? The only reason there's some
"persistent" behaviour is that types that are modified in a cloned
module are also modified in the source.

> > As far as I can tell there unfortunately is no tool in LLVM that
allows
> > to conveniently create a reproducer, unfortunately.
> 
> Contributions welcome :).
Happy to do so (I've 5 outstanding reviews.llvm.org submissions ;)). Not
sure if there's a good place to integrate that.  Might be easiest to
write a unittest for it?  Don't quite see how sensibly add to llvm-link
the capability to clone modules without being too tailored to this.

Greetings,

Andres Freund

Duncan P. N. Exon Smith via llvm-dev

2018-Mar-27 00:46 UTC

head link

[llvm-dev] IRMover asserts "mapping to a source type" when repeatedly linking - usage or LLVM bug?

> On Mar 26, 2018, at 17:09, Andres Freund <andres at anarazel.de>
wrote:
> 
> Hi,
> 
> On 2018-03-26 16:44:05 -0700, Duncan P. N. Exon Smith wrote:
>>> The second approach is to *not* cache modules but re-read them from
disk
>>> (or memory, but that's irrelevant here). That works without any
sort of
>>> asserts, but "leaks" memory because everytime a module is
re-read from
>>> disk it re-creates types (cf
BitcodeReader.cpp:parseTypeTableBody()).
>>> Creating a different LLVMContext for every single expression used
seems
>>> infeasible, and fixing LLVM to have type lifetimes bound to Modules
>>> seems daunting.
>> 
>> The idea would be to fix LLVM to have LLVMContext lifetime bound to
>> llvm::Module (perhaps, reference-counted), but yes, it's a big
>> project.
> 
> Right. Sounds like there's a number of hairy problems around doing
> so... It's currently trivial to have types around persistently that
> exist in memory but aren't in any module, and that's useful. So
it'd
> have to be refcounting that allows for that, not just on a per-module
> basis...
> 
> 
>> Can you GC the types in the LLVMContext?
> 
> Not without modifying LLVM I think, afaict all the necessary
> datastructures are hidden in LLVMContextImpl, which isn't exposed.
If you want to make this approach work, I think you can add an API to
LLVMContext that garbage collects types, keeping only the union of those
explicitly passed in and the ones used by modules the LLVMContext knows about.
> 
>> 
>>> Therefore the first approach seems preferable by far.
>>> 
>>> What I'm observing is that creating the same Module contents
(modulo
>>> some constants) twice and then inlining a number of Modules
>> 
>> It's not exactly clear to me what you mean by "inlining a
module".
>> Are you linking one module into another, and then inlining those
>> functions?  Or just linking?
> 
> Sorry for the imprecision. Yes, I'm using IRMover to link subsets of a
> module into another one.  The inlining is then done by the normal
> inlining passes (that works).
> 
> 
>>> (selectively, but that seems unrelated), works the first time, but
>>> crashes the second with the "mapping to a source type"
assertion.
>>> 
>>> I'm not 100% sure yet, but my understanding of what happens is
the
>>> following:
>>> 
>>> 1) "main1" module gets created
>>> 2) inlinining loads module "a", clones it, links module
"a'" into
>>>  "main1". This module contains an opaque struct
definition "t" that is now
>>>  referenced by "main1'".
>>> 3) inlinining loads module "b", clones it, links module
"b'" into
>>>  "main1'". This module has a *non-opaque* definition
of "t". Thus "t",
>>>  as referenced by "main1'", gets its type
"t" completed (via
>>>  linkDefinedTypeBodies()).
>>> 4) "main2" module gets created
>>> 5) inlining clones module "a". As types are global to a
context, "t" now
>>>  has a body, *and* the body appears to be from the source module
>>>  (actually "b'"). "mapping to a source
type" is the result.
>> 
>> When exactly does "mapping to a source type" fire?
>> - When cloning?
>> - When linking the clone into another module?
>> If the latter, which module did you link the cloned "a" into?
> 
> It fires when linking the clone. I'm trying to:
> 
> 1) IRMover(main1).move(CloneModule(a), list-of-globals1)
> 2) IRMover(main1).move(CloneModule(b), list-of-globals2)
> 3) IRMover(main2).move(CloneModule(a), list-of-globals1)
> 4) IRMover(main2).move(CloneModule(b), list-of-globals2)
(Ah... now I'm following your example.)
> The assert fires in 3). The reason is that an opaque type in a (which is
> the same as the type in CloneModule(a), because cloning doesn't
recreate
> types), got linked into main1 in 1). Then 2) completes that type,
> because b has a definition of the struct. 3) then gets confused why the
> source module knows the type.  At least that's my theory.
This does look like it should work, but I'm not surprised it doesn't. 
The main use case for IRMover is LTO, where there's only one destination
module per LLVMContext (and a given source module would only be imported once
after being loaded from bitcode).

The assertion is protecting against a hypothetical case where part of the
mapping gets confused between types in the source and destination module.  I
agree that it's possible that the current code is correct for your case,
although some tests would have to demonstrate that... and it would be good to
keep a relaxed (possibly more expensive) assertion if possible.
>>> Am I doing something unsupported here or is this a bug?
>> 
>> It seems strange to link module "X" into module "Y"
more than once,
>> even transitively.  It sounds like you're doing that, and I suppose
>> that's likely what's causing the trouble.
> 
> Well, I'm not really doing that, am I? The only reason there's some
> "persistent" behaviour is that types that are modified in a
cloned
> module are also modified in the source.
Agreed; I had misunderstood the example before.
>>> As far as I can tell there unfortunately is no tool in LLVM that
allows
>>> to conveniently create a reproducer, unfortunately.
>> 
>> Contributions welcome :).
> 
> Happy to do so (I've 5 outstanding reviews.llvm.org submissions ;)).
Not
> sure if there's a good place to integrate that.  Might be easiest to
> write a unittest for it?  Don't quite see how sensibly add to llvm-link
> the capability to clone modules without being too tailored to this.
llvm-link sounds like the right place for this.  The weirdest thing about this
(for llvm-link) is that you have multiple destination modules but for JIT use
cases this may not be so odd.

Here's a possibility:
$ llvm-link -clone -manual \
	main1.ll main2.ll a.ll b.ll \
	-link main1:a \
	-link main1:b \
	-link main2:a \
	-link main2:b \
	-output-file main1:main1-linked.ll \
	-output-file main2:main2-linked.ll

Where:
	-manual
		means "don't link by default"
	-clone
		means "clone before linking"
	-link x,y
		means "link y into x"
		can be specified multiple times

Another option, instead of "-clone", would be to have both
"-link" and "-clone-into", and for your example you'd
use "-clone-into".
$ llvm-link -manual \
	main1.ll main2.ll a.ll b.ll \
	-clone-into main1:a \
	-clone-into main1:b \
	-clone-into main2:a \
	-clone-into main2:b \
	-output-file main1:main1-linked.ll \
	-output-file main2:main2-linked.ll

Andres Freund via llvm-dev

2018-Mar-27 00:58 UTC

head link

[llvm-dev] IRMover asserts "mapping to a source type" when repeatedly linking - usage or LLVM bug?

Hi,

On 2018-03-26 17:46:06 -0700, Duncan P. N. Exon Smith
wrote:> > On Mar 26, 2018, at 17:09, Andres Freund <andres at anarazel.de>
wrote:
> >> Can you GC the types in the LLVMContext?
> > 
> > Not without modifying LLVM I think, afaict all the necessary
> > datastructures are hidden in LLVMContextImpl, which isn't exposed.
> 
> If you want to make this approach work, I think you can add an API to
> LLVMContext that garbage collects types, keeping only the union of
> those explicitly passed in and the ones used by modules the
> LLVMContext knows about.
I prefer making the cloning situation work - it's a lot more efficient
to copy the portion of the module that gets inlined, rather than
re-loading the module from disk (even when lazy loading).  I can imagine
this being useful independent of my use-case however.

> > The assert fires in 3). The reason is that an opaque type in a (which
is
> > the same as the type in CloneModule(a), because cloning doesn't
recreate
> > types), got linked into main1 in 1). Then 2) completes that type,
> > because b has a definition of the struct. 3) then gets confused why
the
> > source module knows the type.  At least that's my theory.
> 
> This does look like it should work, but I'm not surprised it
doesn't.
> The main use case for IRMover is LTO, where there's only one
> destination module per LLVMContext (and a given source module would
> only be imported once after being loaded from bitcode).
Right. I was wondering about emulating that, but it'd add quite some
complications / slowdowns.

> The assertion is protecting against a hypothetical case where part of
> the mapping gets confused between types in the source and destination
> module.  I agree that it's possible that the current code is correct
> for your case, although some tests would have to demonstrate
> that... and it would be good to keep a relaxed (possibly more
> expensive) assertion if possible.
I'll try to come up with a minimal example. The regression test in
postgres that triggers this, when frobbing the cross-module heuristics
to be overly aggressive, hits the assert ~15 deep into a struct
equivalency check.  With that we hopefully can come up with a better
assertion.

> > Happy to do so (I've 5 outstanding reviews.llvm.org submissions
;)). Not
> > sure if there's a good place to integrate that.  Might be easiest
to
> > write a unittest for it?  Don't quite see how sensibly add to
llvm-link
> > the capability to clone modules without being too tailored to this.
> 
> llvm-link sounds like the right place for this.
Ok. 

> Here's a possibility:
> $ llvm-link -clone -manual \
> 	main1.ll main2.ll a.ll b.ll \
> 	-link main1:a \
> 	-link main1:b \
> 	-link main2:a \
> 	-link main2:b \
> 	-output-file main1:main1-linked.ll \
> 	-output-file main2:main2-linked.ll
> 
> Where:
> 	-manual
> 		means "don't link by default"
> 	-clone
> 		means "clone before linking"
> 	-link x,y
> 		means "link y into x"
> 		can be specified multiple times
> 
> Another option, instead of "-clone", would be to have both
"-link" and "-clone-into", and for your example you'd
use "-clone-into".
> $ llvm-link -manual \
> 	main1.ll main2.ll a.ll b.ll \
> 	-clone-into main1:a \
> 	-clone-into main1:b \
> 	-clone-into main2:a \
> 	-clone-into main2:b \
> 	-output-file main1:main1-linked.ll \
> 	-output-file main2:main2-linked.ll
I'll play around with it, shouldn't be too difficult.


Thanks!

Andres Freund

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Mar 2018 - IRMover asserts "mapping to a source type" when repeatedly linking - usage or LLVM bug?

[llvm-dev] IRMover asserts "mapping to a source type" when repeatedly linking - usage or LLVM bug?

[llvm-dev] IRMover asserts "mapping to a source type" when repeatedly linking - usage or LLVM bug?

[llvm-dev] IRMover asserts "mapping to a source type" when repeatedly linking - usage or LLVM bug?

Seemingly Similar Threads