> On Oct 21, 2014, at 12:29 AM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
>
>> I've never realy understood how the llvm.invariant intrinsics could
be
>> put into practice. There is the problem that "end" can occur
anywhere
>> as you suggested fixing with a flag.
>
> I was under this impression too, but after re-reading the language
> reference I'm not so sure -- it says about invariant.start: "This
> intrinsic indicates that
> until an llvm.invariant.end that uses the return value, the referenced
> memory location is constant and unchanging.". I think this implies
> that if the result of an `llvm.invariant.start` doesn't escape, we can
> safely conclude that there can be no matching `llvm.invariant.end`.
>
> I'm still a little hazy on all of the use cases you want to support,
> but one problem with llvm.safe_cast is, as you note, stores to the
> pointer passed to safe_cast will not be forwarded to loads from the
> pointer it returns. I think this issue can be solved by modeling
> `llvm.safe_cast` not as a transformation on a pointer, but as a check.
> Specifically, I imagine a readonly intrinsic `llvm.assert_immutable`
> that "asserts" that the pointer passed to it is immutable, and
unwinds
> if that fact isn't true (these are only imaginary semantics -- we know
> the intrinsic will never actually unwind). This means
> `llvm.assert_immutable` can't be marked nounwind (otherwise it would
> just get optimized away); but since it doesn't need to unwind to the
> same frame, a CallInst is sufficient (an InvokeInst would've increased
> the CFG's complexity).
>
> So
>
> %p = unaliased address
> store 42, %p
> %a = llvm.safe_cast %p ; just made this up
> %val = load %a !invariant
>
> becomes
>
> %p = unaliased address
> store 42, %p
> call void @llvm.assert_immutable(%p)
> %val = load %p !invariant
>
> AFAICT, "load %p" can not, in general, be re-ordered to before
the
> call to @llvm.assert_immutable because we don't know what condition it
> uses to decide if it should unwind. There are cases where I think
> such a re-ordering would be legal (an unused %val is one example,
> another example is where the only use of %val is a comparison with
> undef) but if I understand correctly, re-ordering a load with the call
> to @llvm.assert_immutable can only be a performance loss -- it will
> change dominance in a way that we'll "forget" that a load was
> immutable. And I think in most of the cases that are interesting to
> optimize, the re-ordering will not be legal.
>
> It is still legal to value forward the store though:
>
> %p = unaliased address
> store 42, %p
> call void @llvm.assert_immutable(%p)
> %val = 42
>
> because *if* assert_immutable does not unwind, %val will see the
> dominating store, because assert_immutable is readonly.
>
> Again,
>
> %p1 = ...
> %p2 = llvm.safe_cast %p1
> %a = select i1 %is_immutable, %p1, %p2
> load %a !invariant
>
> could become
>
> %p1 = ...
> if (not %is_immutable) {
> call llvm.assert_immutable(%p1)
> }
> load %p1
>
> or, if we wish to reduce control flow, we could define
> `llvm.assert_immutable` to be a no-op on null:
>
> %p1 = ...
> %a = select i1 %is_immutable, null, %p1,
> call llvm.assert_immutable(%a)
> load %p1 !invariant
Just so you know, I am not a big fan of adding cast nodes into the use-def
chain. It has to be a last resort. I proposed it this time because I don't
see another way to fully express the semantics, and in many cases the casts are
limited to the initialization code. I think the optimizer could be taught to
store->load forward across them with some effort.
AFAICT The only difference between assert_immutable and invariant.start is that
assert_immutable "maythrow", and we don't need to find all uses of
the pointer.
assert_immutable will handle your use case safely as long as you don't mark
it "readonly" (loads can move above readonly calls).
http://llvm.1065342.n5.nabble.com/Does-nounwind-have-semantics-td59631.html
The problems that we will face with assert_immutable are:
- Because it is maythrow and is not readonly, there is a dependence between
assert_immutable and *all* subsequent loads. This could seriously impact
optimization.
- Since the invariant load is not actually marked invariant, it can't be
reordered with other stores and nonreadonly calls.
- It does not allow us in the future to speculatively hoist certain invariant
loads above maythrow calls and branches.
So, I'm ok with assert_immutable as a solution to some subset of problems.
But it is incomplete and suboptimal. I suspect that introducing non-readonly
calls will do more harm than the benefit you get from marking some loads
invariant.
Also note that this could get interesting when we introduce TBAA on calls:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/066214.html
We would like to hoist loads above calls that have non-interfering TBAA tags,
but also may throw (patchpoints and stackmaps are marked maythrow).
Say we declare that loads can be hoisted above maythrow calls with TBAA. If TBAA
is dropped it becomes conservative. Then you could give your assert_immutable
calls TBAA that only interferes with the array length load. That would solve
some of your optimization issues, but only really works well if most of your
loads and calls have precise TBAA tags.
The more aggressive solution I was proposing was to mark loads that are guarded
with explicit safety checks as "maySpeculate". If we do that we
definitely need a way to create a data dependence on a cast - this approach is
incompatible with assert_immutable.
> `llvm.assert_immutable` does with control dependencies what
> `llvm.safe_cast` does with data dependencies.
>
>> llvm.invariant is inherently control dependent, which doesn't
really
>> allow the optimizer to hoist the invariant loads the way we want it
>> to.
>
> I'm curious about why you think so -- do you have examples that
> demonstrate this? And is this a flaw in how llvm implements llvm.invariant
> or something fundamental about control dependencies?
I am saying exactly what you said just above the quote: llvm.invariant and
llvm.assert_immutable guarantee safety by relying on implicit control
dependence. That doesn't really give us full freedom to optimize the guarded
invariant loads. They still can't be optimized across other non-readonly
calls or hoisted above branches or out of loops. This is fundamental to the
representation - it's not an implementation detail.
-Andy
>
> -- Sanjoy