thr3ads.net - llvm dev - [LLVMdev] RFC: implicit null checks in llvm [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Sanjoy Das

2015-Apr-23 05:05 UTC

[LLVMdev] RFC: implicit null checks in llvm

Hi all,

I would like to propose a mechanism that would allow LLVM to fold null
pointer checks into "nearby" memory operations, subject to runtime
support.  This is related to but not exactly the same as a proposal
floated by Peter Collingbourne earlier [1].  The obvious use cases are
managed languages like Java, C# and Go that require a null check on
pointers before they're used in certain ways (loads, stores, virtual
dispatches etc.).  I'm sure there are other less obvious and more
interesting use cases.

I plan to break the design into two parts, roughly following the
statepoint philosophy:

# invokable @llvm.(load|store)_with_trap intrinsics

We introduce two new intrinsic families

  T @llvm.load_with_trap(T*)        [modulo name mangling]
  void @llvm.store_with_trap(T, T*) [modulo name mangling]

They cannot be `call`ed, they can only be `invoke`d.

Semantically, they try to load from or store to the pointer passed to
them as normal load / store instructions do.  @llvm.load_with_trap
returns the loaded value on the normal return path.  If the load or
store traps then they dispatch to their unwind destination.  The
landingpad for the unwind destination can only be a cleanup
landingpad, and the result of the landingpad instruction itself is
always undef.  The personality function in the landingpad instruction
is ignored.

These intrinsics require support from the language runtime to work.
During code generation, the invokes are lowered into normal load or
store instructions, followed by a branch (explicit `jmp` or just
fall-through) to the normal destination.  The PC for the unwind
destination is recorded in a side-table along with the PC for the load
or store.  When a load or store traps or segfaults at runtime, the
runtime searches this table to see if the trap is from a PC recorded
in the side-table.  If so, it the runtime jumps to the unwind
destination, otherwise it aborts.

Note that the signal handler / runtime do not themselves raise
exceptions at the ABI level (though they do so conceptually), but the
landing pad block can raise one if it wants to.

The table mapping load/store PCs to unwind PCs can be reported to the
language runtime via an __llvm_stackmaps like section.  I am strongly
in favor of making this section as easy to parse as possible.



# optimization pass to create invokes to @llvm.(load|store)_with_trap

With the @llvm.(load|store)_with_trap intrinsics in place, we can
write an LLVM pass that folds null checks into nearby memory
operations on that same pointer.  As an example, we can turn

  // r0 is a local register
  if (p != null) {
    r0 += 5;
    *(p + 16) = 42;
    ...
  } else {
    throw_NullPointerException();
    unreachable;
  }

into

  // r0 is a local register
  r0 += 5;
  invoke @llvm_store_with_trap(p + 16, 42) to label %ok, unwind label %not_ok

 not_ok:
  %unused = landingpad .....
  throw_NullPointerException();
  unreachable;

 ok:
  ...

A slight subtlety here is that the store to (p + 16) can trap (and we
can branch to not_ok) even if p is not null.  However, in that case
the store would have happened in the original program anyway, and the
behavior of the original program is undefined.

A prerequisite for this optimization is that in the address space we
operate on, loads from and stores to pointers in some small
neighborhood starting from address `null` trap deterministically.  The
above transform is correct only if we know that *(null + 16) will trap
synchronously and deterministically.  Even on platforms where the 0th
page is always unmapped, we cannot fold null checks into memory
operations on an arbitrary offset away from the pointer to be null
checked (e.g. array accesses are not suitable for implicit null
checks).

Running this pass sufficiently late in the optimization pipeline will
allow for all the usual memory related optimization passes to work as
is -- they won't have to learn about the special semantics for the new
(load|store)_with_trap intrinsics to be effective.

This pass will have to be a profile-guided optimization pass for
fundamental reasons: implicit null checks are a pessimization even if
a small fraction of the implicit null checks fail.  Typically language
runtimes that use page-fault based null checks recompile methods with
failed implicit null checks to use an explicit null check instead (e.g. [2]).


What do you think?  Does this make sense?

[1]: https://groups.google.com/d/msg/llvm-dev/mMQzIt_8z1Y/cnE7WH1HNaoJ
[2]:
https://github.com/openjdk-mirror/jdk7u-hotspot/blob/master/src/share/vm/opto/lcm.cpp#L90

-- Sanjoy

Andrew Trick

2015-Apr-23 06:44 UTC

head link

[LLVMdev] RFC: implicit null checks in llvm

> On Apr 22, 2015, at 10:05 PM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> 
> Hi all,
> 
> I would like to propose a mechanism that would allow LLVM to fold null
> pointer checks into "nearby" memory operations, subject to
runtime
> support.  This is related to but not exactly the same as a proposal
> floated by Peter Collingbourne earlier [1].  The obvious use cases are
> managed languages like Java, C# and Go that require a null check on
> pointers before they're used in certain ways (loads, stores, virtual
> dispatches etc.).  I'm sure there are other less obvious and more
> interesting use cases.
This feature will keep being requested. I agree LLVM should support it, and am
happy to see it being done right.
> I plan to break the design into two parts, roughly following the
> statepoint philosophy:
> 
> # invokable @llvm.(load|store)_with_trap intrinsics
> 
> We introduce two new intrinsic families
> 
>  T @llvm.load_with_trap(T*)        [modulo name mangling]
>  void @llvm.store_with_trap(T, T*) [modulo name mangling]
> 
> They cannot be `call`ed, they can only be `invoke`d.
> 
> Semantically, they try to load from or store to the pointer passed to
> them as normal load / store instructions do.  @llvm.load_with_trap
> returns the loaded value on the normal return path.  If the load or
> store traps then they dispatch to their unwind destination.  The
> landingpad for the unwind destination can only be a cleanup
> landingpad, and the result of the landingpad instruction itself is
> always undef.  The personality function in the landingpad instruction
> is ignored.
> 
> These intrinsics require support from the language runtime to work.
> During code generation, the invokes are lowered into normal load or
> store instructions, followed by a branch (explicit `jmp` or just
> fall-through) to the normal destination.  The PC for the unwind
> destination is recorded in a side-table along with the PC for the load
> or store.  When a load or store traps or segfaults at runtime, the
> runtime searches this table to see if the trap is from a PC recorded
> in the side-table.  If so, it the runtime jumps to the unwind
> destination, otherwise it aborts.
The intrinsics need to be lowered to a pseudo instruction just like patchpoint
(so that a stackmap can be emitted). In my mind the real issue here is how to
teaching this pseudo instruction to emit the proper load/store for the target.
> Note that the signal handler / runtime do not themselves raise
> exceptions at the ABI level (though they do so conceptually), but the
> landing pad block can raise one if it wants to.
> 
> The table mapping load/store PCs to unwind PCs can be reported to the
> language runtime via an __llvm_stackmaps like section.  I am strongly
> in favor of making this section as easy to parse as possible.
Let’s just be clear that it is not recommended for the frontend to produce these
intrinsics. They are a compiler backend convenience. (I don’t want InstCombine
or any other standard pass to start trafficking in these.)
> # optimization pass to create invokes to @llvm.(load|store)_with_trap
> 
> With the @llvm.(load|store)_with_trap intrinsics in place, we can
> write an LLVM pass that folds null checks into nearby memory
> operations on that same pointer.  As an example, we can turn
> 
>  // r0 is a local register
>  if (p != null) {
>    r0 += 5;
>    *(p + 16) = 42;
>    ...
>  } else {
>    throw_NullPointerException();
>    unreachable;
>  }
> 
> into
> 
>  // r0 is a local register
>  r0 += 5;
>  invoke @llvm_store_with_trap(p + 16, 42) to label %ok, unwind label
%not_ok
> 
> not_ok:
>  %unused = landingpad .....
>  throw_NullPointerException();
>  unreachable;
> 
> ok:
>  ...
> 
> A slight subtlety here is that the store to (p + 16) can trap (and we
> can branch to not_ok) even if p is not null.  However, in that case
> the store would have happened in the original program anyway, and the
> behavior of the original program is undefined.
> 
> A prerequisite for this optimization is that in the address space we
> operate on, loads from and stores to pointers in some small
> neighborhood starting from address `null` trap deterministically.  The
> above transform is correct only if we know that *(null + 16) will trap
> synchronously and deterministically.  Even on platforms where the 0th
> page is always unmapped, we cannot fold null checks into memory
> operations on an arbitrary offset away from the pointer to be null
> checked (e.g. array accesses are not suitable for implicit null
> checks).
This is a platform dependent intrinsic. There’s nothing wrong with a platform
specific size for the unmapped page zero if we don’t already have one.
> Running this pass sufficiently late in the optimization pipeline will
> allow for all the usual memory related optimization passes to work as
> is -- they won't have to learn about the special semantics for the new
> (load|store)_with_trap intrinsics to be effective.
Good. This is a codegen feature. We can’t say that enough. If you really cared
about the best codegen this would be done in machine IR after scheduling and
target LoadStore opts.
> This pass will have to be a profile-guided optimization pass for
> fundamental reasons: implicit null checks are a pessimization even if
> a small fraction of the implicit null checks fail.  Typically language
> runtimes that use page-fault based null checks recompile methods with
> failed implicit null checks to use an explicit null check instead (e.g.
[2]).
I don’t think making it profile-guided is important. Program behavior can change
after compilation and you’re back to the same problem. I think recovering from
repeated traps is important. That’s why you need to combine this feature with
either code invalidation points or patching implemented via llvm.stackmap,
patchpoint, (or statepoint) — they’re all the same thing.
> What do you think?  Does this make sense?
Well, you need the features that patchpoint gives you (stackmaps entries) and
you’ll need to use patchpoints or stackmaps anyway for invalidation or patching.
So why are you bothering with a totally new, independent intrinsic? Why not just
extend the existing intrinsics. We could have a variant that

- emits a load instead of a call

- looks at the landing pad to generate a special stackmap entry in addition to
the normal exception table (I don’t even see why you need this, except that the
runtime doesn’t know how to parse an exception table.)

Andy
> [1]: https://groups.google.com/d/msg/llvm-dev/mMQzIt_8z1Y/cnE7WH1HNaoJ
> [2]:
https://github.com/openjdk-mirror/jdk7u-hotspot/blob/master/src/share/vm/opto/lcm.cpp#L90
> 
> -- Sanjoy

Reid Kleckner

2015-Apr-23 17:18 UTC

head link

[LLVMdev] RFC: implicit null checks in llvm

On Wed, Apr 22, 2015 at 11:44 PM, Andrew Trick <atrick at apple.com>
wrote:>
> This feature will keep being requested. I agree LLVM should support it,
> and am happy to see it being done right.

+1

> > I plan to break the design into two parts, roughly following the
> > statepoint philosophy:
> >
> > # invokable @llvm.(load|store)_with_trap intrinsics
> >
> > We introduce two new intrinsic families
> >
> >  T @llvm.load_with_trap(T*)        [modulo name mangling]
> >  void @llvm.store_with_trap(T, T*) [modulo name mangling]
> >
> > They cannot be `call`ed, they can only be `invoke`d.
>
Why not allow non-nounwind calls, in other words, an intrinsic call that
may throw?

In most languages with implicit null checks, there are far more functions
that do field accesses and method calls than there are functions that catch
exceptions. The common case is that the frame with the load will have
nothing to do other than propagate the exception to the parent frame, and
we should allow the runtime to handle that efficiently.

Essentially, in this model, the signal handler is responsible for
identifying the signal as a null pointer exception (i.e. SIGSEGVs on a
small pointer value with a PC in code known to use this EH personality) and
transitioning to the exception handling machinery in the language runtime.
> Semantically, they try to load from or store to the pointer passed to
> > them as normal load / store instructions do.  @llvm.load_with_trap
> > returns the loaded value on the normal return path.  If the load or
> > store traps then they dispatch to their unwind destination.  The
> > landingpad for the unwind destination can only be a cleanup
> > landingpad, and the result of the landingpad instruction itself is
> > always undef.  The personality function in the landingpad instruction
> > is ignored.
>
The landingpad personality normally controls what kind of EH tables are
emitted, so if you want something other than the __gxx_personality_v0 LSDA
table, you could invent your own personality and use that to control what
gets emitted. This might be useful for interoperating with existing
language runtimes.
> These intrinsics require support from the language runtime to work.
> > During code generation, the invokes are lowered into normal load or
> > store instructions, followed by a branch (explicit `jmp` or just
> > fall-through) to the normal destination.  The PC for the unwind
> > destination is recorded in a side-table along with the PC for the load
> > or store.  When a load or store traps or segfaults at runtime, the
> > runtime searches this table to see if the trap is from a PC recorded
> > in the side-table.  If so, it the runtime jumps to the unwind
> > destination, otherwise it aborts.
>
> The intrinsics need to be lowered to a pseudo instruction just like
> patchpoint (so that a stackmap can be emitted). In my mind the real issue
> here is how to teaching this pseudo instruction to emit the proper
> load/store for the target.

Does it really have to be a per-target pseudo? The way I see it, we can
handle this all in selection dag. All we need to do is emit the before
label, the load/store operation, and the end label, and establish control
dependence between them all to prevent folding. Does that seem reasonable,
or is this an overly simplistic perspective? :-)
> Note that the signal handler / runtime do not themselves raise
> > exceptions at the ABI level (though they do so conceptually), but the
> > landing pad block can raise one if it wants to.
> >
> > The table mapping load/store PCs to unwind PCs can be reported to the
> > language runtime via an __llvm_stackmaps like section.  I am strongly
> > in favor of making this section as easy to parse as possible.
>
> Let’s just be clear that it is not recommended for the frontend to produce
> these intrinsics. They are a compiler backend convenience. (I don’t want
> InstCombine or any other standard pass to start trafficking in these.)

Would you be OK with simply documenting that these intrinsics are
optimization-hostile, in the same way that early safepoint insertion is?
There are some language constructs (__try / __except) that allow catching
memory faults like this. Such constructs are rare and don't really need to
be optimized. I just want to make sure that mid-level optimizations don't
actively break these.

> > Running this pass sufficiently late in the optimization pipeline will
> > allow for all the usual memory related optimization passes to work as
> > is -- they won't have to learn about the special semantics for the
new
> > (load|store)_with_trap intrinsics to be effective.
>
> Good. This is a codegen feature. We can’t say that enough. If you really
> cared about the best codegen this would be done in machine IR after
> scheduling and target LoadStore opts.

I agree, with the caveat above. Mid-level passes shouldn't actively break
these intrinsics.

> > This pass will have to be a profile-guided optimization pass for
> > fundamental reasons: implicit null checks are a pessimization even if
> > a small fraction of the implicit null checks fail.  Typically language
> > runtimes that use page-fault based null checks recompile methods with
> > failed implicit null checks to use an explicit null check instead
(e.g.
> [2]).
>
> I don’t think making it profile-guided is important. Program behavior can
> change after compilation and you’re back to the same problem. I think
> recovering from repeated traps is important. That’s why you need to combine
> this feature with either code invalidation points or patching implemented
> via llvm.stackmap, patchpoint, (or statepoint) — they’re all the same
thing.
>
> > What do you think?  Does this make sense?
>
> Well, you need the features that patchpoint gives you (stackmaps entries)
> and you’ll need to use patchpoints or stackmaps anyway for invalidation or
> patching. So why are you bothering with a totally new, independent
> intrinsic? Why not just extend the existing intrinsics. We could have a
> variant that
>
> - emits a load instead of a call
>
> - looks at the landing pad to generate a special stackmap entry in
> addition to the normal exception table (I don’t even see why you need this,
> except that the runtime doesn’t know how to parse an exception table.)-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150423/eb8519fc/attachment.html>

Sanjoy Das

2015-Apr-23 20:29 UTC

head link

[LLVMdev] RFC: implicit null checks in llvm

> The intrinsics need to be lowered to a pseudo instruction just like
patchpoint (so that a stackmap can be emitted). In my mind the real issue here
is how to teaching this pseudo instruction to emit the proper load/store for the
target.
Do you a specific set of issues that you think could be problematic?
> Let’s just be clear that it is not recommended for the frontend to produce
these intrinsics. They are a compiler backend convenience. (I don’t want
InstCombine or any other standard pass to start trafficking in these.)
Agreed.
> This is a platform dependent intrinsic. There’s nothing wrong with a
platform specific size for the unmapped page zero if we don’t already have one.
Agreed.
> Good. This is a codegen feature. We can’t say that enough. If you really
cared about the best codegen this would be done in machine IR after scheduling
and target LoadStore opts.
I thought about doing this very late, but I figured that pattern
matching on control flow will be much easier while we are still in
LLVM IR.
> I don’t think making it profile-guided is important. Program behavior can
change after compilation and you’re back to the same problem. I think recovering
from repeated traps is important. That’s why you need to combine this feature
with either code invalidation points or patching implemented via llvm.stackmap,
patchpoint, (or statepoint) — they’re all the same thing.
Without profiling information, how will the frontend communicate to
the backend that an explicit null check "never" fails and hence may be
replaced with an implicit null check (or that it has been observed to
fail, and should not be replaced with an implicit null check)?  In the
scheme I'm thinking of, a null check is made implicit only if the (branch)
probability of it failing is approximately zero.
> Well, you need the features that patchpoint gives you (stackmaps entries)
and you’ll need to use patchpoints or stackmaps anyway for invalidation or
patching. So why are you bothering with a totally new, independent intrinsic?
Why not just extend the existing intrinsics. We could have a variant that
>
> - emits a load instead of a call
>
> - looks at the landing pad to generate a special stackmap entry in addition
to the normal exception table (I don’t even see why you need this, except that
the runtime doesn’t know how to parse an exception table.)
This sounds good -- reusing the patchpoint intrinsics will save me a
lot of work.

-- Sanjoy

Joseph Tremoulet

2015-Apr-29 00:19 UTC

head link

[LLVMdev] RFC: implicit null checks in llvm

Hi,

I'd like to make sure this is headed in a direction that we'll be able
to use/extend it for LLILC/CoreCLR (C#).  A few questions:

1) We've got a runtime that will generate an ABI-level exception object in
response to the machine trap.  I'd like the compiler to be able to support
targeting a runtime with that behavior (if nothing else, it saves us the code
size of a call instruction per [group of] never-dynamically-failing null
check[s]).  So, given an explicit check that looks like this:

        %NullCheck = icmp eq %Ty* %2, null
        br i1 %NullCheck, label %ThrowNullRef, label %3

      ; <label>:3                                       ; preds = %1
        %4 = <index into %2 by constant offset smaller than guard page>
        %5 = load i32, i32* %4, align 8
        ... (code that uses %5)

      ThrowNullRef:                                     ; preds = %1
        invoke void ThrowNullReferenceException() #2
                to label %6 unwind label %ExceptionDispatch

      ; <label>:6                                       ; preds =
%ThrowNullRef
        unreachable

      ExceptionDispatch:                                ; preds = %ThrowNullRef
        %ExnData = landingpad { i8*, i32 } personality void ()*
@ProcessCLRException
      ...


I understand the proposal here is to add a pass that changes this to:

        %5 = invoke  @llvm.load_with_trap(...) %2, <index>
              to label %ok, unwind label %not_ok
      ok:
        ... (code that uses %5)

      not_ok:
        %unused = landingpad { i8*, i32 } personality void ()*
@__llvm_implicit_null_check
        br %ThrowNullRef

      ThrowNullRef:                                     ; preds = %1
        invoke void ThrowNullReferenceException() #2
                to label %6 unwind label %ExceptionDispatch

      ; <label>:6                                       ; preds =
%ThrowNullRef
        unreachable

      ExceptionDispatch:                                ; preds = %ThrowNullRef
        %ExnData = landingpad { i8*, i32 } personality void ()*
@ProcessCLRException
      ...

But what I eventually need is to also eliminate the call:

        %5 = invoke  @llvm.load_with_trap(...) %2, <index>
              to label %ok, unwind label %ExceptionDispatch
      ok:
        ... (code that uses %5)

      ExceptionDispatch:                                ; preds = %ThrowNullRef
        %ExnData = landingpad { i8*, i32 } personality void ()*
@ProcessCLRException
      ...

And I'm trying to understand how I might achieve that.  Allowing a target
configuration to direct the null check folding to also fold away the call seems
most straightforward; is that the right idea?  Or would it need to be something
more like a separate pass that can be run for such targets immediately after
this pass, which does the extra folding?


Related, regarding these restrictions:

      > The landingpad for the unwind destination can only be a cleanup
landingpad, and the result of the landingpad instruction itself is always undef.

will they be enforced by the verifier?  Could we perhaps make that conditional
on the personality routine?

 

2) Is the signature

        T  <at> llvm.load_with_trap(T*)        [modulo name mangling]
        void  <at> llvm.store_with_trap(T, T*) [modulo name mangling]

meant to imply that the pointer must be in address space zero, or that any
address space is acceptable?  In our case, since we're planning to follow
the statepoint design and use address spaces to distinguish GC pointers, the
pointers whose null checks we'll be wanting to fold will most often not be
in address space zero.


3) I didn't see a follow-up to this point:

      > If you really cared about the best codegen this would be done in
machine IR after scheduling and target LoadStore opts.

Can you elaborate on whether/why this is/isn't the plan?  I'd hate for
the use of implicit checks to come at the cost of worse load/store codegen,
especially since we'll have null checks on most heap loads/stores.


Thanks
-Joseph



-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Sanjoy Das
Sent: Wednesday, April 22, 2015 10:05 PM
To: LLVM Developers Mailing List; Andrew Trick; Reid Kleckner; David Majnemer;
Peter Collingbourne; Hal Finkel; Philip Reames; Russell Hadley
Subject: [LLVMdev] RFC: implicit null checks in llvm

Hi all,

I would like to propose a mechanism that would allow LLVM to fold null pointer
checks into "nearby" memory operations, subject to runtime support. 
This is related to but not exactly the same as a proposal floated by Peter
Collingbourne earlier [1].  The obvious use cases are managed languages like
Java, C# and Go that require a null check on pointers before they're used in
certain ways (loads, stores, virtual dispatches etc.).  I'm sure there are
other less obvious and more interesting use cases.

I plan to break the design into two parts, roughly following the statepoint
philosophy:

# invokable @llvm.(load|store)_with_trap intrinsics

We introduce two new intrinsic families

  T @llvm.load_with_trap(T*)        [modulo name mangling]
  void @llvm.store_with_trap(T, T*) [modulo name mangling]

They cannot be `call`ed, they can only be `invoke`d.

Semantically, they try to load from or store to the pointer passed to them as
normal load / store instructions do.  @llvm.load_with_trap returns the loaded
value on the normal return path.  If the load or store traps then they dispatch
to their unwind destination.  The landingpad for the unwind destination can only
be a cleanup landingpad, and the result of the landingpad instruction itself is
always undef.  The personality function in the landingpad instruction is
ignored.

These intrinsics require support from the language runtime to work.
During code generation, the invokes are lowered into normal load or store
instructions, followed by a branch (explicit `jmp` or just
fall-through) to the normal destination.  The PC for the unwind destination is
recorded in a side-table along with the PC for the load or store.  When a load
or store traps or segfaults at runtime, the runtime searches this table to see
if the trap is from a PC recorded in the side-table.  If so, it the runtime
jumps to the unwind destination, otherwise it aborts.

Note that the signal handler / runtime do not themselves raise exceptions at the
ABI level (though they do so conceptually), but the landing pad block can raise
one if it wants to.

The table mapping load/store PCs to unwind PCs can be reported to the language
runtime via an __llvm_stackmaps like section.  I am strongly in favor of making
this section as easy to parse as possible.



# optimization pass to create invokes to @llvm.(load|store)_with_trap

With the @llvm.(load|store)_with_trap intrinsics in place, we can write an LLVM
pass that folds null checks into nearby memory operations on that same pointer. 
As an example, we can turn

  // r0 is a local register
  if (p != null) {
    r0 += 5;
    *(p + 16) = 42;
    ...
  } else {
    throw_NullPointerException();
    unreachable;
  }

into

  // r0 is a local register
  r0 += 5;
  invoke @llvm_store_with_trap(p + 16, 42) to label %ok, unwind label %not_ok

 not_ok:
  %unused = landingpad .....
  throw_NullPointerException();
  unreachable;

 ok:
  ...

A slight subtlety here is that the store to (p + 16) can trap (and we can branch
to not_ok) even if p is not null.  However, in that case the store would have
happened in the original program anyway, and the behavior of the original
program is undefined.

A prerequisite for this optimization is that in the address space we operate on,
loads from and stores to pointers in some small neighborhood starting from
address `null` trap deterministically.  The above transform is correct only if
we know that *(null + 16) will trap synchronously and deterministically.  Even
on platforms where the 0th page is always unmapped, we cannot fold null checks
into memory operations on an arbitrary offset away from the pointer to be null
checked (e.g. array accesses are not suitable for implicit null checks).

Running this pass sufficiently late in the optimization pipeline will allow for
all the usual memory related optimization passes to work as is -- they won't
have to learn about the special semantics for the new (load|store)_with_trap
intrinsics to be effective.

This pass will have to be a profile-guided optimization pass for fundamental
reasons: implicit null checks are a pessimization even if a small fraction of
the implicit null checks fail.  Typically language runtimes that use page-fault
based null checks recompile methods with failed implicit null checks to use an
explicit null check instead (e.g. [2]).


What do you think?  Does this make sense?

[1]: https://groups.google.com/d/msg/llvm-dev/mMQzIt_8z1Y/cnE7WH1HNaoJ
[2]:
https://github.com/openjdk-mirror/jdk7u-hotspot/blob/master/src/share/vm/opto/lcm.cpp#L90

-- Sanjoy
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Sanjoy Das

2015-Apr-29 04:27 UTC

head link

[LLVMdev] RFC: implicit null checks in llvm

> I'd like to make sure this is headed in a direction that we'll be
able to use/extend it for LLILC/CoreCLR (C#).
Great!
> And I'm trying to understand how I might achieve that.  Allowing a
target configuration to direct the null check folding to also fold away the call
seems most straightforward; is that the right idea?  Or would it need to be
something more like a separate pass that can be run for such targets immediately
after this pass, which does the extra folding?
I'd say it depends on the semantics of your runtime.  If this is
strictly an optimization then I'd have a preference for the latter.
If this is a correctness issue then I think the former makes more
sense, since otherwise the GenerateImplicitNullChecks (or
whatever we call it) pass won't be meaning preserving.
> Related, regarding these restrictions:
>
>       > The landingpad for the unwind destination can only be a cleanup
landingpad, and the result of the landingpad instruction itself is always undef.
>
> will they be enforced by the verifier?  Could we perhaps make that
conditional on the personality routine?
Yes, we can make these dependent on the personality routine.
> 2) Is the signature
>
>         T  <at> llvm.load_with_trap(T*)        [modulo name mangling]
>         void  <at> llvm.store_with_trap(T, T*) [modulo name mangling]
>
> meant to imply that the pointer must be in address space zero, or that any
address space is acceptable?
We use statepoints also, so non-zero address space pointers will have to work.
> 3) I didn't see a follow-up to this point:
>
>       > If you really cared about the best codegen this would be done in
machine IR after scheduling and target LoadStore opts.
>
> Can you elaborate on whether/why this is/isn't the plan?  I'd hate
for the use of implicit checks to come at the cost of worse load/store codegen,
especially since we'll have null checks on most heap loads/stores.
So there are two related issues here:

 1. it may just be too difficult to do this correctly once we're out
    of LLVM IR.  For instance, we may need to call on alias analysis:

      if (x != null) {
        x.f = 42;
        y.f = 100;
      }

    can be transformed, by LLVM's normal optimization pipeline, to

      if (x != null) {
        y.f = 100;
        x.f = 42;
      }

    if it can prove that x does not alias y.

    Now we cannot use the store to x.f to throw an NPE
    (NullPointerException) because then we'd have thrown an NPE after
    the store to y.f while the original program had the NPE throw
    before the store to y.f.  To implicit-ify this null check we'd
    have to swap the order of the two stores and essentially reverse
    the above transform.  To do this safely, we'd need alias analysis
    to prove that x does not alias y.

    Re-discovering if pointer X is pointer Y with an offset of 24
    bytes may also be difficult.  This is important since in may cases
    the load doing the null check will be off a pointer derived from
    the pointer we want to check for nullness.

 2. we may be able to get away with representing the trapping load in
    a way that looks like a normal load to the backend.  I'm not sure
    if we can do this safely (the backend cannot assume that the load
    won't trap) but in principle this should allow most of the
    load/store optimizations to kick in.

-- Sanjoy

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Apr 2015 - [LLVMdev] RFC: implicit null checks in llvm

[LLVMdev] RFC: implicit null checks in llvm

[LLVMdev] RFC: implicit null checks in llvm

[LLVMdev] RFC: implicit null checks in llvm

[LLVMdev] RFC: implicit null checks in llvm

[LLVMdev] RFC: implicit null checks in llvm

[LLVMdev] RFC: implicit null checks in llvm

Possibly Parallel Threads