thr3ads.net - llvm dev - [llvm-dev] RFC: Strong GC References in LLVM [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Sanjoy Das via llvm-dev

2016-Jul-14 22:40 UTC

[llvm-dev] RFC: Strong GC References in LLVM

Hi Andy,

Andrew Trick wrote:
 >> But for the purposes of this discussion, only the legality (or lack
 >> thereof) of the above transform matters, not whether it is profitable
 >> or not.
 >
 > Given that you will need to disable the transform for GCRefs, it’s 
interesting that if it’s only something that needs to run before ISEL 
then you’re not actually losing any optimization.

Ah, okay; that makes sense.

 >>> If you can come up with an IR design for tracking your GCRef load
 >>> depedencies (usually on specific null and type checks), then that
could
 >> That sounds orthogonal to this proposal -- as a starting point we need
 >> a simple way to mark a GCRef load as control dependent on every
 >> condition executed since the start of the program.  Something like
 >> cast-to-dereferenceable can possibly be added once we start
 >> upstreaming analyses that prove that a GCRef load is safe to
 >> speculate, but that's for better performance not correctness.
 >
 > I thought your frontend would know which conditions guard each field 
access.

At a theoretical level it does, but in practice that kind of control
dependence can be difficult to split apart from "normal" control
dependence.

For instance, if we have:

   interface I {
     void func();
   }

   class F {
     Object o;
     void func() {
       Object val = this.o;
     }
   }

   class G {
     public static void f(I instance) {
       instance.func();
     }
   }

then when compiling func() separately there is nothing the load of val
is control dependent on, but in G::f if we do some form of predicated
devirtualization and then inline the body of F::func then the safety
of the load is predicated on the devirtualization predicate passing
too.  If the predicated devirt. was itself conditional on some control
flow based type refinement, then the safety of the load is control
dependent on that control flow as well, and so on.

[snip]

-- Sanjoy

Andrew Trick via llvm-dev

2016-Jul-14 23:03 UTC

head link

[llvm-dev] RFC: Strong GC References in LLVM

> On Jul 14, 2016, at 3:40 PM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> 
>  interface I {
>    void func();
>  }
> 
>  class F {
>    Object o;
>    void func() {
>      Object val = this.o;
>    }
>  }
> 
>  class G {
>    public static void f(I instance) {
>      instance.func();
>    }
>  }
> 
> then when compiling func() separately there is nothing the load of val
> is control dependent on, but in G::f if we do some form of predicated
> devirtualization and then inline the body of F::func then the safety
> of the load is predicated on the devirtualization predicate passing
> too.  If the predicated devirt. was itself conditional on some control
> flow based type refinement, then the safety of the load is control
> dependent on that control flow as well, and so on.
Yeah, the devirtualizer needs to insert a type cast, then inlining the load just
inherits it. Isn’t that all done before LLVM IR?
-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160714/de80a323/attachment-0001.html>

Sanjoy Das via llvm-dev

2016-Jul-14 23:48 UTC

head link

[llvm-dev] RFC: Strong GC References in LLVM

Hi all,

It looks like the key controversial point is the bit about the extra
control dependence on loads and stores[0].  Generally the consensus is
that (please chime if you think otherwise) it is not reasonable to
make the safety (or semantics) of a load instruction depend on the
type it is loading.  Introducing such a thing would involve changing
the IR semantics in a fundamental way, and therefore has a high bar
for acceptance.

Here is a summary of the alternative solutions that were proposed here
and on IRC (thanks Chandler, Andy, Eli!):

  1. Model loads and stores of GC references as intrinsic calls: add
     llvm.gc_load, llvm.gc_store intrinsics, and optimize them as loads
     and stores whenever appropriate and legal.

  2. Introduce a flag on load and stores that either
       a. Denotes a "gc_safety" control dependence.
       b. Denotes a "blackbox_safety" control dependence.  In this
case
          we will probably have some optional metadata on loads and
          stores to indicate that the control dependence is actually on
          GC safety.

     As a starting point, LLVM will conservatively not speculate such
     loads and stores; and will leave open the potential to upstream
     logic that will have a more precise sense of when these loads and
     stores are safe to speculate.

  3. Introduce a general way to denote control dependence on loads and
     stores.  This can be helpful to LLVM in general, and will let us
     basically implement a more precise version of (2).

# Tradeoffs

(1) is the easiest to implement initially, but I think it will be bad
for LLVM in the long term -- every place that looks at loads and
stores will have to now look at this additional set of intrinsics.
This won't be terribly complex, but it will be noisy.

I personally like (2) the most.  It fits in well with the current
framework: since most (all?) load/store speculation has to do some
sort of safety check anyway we can fold the "gc_safety" or
"blackbox_safety" logic into those safety checks.  In practice maybe
we can even factor this into one of the `isSimple` or
`isUnordered` -like checks.

(3) is probably the cleanest solution, but I'm worried about its scope
-- given that this will be large investment, I'm worried I'll spin my
wheels on this for a long time and ultimately realize that it isn't
really the right fix.

Thoughts?
-- Sanjoy


[0]: I may have (incorrectly) mentioned otherwise on IRC, but we need
to model safety properties of stores as well, to avoid transforms
like:

   %x = malloc()  ;; known thread local
   if (cond_0) {
     store GCREF %val to %x
   }
   if (cond_1) {
     store i64 %val to %x
   }

to

   %x = malloc()  ;; known thread local
   if (cond_0 || cond_1) {
     store GCREF %val to %x  ;; This "speculative" store is bad
     if (cond_1) {
       store i64 %val to %x
     }
   }

Sanjoy Das via llvm-dev

2016-Jul-15 00:45 UTC

head link

[llvm-dev] RFC: Strong GC References in LLVM

Hi Andy,

Andrew Trick wrote:
 >
 >> On Jul 14, 2016, at 3:40 PM, Sanjoy Das
 >> <sanjoy at playingwithpointers.com
 >> <mailto:sanjoy at playingwithpointers.com>> wrote:
 >>
 >> interface I {
 >> void func();
 >> }
 >>
 >> class F {
 >> Object o;
 >> void func() {
 >> Object val = this.o;
 >> }
 >> }
 >>
 >> class G {
 >> public static void f(I instance) {
 >> instance.func();
 >> }
 >> }
 >>
 >> then when compiling func() separately there is nothing the load of val
 >> is control dependent on, but in G::f if we do some form of predicated
 >> devirtualization and then inline the body of F::func then the safety
 >> of the load is predicated on the devirtualization predicate passing
 >> too. If the predicated devirt. was itself conditional on some control
 >> flow based type refinement, then the safety of the load is control
 >> dependent on that control flow as well, and so on.
 >
 > Yeah, the devirtualizer needs to insert a type cast, then inlining the
 > load just inherits it. Isn’t that all done before LLVM IR?

We do some devirtualization "incrementally" (i.e. while inlining /
optimizing the IR), but I think I was over-complicating -- this looks
like it is doable.  Intuitively, since Java's type system is
verifiable, we should be able to model it directly in the IR without
too many complications.

However, I'm still worried about the scope.  Going by your suggestion,
IIUC this is how the above devirtualization transform will look like:

void G::f(I instance) {
   invoke_interface(I::func, instance);
}

== predicated devirt ==>

void G::f(I instance) {
   if (get_klass(instance) == F.klass) {
     instance_casted = cast_to_type(instance, !type)
     F::func(instance_casted);
   } else {
     blah();
   }
}

!type = !descriptor_for_F

== inline F::func etc ==>

...

and to get back to our "baseline" performance (i.e. hoist loads of
non-GCREF type as usual, but don't touch GCREF loads), we'd have to
teach LLVM that it is okay to speculate non-GCREF loads through
cast_to_type, but not GCREF loads.  We will also have to teach LLVM
about aliasing properties of these intrinsics.

I think cast_to_type is a good second step we can take when we start
modeling Java's type system in LLVM IR, but it looks too complex to do
as an initial step.

-- Sanjoy

Chandler Carruth via llvm-dev

2016-Jul-15 00:53 UTC

head link

[llvm-dev] RFC: Strong GC References in LLVM

On Thu, Jul 14, 2016 at 4:48 PM Sanjoy Das via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all,
>
> It looks like the key controversial point is the bit about the extra
> control dependence on loads and stores[0].  Generally the consensus is
> that (please chime if you think otherwise) it is not reasonable to
> make the safety (or semantics) of a load instruction depend on the
> type it is loading.  Introducing such a thing would involve changing
> the IR semantics in a fundamental way, and therefore has a high bar
> for acceptance.
>
> Here is a summary of the alternative solutions that were proposed here
> and on IRC (thanks Chandler, Andy, Eli!):
>
Thanks for writing up the summary, sorry I didn't summarize #2 for you as I
had promised. ;]

>   2. Introduce a flag on load and stores that either
>        a. Denotes a "gc_safety" control dependence.
>        b. Denotes a "blackbox_safety" control dependence.  In
this case
>           we will probably have some optional metadata on loads and
>           stores to indicate that the control dependence is actually on
>           GC safety.
>
Suggested name for 2b: "unsafe" or "predicated".
Essentially, that the load
is not generally safe to do even if the *memory* is generally safe to load
from because of some external effect.

>
>      As a starting point, LLVM will conservatively not speculate such
>      loads and stores; and will leave open the potential to upstream
>      logic that will have a more precise sense of when these loads and
>      stores are safe to speculate.
>
>   3. Introduce a general way to denote control dependence on loads and
>      stores.  This can be helpful to LLVM in general, and will let us
>      basically implement a more precise version of (2).
>
I actually think there is a pretty clean incremental path from 2a to 3
here. We can start with just the basic "there is something control
dependent about this" and use metadata to experiment with communicating it
to the optimizer. If we come up with something really general and precise,
we can fold it into the blanket thing, but until then we don't scope creep
or put something into the IR.


One thing you didn't mention is that this requires a matching function
attribute too so that we can distinguish between a call site that reads
memory and a call site that reads memory in a way that is potentially
unsafe.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160715/87bb6bf0/attachment.html>

Daniel Berlin via llvm-dev

2016-Jul-15 18:24 UTC

head link

[llvm-dev] RFC: Strong GC References in LLVM

On Thu, Jul 14, 2016 at 4:48 PM, Sanjoy Das <sanjoy at
playingwithpointers.com>
wrote:
> Hi all,
>
> It looks like the key controversial point is the bit about the extra
> control dependence on loads and stores[0].  Generally the consensus is
> that (please chime if you think otherwise) it is not reasonable to
> make the safety (or semantics) of a load instruction depend on the
> type it is loading.  Introducing such a thing would involve changing
> the IR semantics in a fundamental way, and therefore has a high bar
> for acceptance.
>
> Here is a summary of the alternative solutions that were proposed here
> and on IRC (thanks Chandler, Andy, Eli!):
>
>  1. Model loads and stores of GC references as intrinsic calls: add
>     llvm.gc_load, llvm.gc_store intrinsics, and optimize them as loads
>     and stores whenever appropriate and legal.
>
>  2. Introduce a flag on load and stores that either
>       a. Denotes a "gc_safety" control dependence.
>       b. Denotes a "blackbox_safety" control dependence.  In this
case
>          we will probably have some optional metadata on loads and
>          stores to indicate that the control dependence is actually on
>          GC safety.
>
>     As a starting point, LLVM will conservatively not speculate such
>     loads and stores; and will leave open the potential to upstream
>     logic that will have a more precise sense of when these loads and
>     stores are safe to speculate.
>
I think you need to define what you mean by control dependence here.  If
you mean speculation, you should say speculation :)
As you describe below, it is not enough to simply not speculate them. You
also are saying you don't want to change the conditions on which they
execute.
That is very different from speculation.

>  3. Introduce a general way to denote control dependence on loads and
>     stores.  This can be helpful to LLVM in general, and will let us
>     basically implement a more precise version of (2).
>
>
>
> [0]: I may have (incorrectly) mentioned otherwise on IRC, but we need
> to model safety properties of stores as well, to avoid transforms
> like:
>
>   %x = malloc()  ;; known thread local
>   if (cond_0) {
>     store GCREF %val to %x
>   }
>   if (cond_1) {
>     store i64 %val to %x
>   }
>
> to
>
>   %x = malloc()  ;; known thread local
>   if (cond_0 || cond_1) {
>     store GCREF %val to %x  ;; This "speculative" store is bad
>     if (cond_1) {
>       store i64 %val to %x
>     }
>   }
>

FWIW: This raises one of the same issues we have now with may-throw, which
is that, if all you have is a flag on the instruction, now you have to look
at every instruction in every block to know whether a *CFG* transform is
correct.

That means any pass that wants to just touch the CFG can't do so without
also looking at the instruction stream.  It will also make a bunch of
things currently O(N), O(N^2) (see the sets of patches fixing may-throw
places, and extrapolate to more places).

This is theoretically fixable in each pass by taking a single pass over the
instruction stream and marking which blocks have these instructions, etc,
and then using that info.

But we shouldn't have to do that in each pass,  especially if they just
want to make CFG manipulations.

The TL;DR  I would really like to see this also made a BB level flag that
says whether the block contains instructions with unknown control
dependences (or really, 1 bb level flag for each attribute you introduce) .
 I don't think this is terribly difficult, and at the very least, will keep
us from having to look at every instruction in the block in the common case
that there is nothing in the block to worry about ;)

Also note that these CFG transforms will also now need post-dominance info
in a bunch of cases to sanely determine if they are changing the control
dependence structure.

Let me ask another question:

Given

  %x = malloc()  ;; known thread local
  if (cond_0) {
    store GCREF %val to %x
  }
  if (cond_1) {
    store i64 %val to %x
  }
Assume i can prove cond0 and cond1 the same.

I change this to:
  %x = malloc()  ;; known thread local
  if (cond_0) {
    store i64 %val to %x
    store GCREF %val to %x
  }

Is this okay to happen?
Note that i did not actually change the control dependence of it by the
normal definition of control dependence.

But you can end up "hoisting" or "sinking" instructions by
moving every
other instruction :)

Is that okay, or are they really barriers as well (like may throw), in
which case, they probably need some real representation in control flow if
you want to make most algorithms O(N) (you can get away with the bb level
flags if you are willing to make some algorithms N^2 in cases where these
things exist).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160715/dfc1973b/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Jul 2016 - RFC: Strong GC References in LLVM

[llvm-dev] RFC: Strong GC References in LLVM

[llvm-dev] RFC: Strong GC References in LLVM

[llvm-dev] RFC: Strong GC References in LLVM

[llvm-dev] RFC: Strong GC References in LLVM

[llvm-dev] RFC: Strong GC References in LLVM

[llvm-dev] RFC: Strong GC References in LLVM

Possibly Parallel Threads