Eli Friedman via llvm-dev
2016-Jul-11 22:44 UTC
[llvm-dev] RFC: Strong GC References in LLVM
On Mon, Jul 11, 2016 at 2:28 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> ping! > > Sanjoy Das wrote: ># Proposed Solution:>> >> We introduce a "new" LLVM type. I will still refer to it as GCREF >> here, but it may actually still be "<ty> addrspace(k)*" where k is >> specially noted in the datalayout. >> >> Semantics: >> >> 1. GCREF represents an equivalence class of values (equivalence >> relation being "points to a fixed semantic object"). The bitwise >> representation fluctuates constantly outside the compiler's >> control (the dual of `undef`), but may have invariants (in >> particular, we'd like to be able to specify alignment, nonnull >> etc.). At any given point in time all GCREF instances pointing to >> the same object have the same bitwise representation (we need this >> to make `icmp eq` is well-defined). >> >> 2. GCREF instances can only contain a valid gc reference (otherwise >> they can't meaningfully "fluctuate" among the various possible >> bitwise representations of a reference). >> >> 3. Converting GCREF to integers is fine in general, but you'll get an >> arbitrary "snapshot" of the bitwise value that will generally not >> be meaningful (unless you are colluding with the GC in >> implementation defined ways). >> >> 4. Converting integers to GCREF is allowed only if source integer is >> a bitwise representation of a valid GC reference that is not an >> out of bounds derived reference. However, this is difficult for >> the compiler to infer since it typically will have no fundamental >> knowledge of what bitwise representation can be a valid GC >> reference. >> >> 5. Operations that use a GCREF-typed value are "atomic" in using the >> bitwise representation, i.e., loading from a GCREF typed value >> does not "internally" convert the GCREF to a normal >> integer-pointer and then use the integer-pointer, since that would >> mean there is a window in which the integer-pointer can become >> stale[1]. >> >> 6. A GCREF stored to a location in the heap continues to fluctuate, >> and keeps itself in sync with the right bitwise representation. >> In a way, there isn't a large distinction between the GC and the >> heap -- the heap is part of (or managed by) the GC. >> >> I think (6) is the most controversial of the semantics above, but it >> isn't very different from how `undef` stored to the heap remains >> `undef` (i.e. a non-deterministic N-bit value) and a later load can >> recover `undef` instead of getting a normal N-bit value. >> >I'm not really convinced that the GCREF type is really necessary... consider an alternate model: 1. A GCREF is never loaded into a register; it's either on the heap, or in an alloca. 2. Add an intrinsic gcref.copy which copies a gcref between two allocas. 3. Add intrinsics gcref.load_gcref(GCREF*, GCREF*, offset) and gcref.store_gcref(GCREF*, GCREF*, offset, value) which load and store a gcref through a gcref. 4. Add intrinsics gcref.load_value(GCREF*, offset) and gcref.store_value(GCREF*, offset, value) which load and store normal values a gcref. 5. The statepoint lowering pass gets rid of the allocas. Keeping GCREFs exclusively in memory means the LLVM optimizer will handle them conservatively, but correctly. I guess the problem with this is precisely that the LLVM optimizer will handle them conservatively... but on the flip side, I think you're going to end up chasing down weird problems forever if a "load" from an alloca has side-effects. -Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160711/b628a523/attachment.html>
Chandler Carruth via llvm-dev
2016-Jul-11 22:56 UTC
[llvm-dev] RFC: Strong GC References in LLVM
On Mon, Jul 11, 2016 at 3:44 PM Eli Friedman <eli.friedman at gmail.com> wrote:> On Mon, Jul 11, 2016 at 2:28 PM, Sanjoy Das < > sanjoy at playingwithpointers.com> wrote: > >> ping! >> >> Sanjoy Das wrote: >> > # Proposed Solution: >>> >>> We introduce a "new" LLVM type. I will still refer to it as GCREF >>> here, but it may actually still be "<ty> addrspace(k)*" where k is >>> specially noted in the datalayout. >>> >>> Semantics: >>> >>> 1. GCREF represents an equivalence class of values (equivalence >>> relation being "points to a fixed semantic object"). The bitwise >>> representation fluctuates constantly outside the compiler's >>> control (the dual of `undef`), but may have invariants (in >>> particular, we'd like to be able to specify alignment, nonnull >>> etc.). At any given point in time all GCREF instances pointing to >>> the same object have the same bitwise representation (we need this >>> to make `icmp eq` is well-defined). >>> >>> 2. GCREF instances can only contain a valid gc reference (otherwise >>> they can't meaningfully "fluctuate" among the various possible >>> bitwise representations of a reference). >>> >>> 3. Converting GCREF to integers is fine in general, but you'll get an >>> arbitrary "snapshot" of the bitwise value that will generally not >>> be meaningful (unless you are colluding with the GC in >>> implementation defined ways). >>> >>> 4. Converting integers to GCREF is allowed only if source integer is >>> a bitwise representation of a valid GC reference that is not an >>> out of bounds derived reference. However, this is difficult for >>> the compiler to infer since it typically will have no fundamental >>> knowledge of what bitwise representation can be a valid GC >>> reference. >>> >>> 5. Operations that use a GCREF-typed value are "atomic" in using the >>> bitwise representation, i.e., loading from a GCREF typed value >>> does not "internally" convert the GCREF to a normal >>> integer-pointer and then use the integer-pointer, since that would >>> mean there is a window in which the integer-pointer can become >>> stale[1]. >>> >>> 6. A GCREF stored to a location in the heap continues to fluctuate, >>> and keeps itself in sync with the right bitwise representation. >>> In a way, there isn't a large distinction between the GC and the >>> heap -- the heap is part of (or managed by) the GC. >>> >>> I think (6) is the most controversial of the semantics above, but it >>> isn't very different from how `undef` stored to the heap remains >>> `undef` (i.e. a non-deterministic N-bit value) and a later load can >>> recover `undef` instead of getting a normal N-bit value. >>> >> > I'm not really convinced that the GCREF type is really necessary... > consider an alternate model: > > 1. A GCREF is never loaded into a register; it's either on the heap, or in > an alloca. > 2. Add an intrinsic gcref.copy which copies a gcref between two allocas. > 3. Add intrinsics gcref.load_gcref(GCREF*, GCREF*, offset) and > gcref.store_gcref(GCREF*, GCREF*, offset, value) which load and store a > gcref through a gcref. > 4. Add intrinsics gcref.load_value(GCREF*, offset) and > gcref.store_value(GCREF*, offset, value) which load and store normal values > a gcref. > 5. The statepoint lowering pass gets rid of the allocas. > > Keeping GCREFs exclusively in memory means the LLVM optimizer will handle > them conservatively, but correctly. > > I guess the problem with this is precisely that the LLVM optimizer will > handle them conservatively... but on the flip side, I think you're going to > end up chasing down weird problems forever if a "load" from an alloca has > side-effects. >I think everything but this last weird aspect we already get from address spaces. I misread the proposal originally and didn't understand that the problem was loading from an alloca *holding* the GC pointer, and thus it was a normal and boring load that somehow has to have side-effects. I fundamentally think that we can't do that. I can see several ways to make the result work without that. - Teach the statepoint rewriting to handle hoisted loads in some way (haven't thought too much about how feasible this is) - Tell LLVM that the load has this weird control dependence with some mechanism (make it a special gc load intrinsic, or a volatile load, or ....) -Chandler -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160711/313e8daa/attachment.html>
Sanjoy Das via llvm-dev
2016-Jul-11 23:26 UTC
[llvm-dev] RFC: Strong GC References in LLVM
Hi Eli, Eli Friedman wrote: > > I'm not really convinced that the GCREF type is really necessary... > consider an alternate model: > > 1. A GCREF is never loaded into a register; it's either on the heap, or > in an alloca. > 2. Add an intrinsic gcref.copy which copies a gcref between two allocas. > 3. Add intrinsics gcref.load_gcref(GCREF*, GCREF*, offset) and > gcref.store_gcref(GCREF*, GCREF*, offset, value) which load and store a > gcref through a gcref. > 4. Add intrinsics gcref.load_value(GCREF*, offset) and > gcref.store_value(GCREF*, offset, value) which load and store normal > values a gcref. > 5. The statepoint lowering pass gets rid of the allocas. This is similar to what we used to call "early safepoint insertion", but ... > > Keeping GCREFs exclusively in memory means the LLVM optimizer will > handle them conservatively, but correctly. > > I guess the problem with this is precisely that the LLVM optimizer will > handle them conservatively... ... yup. We _want_ LLVM to optimize well. > but on the flip side, I think you're going > to end up chasing down weird problems forever if a "load" from an alloca > has side-effects. I need to carefully think through this before committing to it, but I think we're generally okay with completely disallowing allocas of GCREF type (like we do for token types today). If we could do that is your concern addressed? -- Sanjoy
Sanjoy Das via llvm-dev
2016-Jul-11 23:56 UTC
[llvm-dev] RFC: Strong GC References in LLVM
Hi Chandler, Chandler Carruth wrote: > I think everything but this last weird aspect we already get from > address spaces. > > I misread the proposal originally and didn't understand that the problem > was loading from an alloca *holding* the GC pointer, and thus it was a > normal and boring load that somehow has to have side-effects. > > I fundamentally think that we can't do that. I can see several ways to > make the result work without that. > > - Teach the statepoint rewriting to handle hoisted loads in some way > (haven't thought too much about how feasible this is) I think that is impossible in general, given LLVM's IR. If we start with: // IR_1 for (;;) { if (condition) { use(this->field); // static type of(&(this->field)) == GCREF* } } and LICM it to: // IR_2 GCREF val = this->field for (;;) { if (condition) { use(val); } } then when rewriting it is incorrect to go from IR_2 to IR_1, assuming that IR_2 is all we see, since in IR_2 use sees a single unique value for every iteration whereas in IR_1 it may see different values in different iterations. > - Tell LLVM that the load has this weird control dependence with some > mechanism (make it a special gc load intrinsic, or a volatile load, or ....) A gc_load intrinsic is not a reasonable idea actually, but I think it will be worse in the long run. To start off, we'd need to at least add gc_load to gc_load and store to gc_load forwarding (the former for control equivalent gc_loads). We have a set of analyses locally (downstream) that can infer Java types, which can be used to justify hoisting gc_loads as well (by proving that the gc_load will result in a GC reference in the new location) that we'd like to eventually upstream. Once all the dust settles, I think we'll end up with too many cases of: if (auto *LI = dyn_cast<LoadInst>(V)) { process LI } if (auto *II = dyn_cast<IntrinsicInst>(V)) if (II->getIntrinsicID() == gc_load) { prrocess II; // almost duplicate of process LI } whereas a GCREF type (however it is represented, either as <ty> addrspace(k)* or a new type) will let us fold most of the interesting logic into the safety checks we already do for load instructions. Slightly unrelated to the above, but here is another example (apart from the LICM example above) of why doing "repair" in the rewrite pass is difficult, after letting the optimizer run its course: If we have: if (always_false) { GCREF val = this->field; use(val); } safepoint(); if (always_false) { GCREF val = this->field; use(val); } clobber_this_field(); and we hoist and CSE this to: GCREF val = this->field; if (always_false) { use(val); } safepoint(); clobber_this_field(); if (always_false) { use(val); } then we have "val" live over the safepoint(), but there is no guarantee that "val" actually contains a GC reference. Moreover, we can't sink the load into the two blocks that use it, since we've clobber_this_field() in a way that precludes that. -- Sanjoy
Eli Friedman via llvm-dev
2016-Jul-12 01:49 UTC
[llvm-dev] RFC: Strong GC References in LLVM
On Mon, Jul 11, 2016 at 4:26 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> Hi Eli, > > Eli Friedman wrote: > > > > I'm not really convinced that the GCREF type is really necessary... > > consider an alternate model: > > > > 1. A GCREF is never loaded into a register; it's either on the heap, or > > in an alloca. > > 2. Add an intrinsic gcref.copy which copies a gcref between two allocas. > > 3. Add intrinsics gcref.load_gcref(GCREF*, GCREF*, offset) and > > gcref.store_gcref(GCREF*, GCREF*, offset, value) which load and store a > > gcref through a gcref. > > 4. Add intrinsics gcref.load_value(GCREF*, offset) and > > gcref.store_value(GCREF*, offset, value) which load and store normal > > values a gcref. > > 5. The statepoint lowering pass gets rid of the allocas. > > This is similar to what we used to call "early safepoint insertion", > but ... > > > > > Keeping GCREFs exclusively in memory means the LLVM optimizer will > > handle them conservatively, but correctly. > > > > I guess the problem with this is precisely that the LLVM optimizer will > > handle them conservatively... > > ... yup. We _want_ LLVM to optimize well. >What optimizations are you missing? The very fact that it isn't in SSA form hurts to some extent... and I guess you want to leverage existing optimizations for loads and stores?> > but on the flip side, I think you're going > > to end up chasing down weird problems forever if a "load" from an alloca > > has side-effects. > > I need to carefully think through this before committing to it, but I > think we're generally okay with completely disallowing allocas of > GCREF type (like we do for token types today). If we could do that > is your concern addressed? >In general, if I think of a GCREF as an unsized handle to an object rather than a pointer with weird properties, it's somewhat less scary. Also, it's more clear how to deal with a GCREF if you explicitly list out operations which are legal rather than which operations are illegal. For example "A GCREF is a new type of value. It does not have a representation in memory. You can use GEP on a GCREF. You can load or store through a GCREF; this resolves the GCREF to a specific address in memory, then acts like a load/store from that address. GCREFs have these aliasing rules: [...]. There are GCREF intrinsics for constructing and manipulating GCREFs. No other operations on GCREFs are allowed." I guess the bit I'm most worried about is using plain loads to create GCREFs, and plain stores to store them. As long as we don't have that, existing optimization passes will essentially just work, I think; we won't try to duplicate or hoist loads in situations where that isn't allowed, and we won't try to perform weird value manipulations like trying to construct one using an inttoptr. (I guess there are some passes that assume the operand of a load is specifically a PointerType, but that's not really a semantic difference.) -Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160711/24ad079a/attachment.html>