On Aug 14, 2007, at 00:23, Chris Lattner wrote:> On Mon, 13 Aug 2007, Gordon Henriksen wrote: > >> Changing these structures breaks binary compatibility (including C >> interop). > > If that is so, and if there is no way around this, then it makes > sense to develop some compatibility mode. How does native C code > generate these tables?I might've conflated two related points. The frametable isn't really at issue with respect to C interop. Rather, the decision to get into the runtime business is. To be specific, C interop doesn't generate frame table entries. It anchors roots by explicitly registering/unregistering root pads with the runtime. Very similar to the paper you cited, although not a stack. C interop does (from macros) use runtime information directly derived from the data/text bracketing symbols I also mentioned.>>> and then walk the stack with the llvm_cg_walk_gcroots function. >>> Note that these are not well tested and probably have holes in >>> them, but these are the interfaces we want to use :) >> >> But here I have to disagree. Quite by design, LLVM lacks an >> existing runtime to leverage: LLVM CLR. In light of that, it is >> difficult to justify ignoring a mature runtime that does exist in >> order to avoid building a simple data structure. > > Sure. Given a lack of current implementation, if the ocaml > implementation would work and meet our needs, I have no problem > with adopting/stealing it :)I'm not sure you want to incorporate ocaml's runtime per se; its object model is very unique. :) How do you envision LLVM GC support going? My thinking was to merely extend LLVM to be a less "uncooperative environment" for GC (as the paper you referenced put it), while maintaining neutrality to the runtime and object model. The two major problems I had really boil down to identifying GC points in machine code and statically identifying live roots at those GC points, both problems common to many collection techniques. Looking at the problem from that perspective makes the problem much more tractable, actually…>> [2] ocaml is licensed under the QPL [Trolltech/KDE], which has an >> onerous distribution clause which prohibits forks. My current work >> leaves the existing code generator in place, touching only a few >> files to integrate LLVM code generation as a runtime option; this >> improves the possibility that a patch would be accepted, and >> otherwise makes patch maintenance manageable. > > We can work around this on our end going forward.How so?> What this means in the short term is that we can't accept any Ocaml- > project code into LLVM. This does not prevent the ocaml people from > using LLVM code though.Exactly. ocaml is written in itself, so there's little likelihood of that becoming a problem. :) — Gordon -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070814/9be2c498/attachment.html>
On Aug 14, 2007, at 06:24, Gordon Henriksen wrote:> The two major problems I had really boil down to identifying GC > points in machine code and statically identifying live roots at > those GC points, both problems common to many collection > techniques. Looking at the problem from that perspective makes the > problem much more tractable, actually…Chris, This is much more generic than what I was originally thinking, but what do you think of providing a facility like this? enum GCPointKind { GCPSubroutineReturn, GCPFunctionExit, GCPSubroutineCall, GCPBackBranch }; class ??? { public: // Allows the code generator to avoid inserting unnecessary GC points. bool RequiresGCPointsOfKind(GCPointKind Kind) const; // If false, instructs the code generator to keep all GC roots on the // stack at GC points. bool SupportsRegisterRoots() const; // false for now... // Visits a GC point, including information necessary to identify all // live GC roots in the stack frame. virtual bool VisitGCPoint(MachineInstruction *Point, GCPointKind Kind, intptr_t *StackRoots, size_t StackCount, int *RegisterRoots, size_t RegisterCount); }; That could enable several kinds of conservative collectors. The code generator would need to cooperate by: 1. Inserting nodes for GC points if required, recording the corresponding original Instruction*. 2. Recording the physical location of each llvm.gcroot'd alloca. Later, a MachineFunctionPass could walk the MachineFunction to lower these nodes. Live roots can be identified using domination information. In addition to providing a stack and register map for building tables, such an interface offers a collector a chance to insert any necessary landing pads at exit/call/back-branch. If the collector supports it, LLVM could be allowed to scalarize GC roots down the road. — Gordon -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070814/62d71db8/attachment.html>
On Aug 14, 2007, at 3:24 AM, Gordon Henriksen wrote:> On Aug 14, 2007, at 00:23, Chris Lattner wrote: >> On Mon, 13 Aug 2007, Gordon Henriksen wrote: >>> Changing these structures breaks binary compatibility (including >>> C interop). >> If that is so, and if there is no way around this, then it makes >> sense to develop some compatibility mode. How does native C code >> generate these tables? > > I might've conflated two related points. The frametable isn't > really at issue with respect to C interop. Rather, the decision to > get into the runtime business is.Ok.> To be specific, C interop doesn't generate frame table entries. It > anchors roots by explicitly registering/unregistering root pads > with the runtime. Very similar to the paper you cited, although not > a stack.Ok, that seems easy to support. LLVM doesn't get in the runtime business either, the code generators just provide an opaque mechanism to walk pointers in the stack. The GC runtime library discussed on the accurate GC page is nice (if it existed ;-) but optional.> C interop does (from macros) use runtime information directly > derived from the data/text bracketing symbols I also mentioned.Ok.>>>> and then walk the stack with the llvm_cg_walk_gcroots function. >>>> Note that these are not well tested and probably have holes in >>>> them, but these are the interfaces we want to use :) >>> >>> But here I have to disagree. Quite by design, LLVM lacks an >>> existing runtime to leverage: LLVM CLR. In light of that, it is >>> difficult to justify ignoring a mature runtime that does exist in >>> order to avoid building a simple data structure. >> >> Sure. Given a lack of current implementation, if the ocaml >> implementation would work and meet our needs, I have no problem >> with adopting/stealing it :) > > I'm not sure you want to incorporate ocaml's runtime per se; its > object model is very unique. :) How do you envision LLVM GC support > going?No idea - I know little about the ocaml GC. I assumed it was generic enough to be reused in other contexts. If not, then disregard my comments.> My thinking was to merely extend LLVM to be a less "uncooperative > environment" for GC (as the paper you referenced put it), while > maintaining neutrality to the runtime and object model.Yep, sounds good.> The two major problems I had really boil down to identifying GC > points in machine code and statically identifying live roots at > those GC points, both problems common to many collection > techniques. Looking at the problem from that perspective makes the > problem much more tractable, actually…Yep.>>> [2] ocaml is licensed under the QPL [Trolltech/KDE], which has an >>> onerous distribution clause which prohibits forks. My current >>> work leaves the existing code generator in place, touching only a >>> few files to integrate LLVM code generation as a runtime option; >>> this improves the possibility that a patch would be accepted, and >>> otherwise makes patch maintenance manageable. >> >> We can work around this on our end going forward. > > How so?If it becomes important, we can re-implement pieces that are needed.>> What this means in the short term is that we can't accept any >> Ocaml-project code into LLVM. This does not prevent the ocaml >> people from using LLVM code though. > > Exactly. ocaml is written in itself, so there's little likelihood > of that becoming a problem. :):) -Chris
On Aug 14, 2007, at 4:35 AM, Gordon Henriksen wrote:> On Aug 14, 2007, at 06:24, Gordon Henriksen wrote: > >> The two major problems I had really boil down to identifying GC >> points in machine code and statically identifying live roots at >> those GC points, both problems common to many collection >> techniques. Looking at the problem from that perspective makes the >> problem much more tractable, actually… > > Chris, > > This is much more generic than what I was originally thinking, but > what do you think of providing a facility like this? > That could enable several kinds of conservative collectors.I'm not sure I follow what you mean.> The code generator would need to cooperate by: > > 1. Inserting nodes for GC points if required, recording the > corresponding original Instruction*. > 2. Recording the physical location of each llvm.gcroot'd alloca.This is somewhat implicit in the design we want. The only question is how to identify these at runtime. In theory, the code generator already knows what the gc points are and where the pointers on the stack are located. For a "cooperative" code generator, the codegen should emit tables (similar to EH tables) that describe this, and the llvm code generator callback should enumerate these.> Later, a MachineFunctionPass could walk the MachineFunction to > lower these nodes. Live roots can be identified using domination > information.These are optimizations that can be implemented in the code generator, there are many others as well. For the time being, we don't even have real dominator info in the code generator though.. -Chris
Chris Lattner wrote:> On Aug 14, 2007, at 3:24 AM, Gordon Henriksen wrote:>> I'm not sure you want to incorporate ocaml's runtime per se; its >> object model is very unique. :) How do you envision LLVM GC support >> going? > > No idea - I know little about the ocaml GC. I assumed it was generic > enough to be reused in other contexts. If not, then disregard my > comments.The OCaml GC is quite generic. Basically it exposes a model where values all uniformly represented by a N-bit machine word (N=32 or 64) which can denote either a (N-1)-bit integer (with the lsb set to 1), a pointer to a block controlled by the GC, or a pointer outside the GC area (though the last possibility is considered bad style). A GC block comes with a 1-word header that describes its length and also contain a small 8-bit tag (plus a few bits for the GC). In OCaml, the tag is used in particular to store the constructor for union types (discriminated sums), but the runtime system is really agnostic about the meaning of the tag except for a few special values which denote special kinds of blocks such as: - byte buffers (not traced by the GC, used to represent OCaml strings), - custom blocks (native data not controlled by the GC, plus a pointer to a table of custom functions to implement finalization and generic operations like equality/hashing/serialization), - lazy pointers (the GC is able to shortcut chains of forwarding pointers to forced values), - arrays of weak pointers,... Of course, this model makes some assumptions, but it could certainly be useful to support languages other than OCaml! -- Alain