thr3ads.net - llvm dev - [LLVMdev] ocaml+llvm [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Gordon Henriksen

2007-Aug-14 10:24 UTC

[LLVMdev] ocaml+llvm

On Aug 14, 2007, at 00:23, Chris Lattner wrote:
> On Mon, 13 Aug 2007, Gordon Henriksen wrote:
>
>> Changing these structures breaks binary compatibility (including C  
>> interop).
>
> If that is so, and if there is no way around this, then it makes  
> sense to develop some compatibility mode.  How does native C code  
> generate these tables?
I might've conflated two related points. The frametable isn't really  
at issue with respect to C interop. Rather, the decision to get into  
the runtime business is.

To be specific, C interop doesn't generate frame table entries. It  
anchors roots by explicitly registering/unregistering root pads with  
the runtime. Very similar to the paper you cited, although not a stack.

C interop does (from macros) use runtime information directly derived  
from the data/text bracketing symbols I also mentioned.
>>>  and then walk the stack with the llvm_cg_walk_gcroots function.  
>>> Note that  these are not well tested and probably have holes in  
>>> them, but these are  the interfaces we want to use :)
>>
>> But here I have to disagree. Quite by design, LLVM lacks an  
>> existing runtime to leverage: LLVM  CLR. In light of that, it is  
>> difficult to justify ignoring a mature runtime that does exist in  
>> order to avoid building a simple data structure.
>
> Sure.  Given a lack of current implementation, if the ocaml  
> implementation would work and meet our needs, I have no problem  
> with adopting/stealing it :)
I'm not sure you want to incorporate ocaml's runtime per se; its  
object model is very unique. :) How do you envision LLVM GC support  
going?

My thinking was to merely extend LLVM to be a less "uncooperative  
environment" for GC (as the paper you referenced put it), while  
maintaining neutrality to the runtime and object model.

The two major problems I had really boil down to identifying GC  
points in machine code and statically identifying live roots at those  
GC points, both problems common to many collection techniques.  
Looking at the problem from that perspective makes the problem much  
more tractable, actually…
>> [2] ocaml is licensed under the QPL [Trolltech/KDE], which has an  
>> onerous distribution clause which prohibits forks. My current work  
>> leaves the existing code generator in place, touching only a few  
>> files to integrate LLVM code generation as a runtime option; this  
>> improves the possibility that a patch would be accepted, and  
>> otherwise makes patch maintenance manageable.
>
> We can work around this on our end going forward.
How so?
> What this means in the short term is that we can't accept any Ocaml- 
> project code into LLVM. This does not prevent the ocaml people from  
> using LLVM code though.
Exactly. ocaml is written in itself, so there's little likelihood of  
that becoming a problem. :)

— Gordon

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20070814/9be2c498/attachment.html>

Gordon Henriksen

2007-Aug-14 11:35 UTC

head link

[LLVMdev] ocaml+llvm

On Aug 14, 2007, at 06:24, Gordon Henriksen wrote:
> The two major problems I had really boil down to identifying GC  
> points in machine code and statically identifying live roots at  
> those GC points, both problems common to many collection  
> techniques. Looking at the problem from that perspective makes the  
> problem much more tractable, actually…
Chris,

This is much more generic than what I was originally thinking, but  
what do you think of providing a facility like this?

     enum GCPointKind {
       GCPSubroutineReturn,
       GCPFunctionExit,
       GCPSubroutineCall,
       GCPBackBranch
     };

     class ??? {
     public:
       // Allows the code generator to avoid inserting unnecessary GC  
points.
       bool RequiresGCPointsOfKind(GCPointKind Kind) const;

       // If false, instructs the code generator to keep all GC roots  
on the
       // stack at GC points.
       bool SupportsRegisterRoots() const; // false for now...

       // Visits a GC point, including information necessary to  
identify all
       // live GC roots in the stack frame.
       virtual bool VisitGCPoint(MachineInstruction *Point,
                                 GCPointKind Kind,
                                 intptr_t *StackRoots, size_t  
StackCount,
                                 int *RegisterRoots, size_t  
RegisterCount);
     };

That could enable several kinds of conservative collectors.

The code generator would need to cooperate by:

  1. Inserting nodes for GC points if required, recording the  
corresponding original Instruction*.
  2. Recording the physical location of each llvm.gcroot'd alloca.

Later, a MachineFunctionPass could walk the MachineFunction to lower  
these nodes. Live roots can be identified using domination information.

In addition to providing a stack and register map for building  
tables, such an interface offers a collector a chance to insert any  
necessary landing pads at exit/call/back-branch.

If the collector supports it, LLVM could be allowed to scalarize GC  
roots down the road.

— Gordon

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20070814/62d71db8/attachment.html>

Chris Lattner

2007-Aug-20 00:40 UTC

head link

[LLVMdev] ocaml+llvm

On Aug 14, 2007, at 3:24 AM, Gordon Henriksen wrote:> On Aug 14, 2007, at 00:23, Chris Lattner wrote:
>> On Mon, 13 Aug 2007, Gordon Henriksen wrote:
>>> Changing these structures breaks binary compatibility (including  
>>> C interop).
>> If that is so, and if there is no way around this, then it makes  
>> sense to develop some compatibility mode.  How does native C code  
>> generate these tables?
>
> I might've conflated two related points. The frametable isn't  
> really at issue with respect to C interop. Rather, the decision to  
> get into the runtime business is.
Ok.
> To be specific, C interop doesn't generate frame table entries. It  
> anchors roots by explicitly registering/unregistering root pads  
> with the runtime. Very similar to the paper you cited, although not  
> a stack.
Ok, that seems easy to support. LLVM doesn't get in the runtime  
business either, the code generators just provide an opaque mechanism  
to walk pointers in the stack.  The GC runtime library discussed on  
the accurate GC page is nice (if it existed ;-) but optional.
> C interop does (from macros) use runtime information directly  
> derived from the data/text bracketing symbols I also mentioned.
Ok.
>>>>  and then walk the stack with the llvm_cg_walk_gcroots
function.
>>>> Note that  these are not well tested and probably have holes in
>>>> them, but these are  the interfaces we want to use :)
>>>
>>> But here I have to disagree. Quite by design, LLVM lacks an  
>>> existing runtime to leverage: LLVM  CLR. In light of that, it is  
>>> difficult to justify ignoring a mature runtime that does exist in  
>>> order to avoid building a simple data structure.
>>
>> Sure.  Given a lack of current implementation, if the ocaml  
>> implementation would work and meet our needs, I have no problem  
>> with adopting/stealing it :)
>
> I'm not sure you want to incorporate ocaml's runtime per se; its  
> object model is very unique. :) How do you envision LLVM GC support  
> going?
No idea - I know little about the ocaml GC.  I assumed it was generic  
enough to be reused in other contexts.  If not, then disregard my  
comments.
> My thinking was to merely extend LLVM to be a less "uncooperative  
> environment" for GC (as the paper you referenced put it), while  
> maintaining neutrality to the runtime and object model.
Yep, sounds good.
> The two major problems I had really boil down to identifying GC  
> points in machine code and statically identifying live roots at  
> those GC points, both problems common to many collection  
> techniques. Looking at the problem from that perspective makes the  
> problem much more tractable, actually…
Yep.
>>> [2] ocaml is licensed under the QPL [Trolltech/KDE], which has an  
>>> onerous distribution clause which prohibits forks. My current  
>>> work leaves the existing code generator in place, touching only a  
>>> few files to integrate LLVM code generation as a runtime option;  
>>> this improves the possibility that a patch would be accepted, and  
>>> otherwise makes patch maintenance manageable.
>>
>> We can work around this on our end going forward.
>
> How so?
If it becomes important, we can re-implement pieces that are needed.
>> What this means in the short term is that we can't accept any  
>> Ocaml-project code into LLVM. This does not prevent the ocaml  
>> people from using LLVM code though.
>
> Exactly. ocaml is written in itself, so there's little likelihood  
> of that becoming a problem. :)
:)

-Chris

Chris Lattner

2007-Aug-20 00:43 UTC

head link

[LLVMdev] ocaml+llvm

On Aug 14, 2007, at 4:35 AM, Gordon Henriksen wrote:
> On Aug 14, 2007, at 06:24, Gordon Henriksen wrote:
>
>> The two major problems I had really boil down to identifying GC  
>> points in machine code and statically identifying live roots at  
>> those GC points, both problems common to many collection  
>> techniques. Looking at the problem from that perspective makes the  
>> problem much more tractable, actually…
>
> Chris,
>
> This is much more generic than what I was originally thinking, but  
> what do you think of providing a facility like this?
> That could enable several kinds of conservative collectors.
I'm not sure I follow what you mean.
> The code generator would need to cooperate by:
>
>  1. Inserting nodes for GC points if required, recording the  
> corresponding original Instruction*.
>  2. Recording the physical location of each llvm.gcroot'd alloca.
This is somewhat implicit in the design we want.  The only question  
is how to identify these at runtime.  In theory, the code generator  
already knows what the gc points are and where the pointers on the  
stack are located.  For a "cooperative" code generator, the codegen  
should emit tables (similar to EH tables) that describe this, and the  
llvm code generator callback should enumerate these.
> Later, a MachineFunctionPass could walk the MachineFunction to  
> lower these nodes. Live roots can be identified using domination  
> information.
These are optimizations that can be implemented in the code  
generator, there are many others as well.  For the time being, we  
don't even have real dominator info in the code generator though..

-Chris

Alain Frisch

2007-Aug-20 06:04 UTC

head link

[LLVMdev] ocaml+llvm

Chris Lattner wrote:> On Aug 14, 2007, at 3:24 AM, Gordon Henriksen wrote:
>> I'm not sure you want to incorporate ocaml's runtime per se;
its
>> object model is very unique. :) How do you envision LLVM GC support  
>> going?
> 
> No idea - I know little about the ocaml GC.  I assumed it was generic  
> enough to be reused in other contexts.  If not, then disregard my  
> comments.
The OCaml GC is quite generic. Basically it exposes a model where values 
  all uniformly represented by a N-bit machine word (N=32 or 64) which 
can denote either a (N-1)-bit integer (with the lsb set to 1), a pointer 
to a block controlled by the GC, or a pointer outside the GC area 
(though the last possibility is considered bad style).

A GC block comes with a 1-word header that describes its length and also 
contain a small 8-bit tag (plus a few bits for the GC). In OCaml, the 
tag is used in particular to store the constructor for union types 
(discriminated sums), but the runtime system is really agnostic about 
the meaning of the tag except for a few special values which denote 
special kinds of blocks such as:
- byte buffers (not traced by the GC, used to represent OCaml strings), 
- custom blocks (native data not controlled by the GC, plus a pointer to 
a table of custom functions to implement finalization and generic 
operations like equality/hashing/serialization),
- lazy pointers (the GC is able to shortcut chains of forwarding 
pointers to forced values),
- arrays of weak pointers,...

Of course, this model makes some assumptions, but it could certainly be 
useful to support languages other than OCaml!


-- Alain

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Aug 2007 - [LLVMdev] ocaml+llvm

[LLVMdev] ocaml+llvm

[LLVMdev] ocaml+llvm

[LLVMdev] ocaml+llvm

[LLVMdev] ocaml+llvm

[LLVMdev] ocaml+llvm

Possibly Parallel Threads