Good afternoon!
I'm trying to write an LLVM codegen for a Standard ML compiler
(MLton). So far things seem to match up quite nicely, but I have hit
two sticking points. I'm hoping LLVM experts might know how to handle
these two cases better.
1: In ML we have some types that are actually one of several possible
types. Expressed in C this might be thought of as a union. The codegen
only ever accesses these 'union types' via pointer. Before actually
indexing into the type, it always casts from the 'union pointer type'
to a specific pointer type.
As a concrete example. I have two types %a and %b. I want to express a
third type %c that is either %a* or %b*. Later I'll cast the %c to
either %a* or %b*.
Currently I just represent %c as i8*. I assume that this can have
consequences in terms of aliasing. I tried opaque*, but llvm-as didn't
like that. Is there any way to better represent the type %c to LLVM?
2: In the ML heap we have objects that are garbage collected. Objects
are preceded by a header that describes the object to the garbage
collector. However, pointers to the objects point past the header and
at the actual object. Sometimes, however, the program itself accesses
the header. For example, to determine the length of an array (the
length is in the header). For every type I output it like this:
%opt_33 = { i32, %opt_45*, float }
I could also create another type which includes the header something like:
%opt_33_with_header = {i32, %opt_33 }
Is there any way to express that a pointer is actually a pointer to an
interior element of a type? Something like %opt_33_in_heap %opt_33_with_header:1
?
Currently when I want to read the header of an %opt_33, I cast it to a
i32* and then use getelementptr -1. Is there a better way?
On Jun 13, 2009, at 3:54 AM, Wesley W. Terpstra wrote:> Currently I just represent %c as i8*. I assume that this can have > consequences in terms of aliasing. I tried opaque*, but llvm-as didn't > like that. Is there any way to better represent the type %c to LLVM?I assume this is for tagged sums. Logically, what you want is a distinct LLVM type for every ML union type and each of its constructors. Unfortunately, LLVM does structural uniquing of types, so that won't work. What you can do is abuse address spaces, giving every distinct type its own address space and casting back and forth between address spaces as necessary.> Is there any way to express that a pointer is actually a pointer to an > interior element of a type? Something like %opt_33_in_heap > %opt_33_with_header:1 ?Something like an ungetelementptr? No, sorry. That would be a pretty nice extension, though obviously unsound, of course.> Currently when I want to read the header of an %opt_33, I cast it to a > i32* and then use getelementptr -1. Is there a better way?I think it depends on (1) exactly how your runtime environment lays out the header and (2) whether you're trying to create portable IR (as opposed to creating IR portably). Personally, I would create a struct type (hereafter "HeaderType") for the entire GC header; when you want to access a header field, just cast the base pointer to i8*, subtract the allocation size of HeaderType, cast the result to HeaderType*, and getelementptr from there. That doesn't make portable IR, of course, but in the absence of an ungetelementptr instruction, I'm not sure how you could do that better. You can portably get the allocation size of a type using TargetData::getTypeSizeInBits(). John. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090613/eaa8566d/attachment.html>
On Sat, Jun 13, 2009 at 9:44 PM, John McCall<rjmccall at apple.com> wrote:> On Jun 13, 2009, at 3:54 AM, Wesley W. Terpstra wrote: > Currently I just represent %c as i8*. I assume that this can have > consequences in terms of aliasing. I tried opaque*, but llvm-as didn't > like that. Is there any way to better represent the type %c to LLVM? > > I assume this is for tagged sums.Yes.> Logically, what you want is a distinct LLVM type for every ML union type > and each of its constructors. Unfortunately, LLVM does structural > uniquing of types, so that won't work.Is there absolutely no way to generate a new type? Not even an 'opaque' one?> What you can do is abuse address > spaces, giving every distinct type its own address space and casting > back and forth between address spaces as necessary.The manual indicates that only addresses in space 0 can have GC intrinsics used on them. Also I get the impression that this would be a pretty unsafe idea. ;)> Is there any way to express that a pointer is actually a pointer to an > interior element of a type? Something like %opt_33_in_heap > %opt_33_with_header:1 ? > > Something like an ungetelementptr? No, sorry. That would be a > pretty nice extension, though obviously unsound, of course.Well, ungetelementptr could be nice, but I was hoping for something even better: a way to refer to the whole object type (including the header) even though my pointer doesn't point to the start of the object. Ie: this is a pointer to 8 bytes past type X. That way for normal access I punch down to the object part of the type and do my business. For access to the header, I just punch into that part of the type (which happens to involve a negative offset from the address). However, it seems that LLVM pointers always have to point to the start of an object.> Personally, I would create a struct type (hereafter "HeaderType") for the > entire GC header; when you want to access a header field, just cast the > base pointer to i8*, subtract the allocation size of HeaderType, cast the > result to HeaderType*, and getelementptr from there.That's what I'm doing right now; the HeaderType happens to be i32. ;) I am assuming that casting in and out of i8* will cost me in terms of the optimizations LLVM can apply..? Also, I couldn't find a no-op instruction in LLVM. In some places it would be convenient to say: '%x = %y'. For the moment I'm doing a bitcast from the type back to itself, which is rather awkward.
On Saturday 13 June 2009 11:54:06 Wesley W. Terpstra wrote:> Good afternoon! > > I'm trying to write an LLVM codegen for a Standard ML compiler > (MLton)...You may be interested in my HLVM project, a VM built upon LLVM and designed for MLs: http://hlvm.forge.ocamlcore.org -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e
Does this help :-
struct TaggedUnionType {
  enum { Byte, Char, Int } Type;
  union {
    uint_8 b;
    char c;
    uint32_t i;
  };
};
int main( int argc, char *argv[])
{
    TaggedUnionType t;
    t.Type = Int;
    t.i = 0xAA55;
    switch( t.Type)
    {
    case Byte:
      {
        printf( "Byte = %0x2\n", t.b);
        break;
      }
    case Char:
      {
        printf( "Char = %c\n", t.c);
        break;
      }
    case Int:
      {
        printf( "Int = %0x8\n", t.i);
        break;
      }
    return 0;
}
Output from LLVM disassembler
; ModuleID = '/tmp/webcompile/_613_0.bc'
target datalayout
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32"
target triple = "i386-pc-linux-gnu"
 %struct.TaggedUnionType = type { i32, %"struct.TaggedUnionType::._11"
}
 %"struct.TaggedUnionType::._11" = type { i32 }
@.str = internal constant [13 x i8] c"Byte = %0x2\0A\00"  ; <[13 x
i8]*>
[#uses=1]
@.str1 = internal constant [11 x i8] c"Char = %c\0A\00"  ; <[11 x
i8]*>
[#uses=1]
@.str2 = internal constant [12 x i8] c"Int = %0x8\0A\00"  ; <[12 x
i8]*>
[#uses=1]
define i32 @main(i32 %argc, i8** %argv) {
entry:
 %argc_addr = alloca i32  ; <i32*> [#uses=1]
 %argv_addr = alloca i8**  ; <i8***> [#uses=1]
 %retval = alloca i32  ; <i32*> [#uses=2]
 %t = alloca %struct.TaggedUnionType  ; <%struct.TaggedUnionType*>
[#uses=6]
 %0 = alloca i32  ; <i32*> [#uses=2]
 %"alloca point" = bitcast i32 0 to i32  ; <i32> [#uses=0]
 store i32 %argc, i32* %argc_addr
 store i8** %argv, i8*** %argv_addr
 %1 = getelementptr %struct.TaggedUnionType* %t, i32 0, i32 0  ; <i32*>
[#uses=1]
 store i32 2, i32* %1, align 4
 %2 = getelementptr %struct.TaggedUnionType* %t, i32 0, i32 1  ;
<%"struct.TaggedUnionType::._11"*> [#uses=1]
 %3 = getelementptr %"struct.TaggedUnionType::._11"* %2, i32 0, i32 0 
;
<i32*> [#uses=1]
 store i32 43605, i32* %3, align 4
 %4 = getelementptr %struct.TaggedUnionType* %t, i32 0, i32 0  ; <i32*>
[#uses=1]
 %5 = load i32* %4, align 4  ; <i32> [#uses=1]
 switch i32 %5, label %bb3 [
  i32 0, label %bb
  i32 1, label %bb1
  i32 2, label %bb2
 ]
bb:  ; preds = %entry
 %6 = getelementptr %struct.TaggedUnionType* %t, i32 0, i32 1  ;
<%"struct.TaggedUnionType::._11"*> [#uses=1]
 %7 = getelementptr %"struct.TaggedUnionType::._11"* %6, i32 0, i32 0 
;
<i32*> [#uses=1]
 %8 = bitcast i32* %7 to i8*  ; <i8*> [#uses=1]
 %9 = load i8* %8, align 4  ; <i8> [#uses=1]
 %10 = zext i8 %9 to i32  ; <i32> [#uses=1]
 %11 = call i32 (i8*, ...)* @printf(i8* noalias getelementptr ([13 x i8]*
@.str, i32 0, i32 0), i32 %10)  ; <i32> [#uses=0]
 br label %bb3
bb1:  ; preds = %entry
 %12 = getelementptr %struct.TaggedUnionType* %t, i32 0, i32 1  ;
<%"struct.TaggedUnionType::._11"*> [#uses=1]
 %13 = getelementptr %"struct.TaggedUnionType::._11"* %12, i32 0, i32
0  ;
<i32*> [#uses=1]
 %14 = bitcast i32* %13 to i8*  ; <i8*> [#uses=1]
 %15 = load i8* %14, align 4  ; <i8> [#uses=1]
 %16 = sext i8 %15 to i32  ; <i32> [#uses=1]
 %17 = call i32 (i8*, ...)* @printf(i8* noalias getelementptr ([11 x i8]*
@.str1, i32 0, i32 0), i32 %16)  ; <i32> [#uses=0]
 br label %bb3
bb2:  ; preds = %entry
 %18 = getelementptr %struct.TaggedUnionType* %t, i32 0, i32 1  ;
<%"struct.TaggedUnionType::._11"*> [#uses=1]
 %19 = getelementptr %"struct.TaggedUnionType::._11"* %18, i32 0, i32
0  ;
<i32*> [#uses=1]
 %20 = load i32* %19, align 4  ; <i32> [#uses=1]
 %21 = call i32 (i8*, ...)* @printf(i8* noalias getelementptr ([12 x i8]*
@.str2, i32 0, i32 0), i32 %20)  ; <i32> [#uses=0]
 br label %bb3
bb3:  ; preds = %bb2, %bb1, %bb, %entry
 store i32 0, i32* %0, align 4
 %22 = load i32* %0, align 4  ; <i32> [#uses=1]
 store i32 %22, i32* %retval, align 4
 br label %return
return:  ; preds = %bb3
 %retval4 = load i32* %retval  ; <i32> [#uses=1]
 ret i32 %retval4
}
declare i32 @printf(i8* noalias, ...)
Use :-
    http://llvm.org/demo/index.cgi
To convert the code. Making sure optimization is turned off, its good
optimization !
The other approach is a C++ style inheritance and have a base class with a
tag in and sub types as inheriting classes.
Hope this helps,
Aaron
2009/6/13 Wesley W. Terpstra <wesley at terpstra.ca>
> Good afternoon!
>
> I'm trying to write an LLVM codegen for a Standard ML compiler
> (MLton). So far things seem to match up quite nicely, but I have hit
> two sticking points. I'm hoping LLVM experts might know how to handle
> these two cases better.
>
> 1: In ML we have some types that are actually one of several possible
> types. Expressed in C this might be thought of as a union. The codegen
> only ever accesses these 'union types' via pointer. Before actually
> indexing into the type, it always casts from the 'union pointer
type'
> to a specific pointer type.
>
> As a concrete example. I have two types %a and %b. I want to express a
> third type %c that is either %a* or %b*. Later I'll cast the %c to
> either %a* or %b*.
>
> Currently I just represent %c as i8*. I assume that this can have
> consequences in terms of aliasing. I tried opaque*, but llvm-as didn't
> like that. Is there any way to better represent the type %c to LLVM?
>
> 2: In the ML heap we have objects that are garbage collected. Objects
> are preceded by a header that describes the object to the garbage
> collector. However, pointers to the objects point past the header and
> at the actual object. Sometimes, however, the program itself accesses
> the header. For example, to determine the length of an array (the
> length is in the header). For every type I output it like this:
>
> %opt_33 = { i32, %opt_45*, float }
>
> I could also create another type which includes the header something like:
> %opt_33_with_header = {i32, %opt_33 }
>
> Is there any way to express that a pointer is actually a pointer to an
> interior element of a type? Something like %opt_33_in_heap >
%opt_33_with_header:1 ?
>
> Currently when I want to read the header of an %opt_33, I cast it to a
> i32* and then use getelementptr -1. Is there a better way?
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090614/7afc43c6/attachment.html>