thr3ads.net - llvm dev - [LLVMdev] New Type System Questions [Jul 2011]

If this information is useful, please help other people find it:
Share via:

Talin

2011-Jul-25 05:59 UTC

[LLVMdev] New Type System Questions

So far I'm really liking the new type system -- I've been able to
simplify
my code generator in a number of areas. And the IR is now vastly more
readable, both in the debugger (using dump()) and when printing modules via
llvm-dis. It's a tremendous improvement.

I do have a few comments / questions:

-- I think I may be misunderstanding how named structs are supposed to be
combined in the linker. Say we have a type that is defined in two modules
with the same name, however in one of the modules the type is abstract and
in the other module it has a body. The behavior I would expect is that it
would merge the two definitions, so that now you have one type with a body.
However, instead what I am getting is a lot of renamed types
- %tart.reflect.NameTable.3562 and so on. This is puzzling, as I shouldn't
have any renamed types in my modules at all.

-- I notice that BitReader now catches some errors that are missed by the
module verifier. (I submitted a bug report on this). Basically, you can
create an abstract type and have a GEP instruction that uses that type - and
it will pass through the module verifier and the bitcode writer, but the
bitcode reader will assert when it tries to load it in. Yes, I can now
create modules that cause llvm-dis to abort :)

-- Self-referential vs. anonymous types. This is more of a comment than a
question: in my language, String literals are implemented as anonymous types
because the string data follows header struct in memory. So basically
there's a named type with the format:

   tart.core.String = { %ObjectHeader, %tart.core.String*, int32, [0 x char]
}

And then for a string literal of length N there's an anonymous type:

   { %ObjectHeader, %tart.core.String*, int32, [N x char] }

It's anonymous because it doesn't make sense to generate a new named
type
for each different length of string. Now, the reason for the
%tart.core.String* field in the middle there is to support substring
references - substrings point to the orignal string, whereas non-substrings
point to themselves (I've left out a few fields for purposes of this
example.)

So to make the string literal we need to have create a Constant, whose type
is an anonymous struct, which has a pointer to itself embedded within it.
Turns out that you can do this with UndefValue, as long as when you refine
the undef, you pointer-cast the anon struct to the named struct.

I only mention this because it took me a while to figure out, and it's the
kind of recipe that you might want to consider mentioning in the programmers
manual, along with the recipe for creating self-referential named structs.

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110724/3adcb0f0/attachment.html>

Garrison Venn

2011-Jul-25 13:35 UTC

head link

[LLVMdev] New Type System Questions

Hi Talin,

On Jul 25, 2011, at 1:59, Talin wrote:
> So far I'm really liking the new type system -- I've been able to
simplify my code generator in a number of areas. And the IR is now vastly more
readable, both in the debugger (using dump()) and when printing modules via
llvm-dis. It's a tremendous improvement.
> 
> I do have a few comments / questions:
> 
> -- I think I may be misunderstanding how named structs are supposed to be
combined in the linker. Say we have a type that is defined in two modules with
the same name, however in one of the modules the type is abstract and in the
other module it has a body. The behavior I would expect is that it would merge
the two definitions, so that now you have one type with a body. However, instead
what I am getting is a lot of renamed types - %tart.reflect.NameTable.3562 and
so on. This is puzzling, as I shouldn't have any renamed types in my modules
at all.
> 
>From an implementation perspective at least:
The named struct types are stored in the context. As soon as you create a new 
StructType with the same name it will instead name your new
"StructType" to new name
as above. I believe you instead need to pull the "old" named struct
type from the
context, and then set the body or just reference it. Since your modules are
sharing
the the same context, your issue is manifesting itself.

Garrison

[snip]
> -- 
> -- Talin
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Talin

2011-Jul-25 15:52 UTC

head link

[LLVMdev] New Type System Questions

On Mon, Jul 25, 2011 at 6:35 AM, Garrison Venn <gvenn.cfe.dev at
gmail.com>wrote:
> Hi Talin,
>
> On Jul 25, 2011, at 1:59, Talin wrote:
>
> > So far I'm really liking the new type system -- I've been able
to
> simplify my code generator in a number of areas. And the IR is now vastly
> more readable, both in the debugger (using dump()) and when printing
modules
> via llvm-dis. It's a tremendous improvement.
> >
> > I do have a few comments / questions:
> >
> > -- I think I may be misunderstanding how named structs are supposed to
be
> combined in the linker. Say we have a type that is defined in two modules
> with the same name, however in one of the modules the type is abstract and
> in the other module it has a body. The behavior I would expect is that it
> would merge the two definitions, so that now you have one type with a body.
> However, instead what I am getting is a lot of renamed types -
> %tart.reflect.NameTable.3562 and so on. This is puzzling, as I
shouldn't
> have any renamed types in my modules at all.
> >
>
> From an implementation perspective at least:
>
> The named struct types are stored in the context. As soon as you create a
> new
> StructType with the same name it will instead name your new
"StructType" to
> new name
> as above. I believe you instead need to pull the "old" named
struct type
> from the
> context, and then set the body or just reference it. Since your modules are
> sharing
> the the same context, your issue is manifesting itself.
>
> I guess I wasn't clear. I'm talking about two bitcode files which I
thenfeed into llvm-ld. None of my code is involved at this point.

I should mention that I'm seeing these renamed types in the debugger -
because of the other issue I mentioned (abort in BitReader) I haven't
actually seen what the merged output looks like.

Garrison>
> [snip]
>
> > --
> > -- Talin
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110725/7022b7e5/attachment.html>

Chris Lattner

2011-Jul-26 04:43 UTC

head link

[LLVMdev] New Type System Questions

On Jul 24, 2011, at 10:59 PM, Talin wrote:> So far I'm really liking the new type system -- I've been able to
simplify my code generator in a number of areas. And the IR is now vastly more
readable, both in the debugger (using dump()) and when printing modules via
llvm-dis. It's a tremendous improvement.
Great! It was long overdue. Someone should have done it right back in 2002.
;-)
> I do have a few comments / questions:
>
> -- I think I may be misunderstanding how named structs are supposed to be
combined in the linker. Say we have a type that is defined in two modules with
the same name, however in one of the modules the type is abstract and in the
other module it has a body. The behavior I would expect is that it would merge
the two definitions, so that now you have one type with a body. However, instead
what I am getting is a lot of renamed types - %tart.reflect.NameTable.3562 and
so on. This is puzzling, as I shouldn't have any renamed types in my modules
at all.
As I responded on the other thread, name preservation is best-effort but not
guaranteed.

Consider if you linked these two modules:

%a = type { i32 }
%G1 = internal global %a ...
...and...
%a = type { float }
%G2 = internal global %a ...

G1 and G2 are just "static" globals with no relation and no linkage to
each other. When the linker produces a result file, it needs both versions of
"%a", so one *must* be renamed. There are also issues when modules
have conflicting definitions and there *is* linkage.

Beyond this inherent issue, the place that type uniquing happens is at the LLVM
Context level. This is the place that holds "the one true i32" and
thus "the one true i32*" etc. Because this is where uniquing happens,
this is now also where named struct uniquing happens. This means that you
can't have two different types named the same thing in the same context.

Linking bitcode necessarily requires loading multiple modules into the same
context, so when the second module is loaded (but before linking happens) any
conflicts in the second module are auto-renamed. The linker then tries to (best
effort) rewrite the second modules types in terms of the first module's
types where possible.

> -- I notice that BitReader now catches some errors that are missed by the
module verifier. (I submitted a bug report on this). Basically, you can create
an abstract type and have a GEP instruction that uses that type - and it will
pass through the module verifier and the bitcode writer, but the bitcode reader
will assert when it tries to load it in. Yes, I can now create modules that
cause llvm-dis to abort :)
Cool, I'll take a look at the PR when I get some cycles. Thanks for the
advise on building strings!

-Chris

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Jul 2011 - [LLVMdev] New Type System Questions

[LLVMdev] New Type System Questions

[LLVMdev] New Type System Questions

[LLVMdev] New Type System Questions

[LLVMdev] New Type System Questions

Apparently Analagous Threads