So far I'm really liking the new type system -- I've been able to simplify my code generator in a number of areas. And the IR is now vastly more readable, both in the debugger (using dump()) and when printing modules via llvm-dis. It's a tremendous improvement. I do have a few comments / questions: -- I think I may be misunderstanding how named structs are supposed to be combined in the linker. Say we have a type that is defined in two modules with the same name, however in one of the modules the type is abstract and in the other module it has a body. The behavior I would expect is that it would merge the two definitions, so that now you have one type with a body. However, instead what I am getting is a lot of renamed types - %tart.reflect.NameTable.3562 and so on. This is puzzling, as I shouldn't have any renamed types in my modules at all. -- I notice that BitReader now catches some errors that are missed by the module verifier. (I submitted a bug report on this). Basically, you can create an abstract type and have a GEP instruction that uses that type - and it will pass through the module verifier and the bitcode writer, but the bitcode reader will assert when it tries to load it in. Yes, I can now create modules that cause llvm-dis to abort :) -- Self-referential vs. anonymous types. This is more of a comment than a question: in my language, String literals are implemented as anonymous types because the string data follows header struct in memory. So basically there's a named type with the format: tart.core.String = { %ObjectHeader, %tart.core.String*, int32, [0 x char] } And then for a string literal of length N there's an anonymous type: { %ObjectHeader, %tart.core.String*, int32, [N x char] } It's anonymous because it doesn't make sense to generate a new named type for each different length of string. Now, the reason for the %tart.core.String* field in the middle there is to support substring references - substrings point to the orignal string, whereas non-substrings point to themselves (I've left out a few fields for purposes of this example.) So to make the string literal we need to have create a Constant, whose type is an anonymous struct, which has a pointer to itself embedded within it. Turns out that you can do this with UndefValue, as long as when you refine the undef, you pointer-cast the anon struct to the named struct. I only mention this because it took me a while to figure out, and it's the kind of recipe that you might want to consider mentioning in the programmers manual, along with the recipe for creating self-referential named structs. -- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110724/3adcb0f0/attachment.html>
Hi Talin, On Jul 25, 2011, at 1:59, Talin wrote:> So far I'm really liking the new type system -- I've been able to simplify my code generator in a number of areas. And the IR is now vastly more readable, both in the debugger (using dump()) and when printing modules via llvm-dis. It's a tremendous improvement. > > I do have a few comments / questions: > > -- I think I may be misunderstanding how named structs are supposed to be combined in the linker. Say we have a type that is defined in two modules with the same name, however in one of the modules the type is abstract and in the other module it has a body. The behavior I would expect is that it would merge the two definitions, so that now you have one type with a body. However, instead what I am getting is a lot of renamed types - %tart.reflect.NameTable.3562 and so on. This is puzzling, as I shouldn't have any renamed types in my modules at all. >>From an implementation perspective at least:The named struct types are stored in the context. As soon as you create a new StructType with the same name it will instead name your new "StructType" to new name as above. I believe you instead need to pull the "old" named struct type from the context, and then set the body or just reference it. Since your modules are sharing the the same context, your issue is manifesting itself. Garrison [snip]> -- > -- Talin > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Mon, Jul 25, 2011 at 6:35 AM, Garrison Venn <gvenn.cfe.dev at gmail.com>wrote:> Hi Talin, > > On Jul 25, 2011, at 1:59, Talin wrote: > > > So far I'm really liking the new type system -- I've been able to > simplify my code generator in a number of areas. And the IR is now vastly > more readable, both in the debugger (using dump()) and when printing modules > via llvm-dis. It's a tremendous improvement. > > > > I do have a few comments / questions: > > > > -- I think I may be misunderstanding how named structs are supposed to be > combined in the linker. Say we have a type that is defined in two modules > with the same name, however in one of the modules the type is abstract and > in the other module it has a body. The behavior I would expect is that it > would merge the two definitions, so that now you have one type with a body. > However, instead what I am getting is a lot of renamed types - > %tart.reflect.NameTable.3562 and so on. This is puzzling, as I shouldn't > have any renamed types in my modules at all. > > > > From an implementation perspective at least: > > The named struct types are stored in the context. As soon as you create a > new > StructType with the same name it will instead name your new "StructType" to > new name > as above. I believe you instead need to pull the "old" named struct type > from the > context, and then set the body or just reference it. Since your modules are > sharing > the the same context, your issue is manifesting itself. > > I guess I wasn't clear. I'm talking about two bitcode files which I thenfeed into llvm-ld. None of my code is involved at this point. I should mention that I'm seeing these renamed types in the debugger - because of the other issue I mentioned (abort in BitReader) I haven't actually seen what the merged output looks like. Garrison> > [snip] > > > -- > > -- Talin > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110725/7022b7e5/attachment.html>
On Jul 24, 2011, at 10:59 PM, Talin wrote:> So far I'm really liking the new type system -- I've been able to simplify my code generator in a number of areas. And the IR is now vastly more readable, both in the debugger (using dump()) and when printing modules via llvm-dis. It's a tremendous improvement.Great! It was long overdue. Someone should have done it right back in 2002. ;-)> I do have a few comments / questions: > > -- I think I may be misunderstanding how named structs are supposed to be combined in the linker. Say we have a type that is defined in two modules with the same name, however in one of the modules the type is abstract and in the other module it has a body. The behavior I would expect is that it would merge the two definitions, so that now you have one type with a body. However, instead what I am getting is a lot of renamed types - %tart.reflect.NameTable.3562 and so on. This is puzzling, as I shouldn't have any renamed types in my modules at all.As I responded on the other thread, name preservation is best-effort but not guaranteed. Consider if you linked these two modules: %a = type { i32 } %G1 = internal global %a ... ...and... %a = type { float } %G2 = internal global %a ... G1 and G2 are just "static" globals with no relation and no linkage to each other. When the linker produces a result file, it needs both versions of "%a", so one *must* be renamed. There are also issues when modules have conflicting definitions and there *is* linkage. Beyond this inherent issue, the place that type uniquing happens is at the LLVM Context level. This is the place that holds "the one true i32" and thus "the one true i32*" etc. Because this is where uniquing happens, this is now also where named struct uniquing happens. This means that you can't have two different types named the same thing in the same context. Linking bitcode necessarily requires loading multiple modules into the same context, so when the second module is loaded (but before linking happens) any conflicts in the second module are auto-renamed. The linker then tries to (best effort) rewrite the second modules types in terms of the first module's types where possible.> -- I notice that BitReader now catches some errors that are missed by the module verifier. (I submitted a bug report on this). Basically, you can create an abstract type and have a GEP instruction that uses that type - and it will pass through the module verifier and the bitcode writer, but the bitcode reader will assert when it tries to load it in. Yes, I can now create modules that cause llvm-dis to abort :)Cool, I'll take a look at the PR when I get some cycles. Thanks for the advise on building strings! -Chris