Several people have been proding me to write up my thoughts on how to fix the IR type system for LLVM 3.0. Here are some (fairly stream of conscious) thoughts on the matter: http://nondot.org/sabre/LLVMNotes/TypeSystemRewrite.txt Comments welcome! -Chris
On 26 February 2011 21:25, Chris Lattner <clattner at apple.com> wrote:> Several people have been proding me to write up my thoughts on how to fix the IR type system for LLVM 3.0. Here are some (fairly stream of conscious) thoughts on the matter: > http://nondot.org/sabre/LLVMNotes/TypeSystemRewrite.txtHi Chris, I don't see such a bad problem with PATypeHolder, but I get what you say about the type refinement. In the end, the logic is the same. You hold the type in a temporary state, build it as you go and mark it as "complete" when you finish. If you use type holders or the types themselves, it doesn't matter, as long as there is a way to mark them as complete. In the end, you still need to reject incomplete types on validation. For IR readability, I welcome types being handled by name. This has been a bit of a nuisance, but again, nothing terrible. Since you're refactoring types, how about a quick detour to see if it's possible to include unions and bitfields in the IR as well? I'm not saying to write a full-blown spec, but would be good at least to consider, so when someone finally have time to implement them, it's not impossible... I think there are plenty of people in this list, from different backgrounds and languages, that would welcome it. I can send the first round of proposals, but you guys will have to help me with the back-end implementation, as it's not my strong side. cheers, --renato
Hey Chris, On Sat, Feb 26, 2011 at 1:25 PM, Chris Lattner <clattner at apple.com> wrote:> Several people have been proding me to write up my thoughts on how to fix the IR type system for LLVM 3.0. Here are some (fairly stream of conscious) thoughts on the matter: > http://nondot.org/sabre/LLVMNotes/TypeSystemRewrite.txtAwesome! This is definitely long overdue... Proposal seems very reasonable to me, I think the linker logic will become much nicer overall by making things more explicit. Strictly speaking, does this need to be a 3.0 feature? I haven't thought about it, and don't know that part of the code particularly well, but is the idea it would be too hard to auto-upgrade to this? - Daniel> > Comments welcome! > > -Chris > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On Sat, Feb 26, 2011 at 10:25 PM, Chris Lattner <clattner at apple.com> wrote:> Several people have been proding me to write up my thoughts on how to fix the IR type system for LLVM 3.0. Here are some (fairly stream of conscious) thoughts on the matter: > http://nondot.org/sabre/LLVMNotes/TypeSystemRewrite.txt > > Comments welcome!So struct types would unique by name. How important would it be for the linker to preserve those names? Because I can think of a few examples where it could be problematic. For instance, if you want to link %Foo = type { i32 } %Bar = type { i32 } %myFoo = global %Foo declare void @useBar(%Bar* %arg) ; ... with %Baz = type { i32 } %myFoo = global %Baz declare void @useBar(%Baz* %par) ; ... then AFAICS the linker would either have to insert some casts (ugly) or it would need to somehow recognize that %Baz, %Foo *and* %Bar need to be merged into one type. Inserting casts would mean that a way to cast between different struct types is needed (for instance, bitcast could be extended to allow casting between struct types). Which raises the question of what casts to allow: a) between struct types that are structurally identical (which leads us back to graph isomorphism, though with only two types at a time to worry about), or b) between struct types with the same size (which might not be knowable without target data), or c) some other criterium I haven't thought of yet. Merging pre-existing struct types (i.e. types in the source module) would likely be problematic as well unless something like a "use list" of types is still maintained. In other words, you still wouldn't be able to get rid of PATypeHolder and friends. Also, this wouldn't preserve the names of some of the struct types, which you mentioned as a design goal. Is there some other solution for this I'm not seeing, or is one of the ones I mentioned less problematic than it appears at first glance?
On Feb 26, 2011, at 2:40 PM, Daniel Dunbar wrote:> Hey Chris, > > On Sat, Feb 26, 2011 at 1:25 PM, Chris Lattner <clattner at apple.com> wrote: >> Several people have been proding me to write up my thoughts on how to fix the IR type system for LLVM 3.0. Here are some (fairly stream of conscious) thoughts on the matter: >> http://nondot.org/sabre/LLVMNotes/TypeSystemRewrite.txt > > Awesome! This is definitely long overdue... > > Proposal seems very reasonable to me, I think the linker logic will > become much nicer overall by making things more explicit. > > Strictly speaking, does this need to be a 3.0 feature? I haven't > thought about it, and don't know that part of the code particularly > well, but is the idea it would be too hard to auto-upgrade to this?No, I expect it to be quite compatible with existing .ll files (at least those generated by the asmwriter, it's not an accident that it does certain things certain wayts). However, implementing it in a rush before the 2.9 branch seems unwise. It will also remove some IR features (like \1*) which seems unfriendly in a 2.x. -Chris
On Feb 26, 2011, at 2:35 PM, Renato Golin wrote:> On 26 February 2011 21:25, Chris Lattner <clattner at apple.com> wrote: >> Several people have been proding me to write up my thoughts on how to fix the IR type system for LLVM 3.0. Here are some (fairly stream of conscious) thoughts on the matter: >> http://nondot.org/sabre/LLVMNotes/TypeSystemRewrite.txt > > Hi Chris, > > I don't see such a bad problem with PATypeHolder, but I get what you > say about the type refinement. > > In the end, the logic is the same. You hold the type in a temporary > state, build it as you go and mark it as "complete" when you finish. > If you use type holders or the types themselves, it doesn't matter, as > long as there is a way to mark them as complete. In the end, you still > need to reject incomplete types on validation. > > For IR readability, I welcome types being handled by name. This has > been a bit of a nuisance, but again, nothing terrible.Right, I don't see either as a showstopper. The top things for me personally are getType() being slow, and type refinement being slow.> Since you're refactoring types, how about a quick detour to see if > it's possible to include unions and bitfields in the IR as well? I'm > not saying to write a full-blown spec, but would be good at least to > consider, so when someone finally have time to implement them, it's > not impossible... I think there are plenty of people in this list, > from different backgrounds and languages, that would welcome it.These are quite orthogonal from the changes proposed. I think that unions are well understood (we just need someone who cares enough about non-C languages to make it happen). I don't think that bitfields should be in IR. -Chris
On Feb 26, 2011, at 2:53 PM, Frits van Bommel wrote:> On Sat, Feb 26, 2011 at 10:25 PM, Chris Lattner <clattner at apple.com> wrote: >> Several people have been proding me to write up my thoughts on how to fix the IR type system for LLVM 3.0. Here are some (fairly stream of conscious) thoughts on the matter: >> http://nondot.org/sabre/LLVMNotes/TypeSystemRewrite.txt >> >> Comments welcome! > > So struct types would unique by name. How important would it be for > the linker to preserve those names?Not at all. The linker will do a best approximation, but it will autorename on unresolvable conflicts, just like we do for internal functions when linking.> Because I can think of a few > examples where it could be problematic. For instance, if you want to > link > > %Foo = type { i32 } > %Bar = type { i32 } > > %myFoo = global %Foo > declare void @useBar(%Bar* %arg) > ; ... > > with > > %Baz = type { i32 } > > %myFoo = global %Baz > declare void @useBar(%Baz* %par) > ; ... > > then AFAICS the linker would either have to insert some casts (ugly) > or it would need to somehow recognize that %Baz, %Foo *and* %Bar need > to be merged into one type.In the system I'm envisioning, you'd get a bitcast. The same thing happens today when linking modules that contain things like: a.ll: declare void @foo(i32) ... uses of @foo ... b.ll: define void @foo(i8*) ...> Inserting casts would mean that a way to cast between different struct > types is needed (for instance, bitcast could be extended to allow > casting between struct types).We already have this logic. The function itself is cast and instcombine attempts to resolve conflicts where reasonable. There is no IR extension needed for this. -Chris
+1 On Sat, Feb 26, 2011 at 1:25 PM, Chris Lattner <clattner at apple.com> wrote:> Several people have been proding me to write up my thoughts on how to fix > the IR type system for LLVM 3.0. Here are some (fairly stream of > conscious) thoughts on the matter: > http://nondot.org/sabre/LLVMNotes/TypeSystemRewrite.txt > > Comments welcome! > > -Chris > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110226/baacdaeb/attachment.html>
With the named/unnamed/anonymous distinctions there seems to be two things going on: 1. Giving an ID to a struct type. 2. Naming a type for readability. Would the new system force anonymous structs to be printed inline at every occurrence? The code might become hard to read with large anonymous types. Would it make sense to have both ID and Name? If a type has a name it could be printed using that. Perhaps there could be an extension to indicate that a type has an ID (something like "%'foo" vs. "%foo"). When printing with -strip any type with an ID (maybe a number) would be printed using the ID and a type without ID would have to be printed inline. My opinion about the syntax is that if there is a semantic difference between two types it should be very clear in the syntax (maybe a "'" isn't clear enough). In my mind, naming things in LLVM is used more for convenience or readability, rather than specifying semantics. - Jan ----- Original Message ----> From: Chris Lattner <clattner at apple.com> > To: llvmdev Mailing List <llvmdev at cs.uiuc.edu> > Sent: Sat, February 26, 2011 4:25:32 PM > Subject: [LLVMdev] LLVM IR Type System Rewrite > > Several people have been proding me to write up my thoughts on how to fix the >IR type system for LLVM 3.0. Here are some (fairly stream of conscious) >thoughts on the matter: > http://nondot.org/sabre/LLVMNotes/TypeSystemRewrite.txt > > Comments welcome! > > -Chris > _______________________________________________ > LLVM M Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On 11-02-26 4:25 PM, Chris Lattner wrote:> Several people have been proding me to write up my thoughts on how to fix the IR type system for LLVM 3.0. Here are some (fairly stream of conscious) thoughts on the matter: > http://nondot.org/sabre/LLVMNotes/TypeSystemRewrite.txtThanks!> Comments welcome!Having fewer types makes the job of the passes easier. Right now we maintain a small number of types as we go, which is expensive. In the proposed change, would it still be legal to write a type merging pass? For example, lets say we have %Foo = type { i32 } %Bar = type { i32 } @a = unnamed_addr constant %Foo { i32 42 } @a = unnamed_addr constant %Bar { i32 42 } The typemerge pass could turn this into @a = unnamed_addr constant {i32} { i32 42 } @b = unnamed_addr constant {i32} { i32 42 } And constmerge will then merge them as it does today. Another comment is that getting anonymous types would still be a bit expensive. Can we get away with just Named types (including Unnamed) and trusting typemerge to do its job?> -ChrisCheers, Rafael
On Feb 27, 2011, at 2:42 AM, Jan Sjodin wrote:> With the named/unnamed/anonymous distinctions there seems to be two things going > on: > 1. Giving an ID to a struct type. > 2. Naming a type for readability. > > Would the new system force anonymous structs to be printed inline at every > occurrence?Yes, but I don't expect them to be commonly used. Things like complex numbers and simple tuples in a frontend might want to use them, but the frontend could always choose to name them as well. I strongly considered just making all struct types have an "identity", thus eliminating anonymous types. This is effectively what we get in the .ll printer today, but it is a nuisance for simple pairs - like multi-return intrinsics. It still may not be worth the complexity to have them.> The code might become hard to read with large anonymous types. Would > it make sense to have both ID and Name? If a type has a name it could be printed > using that. Perhaps there could be an extension to indicate that a type has an > ID (something like "%'foo" vs. "%foo"). When printing with -strip any type with > an ID (maybe a number) would be printed using the ID and a type without ID would > have to be printed inline.I think the problem is solved by having frontend authors not generate types that are hard to read.> My opinion about the syntax is that if there is a semantic difference between > two types it should be very clear in the syntax (maybe a "'" isn't clear > enough). In my mind, naming things in LLVM is used more for convenience or > readability, rather than specifying semantics.Fair point, -Chris
On Feb 27, 2011, at 9:48 AM, Rafael Ávila de Espíndola wrote:>> Comments welcome! > > Having fewer types makes the job of the passes easier. Right now we > maintain a small number of types as we go, which is expensive. In the > proposed change, would it still be legal to write a type merging pass? > For example, lets say we have > > %Foo = type { i32 } > %Bar = type { i32 } > @a = unnamed_addr constant %Foo { i32 42 } > @a = unnamed_addr constant %Bar { i32 42 } > > The typemerge pass could turn this into > > @a = unnamed_addr constant {i32} { i32 42 } > @b = unnamed_addr constant {i32} { i32 42 } > > And constmerge will then merge them as it does today.I don't see this as requiring a type merging pass. A better way to handle this (IMO) would be to generalize constant merging to handle bit identities, which would allow it to merge things like: @a = unnamed constant i32 0 @b = unnamed constant float 0.0 This can be done today, and then the type system change won't affect constant merging.> Another comment is that getting anonymous types would still be a bit > expensive. Can we get away with just Named types (including Unnamed) and > trusting typemerge to do its job?Getting an anonymous type should be the same cost as getting a struct type today: just look it up in a (redesigned) uniquing table. However, as I just mentioned to Jan, it might be better to just drop the concept of anonymous types entirely. This would simplify the model and is even closer to what we have today. -Chris
Hi Chris I have questions about new IR.> Named structs. > Unnamed structs (named structs with no names) > Anonymous structs (things like complex, which cannot be cyclic) > >Given this, the rules above would be that "only non-anonymous structs can have >cycles" for example. Arrays, pointers and other non-namable types can be >considered to be anonymous as well. > >When printing out .ll files, anonymous structs (which cannot have cycles) are >just printed inline, as in "{i32, i32}*". Named structs are printed as their >name ("%foo*"), and unnamed structs are printed with a number ("%42*"). This is >nice and compatible with existing .ll files.When initializer type of Global Variable with StructType is different from type of Global Variable, current front-end makes an temporary StructType for initializer as unnamed struct ("%42"). In this case, Is temporary StructType for initializer Unnamed sturct (named structs with no names)? And will be key(StructValType) of StructTypes(map) the number of elements and isPacked in current IR plus a name? (if so, isPacked is likely to be redundant in some types.) Thanks, Jin-Gu Kang _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
This looks like a good extension to the type system. One thing that I expected to work (when I first tried using opaque types) was for the optimization passes and verification to operate correctly when using opaque types, while the type could be made concrete later on - after optimization. This turned out to be impossible, and in combination with the lack of naming for opaque types, I decided to avoid their use entirely. In particular, could you address whether this overhaul of the type system will address the following: - The ability to create constants for named types without a defined structure - Optimization involving PHI nodes for named types without a defined structure - The ability to pass in named types to external function call arguments and return values Basically, I'd like to be able to construct valid programs involving named types (without a defined structure), optimize/verify them, and then provide a structure for the type later on. Andrew On 02/26/2011 04:25 PM, Chris Lattner wrote:> Several people have been proding me to write up my thoughts on how to fix the IR type system for LLVM 3.0. Here are some (fairly stream of conscious) thoughts on the matter: > http://nondot.org/sabre/LLVMNotes/TypeSystemRewrite.txt > > Comments welcome! > > -Chris > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Feb 28, 2011, at 9:47 AM, Andrew Clinton wrote:> This looks like a good extension to the type system. One thing that I > expected to work (when I first tried using opaque types) was for the > optimization passes and verification to operate correctly when using > opaque types, while the type could be made concrete later on - after > optimization. This turned out to be impossible, and in combination with > the lack of naming for opaque types, I decided to avoid their use > entirely. In particular, could you address whether this overhaul of the > type system will address the following: > > - The ability to create constants for named types without a defined > structureThere is no change, this won't be allowed.> - Optimization involving PHI nodes for named types without a defined > structureNot sure what you mean here, but I don't anticipate any change.> - The ability to pass in named types to external function call arguments > and return valuesThis is representable in the IR, but will blow up at codegen time, just like today.> Basically, I'd like to be able to construct valid programs involving > named types (without a defined structure), optimize/verify them, and > then provide a structure for the type later on.I don't expect this series of changes to meaningfully impact this usage scenario, sorry! -Chris
On Feb 27, 2011, at 12:59 PM, Jin Gu Kang wrote:>> Given this, the rules above would be that "only non-anonymous structs can have >> cycles" for example. Arrays, pointers and other non-namable types can be >> considered to be anonymous as well. >> >> When printing out .ll files, anonymous structs (which cannot have cycles) are >> just printed inline, as in "{i32, i32}*". Named structs are printed as their >> name ("%foo*"), and unnamed structs are printed with a number ("%42*"). This is >> nice and compatible with existing .ll files. > > When initializer type of Global Variable with StructType is different from type of > Global Variable, current front-end makes an temporary StructType for initializer > as unnamed struct ("%42"). In this case, Is temporary StructType for initializer > Unnamed sturct (named structs with no names)? > > And will be key(StructValType) of StructTypes(map) the number of elements and > isPacked in current IR plus a name? > (if so, isPacked is likely to be redundant in some types.)isPacked is still needed because it affects layout of the struct. Having a name (or not) doesn't impact layout. I don't know exactly what policy the frontend will use, but it would make sense for it to be an anonymous struct type. -Chris
Chris Lattner wrote:> Several people have been proding me to write up my thoughts on how to fix the IR type system for LLVM 3.0. Here are some (fairly stream of conscious) thoughts on the matter: > http://nondot.org/sabre/LLVMNotes/TypeSystemRewrite.txt > > Comments welcome!I like this change a lot. I'm not sure if it is feasible or generally desirable, but one thing that would have been very helpful in my use case would have been to be able to construct structure types incrementally and to be able to access slots of the un-finished type, for example: StructType *NewSTy = StructType::get("mylist", TheModule); NewSTy->addField(Type::getInt32Ty(NewSTy->getContext()); Then given a Value of type NewSTy, do a gep on slot 0.> > -Chris > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >============================================================================michaelMuller = mmuller at enduden.com | http://www.mindhog.net/~mmuller ----------------------------------------------------------------------------- Society in every state is a blessing, but government even in its best state is but a necessary evil; in its worst state an intolerable one... - Thomas Paine =============================================================================
On Sat, Feb 26, 2011 at 1:25 PM, Chris Lattner <clattner at apple.com> wrote:> Several people have been proding me to write up my thoughts on how to fix > the IR type system for LLVM 3.0. Here are some (fairly stream of > conscious) thoughts on the matter: > http://nondot.org/sabre/LLVMNotes/TypeSystemRewrite.txt > > Comments welcome! > > Chris - some comments in the other thread about garbage collection rootsmade me realize that I'd like to see the ability to associate metadata nodes with a struct type. Right now it makes no sense to add any additional traits, attributes, properties, or whatnot with a type, because the type may get merged with a structurally identical type and the attributes would thereby be lost. (Either that or the attribute has to become part of the type's signature, which gets complicated.) Under your scheme, however, as long as I can come up with a unique name for a struct, I can prevent the merging from happening, which now means that I ought to be able to hang attributes off the type and have them be preserved. So basically what I want is to add a metadata node pointer for structs. Metadata is open-ended enough to be able to represent anything we would want. Of course, this feature only holds for struct types - what about hanging metadata off of other types? Well, I think I can live with just structs, since in the worst case I can always wrap the other type in a struct. -- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110401/398f2fbb/attachment.html>