I've been playing around with clang/LLVM looking at adding partial support for the draft technical report for embedded C extensions (TR18037, http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf), specifically named address spaces. Named address spaces need to be tracked in LLVM in essentially all the same places that alignment is tracked, which necessitates adding the information to the .bc format. Given the Apple has shipped .bc files I'm guessing that backwards compatibility is very important. Given this and the work I see happening on using the newish serialize/ deserialize infrastructure what is the pattern for extending the .bc format in a backwards compatible way? Is it safe to add records to the writer for an instruction and predicate parts of the reader based on the number of records present so that old .bc files with fewer records for that entry should still be able to be read? -- Christopher Lamb -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20071110/5e097fba/attachment.html>
On Nov 11, 2007, at 02:07, Christopher Lamb wrote:> I've been playing around with clang/LLVM looking at adding partial > support for the draft technical report for embedded C extensions > (TR18037, http://www.open-std.org/jtc1/sc22/wg14/www/docs/ > n1169.pdf), specifically named address spaces. > > Named address spaces need to be tracked in LLVM in essentially all > the same places that alignment is tracked, which necessitates adding > the information to the .bc format. Given the Apple has shipped .bc > files I'm guessing that backwards compatibility is very important. > Given this and the work I see happening on using the newish > serialize/deserialize infrastructure what is the pattern for > extending the .bc format in a backwards compatible way? Is it safe > to add records to the writer for an instruction and predicate parts > of the reader based on the number of records present so that old .bc > files with fewer records for that entry should still be able to be > read?It's easy enough to extend a bitcode record in a compatible manner. • The writer should place new fields only at the end of a record. Earlier readers will ignore them. • If a record comes up short, a backwards-compatible default should be selected by the reader. This provides backwards and forwards compatibility, which is great, and surprisingly simple to accomplish. Sounds like you're adding fields to load and store nodes. To reduce the cost for programs that do not use memory spaces, you might try to optimize the representation by taking advantage of the default value when encoding the record. — Gordon
On Nov 10, 2007, at 11:07 PM, Christopher Lamb wrote:> Given this and the work I see happening on using the newish > serialize/deserialize infrastructure what is the pattern for > extending the .bc format in a backwards compatible way?FYI, there are no current plans to replace the implementation of the LLVM bitcode reader/writer with something that uses the new serialize/ deserialize infrastructure. It is possible, however, that it could be used as a convenient tool to add new kinds of records to the bitcode. The serialize/deserialize infrastructure is intended to be another API that sits just above the Bitstream reader/writer (which the LLVM bitcode reader/writer is built on), and its role is to serialize arbitrary objects using logic provided via C++ trait classes. The serializer keeps track of pointers and references (allowing objects with multiple pointers to them to be safely and transparently serialized, or even cyclic data structures). The serializer also allows almost complete transparency of the underlying bitstream format (including the notion of blocks and records), although the goal is to provide an interface to those details should the client need it (this is gradually taking form). The serializer's big role right now is to support serialization of data structures in the new C frontend. This includes ASTs, and all the supporting meta data needed to serialize out a C program and read it back in. We currently have made a good deal of progress on this project. Thus initially, our goals with the serializer don't have to contend with problems of backwards compatibility with an existing application. Our goal is to first get serialization "right" for clang, but at the same time it isn't being engineered as an API that will only be useful in the new frontend.
On Nov 10, 2007, at 11:07 PM, Christopher Lamb wrote:> I've been playing around with clang/LLVM looking at adding partial > support for the draft technical report for embedded C extensions > (TR18037, http://www.open-std.org/jtc1/sc22/wg14/www/docs/ > n1169.pdf), specifically named address spaces. > > Named address spaces need to be tracked in LLVM in essentially all > the same places that alignment is tracked,Others addressed the other questions, one (surprising?) thing I'd recommend: Unlike alignment and volatility, I think that the address space qualifier should be represented explicitly in the type system. The reason for this is primarily that pointers to different address spaces are really very different sorts of beasties: for example they can be codegen'd to have different sizes. Any property that affects how the value is stored in registers needs to be in the type instead of on the load/store instruction. Also, unlike volatile, it is not common to cast a pointer between two different address spaces. The good thing about this is that I think it will make it substantially easier to update the various llvm optimizations if you do this. The meat of project boils down to adding a new address space qualifier field to PointerType, making sure PointerType takes this field into account when it is being uniqued, and adding the address space qualifier to things like global variable. Does this sound reasonable? -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20071111/acf2aefa/attachment.html>
On Nov 11, 2007, at 9:52 AM, Chris Lattner wrote:> > On Nov 10, 2007, at 11:07 PM, Christopher Lamb wrote: > >> I've been playing around with clang/LLVM looking at adding partial >> support for the draft technical report for embedded C extensions >> (TR18037, http://www.open-std.org/jtc1/sc22/wg14/www/docs/ >> n1169.pdf), specifically named address spaces. >> >> Named address spaces need to be tracked in LLVM in essentially all >> the same places that alignment is tracked, > > Others addressed the other questions, one (surprising?) thing I'd > recommend: > > Unlike alignment and volatility, I think that the address space > qualifier should be represented explicitly in the type system. The > reason for this is primarily that pointers to different address > spaces are really very different sorts of beasties: for example > they can be codegen'd to have different sizes.Very true.> Any property that affects how the value is stored in registers > needs to be in the type instead of on the load/store instruction. > Also, unlike volatile, it is not common to cast a pointer between > two different address spaces.Though perhaps infrequent, casting between address spaces is allowed based on rules that the target defines indicating which address spaces are subsets of others. Does supporting those casts require an explicit operation (ie intrinsic)?> The good thing about this is that I think it will make it > substantially easier to update the various llvm optimizations if > you do this.Bonus!> The meat of project boils down to adding a new address space > qualifier field to PointerType, making sure PointerType takes this > field into account when it is being uniqued, and adding the address > space qualifier to things like global variable. > > Does this sound reasonable?That sounds like it should be easier than adding the address space ID to all the instructions and SDNodes. I'll give it a try and see what happens. I can see that adding it to the type system makes it easier on the optimizer, but I don't yet understand all the consequences for the code generator. -- Christopher Lamb -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20071111/652f9a0b/attachment.html>
On Nov 11, 2007, at 9:52 AM, Chris Lattner wrote:> Unlike alignment and volatility, I think that the address space > qualifier should be represented explicitly in the type system. The > reason for this is primarily that pointers to different address > spaces are really very different sorts of beasties: for example > they can be codegen'd to have different sizes. Any property that > affects how the value is stored in registers needs to be in the > type instead of on the load/store instruction. Also, unlike > volatile, it is not common to cast a pointer between two different > address spaces. > > The good thing about this is that I think it will make it > substantially easier to update the various llvm optimizations if > you do this. The meat of project boils down to adding a new > address space qualifier field to PointerType, making sure > PointerType takes this field into account when it is being uniqued, > and adding the address space qualifier to things like global variable.Any suggestions on type syntax in .ll files for address spaced pointers? I was thinking postfix of the type name, but I'm up in the air about what a good separator would be. Simply whitespace? i32$27* i32 27* i32(27)* i32{27}* i32 at 27* -- Christopher Lamb -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20071120/c34a5cc4/attachment.html>
On Nov 11, 2007, at 9:52 AM, Chris Lattner wrote:> > On Nov 10, 2007, at 11:07 PM, Christopher Lamb wrote: > >> I've been playing around with clang/LLVM looking at adding partial >> support for the draft technical report for embedded C extensions >> (TR18037, http://www.open-std.org/jtc1/sc22/wg14/www/docs/ >> n1169.pdf), specifically named address spaces. >> >> Named address spaces need to be tracked in LLVM in essentially all >> the same places that alignment is tracked, > > Others addressed the other questions, one (surprising?) thing I'd > recommend: > > Unlike alignment and volatility, I think that the address space > qualifier should be represented explicitly in the type system. The > reason for this is primarily that pointers to different address > spaces are really very different sorts of beasties: for example > they can be codegen'd to have different sizes. Any property that > affects how the value is stored in registers needs to be in the > type instead of on the load/store instruction. Also, unlike > volatile, it is not common to cast a pointer between two different > address spaces. > > The good thing about this is that I think it will make it > substantially easier to update the various llvm optimizations if > you do this. The meat of project boils down to adding a new > address space qualifier field to PointerType, making sure > PointerType takes this field into account when it is being uniqued, > and adding the address space qualifier to things like global variable. > > Does this sound reasonable?I've come across a hitch. Store instructions do not reference the pointer type in the .bc format, only the stored type. The .bc reader constructs the pointer type from the stored value's type. This means that the address space information doesn't come along for the ride. I see three solutions: 1) Change how stores are written/read in .bc to store the pointer type rather than the stored type. This is the most straight forward, but I think it also breaks .bc compatibility in a way that's impossible to work around. There's no way to differentiate the new and old forms. 2) Have an extended record form of stores that carries the address space information for the pointer type which then gets restored by the reader. This preserves backwards compatibility, but is kind of ugly. 3) Store address space information on all types (not just pointers), but it only really affects how pointers are handled. This ensures that address spaces go wherever the type goes. This is pretty invasive, and I'd like to avoid that overhead if at all possible. My suggestion would be 2 for now with an intention to change to 1 in LLVM 3.0. -- Christopher Lamb -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20071121/bd553af1/attachment.html>