thr3ads.net - llvm dev - [LLVMdev] C embedded extensions and LLVM [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Christopher Lamb

2007-Nov-11 07:07 UTC

[LLVMdev] C embedded extensions and LLVM

I've been playing around with clang/LLVM looking at adding partial  
support for the draft technical report for embedded C extensions  
(TR18037, http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf),  
specifically named address spaces.

Named address spaces need to be tracked in LLVM in essentially all  
the same places that alignment is tracked, which necessitates adding  
the information to the .bc format. Given the Apple has shipped .bc  
files I'm guessing that backwards compatibility is very important.  
Given this and the work I see happening on using the newish serialize/ 
deserialize infrastructure what is the pattern for extending the .bc  
format in a backwards compatible way? Is it safe to add records to  
the writer for an instruction and predicate parts of the reader based  
on the number of records present so that old .bc files with fewer  
records for that entry should still be able to be read?

--
Christopher Lamb



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20071110/5e097fba/attachment.html>

Gordon Henriksen

2007-Nov-11 14:55 UTC

head link

[LLVMdev] C embedded extensions and LLVM

On Nov 11, 2007, at 02:07, Christopher Lamb wrote:
> I've been playing around with clang/LLVM looking at adding partial  
> support for the draft technical report for embedded C extensions  
> (TR18037, http://www.open-std.org/jtc1/sc22/wg14/www/docs/ 
> n1169.pdf), specifically named address spaces.
>
> Named address spaces need to be tracked in LLVM in essentially all  
> the same places that alignment is tracked, which necessitates adding  
> the information to the .bc format. Given the Apple has shipped .bc  
> files I'm guessing that backwards compatibility is very important.  
> Given this and the work I see happening on using the newish  
> serialize/deserialize infrastructure what is the pattern for  
> extending the .bc format in a backwards compatible way? Is it safe  
> to add records to the writer for an instruction and predicate parts  
> of the reader based on the number of records present so that old .bc  
> files with fewer records for that entry should still be able to be  
> read?

It's easy enough to extend a bitcode record in a compatible manner.

   • The writer should place new fields only at the end of a record.  
Earlier readers will ignore them.
   • If a record comes up short, a backwards-compatible default should  
be selected by the reader.

This provides backwards and forwards compatibility, which is great,  
and surprisingly simple to accomplish.

Sounds like you're adding fields to load and store nodes. To reduce  
the cost for programs that do not use memory spaces, you might try to  
optimize the representation by taking advantage of the default value  
when encoding the record.

— Gordon

Ted Kremenek

2007-Nov-11 16:13 UTC

head link

[LLVMdev] C embedded extensions and LLVM

On Nov 10, 2007, at 11:07 PM, Christopher Lamb wrote:
>  Given this and the work I see happening on using the newish  
> serialize/deserialize infrastructure what is the pattern for  
> extending the .bc format in a backwards compatible way?
FYI, there are no current plans to replace the implementation of the  
LLVM bitcode reader/writer with something that uses the new serialize/ 
deserialize infrastructure.  It is possible, however, that it could be  
used as a convenient tool to add new kinds of records to the bitcode.

The serialize/deserialize infrastructure is intended to be another API  
that sits just above the Bitstream reader/writer (which the LLVM  
bitcode reader/writer is built on), and its role is to serialize  
arbitrary objects using logic provided via C++ trait classes.  The  
serializer keeps track of pointers and references (allowing objects  
with multiple pointers to them to be safely and transparently  
serialized, or even cyclic data structures).  The serializer also  
allows almost complete transparency of the underlying bitstream format  
(including the notion of blocks and records), although the goal is to  
provide an interface to those details should the client need it (this  
is gradually taking form).

The serializer's big role right now is to support serialization of  
data structures in the new C frontend.  This includes ASTs, and all  
the supporting meta data needed to serialize out a C program and read  
it back in.  We currently have made a good deal of progress on this  
project.  Thus initially, our goals with the serializer don't have to  
contend with problems of backwards compatibility with an existing  
application.  Our goal is to first get serialization "right" for  
clang, but at the same time it isn't being engineered as an API that  
will only be useful in the new frontend.

Chris Lattner

2007-Nov-11 17:52 UTC

head link

[LLVMdev] C embedded extensions and LLVM

On Nov 10, 2007, at 11:07 PM, Christopher Lamb wrote:
> I've been playing around with clang/LLVM looking at adding partial  
> support for the draft technical report for embedded C extensions  
> (TR18037, http://www.open-std.org/jtc1/sc22/wg14/www/docs/ 
> n1169.pdf), specifically named address spaces.
>
> Named address spaces need to be tracked in LLVM in essentially all  
> the same places that alignment is tracked,
Others addressed the other questions, one (surprising?) thing I'd  
recommend:

Unlike alignment and volatility, I think that the address space  
qualifier should be represented explicitly in the type system.  The  
reason for this is primarily that pointers to different address spaces  
are really very different sorts of beasties: for example they can be  
codegen'd to have different sizes.  Any property that affects how the  
value is stored in registers needs to be in the type instead of on the  
load/store instruction.  Also, unlike volatile, it is not common to  
cast a pointer between two different address spaces.

The good thing about this is that I think it will make it  
substantially easier to update the various llvm optimizations if you  
do this.  The meat of project boils down to adding a new address space  
qualifier field to PointerType, making sure PointerType takes this  
field into account when it is being uniqued, and adding the address  
space qualifier to things like global variable.

Does this sound reasonable?

-Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20071111/acf2aefa/attachment.html>

Christopher Lamb

2007-Nov-11 19:18 UTC

head link

[LLVMdev] C embedded extensions and LLVM

On Nov 11, 2007, at 9:52 AM, Chris Lattner wrote:
>
> On Nov 10, 2007, at 11:07 PM, Christopher Lamb wrote:
>
>> I've been playing around with clang/LLVM looking at adding partial
>> support for the draft technical report for embedded C extensions  
>> (TR18037, http://www.open-std.org/jtc1/sc22/wg14/www/docs/ 
>> n1169.pdf), specifically named address spaces.
>>
>> Named address spaces need to be tracked in LLVM in essentially all  
>> the same places that alignment is tracked,
>
> Others addressed the other questions, one (surprising?) thing I'd  
> recommend:
>
> Unlike alignment and volatility, I think that the address space  
> qualifier should be represented explicitly in the type system.  The  
> reason for this is primarily that pointers to different address  
> spaces are really very different sorts of beasties: for example  
> they can be codegen'd to have different sizes.
Very true.
> Any property that affects how the value is stored in registers  
> needs to be in the type instead of on the load/store instruction.   
> Also, unlike volatile, it is not common to cast a pointer between  
> two different address spaces.
Though perhaps infrequent, casting between address spaces is allowed  
based on rules that the target defines indicating which address  
spaces are subsets of others. Does supporting those casts require an  
explicit operation (ie intrinsic)?
> The good thing about this is that I think it will make it  
> substantially easier to update the various llvm optimizations if  
> you do this.
Bonus!
> The meat of project boils down to adding a new address space  
> qualifier field to PointerType, making sure PointerType takes this  
> field into account when it is being uniqued, and adding the address  
> space qualifier to things like global variable.
>
> Does this sound reasonable?
That sounds like it should be easier than adding the address space ID  
to all the instructions and SDNodes.

I'll give it a try and see what happens. I can see that adding it to  
the type system makes it easier on the optimizer, but I don't yet  
understand all the consequences for the code generator.

--
Christopher Lamb



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20071111/652f9a0b/attachment.html>

Christopher Lamb

2007-Nov-21 05:47 UTC

head link

[LLVMdev] C embedded extensions and LLVM

On Nov 11, 2007, at 9:52 AM, Chris Lattner wrote:
> Unlike alignment and volatility, I think that the address space  
> qualifier should be represented explicitly in the type system.  The  
> reason for this is primarily that pointers to different address  
> spaces are really very different sorts of beasties: for example  
> they can be codegen'd to have different sizes.  Any property that  
> affects how the value is stored in registers needs to be in the  
> type instead of on the load/store instruction.  Also, unlike  
> volatile, it is not common to cast a pointer between two different  
> address spaces.
>
> The good thing about this is that I think it will make it  
> substantially easier to update the various llvm optimizations if  
> you do this.  The meat of project boils down to adding a new  
> address space qualifier field to PointerType, making sure  
> PointerType takes this field into account when it is being uniqued,  
> and adding the address space qualifier to things like global variable.
Any suggestions on type syntax in .ll files for address spaced pointers?
I was thinking postfix of the type name, but I'm up in the air about  
what a good separator would be. Simply whitespace?

i32$27*

i32 27*

i32(27)*

i32{27}*

i32 at 27*

--
Christopher Lamb



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20071120/c34a5cc4/attachment.html>

Christopher Lamb

2007-Nov-21 23:25 UTC

head link

[LLVMdev] C embedded extensions and LLVM

On Nov 11, 2007, at 9:52 AM, Chris Lattner wrote:
>
> On Nov 10, 2007, at 11:07 PM, Christopher Lamb wrote:
>
>> I've been playing around with clang/LLVM looking at adding partial
>> support for the draft technical report for embedded C extensions  
>> (TR18037, http://www.open-std.org/jtc1/sc22/wg14/www/docs/ 
>> n1169.pdf), specifically named address spaces.
>>
>> Named address spaces need to be tracked in LLVM in essentially all  
>> the same places that alignment is tracked,
>
> Others addressed the other questions, one (surprising?) thing I'd  
> recommend:
>
> Unlike alignment and volatility, I think that the address space  
> qualifier should be represented explicitly in the type system.  The  
> reason for this is primarily that pointers to different address  
> spaces are really very different sorts of beasties: for example  
> they can be codegen'd to have different sizes.  Any property that  
> affects how the value is stored in registers needs to be in the  
> type instead of on the load/store instruction.  Also, unlike  
> volatile, it is not common to cast a pointer between two different  
> address spaces.
>
> The good thing about this is that I think it will make it  
> substantially easier to update the various llvm optimizations if  
> you do this.  The meat of project boils down to adding a new  
> address space qualifier field to PointerType, making sure  
> PointerType takes this field into account when it is being uniqued,  
> and adding the address space qualifier to things like global variable.
>
> Does this sound reasonable?
I've come across a hitch. Store instructions do not reference the  
pointer type in the .bc format, only the stored type. The .bc reader  
constructs the pointer type from the stored value's type. This means  
that the address space information doesn't come along for the ride.

I see three solutions:

1) Change how stores are written/read in .bc to store the pointer  
type rather than the stored type. This is the most straight forward,  
but I think it also breaks .bc compatibility in a way that's  
impossible to work around. There's no way to differentiate the new  
and old forms.

2) Have an extended record form of stores that carries the address  
space information for the pointer type which then gets restored by  
the reader. This preserves backwards compatibility, but is kind of ugly.

3) Store address space information on all types (not just pointers),  
but it only really affects how pointers are handled. This ensures  
that address spaces go wherever the type goes. This is pretty  
invasive, and I'd like to avoid that overhead if at all possible.

My suggestion would be 2 for now with an intention to change to 1 in  
LLVM 3.0.

--
Christopher Lamb



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20071121/bd553af1/attachment.html>

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Nov 2007 - [LLVMdev] C embedded extensions and LLVM

[LLVMdev] C embedded extensions and LLVM

[LLVMdev] C embedded extensions and LLVM

[LLVMdev] C embedded extensions and LLVM

[LLVMdev] C embedded extensions and LLVM

[LLVMdev] C embedded extensions and LLVM

[LLVMdev] C embedded extensions and LLVM

[LLVMdev] C embedded extensions and LLVM

Reasonably Related Threads