Hi everyone, Nick and Philip suggested something yesterday that I'd also thought about: supporting attributes on values (http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140908/234323.html). The primary motivation for this is to provide a way of attaching pointer information, such as noalias, nonnull and dereferenceable(n), to pointers generated by loads. Doing this for pointers generated by inttoptr also seems potentially useful. A problem that we currently have with C++ lambda captures from Clang (and will have with other similar constructs, like outlined OpenMP loop bodies), is that we end up packing things that would be function parameters into structures to pass to the generated function. Unfortunately, this means that, in the generated anonymous function, these values are generated by loads (not true function parameters), and so we lose our ability to assert things about them (like noalias, nonnull, etc.). How this might work: 1. Instead of only CallInst/InvokeInst having an AttributeSet member variable, all instructions would have one. Accessor functions like getAttributes would be moved from CallInst/InvokeInst to Instruction. Only 'AttributeSet::ReturnIndex' would be meaningful on most instructions. 2. For the text IR format: Like with call/invoke currently, we'd optionally parse attributes in between the instruction name ('load', etc.) and the first type parameter. So load would become: <result> = load [ret attrs] [volatile] <ty>* <pointer>[, align <alignment>] (...) allowing you to write: %val = load dereferenceable(32) i32** %ptr 3. The bitcode format extension would mirror the setup for metadata. A subblock type bitc::ATTRIBUTES_ATTACHMENT_ID would be added with a record array of [Instruction #, Attribute Set #]. This will allow attaching attribute sets to any instruction without increasing the bitcode record sizes for the instructions themselves. Obviously there is some potential functionality overlap here with metadata, and an alternate approach would be to add a metadata type that mirrors each attribute we'd like to model on values. Potential downsides to using metadata are: - It could be a lot of metadata (consider adding nonnull/dereferenceable(n) to every load of a C++ reference type) - We already have code that handles these as return attributes on call instructions; extending that to look at all instructions seems straightforward. Adding alternate code to also look at metadata also would require additional development for each attribute. One thing to keep in mind is that, like with metadata, the attributes can have control-flow dependencies, and we'd generally need to clear them when hoisting loads, etc. We already do this with metadata, at least for loads, so the places where we'd need to do this should hopefully be relatively-easy to find. What do you think? Thanks again, Hal -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
>From a very high level, I wonder if maybe we should have encoded nonnulland dereferenceable as metadata instead of attributes. To me it seems like the dividing line is what is an optimization hint vs. what is a requirement for correctness or functionality. For example, sanitize_address, byval, sret, inreg, and inalloca cannot be dropped without changing program semantics. On the other hand, noalias, nonnull, readonly, readnone, and dereferenceable can all be dropped in an -O0 build. I think the bar for LangRef changes is sufficiently high that we can suffer a little bit of implementation complexity and use the existing metadata support for values instead of attributes. We could probably add a pass that infers metadata from call site and function prototype attributes. Just my two cents, you don't need listen to me, I don't write optimizations, I just break them. ;-) On Tue, Sep 9, 2014 at 1:36 AM, Hal Finkel <hfinkel at anl.gov> wrote:> Hi everyone, > > Nick and Philip suggested something yesterday that I'd also thought about: > supporting attributes on values ( > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140908/234323.html). > The primary motivation for this is to provide a way of attaching pointer > information, such as noalias, nonnull and dereferenceable(n), to pointers > generated by loads. Doing this for pointers generated by inttoptr also > seems potentially useful. > > A problem that we currently have with C++ lambda captures from Clang (and > will have with other similar constructs, like outlined OpenMP loop bodies), > is that we end up packing things that would be function parameters into > structures to pass to the generated function. Unfortunately, this means > that, in the generated anonymous function, these values are generated by > loads (not true function parameters), and so we lose our ability to assert > things about them (like noalias, nonnull, etc.). > > How this might work: > > 1. Instead of only CallInst/InvokeInst having an AttributeSet member > variable, all instructions would have one. Accessor functions like > getAttributes would be moved from CallInst/InvokeInst to Instruction. Only > 'AttributeSet::ReturnIndex' would be meaningful on most instructions. > > 2. For the text IR format: Like with call/invoke currently, we'd > optionally parse attributes in between the instruction name ('load', etc.) > and the first type parameter. So load would become: > <result> = load [ret attrs] [volatile] <ty>* <pointer>[, align > <alignment>] (...) > allowing you to write: > %val = load dereferenceable(32) i32** %ptr > > 3. The bitcode format extension would mirror the setup for metadata. A > subblock type bitc::ATTRIBUTES_ATTACHMENT_ID would be added with a record > array of [Instruction #, Attribute Set #]. This will allow attaching > attribute sets to any instruction without increasing the bitcode record > sizes for the instructions themselves. > > Obviously there is some potential functionality overlap here with > metadata, and an alternate approach would be to add a metadata type that > mirrors each attribute we'd like to model on values. Potential downsides to > using metadata are: > - It could be a lot of metadata (consider adding > nonnull/dereferenceable(n) to every load of a C++ reference type) > - We already have code that handles these as return attributes on call > instructions; extending that to look at all instructions seems > straightforward. Adding alternate code to also look at metadata also would > require additional development for each attribute. > > One thing to keep in mind is that, like with metadata, the attributes can > have control-flow dependencies, and we'd generally need to clear them when > hoisting loads, etc. We already do this with metadata, at least for loads, > so the places where we'd need to do this should hopefully be > relatively-easy to find. > > What do you think? > > Thanks again, > Hal > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140909/9563ef7c/attachment.html>
> On Sep 9, 2014, at 1:36 AM, Hal Finkel <hfinkel at anl.gov> wrote: > > Hi everyone, > > Nick and Philip suggested something yesterday that I'd also thought about: supporting attributes on values (http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140908/234323.html). The primary motivation for this is to provide a way of attaching pointer information, such as noalias, nonnull and dereferenceable(n), to pointers generated by loads. Doing this for pointers generated by inttoptr also seems potentially useful. > > A problem that we currently have with C++ lambda captures from Clang (and will have with other similar constructs, like outlined OpenMP loop bodies), is that we end up packing things that would be function parameters into structures to pass to the generated function. Unfortunately, this means that, in the generated anonymous function, these values are generated by loads (not true function parameters), and so we lose our ability to assert things about them (like noalias, nonnull, etc.). > > How this might work: > > 1. Instead of only CallInst/InvokeInst having an AttributeSet member variable, all instructions would have one. Accessor functions like getAttributes would be moved from CallInst/InvokeInst to Instruction. Only 'AttributeSet::ReturnIndex' would be meaningful on most instructions.Do you have any idea what impact this would have on memory use? We’ve been seeing a lot of small regressions in unoptimized compile times, and they’re really starting to add up. Adding a member to all instructions will have a cost even when not optimizing. We need to be more careful about that.> > 2. For the text IR format: Like with call/invoke currently, we'd optionally parse attributes in between the instruction name ('load', etc.) and the first type parameter. So load would become: > <result> = load [ret attrs] [volatile] <ty>* <pointer>[, align <alignment>] (...) > allowing you to write: > %val = load dereferenceable(32) i32** %ptr > > 3. The bitcode format extension would mirror the setup for metadata. A subblock type bitc::ATTRIBUTES_ATTACHMENT_ID would be added with a record array of [Instruction #, Attribute Set #]. This will allow attaching attribute sets to any instruction without increasing the bitcode record sizes for the instructions themselves. > > Obviously there is some potential functionality overlap here with metadata, and an alternate approach would be to add a metadata type that mirrors each attribute we'd like to model on values. Potential downsides to using metadata are: > - It could be a lot of metadata (consider adding nonnull/dereferenceable(n) to every load of a C++ reference type)Metadata is really expensive but we could simply omit it when not optimizing. The tradeoff isn’t obvious to me, but it seems like it would be OK if limited to C++ lambda captures and things like outlined OpenMP loop bodies. If you’re talking about moving the attributes that are currently on call/invoke instructions to metadata, the cost may be too high.> - We already have code that handles these as return attributes on call instructions; extending that to look at all instructions seems straightforward. Adding alternate code to also look at metadata also would require additional development for each attribute. > > One thing to keep in mind is that, like with metadata, the attributes can have control-flow dependencies, and we'd generally need to clear them when hoisting loads, etc. We already do this with metadata, at least for loads, so the places where we'd need to do this should hopefully be relatively-easy to find. > > What do you think? > > Thanks again, > Hal > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
----- Original Message -----> From: "Bob Wilson" <bob.wilson at apple.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "LLVM-Dev" <llvmdev at cs.uiuc.edu> > Sent: Tuesday, September 9, 2014 2:34:05 PM > Subject: Re: [LLVMdev] [RFC] Attributes on Values > > > > On Sep 9, 2014, at 1:36 AM, Hal Finkel <hfinkel at anl.gov> wrote: > > > > Hi everyone, > > > > Nick and Philip suggested something yesterday that I'd also thought > > about: supporting attributes on values > > (http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140908/234323.html). > > The primary motivation for this is to provide a way of attaching > > pointer information, such as noalias, nonnull and > > dereferenceable(n), to pointers generated by loads. Doing this for > > pointers generated by inttoptr also seems potentially useful. > > > > A problem that we currently have with C++ lambda captures from > > Clang (and will have with other similar constructs, like outlined > > OpenMP loop bodies), is that we end up packing things that would > > be function parameters into structures to pass to the generated > > function. Unfortunately, this means that, in the generated > > anonymous function, these values are generated by loads (not true > > function parameters), and so we lose our ability to assert things > > about them (like noalias, nonnull, etc.). > > > > How this might work: > > > > 1. Instead of only CallInst/InvokeInst having an AttributeSet > > member variable, all instructions would have one. Accessor > > functions like getAttributes would be moved from > > CallInst/InvokeInst to Instruction. Only > > 'AttributeSet::ReturnIndex' would be meaningful on most > > instructions. > > Do you have any idea what impact this would have on memory use? We’ve > been seeing a lot of small regressions in unoptimized compile times, > and they’re really starting to add up. Adding a member to all > instructions will have a cost even when not optimizing. We need to > be more careful about that.I don't know, but I'll see if I can make some reasonable measurements. Another possibility would be to add the AttributeSet only to LoadInst, IntToPtr, etc. corresponding to the use cases I currently understand.> > > > > 2. For the text IR format: Like with call/invoke currently, we'd > > optionally parse attributes in between the instruction name > > ('load', etc.) and the first type parameter. So load would become: > > <result> = load [ret attrs] [volatile] <ty>* <pointer>[, align > > <alignment>] (...) > > allowing you to write: > > %val = load dereferenceable(32) i32** %ptr > > > > 3. The bitcode format extension would mirror the setup for > > metadata. A subblock type bitc::ATTRIBUTES_ATTACHMENT_ID would be > > added with a record array of [Instruction #, Attribute Set #]. > > This will allow attaching attribute sets to any instruction > > without increasing the bitcode record sizes for the instructions > > themselves. > > > > Obviously there is some potential functionality overlap here with > > metadata, and an alternate approach would be to add a metadata > > type that mirrors each attribute we'd like to model on values. > > Potential downsides to using metadata are: > > - It could be a lot of metadata (consider adding > > nonnull/dereferenceable(n) to every load of a C++ reference type) > > Metadata is really expensive but we could simply omit it when not > optimizing. The tradeoff isn’t obvious to me, but it seems like it > would be OK if limited to C++ lambda captures and things like > outlined OpenMP loop bodies. If you’re talking about moving the > attributes that are currently on call/invoke instructions to > metadata, the cost may be too high.We could omit the attributes also when not optimizing (maybe we should be doing that now for the function parameter attributes we do emit). I'm not proposing to move attributes currently on call/invoke to metadata (although Reid did suggest that, also noting that they could be omitted at -O0), but if we had these attributes (nonnull, dereferenceable, etc.) on loads, for example, we could certainly emit them in more places in Clang's CodeGen -- this would likely provide improvements not just in lambda captures, etc. (although that is a key use case for this functionality, however we implement it). Thanks again, Hal> > > > - We already have code that handles these as return attributes on > > call instructions; extending that to look at all instructions > > seems straightforward. Adding alternate code to also look at > > metadata also would require additional development for each > > attribute. > > > > One thing to keep in mind is that, like with metadata, the > > attributes can have control-flow dependencies, and we'd generally > > need to clear them when hoisting loads, etc. We already do this > > with metadata, at least for loads, so the places where we'd need > > to do this should hopefully be relatively-easy to find. > > > > What do you think? > > > > Thanks again, > > Hal > > > > -- > > Hal Finkel > > Assistant Computational Scientist > > Leadership Computing Facility > > Argonne National Laboratory > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
On 09/09/2014 01:36 AM, Hal Finkel wrote:> Hi everyone, > > Nick and Philip suggested something yesterday that I'd also thought about: supporting attributes on values (http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140908/234323.html). The primary motivation for this is to provide a way of attaching pointer information, such as noalias, nonnull and dereferenceable(n), to pointers generated by loads. Doing this for pointers generated by inttoptr also seems potentially useful. > > A problem that we currently have with C++ lambda captures from Clang (and will have with other similar constructs, like outlined OpenMP loop bodies), is that we end up packing things that would be function parameters into structures to pass to the generated function. Unfortunately, this means that, in the generated anonymous function, these values are generated by loads (not true function parameters), and so we lose our ability to assert things about them (like noalias, nonnull, etc.). > > How this might work: > > 1. Instead of only CallInst/InvokeInst having an AttributeSet member variable, all instructions would have one. Accessor functions like getAttributes would be moved from CallInst/InvokeInst to Instruction. Only 'AttributeSet::ReturnIndex' would be meaningful on most instructions.From a usability perspective, I want to rename the enum (or maybe provide an alias?). More generally, the interface around attributes is an utter mess to work with. Before attempting to replace metadata (either in part or completely), I'd want to invest in creating an attributes API which was easier to understand and extend. To address the size question asked in a followup, I'll respond somewhat glibly. We already have metadata on a Value, how is having Attributes in their place any different? (This is assuming we completely merge metadata and attributes. No one has seriously proposed doing that yet. Alternatively, we could merge the storage and preserve the interface separation if we thought that was useful.)> > 2. For the text IR format: Like with call/invoke currently, we'd optionally parse attributes in between the instruction name ('load', etc.) and the first type parameter. So load would become: > <result> = load [ret attrs] [volatile] <ty>* <pointer>[, align <alignment>] (...) > allowing you to write: > %val = load dereferenceable(32) i32** %ptr > > 3. The bitcode format extension would mirror the setup for metadata. A subblock type bitc::ATTRIBUTES_ATTACHMENT_ID would be added with a record array of [Instruction #, Attribute Set #]. This will allow attaching attribute sets to any instruction without increasing the bitcode record sizes for the instructions themselves. > > Obviously there is some potential functionality overlap here with metadata, and an alternate approach would be to add a metadata type that mirrors each attribute we'd like to model on values. Potential downsides to using metadata are: > - It could be a lot of metadata (consider adding nonnull/dereferenceable(n) to every load of a C++ reference type)I can't really comment on the implications of this. I don't actually know how wasteful metadata is space wise. So far, I've never cared in practice.> - We already have code that handles these as return attributes on call instructions; extending that to look at all instructions seems straightforward. Adding alternate code to also look at metadata also would require additional development for each attribute.I honestly don't see there being that much duplication. I'd prefer to see us taking an approach of getting it working, then generalizing. If we do find there's a lot of duplication over time and that simple functional abstraction doesn't abstract it well, we can revisit.> > One thing to keep in mind is that, like with metadata, the attributes can have control-flow dependencies, and we'd generally need to clear them when hoisting loads, etc. We already do this with metadata, at least for loads, so the places where we'd need to do this should hopefully be relatively-easy to find.Agreed. This is non-trivial.> What do you think?Overall, I think supporting one unified metadata/attribute model long term is a good idea. We need to support all the various use cases (well known properties, prototyping, easy extensions, required vs dropable, etc..), but having two mechanisms is less than ideal. I don't believe the separation is an urgent problem. I don't believe that work on improving our ability to optimize by proving hints using metadata should be held on this idea goal. (Unless you're volunteering to do the work on this in the near future. In which case, yeah!) Philip
On Tue, Sep 9, 2014 at 8:22 PM, Philip Reames <listmail at philipreames.com> wrote:> To address the size question asked in a followup, I'll respond somewhat > glibly. We already have metadata on a Value, how is having Attributes in > their place any different? (This is assuming we completely merge metadata > and attributes. No one has seriously proposed doing that yet. > Alternatively, we could merge the storage and preserve the interface > separation if we thought that was useful.)Attributes and metadata aren't completely interchangeable. Metadata is semantically unimportant and can always be dropped (TBAA, debug info, stuff). Attributes can control semantically important things like how to pass a parameter, and can't necessarily be dropped. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140909/d91cff0c/attachment.html>