Peter Collingbourne via llvm-dev
2016-Oct-11 03:12 UTC
[llvm-dev] RFC: Absolute or "fixed address" symbols as immediate operands
Hi all, I wanted to summarise some discussion on llvm-commits [0,1] as an RFC, as I felt it demanded wider circulation. Our support for references to absolute symbols is not very good. The symbol will be resolved accurately in non-PIC code, but suboptimally: the symbol reference cannot currently appear as the immediate operand of an instruction, and the code generator cannot make any assumptions about the value of the symbol (so for example, it could not use a R_X86_64_8 relocation if the value is known to be in the range 0..255). In PIC mode, if the reference is not known to be DSO-local, the value is loaded from the GOT (or a synthetic GOT entry), which again means suboptimal code. If the reference is known to be DSO-local, the symbol will be referenced with a PC relative relocation and therefore cannot be resolved properly to an absolute value (c.f. https://reviews.llvm.org/D19844). The latter case in particular would seem to indicate that a representational change is required for correctness to distinguish references to absolute symbols from references to regular symbols. The specific change I have in mind is to allow !range metadata on GlobalObjects. This would be similar to existing !range metadata, but it would apply to the "address" of the attached GlobalObject, rather than any value loaded from it. Its presence on a GlobalObject would also imply that the address of the GlobalObject is "fixed" at link time. Alongside !range we could potentially use other sources of information, such as the relocation model, code model and visibility, to identify "fixed" globals, although that can be done separately. I have been experimenting with a number of approaches to representation in SDAG, and I have found one that seems to work best, and would be the least intrusive (unfortunately most approaches to this problem are somewhat intrusive). Specifically, I want to: 1) move most of the body of ConstantSDNode to a new class, ConstantIntSDNode, which would derives from ConstantSDNode. ConstantSDNode would act as the base class for immediates-post-static-linking. Change most references to ConstantSDNode in C++ code to refer to ConstantIntSDNode. However, "imm" in tblgen code would continue to match ConstantSDNode. 2) introduce a new derived class of ConstantSDNode for references to globals with !range metadata, and teach SDAG to use this new derived class for fixed address references I will shortly be sending out a patch that implements 1. Thanks, -- -- Peter [0] http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160926/394194.html [1] http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161003/394983.html -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161010/28950923/attachment.html>
Chris Lattner via llvm-dev
2016-Oct-11 03:31 UTC
[llvm-dev] RFC: Absolute or "fixed address" symbols as immediate operands
> On Oct 10, 2016, at 8:12 PM, Peter Collingbourne via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi all, > > I wanted to summarise some discussion on llvm-commits [0,1] as an RFC, as I felt it demanded wider circulation. > > Our support for references to absolute symbols is not very good. The symbol will be resolved accurately in non-PIC code, but suboptimally: the symbol reference cannot currently appear as the immediate operand of an instruction, and the code generator cannot make any assumptions about the value of the symbol (so for example, it could not use a R_X86_64_8 relocation if the value is known to be in the range 0..255). > > In PIC mode, if the reference is not known to be DSO-local, the value is loaded from the GOT (or a synthetic GOT entry), which again means suboptimal code. If the reference is known to be DSO-local, the symbol will be referenced with a PC relative relocation and therefore cannot be resolved properly to an absolute value (c.f. https://reviews.llvm.org/D19844 <https://reviews.llvm.org/D19844>). The latter case in particular would seem to indicate that a representational change is required for correctness to distinguish references to absolute symbols from references to regular symbols. > > The specific change I have in mind is to allow !range metadata on GlobalObjects. This would > be similar to existing !range metadata, but it would apply to the "address" of the attached GlobalObject, rather than any value loaded from it. Its presence on a GlobalObject would also imply that the address of the GlobalObject is "fixed" at link time. Alongside !range we could potentially use other sources of information, such as the relocation model, code model and visibility, to identify "fixed" globals, although that can be done separately.Ok, I think I understand the use-case.> I have been experimenting with a number of approaches to representation in SDAG, and I have found one that seems to work best, and would be the least intrusive (unfortunately most approaches to this problem are somewhat intrusive). > > Specifically, I want to: > 1) move most of the body of ConstantSDNode to a new class, ConstantIntSDNode, which would derives from ConstantSDNode. ConstantSDNode would act as the base class for immediates-post-static-linking. Change most references to ConstantSDNode in C++ code to refer to ConstantIntSDNode. However, "imm" in tblgen code would continue to match ConstantSDNode. > 2) introduce a new derived class of ConstantSDNode for references to globals with !range metadata, and teach SDAG to use this new derived class for fixed address referencesConstantSDNode is poorly named, and renaming it to ConstantIntSDNode is probably the right thing to do independently of the other changes. That said, I don’t understand why you’d keep ConstantSDNode around and introduce a new derived class of it. This seems like something that a new “imm" immediate matcher would handle: it would match constants in a certain range, or a GlobalAddressSDNode known-to-be-small. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161010/0df32e59/attachment.html>
Peter Collingbourne via llvm-dev
2016-Oct-11 07:04 UTC
[llvm-dev] RFC: Absolute or "fixed address" symbols as immediate operands
On Mon, Oct 10, 2016 at 8:31 PM, Chris Lattner <clattner at apple.com> wrote:> > On Oct 10, 2016, at 8:12 PM, Peter Collingbourne via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Hi all, > > I wanted to summarise some discussion on llvm-commits [0,1] as an RFC, as > I felt it demanded wider circulation. > > Our support for references to absolute symbols is not very good. The > symbol will be resolved accurately in non-PIC code, but suboptimally: the > symbol reference cannot currently appear as the immediate operand of an > instruction, and the code generator cannot make any assumptions about the > value of the symbol (so for example, it could not use a R_X86_64_8 > relocation if the value is known to be in the range 0..255). > > In PIC mode, if the reference is not known to be DSO-local, the value is > loaded from the GOT (or a synthetic GOT entry), which again means > suboptimal code. If the reference is known to be DSO-local, the symbol will > be referenced with a PC relative relocation and therefore cannot be > resolved properly to an absolute value (c.f. > https://reviews.llvm.org/D19844). The latter case in particular would > seem to indicate that a representational change is required for correctness > to distinguish references to absolute symbols from references to regular > symbols. > > The specific change I have in mind is to allow !range metadata on > GlobalObjects. This would > be similar to existing !range metadata, but it would apply to the > "address" of the attached GlobalObject, rather than any value loaded from > it. Its presence on a GlobalObject would also imply that the address of the > GlobalObject is "fixed" at link time. Alongside !range we could potentially > use other sources of information, such as the relocation model, code model > and visibility, to identify "fixed" globals, although that can be done > separately. > > > Ok, I think I understand the use-case. > > I have been experimenting with a number of approaches to representation in > SDAG, and I have found one that seems to work best, and would be the least > intrusive (unfortunately most approaches to this problem are somewhat > intrusive). > > Specifically, I want to: > 1) move most of the body of ConstantSDNode to a new class, > ConstantIntSDNode, which would derives from ConstantSDNode. ConstantSDNode > would act as the base class for immediates-post-static-linking. Change > most references to ConstantSDNode in C++ code to refer to > ConstantIntSDNode. However, "imm" in tblgen code would continue to match > ConstantSDNode. > 2) introduce a new derived class of ConstantSDNode for references to > globals with !range metadata, and teach SDAG to use this new derived class > for fixed address references > > > ConstantSDNode is poorly named, and renaming it to ConstantIntSDNode is > probably the right thing to do independently of the other changes. > > That said, I don’t understand why you’d keep ConstantSDNode around and > introduce a new derived class of it. This seems like something that a new > “imm" immediate matcher would handle: it would match constants in a certain > range, or a GlobalAddressSDNode known-to-be-small. >To begin with: I'm not sure that GlobalAddressSDNode is the right node to use for these types of immediates. It seems that we have two broad classes of globals here: those with a fixed-at-link-time address (e.g. regular non-PIC symbols, absolute symbols) and those where the address needs to be computed (e.g. PC-relative addresses, TLS variables). To me it seems like the first class is much more similar to immediates than to the second class. That suggested to me that there ought to be two separate representations for global variables, where the former are "morally" immediates, and the latter are not (i.e. the existing GlobalAddressSDNode). I went over a couple of approaches for representing "moral" immediates in my llvm-commits post. The first one seems to be more like what you're suggesting:> - Introduce a new opcode for absolute symbol constants. This intuitivelyseemed like the least risky approach, as individual instructions could "opt in" to the new absolute symbol references. However, this seems hard to fit into the existing SDAG pattern matching engine, as the engine expects each "variable" to have a specific opcode. I tried adding special support for "either of the two constant opcodes" to the matcher, but I could not see a good way to do it without making fundamental changes to how patterns are matched.> > - Use the ISD::Constant opcode for absolute symbol constants, butintroduce a separate class for them. This also seemed problematic, as there is a strong assumption (both in existing SDAG code and in generated code) of a many-to-one mapping from opcodes to classes. We can solve part of the problem with the second approach with a base class for ISD::Constant. As I worked on that approach, I found that it did turn out to be a good fit overall: in many cases we're already adhering to a principle that an unrestricted immediate maps onto potentially relocatable bytes in the output file. The X86 and ARM backends illustrate this quite well: the X86 instruction set generally uses power-of-2 wide immediate forms that neatly map onto instruction bytes, and ARM generally uses compressed immediate forms (e.g. "mod_imm") which would naturally match only real constant integers. Using that principle, we can restrict (e.g.) ImmLeaf to constant integers (see https://reviews.llvm.org/D25355). In cases where this mapping isn't quite right, we can use more restrictive matchers. I'm still a little uneasy about the second approach, and would be interested in my first approach, but I'm not sure if it would be practical. Thanks, -- -- Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161011/e08babff/attachment.html>
Peter Collingbourne via llvm-dev
2016-Oct-24 20:07 UTC
[llvm-dev] RFC: Absolute or "fixed address" symbols as immediate operands
On Mon, Oct 10, 2016 at 8:12 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:> The specific change I have in mind is to allow !range metadata on > GlobalObjects. This would > be similar to existing !range metadata, but it would apply to the > "address" of the attached GlobalObject, rather than any value loaded from > it. Its presence on a GlobalObject would also imply that the address of the > GlobalObject is "fixed" at link time. >Going back to IR-level representation: here is an alternative representation based on a suggestion from Eli. Introduce a new type of GlobalValue called GlobalConstant. GlobalConstant would fit into the GlobalValue hierarchy like this: - GlobalValue - GlobalConstant - GlobalPointer - GlobalIndirectSymbol - GlobalAlias - GlobalIFunc - GlobalObject - Function - GlobalVariable GlobalValue would no longer be assumed to be of pointer type. The getType() overload that takes a PointerType, as well as getValueType() would be moved down to GlobalPointer. (A nice side benefit of this is that it would help flush out cases where we are unnecessarily depending on global pointee types.) A GlobalConstant can either be a definition or a declaration. A definition would look like this: @foo = globalconst i32 42 while a declaration would look like this: @foo = external globalconst i32 GlobalConstant could also hold a linkage and visibility. Looking at the other attributes that a GlobalValue can hold, many of them do not seem appropriate for GlobalConstant and could potentially be moved to GlobalPointer. Thoughts? Thanks, -- -- Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161024/1d347c75/attachment.html>
Friedman, Eli via llvm-dev
2016-Oct-24 20:36 UTC
[llvm-dev] RFC: Absolute or "fixed address" symbols as immediate operands
On 10/24/2016 1:07 PM, Peter Collingbourne via llvm-dev wrote:> On Mon, Oct 10, 2016 at 8:12 PM, Peter Collingbourne <peter at pcc.me.uk > <mailto:peter at pcc.me.uk>> wrote: > > The specific change I have in mind is to allow !range metadata on > GlobalObjects. This would > be similar to existing !range metadata, but it would apply to the > "address" of the attached GlobalObject, rather than any value > loaded from it. Its presence on a GlobalObject would also imply > that the address of the GlobalObject is "fixed" at link time. > > Going back to IR-level representation: here is an alternative > representation based on a suggestion from Eli. > > Introduce a new type of GlobalValue called GlobalConstant. > GlobalConstant would fit into the GlobalValue hierarchy like this: > > * GlobalValue > o GlobalConstant > o GlobalPointer > + GlobalIndirectSymbol > # GlobalAlias > # GlobalIFunc > + GlobalObject > # Function > # GlobalVariable > > GlobalValue would no longer be assumed to be of pointer type. The > getType() overload that takes a PointerType, as well as getValueType() > would be moved down to GlobalPointer. (A nice side benefit of this is > that it would help flush out cases where we are unnecessarily > depending on global pointee types.) > > A GlobalConstant can either be a definition or a declaration. A > definition would look like this: > > @foo = globalconst i32 42This is equivalent to writing "foo = 42" in assembly?> while a declaration would look like this: > > @foo = external globalconst i32 > > GlobalConstant could also hold a linkage and visibility. Looking at > the other attributes that a GlobalValue can hold, many of them do not > seem appropriate for GlobalConstant and could potentially be moved to > GlobalPointer. > > Thoughts? >How do you plan to use this? The concept makes sense, but I've never actually seen anyone use symbols this way. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161024/ac3d0aac/attachment.html>
Chris Lattner via llvm-dev
2016-Oct-25 05:46 UTC
[llvm-dev] RFC: Absolute or "fixed address" symbols as immediate operands
> On Oct 24, 2016, at 1:07 PM, Peter Collingbourne <peter at pcc.me.uk> wrote: > > On Mon, Oct 10, 2016 at 8:12 PM, Peter Collingbourne <peter at pcc.me.uk <mailto:peter at pcc.me.uk>> wrote: > The specific change I have in mind is to allow !range metadata on GlobalObjects. This would > be similar to existing !range metadata, but it would apply to the "address" of the attached GlobalObject, rather than any value loaded from it. Its presence on a GlobalObject would also imply that the address of the GlobalObject is "fixed" at link time. > > Going back to IR-level representation: here is an alternative representation based on a suggestion from Eli. > > Introduce a new type of GlobalValue called GlobalConstant. GlobalConstant would fit into the GlobalValue hierarchy like this: > GlobalValue > GlobalConstant > GlobalPointer > GlobalIndirectSymbol > GlobalAlias > GlobalIFunc > GlobalObject > Function > GlobalVariable > GlobalValue would no longer be assumed to be of pointer type. The getType() overload that takes a PointerType, as well as getValueType() would be moved down to GlobalPointer. (A nice side benefit of this is that it would help flush out cases where we are unnecessarily depending on global pointee types.)Hi Peter, I agree that it makes sense to introduce a new GlobalConstant IR node for this sort of thing. That said, have you considered a design where GlobalConstant is still required to be a pointer type? If you did this, you would end up with a simpler and less invasive design of: GlobalValue GlobalConstant GlobalIndirectSymbol GlobalAlias GlobalIFunc GlobalObject Function GlobalVariable I think that this would be better for (e.g.) the X86 backend anyway, since global objects can be assigned to specific addresses with linker maps, and thus have small addresses (and this is expressible with the range metadata). This means that GlobalConstant and other GlobalValues should all be usable in the same places in principle. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161024/0f1ca6b9/attachment.html>
Apparently Analagous Threads
- RFC: Absolute or "fixed address" symbols as immediate operands
- RFC: Absolute or "fixed address" symbols as immediate operands
- RFC: Absolute or "fixed address" symbols as immediate operands
- RFC: Absolute or "fixed address" symbols as immediate operands
- [IR] Modelling of GlobalIFunc