Rafael Espíndola
2014-May-23 00:13 UTC
[LLVMdev] Changing the design of GlobalAliases to represent what is actually possible in object files.
Bringing the discussion to llvmdev. For the purposed of this discussion, object files can be thought as having just a few thing we care about: data, labels and relocations. Data is what at llvm ir would be just the contents of variables or functions. Relocations are utilities to compute the data at link time when it is not possible to do so earlier. For example, to compute a pcrel relocation we need to know the offset of a given symbol to the current position. Relocations at the llvm IR are represented with ConstantExpr. There is a point to be made that that representation could be better, but I thing that is not too important for this discussion. Whatever we turn ConstantExpr into it will always have to be able to represent the relocations we want to create. Now for the main part of this proposal: The labels. Some labels are implicitly created for other llvm constructs. A Function or a GlobalVariable will have one for example. But labels at the object files are not constrained to point to what at the LLVM level is a Function or a GlobalVariable. We need a way to ask for other labels. We need to * Be able to create labels with an absolute value. * Be able to create labels inside a GlobalVarible or pointing to the start of a GlobalVariable or Function. Note that it is still just a label. No relocations are involved. The tool we have in llvm for creating these extra labels is the GlobalAlias. One way of representing the above uses is to be explicit: a label is created with a specific value or at an offset from another. Another way of representing it is with a ConstantExpr, since those two cases are a subset of what a ConstantExpr can represent. My preference is for having an explicit offset that is just an integer. Using an ConstantExpr seems conflation of two different things: labels and relocations. The fact that some relocations are as simple as a label plus an offset seems incidental.>From an implementation perspective having a representation that usesjust a GlobalObject and an offset seems beneficial too. Any attempt to create a non representable label (GlobalAlias) will fail immediately, instead of leaving the IR in a state that will fail down the line. It also makes general IR operation like rauw easier to reason about. Since ConstantExpr are uniqued, they have a more complex replace implementation where they have to be replaced one level at a time. We would have to wait until the replacement reaches the GlobalAlias to see if it still is one of the ConstanExprs that happen to be just a label and an offset, and if it is not we would have not easy way of knowing what went wrong. Cheers, Rafael
John McCall
2014-May-23 01:18 UTC
[LLVMdev] Changing the design of GlobalAliases to represent what is actually possible in object files.
On May 22, 2014, at 5:13 PM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:> Bringing the discussion to llvmddv.Thanks for doing this.> For the purposed of this discussion, object files can be thought as > having just a few thing we care about: data, labels and relocations. > > Data is what at llvm ir would be just the contents of variables or functions. > > Relocations are utilities to compute the data at link time when it is > not possible to do so earlier. For example, to compute a pcrel > relocation we need to know the offset of a given symbol to the current > position. > > Relocations at the llvm IR are represented with ConstantExpr. There is > a point to be made that that representation could be better, but I > thing that is not too important for this discussion. Whatever we turn > ConstantExpr into it will always have to be able to represent the > relocations we want to create. > > Now for the main part of this proposal: The labels. > > Some labels are implicitly created for other llvm constructs. A > Function or a GlobalVariable will have one for example. > > But labels at the object files are not constrained to point to what at > the LLVM level is a Function or a GlobalVariable. We need a way to ask > for other labels. We need to > > * Be able to create labels with an absolute value. > * Be able to create labels inside a GlobalVarible or pointing to the > start of a GlobalVariable or Function.I agree that this accurately summarizes both (1) what’s expressible in current object file formats and (2) what we’re likely to want to need from global aliases.> The tool we have in llvm for creating these extra labels is the GlobalAlias. > > One way of representing the above uses is to be explicit: a label is > created with a specific value or at an offset from another.Also important: in this model, the label has its own LLVM type, which is permitted to differ from the LLVM type of the aliasee (if present). I will note that this model does require absolute symbols to be literal values. That eliminates a lot of things that are at least theoretically useful. For example, it would not be possible to define an absolute symbol to be the offset between two symbols. In some restricted circumstances — if both symbols are global variables in the same section and defined in the same translation unit — this could be worked around. But I’ll gladly admit that I don’t have a use case in mind for that feature. Absolute symbols are useful, and storing offsets between symbols into global memory is useful, but I don’t know why you’d combine them.> Another way of representing it is with a ConstantExpr, since those two > cases are a subset of what a ConstantExpr can represent. > > My preference is for having an explicit offset that is just an > integer. Using an ConstantExpr seems conflation of two different > things: labels and relocations. The fact that some relocations are as > simple as a label plus an offset seems incidental.I don’t think I accept that ConstantExpr just means “relocation” in IR, either in principal or as a description of reality. A constant used only as an instruction operand is definitely not limited to what’s expressible with relocations.> From an implementation perspective having a representation that uses > just a GlobalObject and an offset seems beneficial too. Any attempt to > create a non representable label (GlobalAlias) will fail immediately, > instead of leaving the IR in a state that will fail down the line. > > It also makes general IR operation like rauw easier to reason about. > Since ConstantExpr are uniqued, they have a more complex replace > implementation where they have to be replaced one level at a time. We > would have to wait until the replacement reaches the GlobalAlias to > see if it still is one of the ConstanExprs that happen to be just a > label and an offset, and if it is not we would have not easy way of > knowing what went wrong.Is this not still true under the global-and-offset model? If you replace the target of a GlobalAlias with a ConstantExpr, RAUW will have to evaluate the expression down to a global and an offset in exactly the way that you’re worried about the backend having to do. Except, of course, RAUW has to worry about working with a module that lacks data layout. John.
Rafael Espíndola
2014-May-23 02:14 UTC
[LLVMdev] Changing the design of GlobalAliases to represent what is actually possible in object files.
> I agree that this accurately summarizes both (1) what’s expressible in > current object file formats and (2) what we’re likely to want to need from > global aliases. > >> The tool we have in llvm for creating these extra labels is the GlobalAlias. >> >> One way of representing the above uses is to be explicit: a label is >> created with a specific value or at an offset from another. > > Also important: in this model, the label has its own LLVM type, which is > permitted to differ from the LLVM type of the aliasee (if present). > > I will note that this model does require absolute symbols to be literal values. > That eliminates a lot of things that are at least theoretically useful. > > For example, it would not be possible to define an absolute symbol to be > the offset between two symbols. In some restricted circumstances — > if both symbols are global variables in the same section and defined > in the same translation unit — this could be worked around. > > But I’ll gladly admit that I don’t have a use case in mind for that feature. > Absolute symbols are useful, and storing offsets between symbols into > global memory is useful, but I don’t know why you’d combine them.That is funny. I, on the other hand, think that this is the best argument I have seen for keeping aliases pointing to ConstantExpr so far. While labels and relocations are very different things at the object level, llvm is not currently in a position to know when a relocation is needed or not. I would like for that not to be the case, but that is a far bigger change. It also points out that an expression being a valid label definition or not can change in a way that is hard to see during the change itself: We can have an arbitrarily nested expression that goes from evaluatable to requiring a relocation when the section of a global object is changed. That in turn puts the validity check in the verifier, even we constraint ConstantExprs. In other words, another possible representation would be * GlobalsAlias point to ConstantExpr * The expression is completely unconstrained in the current implementation of ConstantExpr. * There is no notion of an aliased symbol. Things like detecting cycles go from "A == A->getAliasedSymbol()" to "A->getAliasee().uses(A)", but even that seems questionable outside of special case like clang that knows the types of alias it creates. This would greatly diminish our ability to report invalid uses, since the first thing to noticed they are invalid is MC. It would also require the alias to weak alias problem to be handled directly in the IR linker. In here we would have to approximate: do our best to evaluate it, but if the expression still has discarded globals, error. In other words, painful but reasonable. I will have to look at the code to see how painful it is to generalize every user of the "aliased symbol" to work with an arbitrary expression. I will experiment with it tomorrow and report.> I don’t think I accept that ConstantExpr just means “relocation” in IR, > either in principal or as a description of reality. A constant used only as > an instruction operand is definitely not limited to what’s expressible > with relocations.Yes. I know there are disagreements about ConstantExpr, but think we all agree that they are *at least* as general as any relocation we want to represent.>> It also makes general IR operation like rauw easier to reason about. >> Since ConstantExpr are uniqued, they have a more complex replace >> implementation where they have to be replaced one level at a time. We >> would have to wait until the replacement reaches the GlobalAlias to >> see if it still is one of the ConstanExprs that happen to be just a >> label and an offset, and if it is not we would have not easy way of >> knowing what went wrong. > > Is this not still true under the global-and-offset model? If you replace > the target of a GlobalAlias with a ConstantExpr, RAUW will have to > evaluate the expression down to a global and an offset in exactly the > way that you’re worried about the backend having to do. Except, > of course, RAUW has to worry about working with a module that > lacks data layout.But in here RAUW is seeing the actual replacement. It is seeing the GlobalObject that is directly used by a GlobalAlias being replaced with an expression. If the alias points to a Constant, it is seeing the result of a perfectly valid run of replaceUsesOfWithOnConstant which may or may not be a valid aliasee. Cheers, Rafael
Possibly Parallel Threads
- [LLVMdev] Changing the design of GlobalAliases to represent what is actually possible in object files.
- [LLVMdev] Changing the design of GlobalAliases to represent what is actually possible in object files.
- [LLVMdev] Changing the design of GlobalAliases to represent what is actually possible in object files.
- [LLVMdev] Changing the design of GlobalAliases to represent what is actually possible in object files.
- RFC: Absolute or "fixed address" symbols as immediate operands