Rafael Espíndola <rafael.espindola at gmail.com> writes:> On 25 May 2014 21:29, Ben Gamari <bgamari.foss at gmail.com> wrote: >> >> For a while now LLVM has started rejecting aliases referring to things >> other than definitions[1]. > > We started checking for it. Aliases are just another label in an > object file. The linker itself doesn't know they exist and therefore > there is no way to represent an alias from foo to bar if bar is > undefined. >Sure. I think the only reason our use of aliases worked previously was that the optimizer elided them long before they could make it into an object file.>> >> 1. As place-holders for external symbols. As the code generator does >> not know the type of these symbols until the point of usage (nor >> does it need to), i8* aliases are defined at the end of the >> compilation unit, >>As it turns out this wasn't quite right; there are some cases that we don't know the type of the reference even at the point of usage (namely when we refer to the function's entrypoint label without calling it as only C--'s call node contains the signature).>> @newCAF = external global i8 >> @newCAF$alias = alias private i8* @newCAF >> >> and functions in the current compilation unit calling `newCAF` invoke >> it through `@newCAF$alias$`, >> >> ... >> %lnYi = bitcast i8* @newCAF$alias to i8* (i8*, i8*)* >> ... > > Sorry, I don't see what this buys you. The types of newCAF and > newCAF$alias are the same. >It seems you are right, you could just define the external symbol, @newCAF$alias = external global i8 Unfortunately this still leaves the problem of local symbols.>> 2. As place-holders for local symbols. All symbol references in >> emitted functions are replaced with references to aliases. This is >> done so that the compiler can emit LLVM IR definitions for >> functions without waiting for symbols they reference to become >> available (as our internal representation, Core, allows references >> in any order without forward declarations). This theoretically >> offers a performance improvement and somewhat simplifies the code >> generator. Here we emit aliases like, >> >> @SWn_srt$alias = alias private i8* bitcast (%SWn_srt_struct* @SWn_srt to i8*) >> >> again, using the `$alias` in all references, > > That should also work in llvm IR. You can create a function without a > body or a GlobalVariable without an initializer and add it afterwards. >I'm not sure I follow. If I attempt to compile, declare i32 @main() define i32 @main() { ret i32 0 } It fails with, llc: test.ll:3:12: error: invalid redefinition of function 'main' define i32 @main() { ^> Check for example what llvm-as does when a variable or a function is > used before it is defined. Doesn't that work for you? >The problem here is that we don't know the type of the symbol at the point of use so I need to assume it is something (e.g. i8*). Take for instance the following example, define i32 @main() { // We don't know the type of f a priori, thus we assume // it is i8* %f = bitcast i8* f$alias to i32 ()* call i32 %f() ret i32 0 } Say then later in GHC's Core representation, we get a definition for `f`. We have two ways of dealing with this, 1. Declare it as @f and create an alias as we currently do, define i32 @f() { ret i32 0 } @f$alias = alias private i8* @f but then we fail with recent LLVMs 2. Declare it as @f$alias directly as I think you might be suggesting define i32 @f$alias() { ret i32 0 } but then we get a type mismatch at the point of usage as we claim that @f$alias is of type i8*.>> >> Is our (ab)use of aliases reasonable? If so, what options do we have to >> fix this before LLVM 3.5? If not, what other mechanisms are there for >> addressing the use-cases above in GHC? > > It looks fairly likely llvm will accept arbitrary expressions as > aliasees again (see thread on llvmdev), but the restrictions inherent > from what alias are at the object level will remain, just be reworded > a bit. For example, we will have something along the lines of "the > aliasee expression cannot contain an undefined GlobalValue". >Alright. I'll put my GHC work aside until this is resolved in that case. My current goal is to implement tables-next-to-code using the recently merged prefix data syntax and symbol offset support that I sent to the list yesterday. It would be great if you could ping me when this is resolved so I know when this work can be continued. Thanks, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 472 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140526/d92f16a4/attachment.sig>
>> On 25 May 2014 21:29, Ben Gamari <bgamari.foss at gmail.com> wrote: >>> >>> For a while now LLVM has started rejecting aliases referring to things >>> other than definitions[1]. >> >> We started checking for it. Aliases are just another label in an >> object file. The linker itself doesn't know they exist and therefore >> there is no way to represent an alias from foo to bar if bar is >> undefined. >> > Sure. I think the only reason our use of aliases worked previously was > that the optimizer elided them long before they could make it into an > object file.If that is the case, you should be able to just directly replace alias with aliasee, no? In general you should not depend on an optimization being run to produce correct code.>>> 2. As place-holders for local symbols. All symbol references in >>> emitted functions are replaced with references to aliases. This is >>> done so that the compiler can emit LLVM IR definitions for >>> functions without waiting for symbols they reference to become >>> available (as our internal representation, Core, allows references >>> in any order without forward declarations). This theoretically >>> offers a performance improvement and somewhat simplifies the code >>> generator. Here we emit aliases like, >>> >>> @SWn_srt$alias = alias private i8* bitcast (%SWn_srt_struct* @SWn_srt to i8*) >>> >>> again, using the `$alias` in all references, >> >> That should also work in llvm IR. You can create a function without a >> body or a GlobalVariable without an initializer and add it afterwards. >> > I'm not sure I follow. If I attempt to compile, > > declare i32 @main() > define i32 @main() { > ret i32 0 > } > > It fails with, > > llc: test.ll:3:12: error: invalid redefinition of function 'main' > define i32 @main() {There are no redeclarations in LLVM IR. You can just put top level entities in any order: define void @f() { call void @g() call void @h() ret void } declare void @g() define void @h() { ret void }> The problem here is that we don't know the type of the symbol at the > point of use so I need to assume it is something (e.g. i8*). Take for > instance the following example, > > > define i32 @main() { > // We don't know the type of f a priori, thus we assume > // it is i8* > %f = bitcast i8* f$alias to i32 ()* > call i32 %f() > ret i32 0 > }Instead of having an f$alias, you could just have produced a declare void f() since you know the type it is being called with.> Say then later in GHC's Core representation, we get a definition for > `f`. We have two ways of dealing with this, > > 1. Declare it as @f and create an alias as we currently do, > > define i32 @f() { > ret i32 0 > } > @f$alias = alias private i8* @f > > but then we fail with recent LLVMs > > > 2. Declare it as @f$alias directly as I think you might be suggesting > > define i32 @f$alias() { > ret i32 0 > } > > but then we get a type mismatch at the point of usage as we claim > that @f$alias is of type i8*. > >No, the idea is to not have f$alias at all. Once you find that f has to be defined, you just set its body (which turns it into a definition). I guess a better example might have been clang compiling: void f(void); void g(void) { f(); } void f(void) { } In here f will be converted from a declaration to a definition. Cheers, Rafael
Rafael Espíndola <rafael.espindola at gmail.com> writes:> On 25 May 2014 21:29, Ben Gamari <bgamari.foss at gmail.com> wrote: >> Sure. I think the only reason our use of aliases worked previously was >> that the optimizer elided them long before they could make it into an >> object file. > > If that is the case, you should be able to just directly replace alias > with aliasee, no? In general you should not depend on an optimization > being run to produce correct code. >I absolutely agree. As far as I understand we ended up with the current situation as no one could figure out a better way to resolve the typing issue I clarify below. I'm trying to find a better solution.>>> That should also work in llvm IR. You can create a function without a >>> body or a GlobalVariable without an initializer and add it afterwards. >>> >> I'm not sure I follow. If I attempt to compile, >> >> declare i32 @main() >> define i32 @main() { >> ret i32 0 >> } >> >> It fails with, >> >> llc: test.ll:3:12: error: invalid redefinition of function 'main' >> define i32 @main() { > > There are no redeclarations in LLVM IR. You can just put top level > entities in any order: >Alright, I misunderstood your point in that case. Thanks for the clarification!>> The problem here is that we don't know the type of the symbol at the >> point of use so I need to assume it is something (e.g. i8*). Take for >> instance the following example, >> >> >> define i32 @main() { >> // We don't know the type of f a priori, thus we assume >> // it is i8* >> %f = bitcast i8* f$alias to i32 ()* >> call i32 %f() >> ret i32 0 >> } > > Instead of having an f$alias, you could just have produced a > > declare void f() > > since you know the type it is being called with. >Bah, that was a poor choice of example on my part. A better one might be the following: Our C-- representation might refer to `f` without calling it (e.g. when building a thunk). In this case we don't have access to the function's signature, which we can only infer from a `call` node. For this reason, we currently demote all pointers to some common type (i8* currently) so we don't run into LLVM's type system in cases where we can't infer a value's type.> No, the idea is to not have f$alias at all. Once you find that f has > to be defined, you just set its body (which turns it into a > definition). >See above for why I believe we need the alias.> I guess a better example might have been clang compiling: > > void f(void); > void g(void) { > f(); > } > void f(void) { > } > > In here f will be converted from a declaration to a definition. >That is to say that nothing is emitted until the entire compilation unit is parsed (so we know which items are definitions and which are declarations)? Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 472 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140526/71fe467f/attachment.sig>
To maybe clarify a bit - this is about a pass in GHC that translates an intermediate program representation called "Cmm" into LLVM code. The trouble stems from the fact that 1) Cmm is untyped, so whenever we see a label it might refer to data or functions, of pretty much arbitrary type 2) The Cmm code is generated iteratively, and as a good consumer we would like to "stream" the LLVM code as well. These intermediate representations can easily go up to millions of lines, after all. This all works out fairly well, with the only stumbling block being the types. After all, at the point where we emit a reference to a label we don't know whether it is going to be defined later on in the output file. We especially have no idea what its LLVM type is going to be. To get around that we use aliases to essentially "strip" type information from label references: If we refer to "label$alias" instead of "label", we are still free to define "label" later on in whatever way we see fit. Then we just set "label$alias" to a suitable cast, and let the LLVM infrastructure handle the resolution. That being said - there are actually a number of possible solutions here, and we are currently trying to settle on what the "right thing to do" is. In case we are being to tricky here, we might try to instead do two passes over the output file, or scrap streaming altogether. All depends on whether or not it is likely that this kind of usage remains possible in future LLVM versions. Greetings, Peter Wortmann