David Terei
2010-Jun-08 10:42 UTC
[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
Hi All, The GHC developers would like to add support to llvm to enable the order that code and data are laid out in, in the resulting assembly code produced by llvm to be defined by the user. The reason we would like to have this feature is explained in the blog post on GHC's use of llvm here: http://blog.llvm.org/2010/05/glasgow-haskell-compiler-and-llvm.html, specifically under the title, 'Problems with backend'. Basically we want to be able to produce code using llvm that looks like this: .text .align 4,0x90 .long _ZCMain_main_srt-(_ZCMain_main_info)+0 .long 0 .long 196630 .globl _ZCMain_main_info _ZCMain_main_info: .Lcg6: leal -12(%ebp),%eax cmpl 84(%ebx),%eax [...] So in the above code we can access the code for the function '_ZCMain_main_info' and the metadata for it need by the runtime with just the one label. At the moment llvm just outputs all global variables at the end. It seems to me that there are three slightly different ways to support this in llvm: 1) Have llvm preserve order of data and code from input file when in the same section 2) Use a new special '@llvm.foo' variable that takes a list of functions and globals. Order they appear in the array is the order they should be output in and as one contiguous block. 3) Have llvm be specifically aware about the desire to associate some global variable with a function. So a function definition could include taking a global variable as an attribute. llvm would then output the function and variable together like in the code above. I was thinking that the first option is the easiest, both for llvm and its users. My simple idea was to just somehow store the order that functions and globals are read in by AsmParser or created in by using the API. You could use a list to do this or just give each global/function a number representing its order that could be sorted on. When it comes for AsmPrinter to write out the module it does so in order. Any new functions or globals created by optimisations are simply added to the end of the sort order. This would produce the above code but also with a label for the data, like this: .text .align 4,0x90 _ZCMain_main_info_table: .long _ZCMain_main_srt-(_ZCMain_main_info)+0 .long 0 .long 196630 .globl _ZCMain_main_info _ZCMain_main_info: .Lcg6: leal -12(%ebp),%eax cmpl 84(%ebx),%eax The problem could be optimisations though. this is an area I'm not very knowledgeable in so please point out any issues. The main problem I can think of are inlining and dead code removal. Inlining I believe should be OK as long as it doesn't remove the original function since we need that label present to access the data before it. I wouldn't think this will happen though since the function will be accessed both as a tail call to it and using pointer arithmetic with subtraction to get the data before it. The pointer arithmetic would stop llvm removing it. The other issue is dead code removal removing the data ('_ZCMain_main_info_table') since there are no references to it. That can be easily fixed using @llvm.used. The biggest problem with this approach is that it limits the optimisations llvm can do on this code. If the third approach was taken for example llvm could optimise more aggressively and specifically for the situation. I'm also not exactly sure how link time optimisation would figure into this at the moment so perhaps that's a big issue. So thoughts, criticisms, alternative suggestions please. Cheers, David
Sebastian Redl
2010-Jun-08 14:50 UTC
[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
On Tue, 8 Jun 2010 11:42:41 +0100, David Terei <davidterei at gmail.com> wrote:> Hi All, > > The GHC developers would like to add support to llvm to enable the > order that code and data are laid out in, in the resulting assembly > code produced by llvm to be defined by the user. The reason we would > like to have this feature is explained in the blog post on GHC's use > of llvm here: > http://blog.llvm.org/2010/05/glasgow-haskell-compiler-and-llvm.html, > specifically under the title, 'Problems with backend'. >Whichever way is chosen, the ability to reorder and intermingle functions and data arbitrarily is interesting to more than just GHC. In particular, I would like to point out the efforts by Mozilla to make Firefox startup faster, which essentially came down to reordering stuff in the executables so that everything is ordered by the sequence of accesses during program startup. This means that programs can be read sequentially from the front to the end, thus reducing I/O latency. Tools for automating this process would probably benefit from being able to specify the layout this way. Sebastian
John McCall
2010-Jun-08 18:35 UTC
[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
On Jun 8, 2010, at 3:42 AM, David Terei wrote:> The GHC developers would like to add support to llvm to enable the > order that code and data are laid out in, in the resulting assembly > code produced by llvm to be defined by the user. The reason we would > like to have this feature is explained in the blog post on GHC's use > of llvm here: http://blog.llvm.org/2010/05/glasgow-haskell-compiler-and-llvm.html, > specifically under the title, 'Problems with backend'. > > Basically we want to be able to produce code using llvm that looks like this: > > .text > .align 4,0x90 > .long _ZCMain_main_srt-(_ZCMain_main_info)+0 > .long 0 > .long 196630 > .globl _ZCMain_main_info > _ZCMain_main_info: > .Lcg6: > leal -12(%ebp),%eax > cmpl 84(%ebx),%eax > [...] > > So in the above code we can access the code for the function > '_ZCMain_main_info' and the metadata for it need by the runtime with > just the one label. At the moment llvm just outputs all global > variables at the end. > > It seems to me that there are three slightly different ways to support > this in llvm: > > 1) Have llvm preserve order of data and code from input file when in > the same sectionI dislike this approach; implicit requirements are bad. Many clients don't care about the order in which variables are emitted, and indeed GHC doesn't care either outside of a very narrow range of constraints. It seems to me that a module property (or special global value) holding a list of ordering lists would be reasonably appropriate. Constraints: values can only appear in a single list, values in a list must be definitions, and (for now) values in a list should not have merging linkage. John.
Török Edwin
2010-Jun-08 19:09 UTC
[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
On 06/08/2010 09:35 PM, John McCall wrote:> On Jun 8, 2010, at 3:42 AM, David Terei wrote: >> The GHC developers would like to add support to llvm to enable the >> order that code and data are laid out in, in the resulting assembly >> code produced by llvm to be defined by the user. The reason we would >> like to have this feature is explained in the blog post on GHC's use >> of llvm here: http://blog.llvm.org/2010/05/glasgow-haskell-compiler-and-llvm.html, >> specifically under the title, 'Problems with backend'. >> >> Basically we want to be able to produce code using llvm that looks like this: >> >> .text >> .align 4,0x90 >> .long _ZCMain_main_srt-(_ZCMain_main_info)+0 >> .long 0 >> .long 196630 >> .globl _ZCMain_main_info >> _ZCMain_main_info: >> .Lcg6: >> leal -12(%ebp),%eax >> cmpl 84(%ebx),%eax >> [...] >> >> So in the above code we can access the code for the function >> '_ZCMain_main_info' and the metadata for it need by the runtime with >> just the one label. At the moment llvm just outputs all global >> variables at the end. >> >> It seems to me that there are three slightly different ways to support >> this in llvm: >> >> 1) Have llvm preserve order of data and code from input file when in >> the same section > > I dislike this approach; implicit requirements are bad. Many clients don't care > about the order in which variables are emitted, and indeed GHC doesn't care > either outside of a very narrow range of constraints. > > It seems to me that a module property (or special global value) holding a list of > ordering lists would be reasonably appropriate. Constraints: values can only > appear in a single list, values in a list must be definitions, and (for now) values > in a list should not have merging linkage.FWIW this would also help in implementing -fno-toplevel-reorder, which is needed if llvm/clang wants to build (e)glibc. A list holding the order of module level assembly and functions should suffice. Best regards, --Edwin
Eugene Toder
2010-Jun-08 21:15 UTC
[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
Let me point out that projects using standard toolchain (e.g. binutils) can already reorder code and data pretty much arbitrary using sections and linker scripts. I think it's still attractive to have reordering in LLVM to be independent from external toolchain. This will allow reordering in JIT and other interesting things. I agree with John that special global with ordered list looks like the clean approach. However, for the specific need in GHC I'm not sure that this is enough. What it's trying to do is placing a global variable (a struct) and a function next to each other, so that it can use pointer arithmetic to go from one address to another. So, say, we take the address of the struct, add it's size and call into resulting address. Sufficiently smart optimizer can spot that we dereference a pointer pointing outside of the object. This probably has undefined semantics and can be replaced with unreachable? It's also a missed opportunity for optimizations -- if optimizer knew where outside-of-the-struct pointer is really going it could make direct call instead of indirect -- however, I don't know if this is a big deal. Eugene On Tue, Jun 8, 2010 at 3:50 PM, Sebastian Redl <sebastian.redl at getdesigned.at> wrote:> > On Tue, 8 Jun 2010 11:42:41 +0100, David Terei <davidterei at gmail.com> > wrote: >> Hi All, >> >> The GHC developers would like to add support to llvm to enable the >> order that code and data are laid out in, in the resulting assembly >> code produced by llvm to be defined by the user. The reason we would >> like to have this feature is explained in the blog post on GHC's use >> of llvm here: >> http://blog.llvm.org/2010/05/glasgow-haskell-compiler-and-llvm.html, >> specifically under the title, 'Problems with backend'. >> > > Whichever way is chosen, the ability to reorder and intermingle functions > and data arbitrarily is interesting to more than just GHC. In particular, I > would like to point out the efforts by Mozilla to make Firefox startup > faster, which essentially came down to reordering stuff in the executables > so that everything is ordered by the sequence of accesses during program > startup. This means that programs can be read sequentially from the front > to the end, thus reducing I/O latency. > > Tools for automating this process would probably benefit from being able > to specify the layout this way. > > Sebastian > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Apparently Analagous Threads
- [LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
- [LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
- [LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
- [LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
- [LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)