David Terei
2010-Jun-15 13:18 UTC
[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
Hi all, Just wanted to report that I've found a second way to achieve data/code layout (the first being the linker script that Eugene mentioned). The key is that gnu as supports a feature called subsections. http://sourceware.org/binutils/docs-2.20/as/Sub_002dSections.html#Sub_002dSections The way this works is that you can put stuff into a section like '.text 2', where 2 is a subsection of .text When run, 'as' orders the subsections. So all you need to do is arrange for the sidetable to be in section '.text n' and the code in section '.text n+1'. Each sidetable and its code goes in its own subsection. The nice thing is, this is purely a gnu as feature. When it compiles the assembly to object code, the subsections aren't present in the object code, so you don't get 100's of sections that take up space and slow down linking. There is one complication though. LLVM (and GCC as well) don't support subsections. While you can define what section globals and functions are in, this doesn't support defining the subsection. If you say to LLVM, put function f in section "text 12", it produces assembly like: .section text 12,"rw" @progbits f: [..] Which causes gas to spit out a syntax error. Gas only allows using subsections through a very defined syntax, so it needs to be: .text 12 f: [...] We can convert between them though with just a simple regex. We are going to use this approach for the moment in GHC, we've tested it and its working great so far. I prefer this method over the linker script as implementing the linker script approach would affect all the backends GHC supports while this approach is contained to the LLVM backend. I'm still planning on adding support to LLVM for supporting side tables in some manner so we can just depend on pure LLVM. Cheers, David On 10 June 2010 18:08, Andrew Lenharth <andrewl at lenharth.org> wrote:> On Thu, Jun 10, 2010 at 11:34 AM, David Terei <davidterei at gmail.com> wrote: >> Its good to see that a feature of this nature would be useful to a >> whole range of people, I wasn't aware of that. >> >> On 9 June 2010 22:40, Andrew Lenharth <andrewl at lenharth.org> wrote: >>> My argument amounts to express side tables as side tables in the IR >>> rather than as an ordering on globals. I think that would simplify >>> the backend (a side table is something you discover form the function >>> rather than having to check another global). Also, if well specified, >>> I think you could allow basic block labels into structures which makes >>> them more interesting for other uses. >> >> Sure. I wasn't set on the third approach I suggested, which is to have >> them expressed as side tables in the IR as I didn't realise other >> users would be interested in them so I didn't think it would be >> appropriate to add new language constructs for one user. I don't think >> it would simpler to implement in the backend though and this approach >> would need changes to the frontend, so a lot more work. > > The backend already can sort of do this with the GCMetadataPrinter. > Generalizing that to arbitrary side tables might be easier than adding > a new construct (granted sidetables might not replace the ability to > output assembly by that class, but they might do a lot of the heavy > lifting). Since GC lowering happens on the IR level (from the docs I > looked at, I haven't personally dealt with GC yet), it maybe possible > to do a lot of lowering to generalized tables rather than complex > GCMetadataPrinter implementations. This is just speculation on my > part though. This is one of the reasons I thought labels in the > constant structs could be handy. Perhaps a general side table > representation in the backend could be used by EH too? > > Andrew > >> What I am hoping someone may be able to give a answer to though is >> what issues there may be if the second approach was taken (using the >> special glob var)? Would the optimiser be tempted at some point to >> replace a load instruction to an unknown address created by a negative >> offset from a function with unreachable for example as Eugene >> suggested may be possible? >> >> Also, what are you gaining going with the third approach? I guess the >> optimiser could do things like constant propogation using the third >> approach but not the second although I think thats unlikely do give >> much benefit in the kind of code GHC produces but there is everyone >> else to think of :). >> >> Thanks for all the responses though, I'm going to start playing around >> with some code and see what happens. >> >
Eugene Toder
2010-Jun-15 22:08 UTC
[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
Subsections is a very good idea. You can even do without post-processing by using carefully crafted section names, e.g. __attribute__((section(".text,\"ax\", at progbits\n\t.subsection 1 #"))) void foo() { } (Note that you need ".subsection n" commands on ELF targets and ".section name, n" commands on COFF targets; seems that the latter was supported on all targets in old versions of gas, but not any longer). Eugene On Tue, Jun 15, 2010 at 2:18 PM, David Terei <davidterei at gmail.com> wrote:> Hi all, > > Just wanted to report that I've found a second way to achieve > data/code layout (the first being the linker script that Eugene > mentioned). > > The key is that gnu as supports a feature called subsections. > > http://sourceware.org/binutils/docs-2.20/as/Sub_002dSections.html#Sub_002dSections > > The way this works is that you can put stuff into a section like > '.text 2', where 2 is a subsection of .text When run, 'as' orders the > subsections. So all you need to do is arrange for the sidetable to be > in section '.text n' and the code in section '.text n+1'. Each > sidetable and its code goes in its own subsection. The nice thing is, > this is purely a gnu as feature. When it compiles the assembly to > object code, the subsections aren't present in the object code, so you > don't get 100's of sections that take up space and slow down linking. > > There is one complication though. LLVM (and GCC as well) don't support > subsections. While you can define what section globals and functions > are in, this doesn't support defining the subsection. If you say to > LLVM, put function f in section "text 12", it produces assembly like: > > .section text 12,"rw" @progbits > f: > [..] > > Which causes gas to spit out a syntax error. Gas only allows using > subsections through a very defined syntax, so it needs to be: > > .text 12 > f: > [...] > > We can convert between them though with just a simple regex. > > We are going to use this approach for the moment in GHC, we've tested > it and its working great so far. I prefer this method over the linker > script as implementing the linker script approach would affect all the > backends GHC supports while this approach is contained to the LLVM > backend. > > I'm still planning on adding support to LLVM for supporting side > tables in some manner so we can just depend on pure LLVM. > > Cheers, > David > > On 10 June 2010 18:08, Andrew Lenharth <andrewl at lenharth.org> wrote: >> On Thu, Jun 10, 2010 at 11:34 AM, David Terei <davidterei at gmail.com> wrote: >>> Its good to see that a feature of this nature would be useful to a >>> whole range of people, I wasn't aware of that. >>> >>> On 9 June 2010 22:40, Andrew Lenharth <andrewl at lenharth.org> wrote: >>>> My argument amounts to express side tables as side tables in the IR >>>> rather than as an ordering on globals. I think that would simplify >>>> the backend (a side table is something you discover form the function >>>> rather than having to check another global). Also, if well specified, >>>> I think you could allow basic block labels into structures which makes >>>> them more interesting for other uses. >>> >>> Sure. I wasn't set on the third approach I suggested, which is to have >>> them expressed as side tables in the IR as I didn't realise other >>> users would be interested in them so I didn't think it would be >>> appropriate to add new language constructs for one user. I don't think >>> it would simpler to implement in the backend though and this approach >>> would need changes to the frontend, so a lot more work. >> >> The backend already can sort of do this with the GCMetadataPrinter. >> Generalizing that to arbitrary side tables might be easier than adding >> a new construct (granted sidetables might not replace the ability to >> output assembly by that class, but they might do a lot of the heavy >> lifting). Since GC lowering happens on the IR level (from the docs I >> looked at, I haven't personally dealt with GC yet), it maybe possible >> to do a lot of lowering to generalized tables rather than complex >> GCMetadataPrinter implementations. This is just speculation on my >> part though. This is one of the reasons I thought labels in the >> constant structs could be handy. Perhaps a general side table >> representation in the backend could be used by EH too? >> >> Andrew >> >>> What I am hoping someone may be able to give a answer to though is >>> what issues there may be if the second approach was taken (using the >>> special glob var)? Would the optimiser be tempted at some point to >>> replace a load instruction to an unknown address created by a negative >>> offset from a function with unreachable for example as Eugene >>> suggested may be possible? >>> >>> Also, what are you gaining going with the third approach? I guess the >>> optimiser could do things like constant propogation using the third >>> approach but not the second although I think thats unlikely do give >>> much benefit in the kind of code GHC produces but there is everyone >>> else to think of :). >>> >>> Thanks for all the responses though, I'm going to start playing around >>> with some code and see what happens. >>> >> >
David Terei
2010-Jun-15 23:27 UTC
[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
On 15 June 2010 23:08, Eugene Toder <eltoder at gmail.com> wrote:> Subsections is a very good idea. You can even do without > post-processing by using carefully crafted section names, e.g. > > __attribute__((section(".text,\"ax\", at progbits\n\t.subsection 1 #"))) > void foo() > { > }hehe cool, this is great news. Thanks for letting me know. David
Anton Korobeynikov
2010-Jun-16 12:43 UTC
[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
> (Note that you need ".subsection n" commands on ELF targets and > ".section name, n" commands on COFF targets; seems that the latter was > supported on all targets in old versions of gas, but not any longer).Btw, will this work on Mach-O? -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University
Anton Korobeynikov
2010-Jun-16 12:45 UTC
[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
> There is one complication though. LLVM (and GCC as well) don't support > subsections. While you can define what section globals and functions > are in, this doesn't support defining the subsection. If you say to > LLVM, put function f in section "text 12", it produces assembly like: > > .section text 12,"rw" @progbitsThis seems easy to fix during the asmprinting. E.g. if section name is an integer from 0 till 8192 => emit as an subsection. Side q: what will you do when you run out of subsections? -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University
Andrew Lenharth
2010-Jun-16 12:58 UTC
[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
On Wed, Jun 16, 2010 at 7:45 AM, Anton Korobeynikov <anton at korobeynikov.info> wrote:>> There is one complication though. LLVM (and GCC as well) don't support >> subsections. While you can define what section globals and functions >> are in, this doesn't support defining the subsection. If you say to >> LLVM, put function f in section "text 12", it produces assembly like: >> >> .section text 12,"rw" @progbits > This seems easy to fix during the asmprinting. E.g. if section name is > an integer from 0 till 8192 => emit as an subsection. Side q: what > will you do when you run out of subsections?It seems easy to fix for functions, but for globals you already have to overwrite their section in LLVM so the section won't be just an integer. Andrew> -- > With best regards, Anton Korobeynikov > Faculty of Mathematics and Mechanics, Saint Petersburg State University >
Eugene Toder
2010-Jun-16 20:23 UTC
[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
I have no idea how gnu toolchain works on Mach-O platforms. My guess is that it goes via COFF path here, because the other path is ELF-specific. As Andrew already said, for the table we need both section and subsection. To solve the problem with running out, we can put each function into a separate section (C++ compilers were doing this for a while) and only use 2 subsections per section: 0 for the table and 1 for function. Eugene On Wed, Jun 16, 2010 at 1:45 PM, Anton Korobeynikov <anton at korobeynikov.info> wrote:>> There is one complication though. LLVM (and GCC as well) don't support >> subsections. While you can define what section globals and functions >> are in, this doesn't support defining the subsection. If you say to >> LLVM, put function f in section "text 12", it produces assembly like: >> >> .section text 12,"rw" @progbits > This seems easy to fix during the asmprinting. E.g. if section name is > an integer from 0 till 8192 => emit as an subsection. Side q: what > will you do when you run out of subsections? > > -- > With best regards, Anton Korobeynikov > Faculty of Mathematics and Mechanics, Saint Petersburg State University > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Sam Martin
2010-Jun-17 11:00 UTC
[LLVMdev] Adding support to LLVM for data & code layout (neededby GHC)
Hi, Does anyone know whether subsections are specific to the gnu assembler or whether they are supported by other assemblers, such as masm? Or put another way, will this limit the assembly output to the gnu toolchain? Cheers, Sam -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of David Terei Sent: 15 June 2010 14:18 To: Andrew Lenharth Cc: Simon Marlow; LLVM Developers Mailing List Subject: Re: [LLVMdev] Adding support to LLVM for data & code layout (neededby GHC) Hi all, Just wanted to report that I've found a second way to achieve data/code layout (the first being the linker script that Eugene mentioned). The key is that gnu as supports a feature called subsections. http://sourceware.org/binutils/docs-2.20/as/Sub_002dSections.html#Sub_00 2dSections <snip>
David Terei
2010-Jun-17 17:25 UTC
[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
On 15 June 2010 23:08, Eugene Toder <eltoder at gmail.com> wrote:> Subsections is a very good idea. You can even do without > post-processing by using carefully crafted section names, e.g. > > __attribute__((section(".text,\"ax\", at progbits\n\t.subsection 1 #"))) > void foo() > { > }There is one problem with the section name used here, 'llvm-as' doesn't support it. LLVM itself does, so if you compile the above with clang then it works fine. If you try to use that section name in a .ll file and call one of the tools it fails as the parser doesn't support escaping quotes. It also doesn't support interpreting '\n' as a new line and outputs each character into the assembly file. Anyway you can get around this by using a section name like: ".text;.subsection 1 #" instead. If your using the LLVM API then this isn't a problem. David
Mark Lacey
2010-Jun-18 03:34 UTC
[LLVMdev] Adding support to LLVM for data & code layout (neededby GHC)
The Microsoft COFF format and linker support what they call section groupings, which are very similar. You can have an object file with a section like ".text$foo" and another (or the same) with ".text$bar" and they will be ordered alphabetically in the final image (and merged into .text, after all of the "regular" .text sections). This is documented in the Microsoft COFF documentation. Mark On Thu, Jun 17, 2010 at 4:00 AM, Sam Martin <sam.martin at geomerics.com> wrote:> Hi, > > Does anyone know whether subsections are specific to the gnu assembler > or whether they are supported by other assemblers, such as masm? > > Or put another way, will this limit the assembly output to the gnu > toolchain? > > Cheers, > Sam > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of David Terei > Sent: 15 June 2010 14:18 > To: Andrew Lenharth > Cc: Simon Marlow; LLVM Developers Mailing List > Subject: Re: [LLVMdev] Adding support to LLVM for data & code layout > (neededby GHC) > > Hi all, > > Just wanted to report that I've found a second way to achieve > data/code layout (the first being the linker script that Eugene > mentioned). > > The key is that gnu as supports a feature called subsections. > > http://sourceware.org/binutils/docs-2.20/as/Sub_002dSections.html#Sub_00 > 2dSections > > <snip> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
Reasonably Related Threads
- [LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
- [LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
- [LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)
- [LLVMdev] Adding support to LLVM for data & code layout (neededby GHC)
- [LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)