thr3ads.net - llvm dev - [LLVMdev] Adding support to LLVM for data & code layout (needed by GHC) [Jun 2010]

If this information is useful, please help other people find it:
Share via:

David Terei

2010-Jun-15 13:18 UTC

[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

Hi all,

Just wanted to report that I've found a second way to achieve
data/code layout (the first being the linker script that Eugene
mentioned).

The key is that gnu as supports a feature called subsections.

http://sourceware.org/binutils/docs-2.20/as/Sub_002dSections.html#Sub_002dSections

The way this works is that you can put stuff into a section like
'.text 2', where 2 is a subsection of .text When run, 'as'
orders the
subsections. So all you need to do is arrange for the sidetable to be
in section '.text n' and the code in section '.text n+1'. Each
sidetable and its code goes in its own subsection. The nice thing is,
this is purely a gnu as feature. When it compiles the assembly to
object code, the subsections aren't present in the object code, so you
don't get 100's of sections that take up space and slow down linking.

There is one complication though. LLVM (and GCC as well) don't support
subsections. While you can define what section globals and functions
are in, this doesn't support defining the subsection. If you say to
LLVM, put function f in section "text 12", it produces assembly like:

.section text 12,"rw" @progbits
f:
 [..]

Which causes gas to spit out a syntax error. Gas only allows using
subsections through a very defined syntax, so it needs to be:

.text 12
f:
  [...]

We can convert between them though with just a simple regex.

We are going to use this approach for the moment in GHC, we've tested
it and its working great so far. I prefer this method over the linker
script as implementing the linker script approach would affect all the
backends GHC supports while this approach is contained to the LLVM
backend.

I'm still planning on adding support to LLVM for supporting side
tables in some manner so we can just depend on pure LLVM.

Cheers,
David

On 10 June 2010 18:08, Andrew Lenharth <andrewl at lenharth.org>
wrote:> On Thu, Jun 10, 2010 at 11:34 AM, David Terei <davidterei at
gmail.com> wrote:
>> Its good to see that a feature of this nature would be useful to a
>> whole range of people, I wasn't aware of that.
>>
>> On 9 June 2010 22:40, Andrew Lenharth <andrewl at lenharth.org>
wrote:
>>> My argument amounts to express side tables as side tables in the IR
>>> rather than as an ordering on globals.  I think that would simplify
>>> the backend (a side table is something you discover form the
function
>>> rather than having to check another global).  Also, if well
specified,
>>> I think you could allow basic block labels into structures which
makes
>>> them more interesting for other uses.
>>
>> Sure. I wasn't set on the third approach I suggested, which is to
have
>> them expressed as side tables in the IR as I didn't realise other
>> users would be interested in them so I didn't think it would be
>> appropriate to add new language constructs for one user. I don't
think
>> it would simpler to implement in the backend though and this approach
>> would need changes to the frontend, so a lot more work.
>
> The backend already can sort of do this with the GCMetadataPrinter.
> Generalizing that to arbitrary side tables might be easier than adding
> a new construct (granted sidetables might not replace the ability to
> output assembly by that class, but they might do a lot of the heavy
> lifting).  Since GC lowering happens on the IR level (from the docs I
> looked at, I haven't personally dealt with GC yet), it maybe possible
> to do a lot of lowering to generalized tables rather than complex
> GCMetadataPrinter implementations.  This is just speculation on my
> part though.  This is one of the reasons I thought labels in the
> constant structs could be handy.  Perhaps a general side table
> representation in the backend could be used by EH too?
>
> Andrew
>
>> What I am hoping someone may be able to give a answer to though is
>> what issues there may be if the second approach was taken (using the
>> special glob var)? Would the optimiser be tempted at some point to
>> replace a load instruction to an unknown address created by a negative
>> offset from a function with unreachable for example as Eugene
>> suggested may be possible?
>>
>> Also, what are you gaining going with the third approach? I guess the
>> optimiser could do things like constant propogation using the third
>> approach but not the second although I think thats unlikely do give
>> much benefit in the kind of code GHC produces but there is everyone
>> else to think of :).
>>
>> Thanks for all the responses though, I'm going to start playing
around
>> with some code and see what happens.
>>
>

Eugene Toder

2010-Jun-15 22:08 UTC

head link

[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

Subsections is a very good idea. You can even do without
post-processing by using carefully crafted section names, e.g.

__attribute__((section(".text,\"ax\", at progbits\n\t.subsection
1 #")))
void foo()
{
}

(Note that you need ".subsection n" commands on ELF targets and
".section name, n" commands on COFF targets; seems that the latter was
supported on all targets in old versions of gas, but not any longer).

Eugene

On Tue, Jun 15, 2010 at 2:18 PM, David Terei <davidterei at gmail.com>
wrote:> Hi all,
>
> Just wanted to report that I've found a second way to achieve
> data/code layout (the first being the linker script that Eugene
> mentioned).
>
> The key is that gnu as supports a feature called subsections.
>
>
http://sourceware.org/binutils/docs-2.20/as/Sub_002dSections.html#Sub_002dSections
>
> The way this works is that you can put stuff into a section like
> '.text 2', where 2 is a subsection of .text When run, 'as'
orders the
> subsections. So all you need to do is arrange for the sidetable to be
> in section '.text n' and the code in section '.text n+1'.
Each
> sidetable and its code goes in its own subsection. The nice thing is,
> this is purely a gnu as feature. When it compiles the assembly to
> object code, the subsections aren't present in the object code, so you
> don't get 100's of sections that take up space and slow down
linking.
>
> There is one complication though. LLVM (and GCC as well) don't support
> subsections. While you can define what section globals and functions
> are in, this doesn't support defining the subsection. If you say to
> LLVM, put function f in section "text 12", it produces assembly
like:
>
> .section text 12,"rw" @progbits
> f:
>  [..]
>
> Which causes gas to spit out a syntax error. Gas only allows using
> subsections through a very defined syntax, so it needs to be:
>
> .text 12
> f:
>  [...]
>
> We can convert between them though with just a simple regex.
>
> We are going to use this approach for the moment in GHC, we've tested
> it and its working great so far. I prefer this method over the linker
> script as implementing the linker script approach would affect all the
> backends GHC supports while this approach is contained to the LLVM
> backend.
>
> I'm still planning on adding support to LLVM for supporting side
> tables in some manner so we can just depend on pure LLVM.
>
> Cheers,
> David
>
> On 10 June 2010 18:08, Andrew Lenharth <andrewl at lenharth.org>
wrote:
>> On Thu, Jun 10, 2010 at 11:34 AM, David Terei <davidterei at
gmail.com> wrote:
>>> Its good to see that a feature of this nature would be useful to a
>>> whole range of people, I wasn't aware of that.
>>>
>>> On 9 June 2010 22:40, Andrew Lenharth <andrewl at
lenharth.org> wrote:
>>>> My argument amounts to express side tables as side tables in
the IR
>>>> rather than as an ordering on globals.  I think that would
simplify
>>>> the backend (a side table is something you discover form the
function
>>>> rather than having to check another global).  Also, if well
specified,
>>>> I think you could allow basic block labels into structures
which makes
>>>> them more interesting for other uses.
>>>
>>> Sure. I wasn't set on the third approach I suggested, which is
to have
>>> them expressed as side tables in the IR as I didn't realise
other
>>> users would be interested in them so I didn't think it would be
>>> appropriate to add new language constructs for one user. I
don't think
>>> it would simpler to implement in the backend though and this
approach
>>> would need changes to the frontend, so a lot more work.
>>
>> The backend already can sort of do this with the GCMetadataPrinter.
>> Generalizing that to arbitrary side tables might be easier than adding
>> a new construct (granted sidetables might not replace the ability to
>> output assembly by that class, but they might do a lot of the heavy
>> lifting).  Since GC lowering happens on the IR level (from the docs I
>> looked at, I haven't personally dealt with GC yet), it maybe
possible
>> to do a lot of lowering to generalized tables rather than complex
>> GCMetadataPrinter implementations.  This is just speculation on my
>> part though.  This is one of the reasons I thought labels in the
>> constant structs could be handy.  Perhaps a general side table
>> representation in the backend could be used by EH too?
>>
>> Andrew
>>
>>> What I am hoping someone may be able to give a answer to though is
>>> what issues there may be if the second approach was taken (using
the
>>> special glob var)? Would the optimiser be tempted at some point to
>>> replace a load instruction to an unknown address created by a
negative
>>> offset from a function with unreachable for example as Eugene
>>> suggested may be possible?
>>>
>>> Also, what are you gaining going with the third approach? I guess
the
>>> optimiser could do things like constant propogation using the third
>>> approach but not the second although I think thats unlikely do give
>>> much benefit in the kind of code GHC produces but there is everyone
>>> else to think of :).
>>>
>>> Thanks for all the responses though, I'm going to start playing
around
>>> with some code and see what happens.
>>>
>>
>

David Terei

2010-Jun-15 23:27 UTC

head link

[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

On 15 June 2010 23:08, Eugene Toder <eltoder at gmail.com>
wrote:> Subsections is a very good idea. You can even do without
> post-processing by using carefully crafted section names, e.g.
>
> __attribute__((section(".text,\"ax\", at
progbits\n\t.subsection 1 #")))
> void foo()
> {
> }
hehe cool, this is great news. Thanks for letting me know.

David

Anton Korobeynikov

2010-Jun-16 12:43 UTC

head link

[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

> (Note that you need ".subsection n" commands on ELF targets and
> ".section name, n" commands on COFF targets; seems that the
latter was
> supported on all targets in old versions of gas, but not any longer).Btw, will this work on Mach-O?

-- 
With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University

Anton Korobeynikov

2010-Jun-16 12:45 UTC

head link

[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

> There is one complication though. LLVM (and GCC as well) don't support
> subsections. While you can define what section globals and functions
> are in, this doesn't support defining the subsection. If you say to
> LLVM, put function f in section "text 12", it produces assembly
like:
>
> .section text 12,"rw" @progbitsThis seems easy to fix during the asmprinting. E.g. if section name is
an integer from 0 till 8192 => emit as an subsection. Side q: what
will you do when you run out of subsections?

-- 
With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University

Andrew Lenharth

2010-Jun-16 12:58 UTC

head link

[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

On Wed, Jun 16, 2010 at 7:45 AM, Anton Korobeynikov
<anton at korobeynikov.info> wrote:>> There is one complication though. LLVM (and GCC as well) don't
support
>> subsections. While you can define what section globals and functions
>> are in, this doesn't support defining the subsection. If you say to
>> LLVM, put function f in section "text 12", it produces
assembly like:
>>
>> .section text 12,"rw" @progbits
> This seems easy to fix during the asmprinting. E.g. if section name is
> an integer from 0 till 8192 => emit as an subsection. Side q: what
> will you do when you run out of subsections?
It seems easy to fix for functions, but for globals you already have
to overwrite their section in LLVM so the section won't be just an
integer.

Andrew
> --
> With best regards, Anton Korobeynikov
> Faculty of Mathematics and Mechanics, Saint Petersburg State University
>

Eugene Toder

2010-Jun-16 20:23 UTC

head link

[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

I have no idea how gnu toolchain works on Mach-O platforms. My guess
is that it goes via COFF path here, because the other path is
ELF-specific.

As Andrew already said, for the table we need both section and
subsection. To solve the problem with running out, we can put each
function into a separate section (C++ compilers were doing this for a
while) and only use 2 subsections per section: 0 for the table and 1
for function.

Eugene

On Wed, Jun 16, 2010 at 1:45 PM, Anton Korobeynikov
<anton at korobeynikov.info> wrote:>> There is one complication though. LLVM (and GCC as well) don't
support
>> subsections. While you can define what section globals and functions
>> are in, this doesn't support defining the subsection. If you say to
>> LLVM, put function f in section "text 12", it produces
assembly like:
>>
>> .section text 12,"rw" @progbits
> This seems easy to fix during the asmprinting. E.g. if section name is
> an integer from 0 till 8192 => emit as an subsection. Side q: what
> will you do when you run out of subsections?
>
> --
> With best regards, Anton Korobeynikov
> Faculty of Mathematics and Mechanics, Saint Petersburg State University
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Sam Martin

2010-Jun-17 11:00 UTC

head link

[LLVMdev] Adding support to LLVM for data & code layout (neededby GHC)

Hi,

Does anyone know whether subsections are specific to the gnu assembler
or whether they are supported by other assemblers, such as masm? 

Or put another way, will this limit the assembly output to the gnu
toolchain?

Cheers,
Sam

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
On Behalf Of David Terei
Sent: 15 June 2010 14:18
To: Andrew Lenharth
Cc: Simon Marlow; LLVM Developers Mailing List
Subject: Re: [LLVMdev] Adding support to LLVM for data & code layout
(neededby GHC)

Hi all,

Just wanted to report that I've found a second way to achieve
data/code layout (the first being the linker script that Eugene
mentioned).

The key is that gnu as supports a feature called subsections.

http://sourceware.org/binutils/docs-2.20/as/Sub_002dSections.html#Sub_00
2dSections

<snip>

David Terei

2010-Jun-17 17:25 UTC

head link

[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

On 15 June 2010 23:08, Eugene Toder <eltoder at gmail.com>
wrote:> Subsections is a very good idea. You can even do without
> post-processing by using carefully crafted section names, e.g.
>
> __attribute__((section(".text,\"ax\", at
progbits\n\t.subsection 1 #")))
> void foo()
> {
> }
There is one problem with the section name used here, 'llvm-as'
doesn't support it. LLVM itself does, so if you compile the above with
clang then it works fine. If you try to use that section name in a .ll
file and call one of the tools it fails as the parser doesn't support
escaping quotes. It also doesn't support interpreting '\n' as a new
line and outputs each character into the assembly file. Anyway you can
get around this by using a section name like:

".text;.subsection 1 #"

instead. If your using the LLVM API then this isn't a problem.

David

Mark Lacey

2010-Jun-18 03:34 UTC

head link

[LLVMdev] Adding support to LLVM for data & code layout (neededby GHC)

The Microsoft COFF format and linker support what they call section
groupings, which are very similar. You can have an object file with a
section like ".text$foo" and another (or the same) with
".text$bar"
and they will be ordered alphabetically in the final image (and merged
into .text, after all of the "regular" .text sections).

This is documented in the Microsoft COFF documentation.

Mark

On Thu, Jun 17, 2010 at 4:00 AM, Sam Martin <sam.martin at geomerics.com>
wrote:> Hi,
>
> Does anyone know whether subsections are specific to the gnu assembler
> or whether they are supported by other assemblers, such as masm?
>
> Or put another way, will this limit the assembly output to the gnu
> toolchain?
>
> Cheers,
> Sam
>
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
> On Behalf Of David Terei
> Sent: 15 June 2010 14:18
> To: Andrew Lenharth
> Cc: Simon Marlow; LLVM Developers Mailing List
> Subject: Re: [LLVMdev] Adding support to LLVM for data & code layout
> (neededby GHC)
>
> Hi all,
>
> Just wanted to report that I've found a second way to achieve
> data/code layout (the first being the linker script that Eugene
> mentioned).
>
> The key is that gnu as supports a feature called subsections.
>
> http://sourceware.org/binutils/docs-2.20/as/Sub_002dSections.html#Sub_00
> 2dSections
>
> <snip>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Jun 2010 - [LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

[LLVMdev] Adding support to LLVM for data & code layout (neededby GHC)

[LLVMdev] Adding support to LLVM for data & code layout (needed by GHC)

[LLVMdev] Adding support to LLVM for data & code layout (neededby GHC)

Reasonably Related Threads