Nick Kledzik
2014-Aug-25 16:54 UTC
[LLVMdev] How to tell whether a GlobalValue is user-defined
On Aug 25, 2014, at 8:26 AM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:> On 21 August 2014 19:32, Akira Hatanaka <ahatanak at gmail.com> wrote: >> Is there a way to distinguish between GlobalValues that are user-defined and >> those that are compiler-defined? I am looking for a function that I can use >> to tell if a GlobalValue is user-defined , something like >> "GlobalValue::isUserDefined", which returns true for user-defined >> GlobalValue. >> >> I'm trying to make changes to prevent llvm from placing user-defined >> constant arrays in the merge able constant sections. Currently, clang places >> 16-byte constant arrays that are marked "unnamed_addr" into __literal16 for >> macho (see following example). >> >> $ cat test1.c >> >> static const int s_dashArraysSize1[4] = {2, 2, 4, 6}; >> >> >> int foo1(int a) { >> >> return s_dashArraysSize1[a]; >> >> } >> >> >> $ clang test1.c -S -O3 -o - | tail -n 10 >> >> .section __TEXT,__literal16,16byte_literals >> >> .align 4 ## @s_dashArraysSize1 >> >> _s_dashArraysSize1: >> >> .long 2 ## 0x2 >> >> .long 2 ## 0x2 >> >> .long 4 ## 0x4 >> >> .long 6 ## 0x6 >> >> >> >> This is not desirable because macho linker wasn't originally designed to >> handle user-defined symbols in those sections and having to handle them >> complicates the linker. Also, there is no benefit in doing so, since the >> linker currently doesn't try to merge user-defined variables anyway. > > What does "user-defined" means in here? Since the linker can is > involved, I assume it has something to do with the final symbol name. > > At the linker level (symbol names, sections, atoms, relocations, etc), > what exactly that is not supported?The literalN sections were developed long ago to support coalescing of unnamed constants like 9.897 in source code for architectures that could not embed large constants in instructions. The linker could knew how to break up the section (e.g. __literal8 is always 8 byte chunks) and coalesce copies by content. ~6 years ago we discovered that gcc would sometimes put user named constants into the literal sections (e.g. const double foo 9.897). This was an issue because C language rules say &a != &b, but if ‘a’ and ‘b’ are the contain the same literal value from different translation units, the linker could merge them to the same address. For whatever reason, we could not fix gcc, so we changed to linker to never coalesce items in literal sections if there was a (non ‘L’ and non ‘l’) symbol on it. The current state of LLVM is that is it going out of its way to move “named” constants from __const section to __literalN section. But the only possible advantage to doing that is that the hopes that the linker might coalesce it. But the linker won’t coalesce it because it is named. So, is there a way to keep the named values in the __const section? -Nick
David Majnemer
2014-Aug-25 17:37 UTC
[LLVMdev] How to tell whether a GlobalValue is user-defined
On Mon, Aug 25, 2014 at 9:54 AM, Nick Kledzik <kledzik at apple.com> wrote:> > On Aug 25, 2014, at 8:26 AM, Rafael Espíndola <rafael.espindola at gmail.com> > wrote: > > > On 21 August 2014 19:32, Akira Hatanaka <ahatanak at gmail.com> wrote: > >> Is there a way to distinguish between GlobalValues that are > user-defined and > >> those that are compiler-defined? I am looking for a function that I can > use > >> to tell if a GlobalValue is user-defined , something like > >> "GlobalValue::isUserDefined", which returns true for user-defined > >> GlobalValue. > >> > >> I'm trying to make changes to prevent llvm from placing user-defined > >> constant arrays in the merge able constant sections. Currently, clang > places > >> 16-byte constant arrays that are marked "unnamed_addr" into __literal16 > for > >> macho (see following example). > >> > >> $ cat test1.c > >> > >> static const int s_dashArraysSize1[4] = {2, 2, 4, 6}; > >> > >> > >> int foo1(int a) { > >> > >> return s_dashArraysSize1[a]; > >> > >> } > >> > >> > >> $ clang test1.c -S -O3 -o - | tail -n 10 > >> > >> .section __TEXT,__literal16,16byte_literals > >> > >> .align 4 ## @s_dashArraysSize1 > >> > >> _s_dashArraysSize1: > >> > >> .long 2 ## 0x2 > >> > >> .long 2 ## 0x2 > >> > >> .long 4 ## 0x4 > >> > >> .long 6 ## 0x6 > >> > >> > >> > >> This is not desirable because macho linker wasn't originally designed to > >> handle user-defined symbols in those sections and having to handle them > >> complicates the linker. Also, there is no benefit in doing so, since the > >> linker currently doesn't try to merge user-defined variables anyway. > > > > What does "user-defined" means in here? Since the linker can is > > involved, I assume it has something to do with the final symbol name. > > > > At the linker level (symbol names, sections, atoms, relocations, etc), > > what exactly that is not supported? > > > The literalN sections were developed long ago to support coalescing of > unnamed constants like 9.897 in source code for architectures that could > not embed large constants in instructions. The linker could knew how to > break up the section (e.g. __literal8 is always 8 byte chunks) and coalesce > copies by content. > > ~6 years ago we discovered that gcc would sometimes put user named > constants into the literal sections (e.g. const double foo 9.897). This > was an issue because C language rules say &a != &b, but if ‘a’ and ‘b’ are > the contain the same literal value from different translation units, the > linker could merge them to the same address. For whatever reason, we could > not fix gcc, so we changed to linker to never coalesce items in literal > sections if there was a (non ‘L’ and non ‘l’) symbol on it. > > The current state of LLVM is that is it going out of its way to move > “named” constants from __const section to __literalN section. But the only > possible advantage to doing that is that the hopes that the linker might > coalesce it. But the linker won’t coalesce it because it is named. So, is > there a way to keep the named values in the __const section? >I believe the following patch would be the minimal needed to do this, there is some dead code that could be removed as well. diff --git a/lib/CodeGen/TargetLoweringObjectFileImpl.cpp b/lib/CodeGen/TargetLoweringObjectFileImpl.cpp index 55e1756..bf78ce1 100644 --- a/lib/CodeGen/TargetLoweringObjectFileImpl.cpp +++ b/lib/CodeGen/TargetLoweringObjectFileImpl.cpp @@ -667,12 +667,6 @@ TargetLoweringObjectFileMachO::getSectionForConstant(SectionKind Kind, if (Kind.isDataRel() || Kind.isReadOnlyWithRel()) return ConstDataSection; - if (Kind.isMergeableConst4()) - return FourByteConstantSection; - if (Kind.isMergeableConst8()) - return EightByteConstantSection; - if (Kind.isMergeableConst16()) - return SixteenByteConstantSection; return ReadOnlySection; // .const }> > -Nick > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140825/f57f9908/attachment.html>
Akira Hatanaka
2014-Aug-25 18:08 UTC
[LLVMdev] How to tell whether a GlobalValue is user-defined
I think this is preventing constants in the constant pool (e.g., floating point literal) from being placed in the mergeable constant sections? We want to keep the const arrays declared in the program (s_dashArraySize1) out of the mergeable constant sections, but don't mind placing constants in the constant pool or constant arrays that the compiler defines, such as switch.table and memset_pattern, in the mergeable sections. On Mon, Aug 25, 2014 at 10:37 AM, David Majnemer <david.majnemer at gmail.com> wrote:> On Mon, Aug 25, 2014 at 9:54 AM, Nick Kledzik <kledzik at apple.com> wrote: > >> >> On Aug 25, 2014, at 8:26 AM, Rafael Espíndola <rafael.espindola at gmail.com> >> wrote: >> >> > On 21 August 2014 19:32, Akira Hatanaka <ahatanak at gmail.com> wrote: >> >> Is there a way to distinguish between GlobalValues that are >> user-defined and >> >> those that are compiler-defined? I am looking for a function that I >> can use >> >> to tell if a GlobalValue is user-defined , something like >> >> "GlobalValue::isUserDefined", which returns true for user-defined >> >> GlobalValue. >> >> >> >> I'm trying to make changes to prevent llvm from placing user-defined >> >> constant arrays in the merge able constant sections. Currently, clang >> places >> >> 16-byte constant arrays that are marked "unnamed_addr" into >> __literal16 for >> >> macho (see following example). >> >> >> >> $ cat test1.c >> >> >> >> static const int s_dashArraysSize1[4] = {2, 2, 4, 6}; >> >> >> >> >> >> int foo1(int a) { >> >> >> >> return s_dashArraysSize1[a]; >> >> >> >> } >> >> >> >> >> >> $ clang test1.c -S -O3 -o - | tail -n 10 >> >> >> >> .section __TEXT,__literal16,16byte_literals >> >> >> >> .align 4 ## @s_dashArraysSize1 >> >> >> >> _s_dashArraysSize1: >> >> >> >> .long 2 ## 0x2 >> >> >> >> .long 2 ## 0x2 >> >> >> >> .long 4 ## 0x4 >> >> >> >> .long 6 ## 0x6 >> >> >> >> >> >> >> >> This is not desirable because macho linker wasn't originally designed >> to >> >> handle user-defined symbols in those sections and having to handle them >> >> complicates the linker. Also, there is no benefit in doing so, since >> the >> >> linker currently doesn't try to merge user-defined variables anyway. >> > >> > What does "user-defined" means in here? Since the linker can is >> > involved, I assume it has something to do with the final symbol name. >> > >> > At the linker level (symbol names, sections, atoms, relocations, etc), >> > what exactly that is not supported? >> >> >> The literalN sections were developed long ago to support coalescing of >> unnamed constants like 9.897 in source code for architectures that could >> not embed large constants in instructions. The linker could knew how to >> break up the section (e.g. __literal8 is always 8 byte chunks) and coalesce >> copies by content. >> >> ~6 years ago we discovered that gcc would sometimes put user named >> constants into the literal sections (e.g. const double foo 9.897). This >> was an issue because C language rules say &a != &b, but if ‘a’ and ‘b’ are >> the contain the same literal value from different translation units, the >> linker could merge them to the same address. For whatever reason, we could >> not fix gcc, so we changed to linker to never coalesce items in literal >> sections if there was a (non ‘L’ and non ‘l’) symbol on it. >> >> The current state of LLVM is that is it going out of its way to move >> “named” constants from __const section to __literalN section. But the only >> possible advantage to doing that is that the hopes that the linker might >> coalesce it. But the linker won’t coalesce it because it is named. So, is >> there a way to keep the named values in the __const section? >> > > I believe the following patch would be the minimal needed to do this, > there is some dead code that could be removed as well. > > diff --git a/lib/CodeGen/TargetLoweringObjectFileImpl.cpp > b/lib/CodeGen/TargetLoweringObjectFileImpl.cpp > index 55e1756..bf78ce1 100644 > --- a/lib/CodeGen/TargetLoweringObjectFileImpl.cpp > +++ b/lib/CodeGen/TargetLoweringObjectFileImpl.cpp > @@ -667,12 +667,6 @@ > TargetLoweringObjectFileMachO::getSectionForConstant(SectionKind Kind, > if (Kind.isDataRel() || Kind.isReadOnlyWithRel()) > return ConstDataSection; > > - if (Kind.isMergeableConst4()) > - return FourByteConstantSection; > - if (Kind.isMergeableConst8()) > - return EightByteConstantSection; > - if (Kind.isMergeableConst16()) > - return SixteenByteConstantSection; > return ReadOnlySection; // .const > } > > >> >> -Nick >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140825/4cc29f8d/attachment.html>
Reid Kleckner
2014-Aug-25 18:38 UTC
[LLVMdev] How to tell whether a GlobalValue is user-defined
On Mon, Aug 25, 2014 at 9:54 AM, Nick Kledzik <kledzik at apple.com> wrote:> > The literalN sections were developed long ago to support coalescing of > unnamed constants like 9.897 in source code for architectures that could > not embed large constants in instructions. The linker could knew how to > break up the section (e.g. __literal8 is always 8 byte chunks) and coalesce > copies by content. > > ~6 years ago we discovered that gcc would sometimes put user named > constants into the literal sections (e.g. const double foo 9.897). This > was an issue because C language rules say &a != &b, but if ‘a’ and ‘b’ are > the contain the same literal value from different translation units, the > linker could merge them to the same address. For whatever reason, we could > not fix gcc, so we changed to linker to never coalesce items in literal > sections if there was a (non ‘L’ and non ‘l’) symbol on it. >Thanks for the info!> The current state of LLVM is that is it going out of its way to move > “named” constants from __const section to __literalN section. But the only > possible advantage to doing that is that the hopes that the linker might > coalesce it. But the linker won’t coalesce it because it is named. So, is > there a way to keep the named values in the __const section? >Right, LLVM has proven that the address of the data is insignificant, hence it is "unnamed", and can be placed in a mergeable section. Is there any reason not to change the linker to merge this stuff, if gcc is no longer supported? We won't violate the semantics of C. Is there some immediate problem with keeping the data in these sections? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140825/12a1aefe/attachment.html>
Rafael Espíndola
2014-Aug-27 20:58 UTC
[LLVMdev] How to tell whether a GlobalValue is user-defined
>> The literalN sections were developed long ago to support coalescing of >> unnamed constants like 9.897 in source code for architectures that could not >> embed large constants in instructions. The linker could knew how to break >> up the section (e.g. __literal8 is always 8 byte chunks) and coalesce copies >> by content. >> >> ~6 years ago we discovered that gcc would sometimes put user named >> constants into the literal sections (e.g. const double foo 9.897). This was >> an issue because C language rules say &a != &b, but if ‘a’ and ‘b’ are the >> contain the same literal value from different translation units, the linker >> could merge them to the same address. For whatever reason, we could not fix >> gcc, so we changed to linker to never coalesce items in literal sections if >> there was a (non ‘L’ and non ‘l’) symbol on it. > > > Thanks for the info!+1>> The current state of LLVM is that is it going out of its way to move >> “named” constants from __const section to __literalN section. But the only >> possible advantage to doing that is that the hopes that the linker might >> coalesce it. But the linker won’t coalesce it because it is named. So, is >> there a way to keep the named values in the __const section? > > > Right, LLVM has proven that the address of the data is insignificant, hence > it is "unnamed", and can be placed in a mergeable section. Is there any > reason not to change the linker to merge this stuff, if gcc is no longer > supported? We won't violate the semantics of C. Is there some immediate > problem with keeping the data in these sections?Agreed. If ld64 can drop support for .o produced by the old gcc that would be awesome. Failing that, what is really needed is LLVM should only put constants in mergeable sections only if (among other things) they require only symbols that start with 'l' or 'L'. Correct? Cheers, Rafael