Fāng-ruì Sòng via llvm-dev
2021-May-26 07:54 UTC
[llvm-dev] PGO GC inefficiency due to link.exe's IMAGE_COMDAT_SELECT_ASSOCIATIVE limitation
I have studied the GC state of Windows (IR/frontend) PGO a bit. Below is my understanding. Hope folks can clarify it I make a mistake:) For PGO, we need cnts/data/vals/names sections. cnts/data are main metadata sections. vals is for value profiling. For Windows, the cnts section is named .lprfc$M and the data section is named .lprfd$M. The garbage collection story is unfortunate. If an IMAGE_COMDAT_SELECT_ASSOCIATIVE section defines an external symbol, MSVC link.exe may report a spurious duplicate symbol error (error LNK2005), even if the associative section would be discarded after handling the leader symbol. lld-link doesn't have this limitation. However, a portable implementation needs to work around MSVC link.exe. For a COMDAT .lprfd$M, its symbol must be external (linkonce_odr), otherwise references to a non-prevailing symbol would cause an error. Due to the limitation, .lprfd$M has to reside in its own COMDAT, no sharing with .lprfc$M. Different COMDAT groups mean that the liveness of one .lprfc$M does not make its associative .lprfd$M live. Since a .lprfd$M may be unreferenced, we have to conservatively assume all COMDAT .lprfd$M live. Since .lprfc$M input sections parallel .lprfd$M input sections, we have to conservatively assume all COMDAT .lprfc$M live. For an external symbol, we use a /INCLUDE: directive in .drectve to mark it as a GC root. As a result, .drectve may have many /INCLUDE: directives, just to work around the link.exe limitation. IIRC Chrome folks have found that the /INCLUDE: directives can cause huge memory usage. Note: for ELF we can use R_*_NONE to establish an artificial dependency edge between two sections. I don't think PE-COFF provides a similar feature. So does link.exe accept feature requests? The external symbol limitation in an IMAGE_COMDAT_SELECT_ASSOCIATIVE section looks quite unfortunate... It seems that /OPT:REF has another inefficiency - it cannot discard non-COMDAT sections. IIUC Fuchsia folks found that GC on non-COMDAT functions can save a lot of space (40%? of PGO metadata sections). It is unfortunate that link.exe doesn't discard non-COMDAT sections... -- 宋方睿
Reid Kleckner via llvm-dev
2021-May-26 20:51 UTC
[llvm-dev] PGO GC inefficiency due to link.exe's IMAGE_COMDAT_SELECT_ASSOCIATIVE limitation
On Wed, May 26, 2021 at 12:54 AM Fāng-ruì Sòng via llvm-dev < llvm-dev at lists.llvm.org> wrote:> I have studied the GC state of Windows (IR/frontend) PGO a bit. > Below is my understanding. Hope folks can clarify it I make a mistake:) > > For PGO, we need cnts/data/vals/names sections. cnts/data are main > metadata sections. vals is for value profiling. > For Windows, the cnts section is named .lprfc$M and the data section > is named .lprfd$M. > The garbage collection story is unfortunate. > > If an IMAGE_COMDAT_SELECT_ASSOCIATIVE section defines an external > symbol, MSVC link.exe may report a spurious duplicate symbol error > (error LNK2005), even if the associative section would be discarded > after handling the leader symbol. > lld-link doesn't have this limitation. > However, a portable implementation needs to work around MSVC link.exe. >Yes, that's pretty much how I remember this working.> For a COMDAT .lprfd$M, its symbol must be external (linkonce_odr), > otherwise references to a non-prevailing symbol would cause an error. > Due to the limitation, .lprfd$M has to reside in its own COMDAT, no > sharing with .lprfc$M. >Yes. I'm sure the instrumented code references the counters, so the counters must be external. However, if the __profd_ symbol is not referenced directly from anywhere, perhaps we could make it internal and have it join the __profc_* comdat to reverse the edge. If there are explicit __profd_ references from code, this won't work. Different COMDAT groups mean that the liveness of one .lprfc$M does> not make its associative .lprfd$M live. > Since a .lprfd$M may be unreferenced, we have to conservatively assume > all COMDAT .lprfd$M live. > Since .lprfc$M input sections parallel .lprfd$M input sections, we > have to conservatively assume all COMDAT .lprfc$M live. > For an external symbol, we use a /INCLUDE: directive in .drectve to > mark it as a GC root. > As a result, .drectve may have many /INCLUDE: directives, just to work > around the link.exe limitation. > IIRC Chrome folks have found that the /INCLUDE: directives can cause > huge memory usage. >Yes, I have wanted to implement an LLD-specific extension for this feature. The general problem is that symbol names are long, and we should avoid repeating them in the object file if at all possible. I think llvm.used and __attribute__((used)) should be compatible with link.exe, but it would be reasonable to make PGO an LLD-only feature to get these object file size savings. Specifically, we can copy the design of .llvm_addrsig to mark symbols as GC roots.> Note: for ELF we can use R_*_NONE to establish an artificial > dependency edge between two sections. > I don't think PE-COFF provides a similar feature. >Yep, not so far as I am aware.> So does link.exe accept feature requests? The external symbol > limitation in an IMAGE_COMDAT_SELECT_ASSOCIATIVE section looks quite > unfortunate... >I have a few points of contact at Microsoft, but there isn't a good general way to submit these kinds of low-level ABI feature requests. The way the feature is designed, it only really works well when the associative data is static, unreferenced, and discovered only by section concatenation and iteration. This supports very many use cases: debug info, unwind info, dynamic initializers, asan global metadata, control flow guard metadata, others. In the PGO use case, I think we would be violating a hidden invariant that all the associative data should be static and unreferenced.> It seems that /OPT:REF has another inefficiency - it cannot discard > non-COMDAT sections. > IIUC Fuchsia folks found that GC on non-COMDAT functions can save a > lot of space (40%? of PGO metadata sections). It is unfortunate that > link.exe doesn't discard non-COMDAT sections... >This is not a major issue in practice because with -ffunction-sections / -fdata-sections, everything is in a comdat section, they just all use IMAGE_COMDAT_SELECT_NODUPLICATES. I guess I really only have one idea here: find a way to avoid __profd_* references, and make __profc_ the comdat leader. ---- Separately, I have noticed that PGO greatly inhibits GC for other reasons. See the example from my last patch: https://reviews.llvm.org/D102818 For most linkonce_odr functions (so basically everything), the __profd_ global references the function, and then __profd_ is added to llvm.used. This prevents GC from working on any linkonce_odr function, which is bad for code size, so I removed the feature from most coverage builds. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210526/3a187b06/attachment.html>
Fāng-ruì Sòng via llvm-dev
2021-Jun-04 22:50 UTC
[llvm-dev] PGO GC inefficiency due to link.exe's IMAGE_COMDAT_SELECT_ASSOCIATIVE limitation
On Wed, May 26, 2021 at 1:51 PM Reid Kleckner <rnk at google.com> wrote:> > On Wed, May 26, 2021 at 12:54 AM Fāng-ruì Sòng via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> I have studied the GC state of Windows (IR/frontend) PGO a bit. >> Below is my understanding. Hope folks can clarify it I make a mistake:) >> >> For PGO, we need cnts/data/vals/names sections. cnts/data are main >> metadata sections. vals is for value profiling. >> For Windows, the cnts section is named .lprfc$M and the data section >> is named .lprfd$M. >> The garbage collection story is unfortunate. >> >> If an IMAGE_COMDAT_SELECT_ASSOCIATIVE section defines an external >> symbol, MSVC link.exe may report a spurious duplicate symbol error >> (error LNK2005), even if the associative section would be discarded >> after handling the leader symbol. >> lld-link doesn't have this limitation. >> However, a portable implementation needs to work around MSVC link.exe. > > > Yes, that's pretty much how I remember this working. > >> >> For a COMDAT .lprfd$M, its symbol must be external (linkonce_odr), >> otherwise references to a non-prevailing symbol would cause an error. >> Due to the limitation, .lprfd$M has to reside in its own COMDAT, no >> sharing with .lprfc$M. > > > Yes. I'm sure the instrumented code references the counters, so the counters must be external. However, if the __profd_ symbol is not referenced directly from anywhere, perhaps we could make it internal and have it join the __profc_* comdat to reverse the edge. If there are explicit __profd_ references from code, this won't work. > >> Different COMDAT groups mean that the liveness of one .lprfc$M does >> not make its associative .lprfd$M live. >> Since a .lprfd$M may be unreferenced, we have to conservatively assume >> all COMDAT .lprfd$M live. >> Since .lprfc$M input sections parallel .lprfd$M input sections, we >> have to conservatively assume all COMDAT .lprfc$M live. >> For an external symbol, we use a /INCLUDE: directive in .drectve to >> mark it as a GC root. >> As a result, .drectve may have many /INCLUDE: directives, just to work >> around the link.exe limitation. >> IIRC Chrome folks have found that the /INCLUDE: directives can cause >> huge memory usage. > > > Yes, I have wanted to implement an LLD-specific extension for this feature. The general problem is that symbol names are long, and we should avoid repeating them in the object file if at all possible. I think llvm.used and __attribute__((used)) should be compatible with link.exe, but it would be reasonable to make PGO an LLD-only feature to get these object file size savings. Specifically, we can copy the design of .llvm_addrsig to mark symbols as GC roots. > >> >> Note: for ELF we can use R_*_NONE to establish an artificial >> dependency edge between two sections. >> I don't think PE-COFF provides a similar feature. > > > Yep, not so far as I am aware. > >> >> So does link.exe accept feature requests? The external symbol >> limitation in an IMAGE_COMDAT_SELECT_ASSOCIATIVE section looks quite >> unfortunate... > > > I have a few points of contact at Microsoft, but there isn't a good general way to submit these kinds of low-level ABI feature requests. > > The way the feature is designed, it only really works well when the associative data is static, unreferenced, and discovered only by section concatenation and iteration. This supports very many use cases: debug info, unwind info, dynamic initializers, asan global metadata, control flow guard metadata, others. In the PGO use case, I think we would be violating a hidden invariant that all the associative data should be static and unreferenced. > >> >> It seems that /OPT:REF has another inefficiency - it cannot discard >> non-COMDAT sections. >> IIUC Fuchsia folks found that GC on non-COMDAT functions can save a >> lot of space (40%? of PGO metadata sections). It is unfortunate that >> link.exe doesn't discard non-COMDAT sections... > > > This is not a major issue in practice because with -ffunction-sections / -fdata-sections, everything is in a comdat section, they just all use IMAGE_COMDAT_SELECT_NODUPLICATES. > > I guess I really only have one idea here: find a way to avoid __profd_* references, and make __profc_ the comdat leader. > > ---- > > Separately, I have noticed that PGO greatly inhibits GC for other reasons. See the example from my last patch: https://reviews.llvm.org/D102818 > For most linkonce_odr functions (so basically everything), the __profd_ global references the function, and then __profd_ is added to llvm.used. This prevents GC from working on any linkonce_odr function, which is bad for code size, so I removed the feature from most coverage builds.Circling back on this. Two patches have landed, decreasing the object file/linked image sizes for ELF and Windows a bit. * https://reviews.llvm.org/D103355 [InstrProfiling] Delete linkage/visibility toggling for Windows * https://reviews.llvm.org/D103372 [InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat Windows still suffers from /INCLUDE:__covrec_DB956436E78DD5FAu strings in the .drectve section. Since .lcovfun$M is in a comdat for deduplication, it is subject to default -opt:ref. We have to use /INCLUDE because there is no cheaper ELF-style R_*_NONE. -- 宋方睿