thr3ads.net - llvm dev - [llvm-dev] RFC: Moving the module summary into the irsymtab [May 2017]

If this information is useful, please help other people find it:
Share via:

Peter Collingbourne via llvm-dev

2017-May-01 19:45 UTC

[llvm-dev] RFC: Moving the module summary into the irsymtab

On Mon, May 1, 2017 at 11:06 AM, Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>
> 2017-04-25 12:11 GMT-07:00 Peter Collingbourne <peter at pcc.me.uk>:
>
>> Hi all,
>>
>> I've been making a number of changes to the summary representation
>> recently, and I wanted to lay out some of my plans so that folks are
aware
>> of my ultimate direction with this.
>>
>> Basically I want to move the summary into the irsymtab that we will be
>> storing to disk after D32061 lands. This would help solve a number of
>> problems:
>> - To read a summary, you need to read all summaries in a module. For
>> example, if a client only wants to read summaries for prevailing
symbols,
>> it still needs to read summaries for all symbols. We should be able to
>> design an API that lets clients avoid reading summaries for known
>> non-prevailing symbols.
>>
>
> Why should we? Efficiency?
>
Where in the process would we benefit from that?>
Efficiency, better dead code stripping (can more easily strip dependencies
of non-prevailing symbols) and it would also help us address the
correctness problem where we re-implement the linker's prevailing
definition logic in the summary and sometimes get it wrong (for non-MachO
at least).

I also want to try to simplify the summary so that each index entry can
only map onto a single definition (by changing the key type from GUID to
symbol name), but that's a separate discussion.

Right now we may still cross-module import and inline a
non-prevailing> symbols, if the prevailing symbol isn't IR defined.
>
We can handle this in the same way as we do for regular LTO: we can add
non-prevailing ODR definitions to the index, and change their linkage to
available_externally, and if we later see a prevailing definition, we can
replace the non-prevailing definition with it.

- Regular LTO modules do not have summaries. This means that dead
stripping>> is less effective if the module contains both regular and thin LTO
modules.
>> (Although this is a somewhat orthogonal problem, since I am making a
format
>> change, I'd like to take care of it as part of the same change.)
>>
>
> How do you want do deal with that?
> Do you mind giving a quick example (LTO defines foo, used from bar in a
> ThinLTO module, but bar is dead).
>
Sure. In your example, we would create summaries (and store them in the
bitcode files) that look like this:
ThinLTO: foo -> {bar}
FullLTO: bar -> {}

The FullLTO summaries would have a flag set to prevent importing (at least
to start with).

At LTO time those summaries would be loaded into a single summary index and
we would run computeDeadSymbols over the index. In this case, bar would be
added to the set of dead symbols because it is not reachable from a GC
root. We would move the code that handles loading regular LTO modules to
LTO::runRegularLTO. In that function we would check whether bar is dead.
Because it is, we would not add it to the list passed to IRMover.

Also I assume code size isn't an issue since -ffunction-sections
always> allow to dead-strip post-LTO (even though we're losing compile time
> efficiency and potentially a few optimisations).
>
It is an issue for CFI, because the CFI pass works by merging several
vtable globals into a small number of larger globals, which would prevent
post-LTO stripping of the unused vtables and virtual functions. Early dead
code stripping would allow the unused vtable globals to be stripped before
they are merged.

>
>>
>> Basically, the summaries would be stored in an auxiliary structure like
>> storage::Uncommon with a flag in the storage::Symbol indicating whether
a
>> given symbol has a summary.
>>
>> Currently we use the presence of a summary to indicate whether to
compile
>> a module with regular or thin LTO. This will need to change if we want
to
>> store summaries for regular LTO modules. To that end, I want to add a
>> record to all bitcode modules to be compiled with thin LTO that marks
them
>> as such. This will be used in place of the presence of the summary. For
>> backwards compatibility, the presence of a summary in bitcode format
will
>> be used to mark modules as needing to be compiled with thin LTO.
>>
>> Because the contents of the summaries, unlike the irsymtab, for the
most
>> part do not need to be accurate for correctness,
>>
>
> I'm not sure what you mean about accuracy and correctness. The
"for the
> most part" part especially is raising the question that if there is a
part
> that needs to be accurate for correctness, that's enough to requires
> accuracy for the summary period.
>
What I was trying to express was that a global's summary must, for
correctness, contain a superset of the accurate information for the global.
In that sense, the summary that I mention below, which contains a reference
edge from each defined symbol to every other symbol in the symbol table,
would lead to correct behaviour. When the module is loaded, it will contain
fewer symbol references than what is indicated by the summary, but that can
be seen as an "optimisation" performed by the bitcode reader.

For example, suppose that we have this module:

void g();
void h();
void f() {
  g();
}

An equivalent implementation of f would be:

void f() {
  void (*p)() = g;
  p();
  if (0)
    h();
}

and an accurate summary for f would contain ref edges from f to both g and
h. When the first module is read, its definition of f will contain a call
edge from f to g and no edge from f to h, but conceptually that can be seen
as similar to dead code elimination and constant propagation being applied
to the second module's f when it is read.

I think we don't need as strict rules as we do for the rest of
the>> irsymtab. I.e. we don't need to rebuild the summary entirely if the
LLVM
>> revision changes, as I am doing for the irsymtab in D32061. However,
the
>> summary must have a correct set of reference edges in order to
implement
>> dead stripping.
>>
>
> Note that in the Apple ecosystem, we claim backward compatibility and want
> ThinLTO static archive built with 4.0 to work seamlessly with 5.0, 6.0,
> etc.
>
Sure, the design expressly allows for that. The result will not necessarily
be optimal, but it will work.

So the solution I have in mind is to pessimise dead stripping *for
that>> module* if the LLVM revision is out of date. I.e. the upgraded summary
>> would contain a reference edge from every defined symbol to every other
>> symbol in the module. Because we had already regenerated the irsymtab
as a
>> result of the revision change, we will have an accurate symbol table
for
>> the module, so this seems sound to me.
>>
>
>> All other parts of the summary would be preserved, as long as the
summary
>> format does not change. This means that the function size and hotness
for
>> example would be copied from the existing summary. To make this work, I
>> would add a format version number field to the irsymtab header as part
of
>> D32061, and copy the remaining information from the existing summary as
>> long as the version number is the same.
>>
>
> The last two paragraphs aren't totally clear to me (on the why it is
> needed and what is the practical impact).
>
I hope the above clarifies things.

Thanks,
-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170501/c61625ac/attachment.html>

Peter Collingbourne via llvm-dev

2017-May-02 02:52 UTC

head link

[llvm-dev] RFC: Moving the module summary into the irsymtab

On Mon, May 1, 2017 at 12:45 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> On Mon, May 1, 2017 at 11:06 AM, Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>>
>>
>> 2017-04-25 12:11 GMT-07:00 Peter Collingbourne <peter at
pcc.me.uk>:
>>
>>> Hi all,
>>>
>>> I've been making a number of changes to the summary
representation
>>> recently, and I wanted to lay out some of my plans so that folks
are aware
>>> of my ultimate direction with this.
>>>
>>> Basically I want to move the summary into the irsymtab that we will
be
>>> storing to disk after D32061 lands. This would help solve a number
of
>>> problems:
>>> - To read a summary, you need to read all summaries in a module.
For
>>> example, if a client only wants to read summaries for prevailing
symbols,
>>> it still needs to read summaries for all symbols. We should be able
to
>>> design an API that lets clients avoid reading summaries for known
>>> non-prevailing symbols.
>>>
>>
>> Why should we? Efficiency?
>>
> Where in the process would we benefit from that?
>>
>
> Efficiency, better dead code stripping (can more easily strip dependencies
> of non-prevailing symbols) and it would also help us address the
> correctness problem where we re-implement the linker's prevailing
> definition logic in the summary and sometimes get it wrong (for non-MachO
> at least).
>
I realised that there is a potential problem with adding only prevailing
definitions to the summary if we re-use non-prevailing definitions as
available_externally definitions in the backends. The problem is that if
the non-prevailing definition is less well optimised than the prevailing
definition, it may contain more global refs than the prevailing definition,
which would make it unsound to apply dead stripping based solely on the
refs in the prevailing definitions.

e.g. 1.cc:

void a() {}
inline void b() { a(); }
void c();
int main() {
  b();
  c();
}

2.cc:

void a();
inline void b() { a(); }
void c() {
  b();
}

Supposing that a is inlined into b in 1.cc and the definition of b in 1.cc
is prevailing, it would be unsound to remove a, because it may be used by
2.cc.

The solution I have in mind is to store the transitive closure of
references (stopping at interposable -- i.e. GlobalValue::isInterposable --
edges) in each global's ref list.

This naturally raises the question of whether it's still worth trying to
avoid adding non-prevailing symbols to the summary, i.e. we could
alternatively address the correctness problem of re-implementing prevailing
in the summary by adding a prevailing flag to GlobalValueSummary. I'll have
to think about it more.

I also want to try to simplify the summary so that each index entry
can> only map onto a single definition (by changing the key type from GUID to
> symbol name), but that's a separate discussion.
>
> Right now we may still cross-module import and inline a non-prevailing
>> symbols, if the prevailing symbol isn't IR defined.
>>
>
> We can handle this in the same way as we do for regular LTO: we can add
> non-prevailing ODR definitions to the index, and change their linkage to
> available_externally, and if we later see a prevailing definition, we can
> replace the non-prevailing definition with it.
>
FWIW, I think this would still work with the transitive closure solution.

> - Regular LTO modules do not have summaries. This means that dead
>>> stripping is less effective if the module contains both regular and
thin
>>> LTO modules. (Although this is a somewhat orthogonal problem, since
I am
>>> making a format change, I'd like to take care of it as part of
the same
>>> change.)
>>>
>>
>> How do you want do deal with that?
>> Do you mind giving a quick example (LTO defines foo, used from bar in a
>> ThinLTO module, but bar is dead).
>>
>
> Sure. In your example, we would create summaries (and store them in the
> bitcode files) that look like this:
> ThinLTO: foo -> {bar}
> FullLTO: bar -> {}
>
> The FullLTO summaries would have a flag set to prevent importing (at least
> to start with).
>
> At LTO time those summaries would be loaded into a single summary index
> and we would run computeDeadSymbols over the index. In this case, bar would
> be added to the set of dead symbols because it is not reachable from a GC
> root. We would move the code that handles loading regular LTO modules to
> LTO::runRegularLTO. In that function we would check whether bar is dead.
> Because it is, we would not add it to the list passed to IRMover.
>
> Also I assume code size isn't an issue since -ffunction-sections always
>> allow to dead-strip post-LTO (even though we're losing compile time
>> efficiency and potentially a few optimisations).
>>
>
> It is an issue for CFI, because the CFI pass works by merging several
> vtable globals into a small number of larger globals, which would prevent
> post-LTO stripping of the unused vtables and virtual functions. Early dead
> code stripping would allow the unused vtable globals to be stripped before
> they are merged.
>
>
>>
>>>
>>> Basically, the summaries would be stored in an auxiliary structure
like
>>> storage::Uncommon with a flag in the storage::Symbol indicating
whether a
>>> given symbol has a summary.
>>>
>>> Currently we use the presence of a summary to indicate whether to
>>> compile a module with regular or thin LTO. This will need to change
if we
>>> want to store summaries for regular LTO modules. To that end, I
want to add
>>> a record to all bitcode modules to be compiled with thin LTO that
marks
>>> them as such. This will be used in place of the presence of the
summary.
>>> For backwards compatibility, the presence of a summary in bitcode
format
>>> will be used to mark modules as needing to be compiled with thin
LTO.
>>>
>>> Because the contents of the summaries, unlike the irsymtab, for the
most
>>> part do not need to be accurate for correctness,
>>>
>>
>> I'm not sure what you mean about accuracy and correctness. The
"for the
>> most part" part especially is raising the question that if there
is a part
>> that needs to be accurate for correctness, that's enough to
requires
>> accuracy for the summary period.
>>
>
> What I was trying to express was that a global's summary must, for
> correctness, contain a superset of the accurate information for the global.
> In that sense, the summary that I mention below, which contains a reference
> edge from each defined symbol to every other symbol in the symbol table,
> would lead to correct behaviour. When the module is loaded, it will contain
> fewer symbol references than what is indicated by the summary, but that can
> be seen as an "optimisation" performed by the bitcode reader.
>
> For example, suppose that we have this module:
>
> void g();
> void h();
> void f() {
>   g();
> }
>
> An equivalent implementation of f would be:
>
> void f() {
>   void (*p)() = g;
>   p();
>   if (0)
>     h();
> }
>
> and an accurate summary for f would contain ref edges from f to both g and
> h. When the first module is read, its definition of f will contain a call
> edge from f to g and no edge from f to h, but conceptually that can be seen
> as similar to dead code elimination and constant propagation being applied
> to the second module's f when it is read.
>
> I think we don't need as strict rules as we do for the rest of the
>>> irsymtab. I.e. we don't need to rebuild the summary entirely if
the LLVM
>>> revision changes, as I am doing for the irsymtab in D32061.
However, the
>>> summary must have a correct set of reference edges in order to
implement
>>> dead stripping.
>>>
>>
>> Note that in the Apple ecosystem, we claim backward compatibility and
>> want ThinLTO static archive built with 4.0 to work seamlessly with 5.0,
>> 6.0, etc.
>>
>
> Sure, the design expressly allows for that. The result will not
> necessarily be optimal, but it will work.
>
> So the solution I have in mind is to pessimise dead stripping *for that
>>> module* if the LLVM revision is out of date. I.e. the upgraded
summary
>>> would contain a reference edge from every defined symbol to every
other
>>> symbol in the module. Because we had already regenerated the
irsymtab as a
>>> result of the revision change, we will have an accurate symbol
table for
>>> the module, so this seems sound to me.
>>>
>>
>>> All other parts of the summary would be preserved, as long as the
>>> summary format does not change. This means that the function size
and
>>> hotness for example would be copied from the existing summary. To
make this
>>> work, I would add a format version number field to the irsymtab
header as
>>> part of D32061, and copy the remaining information from the
existing
>>> summary as long as the version number is the same.
>>>
>>
>> The last two paragraphs aren't totally clear to me (on the why it
is
>> needed and what is the practical impact).
>>
>
> I hope the above clarifies things.
>
> Thanks,
> --
> --
> Peter
>


-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170501/eb44ddc7/attachment.html>

Teresa Johnson via llvm-dev

2017-May-11 16:34 UTC

head link

[llvm-dev] RFC: Moving the module summary into the irsymtab

Sorry for the late response, a few questions/comments below. Teresa

On Mon, May 1, 2017 at 12:45 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> On Mon, May 1, 2017 at 11:06 AM, Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>>
>>
>> 2017-04-25 12:11 GMT-07:00 Peter Collingbourne <peter at
pcc.me.uk>:
>>
>>> Hi all,
>>>
>>> I've been making a number of changes to the summary
representation
>>> recently, and I wanted to lay out some of my plans so that folks
are aware
>>> of my ultimate direction with this.
>>>
>>> Basically I want to move the summary into the irsymtab that we will
be
>>> storing to disk after D32061 lands. This would help solve a number
of
>>> problems:
>>> - To read a summary, you need to read all summaries in a module.
For
>>> example, if a client only wants to read summaries for prevailing
symbols,
>>> it still needs to read summaries for all symbols. We should be able
to
>>> design an API that lets clients avoid reading summaries for known
>>> non-prevailing symbols.
>>>
>>
>> Why should we? Efficiency?
>>
> Where in the process would we benefit from that?
>>
>
> Efficiency, better dead code stripping (can more easily strip dependencies
> of non-prevailing symbols) and it would also help us address the
> correctness problem where we re-implement the linker's prevailing
> definition logic in the summary and sometimes get it wrong (for non-MachO
> at least).
>
Are we getting this wrong in places now with the new LTO API? I.e. we set
up the PrevailingModuleForGUID map using linker info, which feeds into weak
resolution.

> I also want to try to simplify the summary so that each index entry can
> only map onto a single definition (by changing the key type from GUID to
> symbol name), but that's a separate discussion.
>
> Right now we may still cross-module import and inline a non-prevailing
>> symbols, if the prevailing symbol isn't IR defined.
>>
>
> We can handle this in the same way as we do for regular LTO: we can add
> non-prevailing ODR definitions to the index, and change their linkage to
> available_externally, and if we later see a prevailing definition, we can
> replace the non-prevailing definition with it.
>
> - Regular LTO modules do not have summaries. This means that dead
>>> stripping is less effective if the module contains both regular and
thin
>>> LTO modules. (Although this is a somewhat orthogonal problem, since
I am
>>> making a format change, I'd like to take care of it as part of
the same
>>> change.)
>>>
>>
>> How do you want do deal with that?
>> Do you mind giving a quick example (LTO defines foo, used from bar in a
>> ThinLTO module, but bar is dead).
>>
>
> Sure. In your example, we would create summaries (and store them in the
> bitcode files) that look like this:
> ThinLTO: foo -> {bar}
> FullLTO: bar -> {}
>
It doesn't look like this maps to the description Mehdi gave, but rather
the following (correct me if I'm wrong):
  LTO defines bar, which is used from foo in a ThinLTO module, but foo is
dead (and therefore bar is dead)

>
> The FullLTO summaries would have a flag set to prevent importing (at least
> to start with).
>
> At LTO time those summaries would be loaded into a single summary index
> and we would run computeDeadSymbols over the index. In this case, bar would
> be added to the set of dead symbols because it is not reachable from a GC
> root.
>
I assume foo is therefore also dead as I described above

> We would move the code that handles loading regular LTO modules to
> LTO::runRegularLTO. In that function we would check whether bar is dead.
> Because it is, we would not add it to the list passed to IRMover.
>
> Also I assume code size isn't an issue since -ffunction-sections always
>> allow to dead-strip post-LTO (even though we're losing compile time
>> efficiency and potentially a few optimisations).
>>
>
> It is an issue for CFI, because the CFI pass works by merging several
> vtable globals into a small number of larger globals, which would prevent
> post-LTO stripping of the unused vtables and virtual functions. Early dead
> code stripping would allow the unused vtable globals to be stripped before
> they are merged.
>
>
>>
>>>
>>> Basically, the summaries would be stored in an auxiliary structure
like
>>> storage::Uncommon with a flag in the storage::Symbol indicating
whether a
>>> given symbol has a summary.
>>>
>>> Currently we use the presence of a summary to indicate whether to
>>> compile a module with regular or thin LTO. This will need to change
if we
>>> want to store summaries for regular LTO modules. To that end, I
want to add
>>> a record to all bitcode modules to be compiled with thin LTO that
marks
>>> them as such. This will be used in place of the presence of the
summary.
>>> For backwards compatibility, the presence of a summary in bitcode
format
>>> will be used to mark modules as needing to be compiled with thin
LTO.
>>>
>>> Because the contents of the summaries, unlike the irsymtab, for the
most
>>> part do not need to be accurate for correctness,
>>>
>>
>> I'm not sure what you mean about accuracy and correctness. The
"for the
>> most part" part especially is raising the question that if there
is a part
>> that needs to be accurate for correctness, that's enough to
requires
>> accuracy for the summary period.
>>
>
> What I was trying to express was that a global's summary must, for
> correctness, contain a superset of the accurate information for the global.
> In that sense, the summary that I mention below, which contains a reference
> edge from each defined symbol to every other symbol in the symbol table,
> would lead to correct behaviour. When the module is loaded, it will contain
> fewer symbol references than what is indicated by the summary, but that can
> be seen as an "optimisation" performed by the bitcode reader.
>
> For example, suppose that we have this module:
>
> void g();
> void h();
> void f() {
>   g();
> }
>
> An equivalent implementation of f would be:
>
> void f() {
>   void (*p)() = g;
>   p();
>   if (0)
>     h();
> }
>
> and an accurate summary for f would contain ref edges from f to both g and
> h. When the first module is read, its definition of f will contain a call
> edge from f to g and no edge from f to h, but conceptually that can be seen
> as similar to dead code elimination and constant propagation being applied
> to the second module's f when it is read.
>
> I think we don't need as strict rules as we do for the rest of the
>>> irsymtab. I.e. we don't need to rebuild the summary entirely if
the LLVM
>>> revision changes, as I am doing for the irsymtab in D32061.
However, the
>>> summary must have a correct set of reference edges in order to
implement
>>> dead stripping.
>>>
>>
>> Note that in the Apple ecosystem, we claim backward compatibility and
>> want ThinLTO static archive built with 4.0 to work seamlessly with 5.0,
>> 6.0, etc.
>>
>
> Sure, the design expressly allows for that. The result will not
> necessarily be optimal, but it will work.
>
I think this just pessimizes dead stripping, which seems like a reasonable
tradeoff. Mehdi?

>
> So the solution I have in mind is to pessimise dead stripping *for that
>>> module* if the LLVM revision is out of date. I.e. the upgraded
summary
>>> would contain a reference edge from every defined symbol to every
other
>>> symbol in the module. Because we had already regenerated the
irsymtab as a
>>> result of the revision change, we will have an accurate symbol
table for
>>> the module, so this seems sound to me.
>>>
>>
>>> All other parts of the summary would be preserved, as long as the
>>> summary format does not change. This means that the function size
and
>>> hotness for example would be copied from the existing summary. To
make this
>>> work, I would add a format version number field to the irsymtab
header as
>>> part of D32061, and copy the remaining information from the
existing
>>> summary as long as the version number is the same.
>>>
>>Would this supercede the FS_VERSION record currently written in the global
value summary bitcode sections?

I like the overall idea of combining the summary with the irsymtab.
Ideally, as mentioned in the review thread for D32061, it would be great if
this information is ultimately deduplicated between the bitcode IR and the
irsymtab (e.g. the linkage types etc), but I understand that is a longer
term goal.

My main concern, as noted when we last discussed this, is that the irsymtab
is currently encoded in something other than bitcode, and this would move
the summary from bitcode record format into this custom format embedded in
a blob. I am hoping Mehdi or others will comment on that aspect, as I
originally implemented the summary in bitcode format based on the responses
to the original ThinLTO RFC. I know that there is some community interest
in moving to a more efficient encoding than the current bitcode format, and
maybe the irsymtab is a good way to experiment with what that should be,
but I am a little concerned about re-implementing the summary in something
new before we have community agreement on what that new format should be.


>> The last two paragraphs aren't totally clear to me (on the why it
is
>> needed and what is the practical impact).
>>
>
> I hope the above clarifies things.
>
> Thanks,
> --
> --
> Peter
>


-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170511/5842016d/attachment.html>

Peter Collingbourne via llvm-dev

2017-Jun-01 02:01 UTC

head link

[llvm-dev] RFC: Moving the module summary into the irsymtab

On Thu, May 11, 2017 at 9:34 AM, Teresa Johnson <tejohnson at google.com>
wrote:
> Sorry for the late response, a few questions/comments below. Teresa
>
> On Mon, May 1, 2017 at 12:45 PM, Peter Collingbourne <peter at
pcc.me.uk>
> wrote:
>
>> On Mon, May 1, 2017 at 11:06 AM, Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>
>>>
>>>
>>> 2017-04-25 12:11 GMT-07:00 Peter Collingbourne <peter at
pcc.me.uk>:
>>>
>>>> Hi all,
>>>>
>>>> I've been making a number of changes to the summary
representation
>>>> recently, and I wanted to lay out some of my plans so that
folks are aware
>>>> of my ultimate direction with this.
>>>>
>>>> Basically I want to move the summary into the irsymtab that we
will be
>>>> storing to disk after D32061 lands. This would help solve a
number of
>>>> problems:
>>>> - To read a summary, you need to read all summaries in a
module. For
>>>> example, if a client only wants to read summaries for
prevailing symbols,
>>>> it still needs to read summaries for all symbols. We should be
able to
>>>> design an API that lets clients avoid reading summaries for
known
>>>> non-prevailing symbols.
>>>>
>>>
>>> Why should we? Efficiency?
>>>
>> Where in the process would we benefit from that?
>>>
>>
>> Efficiency, better dead code stripping (can more easily strip
>> dependencies of non-prevailing symbols) and it would also help us
address
>> the correctness problem where we re-implement the linker's
prevailing
>> definition logic in the summary and sometimes get it wrong (for
non-MachO
>> at least).
>>
>
> Are we getting this wrong in places now with the new LTO API? I.e. we set
> up the PrevailingModuleForGUID map using linker info, which feeds into weak
> resolution.
>
We could get it wrong, but I don't think it currently comes up in practice.
The case I was thinking of was where you have two modules:

module 1:

$foo = comdat
define void @foo() comdat {
  ...
}

module 2:

$foo = comdat
define void @foo() comdat {
  ...
}

In this case we need to make sure that we import the correct version of foo
from whichever comdat was chosen. We handle this correctly right now for
odr functions (because we can choose any definition, as selectCallee does)
but not for non-odr externals. But that can be alternatively addressed by
changing how we feed information from thinLTOResolveWeakForLinkerInIndex
into selectCallee I think.

I also want to try to simplify the summary so that each index entry
can>> only map onto a single definition (by changing the key type from GUID
to
>> symbol name), but that's a separate discussion.
>>
>> Right now we may still cross-module import and inline a non-prevailing
>>> symbols, if the prevailing symbol isn't IR defined.
>>>
>>
>> We can handle this in the same way as we do for regular LTO: we can add
>> non-prevailing ODR definitions to the index, and change their linkage
to
>> available_externally, and if we later see a prevailing definition, we
can
>> replace the non-prevailing definition with it.
>>
>> - Regular LTO modules do not have summaries. This means that dead
>>>> stripping is less effective if the module contains both regular
and thin
>>>> LTO modules. (Although this is a somewhat orthogonal problem,
since I am
>>>> making a format change, I'd like to take care of it as part
of the same
>>>> change.)
>>>>
>>>
>>> How do you want do deal with that?
>>> Do you mind giving a quick example (LTO defines foo, used from bar
in a
>>> ThinLTO module, but bar is dead).
>>>
>>
>> Sure. In your example, we would create summaries (and store them in the
>> bitcode files) that look like this:
>> ThinLTO: foo -> {bar}
>> FullLTO: bar -> {}
>>
>
> It doesn't look like this maps to the description Mehdi gave, but
rather
> the following (correct me if I'm wrong):
>   LTO defines bar, which is used from foo in a ThinLTO module, but foo is
> dead (and therefore bar is dead)
>
Sorry, yes, that was the example that I was trying to illustrate.

>
>> The FullLTO summaries would have a flag set to prevent importing (at
>> least to start with).
>>
>> At LTO time those summaries would be loaded into a single summary index
>> and we would run computeDeadSymbols over the index. In this case, bar
would
>> be added to the set of dead symbols because it is not reachable from a
GC
>> root.
>>
>
> I assume foo is therefore also dead as I described above
>
Right.

>
>
>> We would move the code that handles loading regular LTO modules to
>> LTO::runRegularLTO. In that function we would check whether bar is
dead.
>> Because it is, we would not add it to the list passed to IRMover.
>>
>> Also I assume code size isn't an issue since -ffunction-sections
always
>>> allow to dead-strip post-LTO (even though we're losing compile
time
>>> efficiency and potentially a few optimisations).
>>>
>>
>> It is an issue for CFI, because the CFI pass works by merging several
>> vtable globals into a small number of larger globals, which would
prevent
>> post-LTO stripping of the unused vtables and virtual functions. Early
dead
>> code stripping would allow the unused vtable globals to be stripped
before
>> they are merged.
>>
>>
>>>
>>>>
>>>> Basically, the summaries would be stored in an auxiliary
structure like
>>>> storage::Uncommon with a flag in the storage::Symbol indicating
whether a
>>>> given symbol has a summary.
>>>>
>>>> Currently we use the presence of a summary to indicate whether
to
>>>> compile a module with regular or thin LTO. This will need to
change if we
>>>> want to store summaries for regular LTO modules. To that end, I
want to add
>>>> a record to all bitcode modules to be compiled with thin LTO
that marks
>>>> them as such. This will be used in place of the presence of the
summary.
>>>> For backwards compatibility, the presence of a summary in
bitcode format
>>>> will be used to mark modules as needing to be compiled with
thin LTO.
>>>>
>>>> Because the contents of the summaries, unlike the irsymtab, for
the
>>>> most part do not need to be accurate for correctness,
>>>>
>>>
>>> I'm not sure what you mean about accuracy and correctness. The
"for the
>>> most part" part especially is raising the question that if
there is a part
>>> that needs to be accurate for correctness, that's enough to
requires
>>> accuracy for the summary period.
>>>
>>
>> What I was trying to express was that a global's summary must, for
>> correctness, contain a superset of the accurate information for the
global.
>> In that sense, the summary that I mention below, which contains a
reference
>> edge from each defined symbol to every other symbol in the symbol
table,
>> would lead to correct behaviour. When the module is loaded, it will
contain
>> fewer symbol references than what is indicated by the summary, but that
can
>> be seen as an "optimisation" performed by the bitcode reader.
>>
>> For example, suppose that we have this module:
>>
>> void g();
>> void h();
>> void f() {
>>   g();
>> }
>>
>> An equivalent implementation of f would be:
>>
>> void f() {
>>   void (*p)() = g;
>>   p();
>>   if (0)
>>     h();
>> }
>>
>> and an accurate summary for f would contain ref edges from f to both g
>> and h. When the first module is read, its definition of f will contain
a
>> call edge from f to g and no edge from f to h, but conceptually that
can be
>> seen as similar to dead code elimination and constant propagation being
>> applied to the second module's f when it is read.
>>
>> I think we don't need as strict rules as we do for the rest of the
>>>> irsymtab. I.e. we don't need to rebuild the summary
entirely if the LLVM
>>>> revision changes, as I am doing for the irsymtab in D32061.
However, the
>>>> summary must have a correct set of reference edges in order to
implement
>>>> dead stripping.
>>>>
>>>
>>> Note that in the Apple ecosystem, we claim backward compatibility
and
>>> want ThinLTO static archive built with 4.0 to work seamlessly with
5.0,
>>> 6.0, etc.
>>>
>>
>> Sure, the design expressly allows for that. The result will not
>> necessarily be optimal, but it will work.
>>
>
> I think this just pessimizes dead stripping, which seems like a reasonable
> tradeoff. Mehdi?
>
>
>>
>> So the solution I have in mind is to pessimise dead stripping *for that
>>>> module* if the LLVM revision is out of date. I.e. the upgraded
summary
>>>> would contain a reference edge from every defined symbol to
every other
>>>> symbol in the module. Because we had already regenerated the
irsymtab as a
>>>> result of the revision change, we will have an accurate symbol
table for
>>>> the module, so this seems sound to me.
>>>>
>>>
>>>> All other parts of the summary would be preserved, as long as
the
>>>> summary format does not change. This means that the function
size and
>>>> hotness for example would be copied from the existing summary.
To make this
>>>> work, I would add a format version number field to the irsymtab
header as
>>>> part of D32061, and copy the remaining information from the
existing
>>>> summary as long as the version number is the same.
>>>>
>>>
> Would this supercede the FS_VERSION record currently written in the global
> value summary bitcode sections?
>
> I like the overall idea of combining the summary with the irsymtab.
> Ideally, as mentioned in the review thread for D32061, it would be great if
> this information is ultimately deduplicated between the bitcode IR and the
> irsymtab (e.g. the linkage types etc), but I understand that is a longer
> term goal.
>
> My main concern, as noted when we last discussed this, is that the
> irsymtab is currently encoded in something other than bitcode, and this
> would move the summary from bitcode record format into this custom format
> embedded in a blob. I am hoping Mehdi or others will comment on that
> aspect, as I originally implemented the summary in bitcode format based on
> the responses to the original ThinLTO RFC. I know that there is some
> community interest in moving to a more efficient encoding than the current
> bitcode format, and maybe the irsymtab is a good way to experiment with
> what that should be, but I am a little concerned about re-implementing the
> summary in something new before we have community agreement on what that
> new format should be.
>
I read the entire original RFC thread and my impression is that people were
more concerned about the value of shoehorning things into the
ELF/COFF/MachO formats, as opposed to specifically about whether our own
format is defined in terms of bitcode or structs or whatever.

That said, I'm not sure whether I can justify spending time on changing the
summary encoding format, given that the problems that it would solve on its
own are relatively minor. I have a prototype [1] which seems to work, but
it will take a lot more work before the new format can entirely replace the
old format. The major issue I'm facing right now is CFI dead stripping, and
I don't want fixing that to be blocked on changing the format. So I'm
contemplating just making a more limited set of changes:
1) start writing bitcode summaries for regular LTO modules
2) apply dead stripping over the combined full/thin LTO summary
and we can think about reformatting the summary at a later date.

Thanks,
-- 
-- 
Peter

[1] https://github.com/pcc/llvm-project/tree/scratch
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170531/0da157e5/attachment.html>

llvm dev - May 2017 - RFC: Moving the module summary into the irsymtab

[llvm-dev] RFC: Moving the module summary into the irsymtab

[llvm-dev] RFC: Moving the module summary into the irsymtab

[llvm-dev] RFC: Moving the module summary into the irsymtab

[llvm-dev] RFC: Moving the module summary into the irsymtab