thr3ads.net - llvm dev - [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary [Apr 2018]

If this information is useful, please help other people find it:
Share via:

Teresa Johnson via llvm-dev

2018-Apr-24 14:43 UTC

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Hi everyone,

I started working on a long-standing request to have the summary dumped in
a readable format to text, and specifically to emit to LLVM assembly.
Proposal below, please let me know your thoughts.

Thanks,
Teresa





























































*RFC: LLVM Assembly format for ThinLTO
Summary========================================Background-----------------ThinLTO
operates on small summaries computed during the compile step (i.e. with “-c
-flto=thin”), which are then analyzed and updated during the Thin Link
stage, and utilized to perform IR updates during the post-link ThinLTO
backends. The summaries are emitted as LLVM Bitcode, however, not currently
in the LLVM assembly.There are two ways to generate a bitcode file
containing summary records for a module: 1. Compile with “clang -c
-flto=thin”2. Build from LLVM assembly using “opt -module-summary”Either of
these will result in the ModuleSummaryIndex analysis pass (which builds the
summary index in memory for a module) to be added to the pipeline just
before bitcode emission.Additionally, a combined index is created by
merging all the per-module indexes during the Thin Link, which is
optionally emitted as a bitcode file.Currently, the only way to view these
records is via “llvm-bcanalyzer -dump”, then manually decoding the raw
bitcode dumps.Relatedly, there is YAML reader/writer support for CFI
related summary fields (-wholeprogramdevirt-read-summary and
-wholeprogramdevirt-write-summary). Last summer, GSOC student Charles
Saternos implemented support to dump the summary in YAML from llvm-lto2
(D34080), including the rest of the summary fields (D34063), however, there
was pushback on the related RFC for dumping via YAML or another format
rather than emitting as LLVM assembly.Goals: 1. Define LLVM assembly format
for summary index2. Define interaction between parsing of summary from LLVM
assembly and synthesis of new summary index from IR.3. Implement printing
and parsing of summary index LLVM assemblyProposed LLVM Assembly
Format----------------------------------------------There are several top
level data structures within the ModuleSummaryIndex: 1.
ModulePathStringTable: Holds the paths to the modules summarized in the
index (only one entry for per-module indexes and multiple in the combined
index), along with their hashes (for incremental builds and global
promotion).2. GlobalValueMap: A map from global value GUIDs to the
corresponding function/variable/alias summary (or summaries for the
combined index and weak linkage).3. CFI-related data structures (TypeIdMap,
CfiFunctionDefs, and CfiFunctionDecls)I have a WIP patch to AsmWriter.cpp
to print the ModuleSummaryIndex that I was using to play with the format.
It currently prints 1 and 2 above. I’ve left the CFI related summary data
structures as a TODO for now, until the format is at least conceptually
agreed, but from looking at those I don’t see an issue with using the same
format (with a note/question for Peter on CFI type test representation
below).I modeled the proposed format on metadata, with a few key
differences noted below. Like metadata, I propose enumerating the entries
with the SlotTracker, and prefixing them with a special character. Avoiding
characters already used in some fashion (i.e. “!” for metadata and “#” for
attributes), I initially have chosen “^”. Open to suggestions
though.Consider the following example:extern void foo();int X;int bar() {
 foo();  return X;}void barAlias() __attribute__ ((alias ("bar")));int
main() {  barAlias();  return bar();}The proposed format has one entry per
ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks
like:^0 = module: {path: testA.o, hash: 5487197307045666224}^1 = gv: {guid:
1881667236089500162, name: X, summaries: {variable: {module: ^0, flags:
{linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}^2 = gv:
{guid: 6699318081062747564, name: foo}^3 = gv: {guid: 15822663052811949562,
name: main, summaries: {function: {module: ^0, flags: {linkage: extern,
notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags:
{readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls:
{{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}^4 = gv:
{guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0,
flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1},
insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0,
returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs:
{^1}}}}^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries:
{alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live:
0, dsoLocal: 1}, aliasee: ^4}}}Like metadata, the fields are tagged
(currently using lower camel case, maybe upper camel case would be
preferable).The proposed format has a structure that reflects the data
structures in the summary index. For example, consider the entry “^4”. This
corresponds to the function “bar”. The entry for that GUID in the
GlobalValueMap contains a list of summaries. For per-module summaries such
as this, there will be at most one summary (with no summary list for an
external function like “foo”). In the combined summary there may be
multiple, e.g. in the case of linkonce_odr functions which have definitions
in multiple modules. The summary list for bar (“^4”) contains a
FunctionSummary, so the summary is tagged “function:”. The FunctionSummary
contains both a flags structure (inherited from the base GlobalValueSummary
class), and a funcFlags structure (specific to FunctionSummary). It
therefore contains a brace-enclosed list of flag tags/values for each.Where
a global value summary references another global value summary (e.g. via a
call list, reference list, or aliasee), the entry is referenced by its
slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as
“^4”.Note that in comparison metadata assembly entries tend to be much more
decomposed since many metadata fields are themselves metadata (so then
entries tend to be shorter with references to other metadata
nodes).Currently, I am emitting the summary entries at the end, after the
metadata nodes. Note that the ModuleSummaryIndex is not currently
referenced from the Module, and isn’t currently created when parsing the
Module IR bitcode (there is a separate derived class for reading the
ModuleSummaryIndex from bitcode). This is because they are not currently
used at the same time. However, in the future there is no reason why we
couldn’t tag the global values in the Module’s LLVM assembly with the
corresponding summary entry if the ModuleSummaryIndex is available when
printing the Module in the assembly writer. I.e. we could do the following
for “main” from the above example when printing the IR definition (note the
“^3” at the end):define  dso_local i32 @main() #0 !dbg !17 ^3 {For CFI data
structures, the format would be similar. It appears that TypeIds are
referred to by string name in the top level TypeIdMap (std::map indexed by
std::string type identifier), whereas they are referenced by GUID within
the FunctionSummary class (i.e. the TypeTests vector and the VFuncId
structure). For the LLVM assembly I think there should be a top level entry
for each TypeIdMap, which lists both the type identifier string and its
GUID (followed by its associated information stored in the map), and the
TypeTests/VFuncId references on the FunctionSummary entries can reference
it by summary slot number. I.e. something like:^1 = typeid: {guid: 12345,
identifier: name_of_type, …^2 = gv: {... {function: {.... typeTests: {^1,
…Peter - is that correct and does that sound ok?Issues when Parsing of
Summaries from
Assembly--------------------------------------------------------------------When
reading an LLVM assembly file containing module summary entries, a
ModuleSummaryIndex will be created from the entries.Things to consider are
the behavior when: - Invoked with “opt -module-summary” (which currently
builds a new summary index from the IR). Options:1. recompute summary and
throw away summary in the assembly file2. ignore -module-summary and build
the summary from the LLVM assembly3. give an error4. compare the two
summaries (one created from the assembly and the new one created by the
analysis phase from the IR), and error if they are different.My opinion is
to do a),  so that the behavior using -module-summary doesn’t change. We
also need a way to force building of a fresh module summary for cases where
the user has modified the LLVM assembly of the IR (see below). - How to
handle older LLVM assembly files that don’t contain new summary fields.
Options:1. Force the LLVM assembly file to be recreated with a new summary.
I.e. “opt -module-summary -o - | llvm-dis”.2. Auto-upgrade, by silently
creating conservative values for the new summary entries.I lean towards b)
(when possible) for user-friendliness and to reduce required churn on test
inputs. - How to handle partial or incorrect LLVM assembly summary entries.
How to handle partial summaries depends in part on how we answer the prior
question about auto-upgrading. I think the best option like there is to
handle it automatically when possible. However, I do think we should error
on glaring errors like obviously missing information. For example, when
there is summary data in the LLVM assembly, but summary entries are missing
for some global values. E.g. if the user modified the assembly to add a
function but forgot to add a corresponding summary entry. We could still
have subtle issues (e.g. user adds a new call but forgets to update the
caller’s summary call list), but it will be harder to detect those.*

-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180424/489f50d6/attachment.html>

Steven Wu via llvm-dev

2018-Apr-25 20:13 UTC

head link

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Hi Teresa

Thanks for the proposal. Serializing out the summary in a readable format is
very help for debugging and development. Some comments inline.
> On Apr 24, 2018, at 7:43 AM, Teresa Johnson via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi everyone,
> 
> I started working on a long-standing request to have the summary dumped in
a readable format to text, and specifically to emit to LLVM assembly. Proposal
below, please let me know your thoughts.
> 
> Thanks,
> Teresa
> 
> RFC: LLVM Assembly format for ThinLTO Summary
> =======================================> 
> Background
> -----------------
> 
> ThinLTO operates on small summaries computed during the compile step (i.e.
with “-c -flto=thin”), which are then analyzed and updated during the Thin Link
stage, and utilized to perform IR updates during the post-link ThinLTO backends.
The summaries are emitted as LLVM Bitcode, however, not currently in the LLVM
assembly.
> 
> There are two ways to generate a bitcode file containing summary records
for a module:
> Compile with “clang -c -flto=thin”
> Build from LLVM assembly using “opt -module-summary”
> Either of these will result in the ModuleSummaryIndex analysis pass (which
builds the summary index in memory for a module) to be added to the pipeline
just before bitcode emission.
> 
> Additionally, a combined index is created by merging all the per-module
indexes during the Thin Link, which is optionally emitted as a bitcode file.
> 
> Currently, the only way to view these records is via “llvm-bcanalyzer
-dump”, then manually decoding the raw bitcode dumps.
> 
> Relatedly, there is YAML reader/writer support for CFI related summary
fields (-wholeprogramdevirt-read-summary and -wholeprogramdevirt-write-summary).
Last summer, GSOC student Charles Saternos implemented support to dump the
summary in YAML from llvm-lto2 (D34080), including the rest of the summary
fields (D34063), however, there was pushback on the related RFC for dumping via
YAML or another format rather than emitting as LLVM assembly.
Can you elaborate what the reason for pushback for YAML support? I want to know
what concern people have for YAML format so we can address them in LLVM
assembly. I could not find much context reading through the review and mailing
list.
> 
> Goals:
> 
> Define LLVM assembly format for summary index
> Define interaction between parsing of summary from LLVM assembly and
synthesis of new summary index from IR.
> Implement printing and parsing of summary index LLVM assembly
> 
> Proposed LLVM Assembly Format
> ----------------------------------------------
> 
> There are several top level data structures within the ModuleSummaryIndex:
> ModulePathStringTable: Holds the paths to the modules summarized in the
index (only one entry for per-module indexes and multiple in the combined
index), along with their hashes (for incremental builds and global promotion).
> GlobalValueMap: A map from global value GUIDs to the corresponding
function/variable/alias summary (or summaries for the combined index and weak
linkage).
> CFI-related data structures (TypeIdMap, CfiFunctionDefs, and
CfiFunctionDecls)
> 
> I have a WIP patch to AsmWriter.cpp to print the ModuleSummaryIndex that I
was using to play with the format. It currently prints 1 and 2 above. I’ve left
the CFI related summary data structures as a TODO for now, until the format is
at least conceptually agreed, but from looking at those I don’t see an issue
with using the same format (with a note/question for Peter on CFI type test
representation below).
> 
> I modeled the proposed format on metadata, with a few key differences noted
below. Like metadata, I propose enumerating the entries with the SlotTracker,
and prefixing them with a special character. Avoiding characters already used in
some fashion (i.e. “!” for metadata and “#” for attributes), I initially have
chosen “^”. Open to suggestions though.
Is there any reason or downside for just using metadata for summary? We can just
stream summary related metadata into summary block in bitcode.
> 
> Consider the following example:
> 
> extern void foo();
> int X;
> int bar() {
>   foo();
>   return X;
> }
> void barAlias() __attribute__ ((alias ("bar")));
> int main() {
>   barAlias();
>   return bar();
> }
> 
> The proposed format has one entry per ModulePathStringTable entry and one
per GlobalValueMap/GUID, and looks like:
> 
> ^0 = module: {path: testA.o, hash: 5487197307045666224}
> ^1 = gv: {guid: 1881667236089500162, name: X, summaries: {variable:
{module: ^0, flags: {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal:
1}}}}
> ^2 = gv: {guid: 6699318081062747564, name: foo}
> ^3 = gv: {guid: 15822663052811949562, name: main, summaries: {function:
{module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal:
1}, insts: 5, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0,
returnDoesNotAlias: 0}, calls: {{callee: ^5, hotness: unknown}, {callee: ^4,
hotness: unknown}}}}}
> ^4 = gv: {guid: 16434608426314478903, name: bar, summaries: {function:
{module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal:
1}, insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0,
returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs: {^1}}}}
> ^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries: {alias:
{module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live: 0, dsoLocal:
1}, aliasee: ^4}}}
> 
> Like metadata, the fields are tagged (currently using lower camel case,
maybe upper camel case would be preferable).
> 
> The proposed format has a structure that reflects the data structures in
the summary index. For example, consider the entry “^4”. This corresponds to the
function “bar”. The entry for that GUID in the GlobalValueMap contains a list of
summaries. For per-module summaries such as this, there will be at most one
summary (with no summary list for an external function like “foo”). In the
combined summary there may be multiple, e.g. in the case of linkonce_odr
functions which have definitions in multiple modules. The summary list for bar
(“^4”) contains a FunctionSummary, so the summary is tagged “function:”. The
FunctionSummary contains both a flags structure (inherited from the base
GlobalValueSummary class), and a funcFlags structure (specific to
FunctionSummary). It therefore contains a brace-enclosed list of flag
tags/values for each.
> 
> Where a global value summary references another global value summary (e.g.
via a call list, reference list, or aliasee), the entry is referenced by its
slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as “^4”.
> 
> Note that in comparison metadata assembly entries tend to be much more
decomposed since many metadata fields are themselves metadata (so then entries
tend to be shorter with references to other metadata nodes).
> 
> Currently, I am emitting the summary entries at the end, after the metadata
nodes. Note that the ModuleSummaryIndex is not currently referenced from the
Module, and isn’t currently created when parsing the Module IR bitcode (there is
a separate derived class for reading the ModuleSummaryIndex from bitcode). This
is because they are not currently used at the same time. However, in the future
there is no reason why we couldn’t tag the global values in the Module’s LLVM
assembly with the corresponding summary entry if the ModuleSummaryIndex is
available when printing the Module in the assembly writer. I.e. we could do the
following for “main” from the above example when printing the IR definition
(note the “^3” at the end):
> 
> define  dso_local i32 @main() #0 !dbg !17 ^3 {
I don't have any real preference regarding the syntax. Tagging the summary
for the IR definition is nice and it increases the readability but it might also
has problem. Summary is currently standalone and IR definition doesn't
really hold a reference to the summary. You have to lookup through GUID.
If you make summary to be tightly coupled with IR, should we verify the state of
the summary as part of IR verifier? I guess it is related to your concern in the
end of the email.
> 
> For CFI data structures, the format would be similar. It appears that
TypeIds are referred to by string name in the top level TypeIdMap (std::map
indexed by std::string type identifier), whereas they are referenced by GUID
within the FunctionSummary class (i.e. the TypeTests vector and the VFuncId
structure). For the LLVM assembly I think there should be a top level entry for
each TypeIdMap, which lists both the type identifier string and its GUID
(followed by its associated information stored in the map), and the
TypeTests/VFuncId references on the FunctionSummary entries can reference it by
summary slot number. I.e. something like:
> 
> ^1 = typeid: {guid: 12345, identifier: name_of_type, …
> ^2 = gv: {... {function: {.... typeTests: {^1, …
> 
> Peter - is that correct and does that sound ok?
> 
> Issues when Parsing of Summaries from Assembly
> --------------------------------------------------------------------
> 
> When reading an LLVM assembly file containing module summary entries, a
ModuleSummaryIndex will be created from the entries.
> 
> Things to consider are the behavior when:
> Invoked with “opt -module-summary” (which currently builds a new summary
index from the IR). Options:
> recompute summary and throw away summary in the assembly file
> ignore -module-summary and build the summary from the LLVM assembly
> give an error
> compare the two summaries (one created from the assembly and the new one
created by the analysis phase from the IR), and error if they are different.
> My opinion is to do a),  so that the behavior using -module-summary doesn’t
change. We also need a way to force building of a fresh module summary for cases
where the user has modified the LLVM assembly of the IR (see below).
I prefer a). d) can be achieved with a different pass.
> 
> How to handle older LLVM assembly files that don’t contain new summary
fields. Options:
> Force the LLVM assembly file to be recreated with a new summary. I.e. “opt
-module-summary -o - | llvm-dis”.
> Auto-upgrade, by silently creating conservative values for the new summary
entries.
> I lean towards b) (when possible) for user-friendliness and to reduce
required churn on test inputs.
Assembly file doesn't need to be compatible. For the older LLVM assembly
file don't contains new summary field, we can just error out. The story is
different for bitcode file and they should be auto upgraded for compatibility.
It is also necessary to debug summary. Otherwise, if you llvm-dis older bitcode,
you will get regenerated summary info.
> 
> How to handle partial or incorrect LLVM assembly summary entries. How to
handle partial summaries depends in part on how we answer the prior question
about auto-upgrading. I think the best option like there is to handle it
automatically when possible. However, I do think we should error on glaring
errors like obviously missing information. For example, when there is summary
data in the LLVM assembly, but summary entries are missing for some global
values. E.g. if the user modified the assembly to add a function but forgot to
add a corresponding summary entry. We could still have subtle issues (e.g. user
adds a new call but forgets to update the caller’s summary call list), but it
will be harder to detect those.
I would prefer to serialize them as them are in the error state if possible. We
can rely on a verifier pass to catch the errors so you have the option to skip
the verifier to see the wrong summary when debugging. This aligns with LLVM IR
as well.

Steven
> 
> 
> -- 
> Teresa Johnson |	 Software Engineer |	 tejohnson at google.com
<mailto:tejohnson at google.com> |	 408-460-2413
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180425/eebb3b4b/attachment.html>

Teresa Johnson via llvm-dev

2018-Apr-25 20:32 UTC

head link

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Hi Steven,

Thanks for your comments! Replies inline.

Teresa

On Wed, Apr 25, 2018 at 1:13 PM Steven Wu <stevenwu at apple.com> wrote:
> Hi Teresa
>
> Thanks for the proposal. Serializing out the summary in a readable format
> is very help for debugging and development. Some comments inline.
>
> On Apr 24, 2018, at 7:43 AM, Teresa Johnson via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi everyone,
>
> I started working on a long-standing request to have the summary dumped in
> a readable format to text, and specifically to emit to LLVM assembly.
> Proposal below, please let me know your thoughts.
>
> Thanks,
> Teresa
>
>
>
>
>
>
>
>
>
> *RFC: LLVM Assembly format for ThinLTO
>
Summary========================================Background-----------------ThinLTO
> operates on small summaries computed during the compile step (i.e. with “-c
> -flto=thin”), which are then analyzed and updated during the Thin Link
> stage, and utilized to perform IR updates during the post-link ThinLTO
> backends. The summaries are emitted as LLVM Bitcode, however, not currently
> in the LLVM assembly.There are two ways to generate a bitcode file
> containing summary records for a module: 1. Compile with “clang -c
> -flto=thin”2. Build from LLVM assembly using “opt -module-summary”Either of
> these will result in the ModuleSummaryIndex analysis pass (which builds the
> summary index in memory for a module) to be added to the pipeline just
> before bitcode emission.Additionally, a combined index is created by
> merging all the per-module indexes during the Thin Link, which is
> optionally emitted as a bitcode file.Currently, the only way to view these
> records is via “llvm-bcanalyzer -dump”, then manually decoding the raw
> bitcode dumps.Relatedly, there is YAML reader/writer support for CFI
> related summary fields (-wholeprogramdevirt-read-summary and
> -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles
> Saternos implemented support to dump the summary in YAML from llvm-lto2
> (D34080), including the rest of the summary fields (D34063), however, there
> was pushback on the related RFC for dumping via YAML or another format
> rather than emitting as LLVM assembly.*
>
>
> Can you elaborate what the reason for pushback for YAML support? I want to
> know what concern people have for YAML format so we can address them in
> LLVM assembly. I could not find much context reading through the review and
> mailing list.
>
The comments were on the RFC ("[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO
summary dump format") and the discussion on https://reviews.llvm.org/D34080
(the latter contains a more abbreviated version of the discussion on the
RFC, so that's a good place to look), were essentially that we should
prioritize a round-trippable assembly format, and doing YAML (or any other
dumper format) is going in the wrong direction from that.

>
>
>
>
>
>
>
>
>
>
> *Goals: 1. Define LLVM assembly format for summary index2. Define
> interaction between parsing of summary from LLVM assembly and synthesis of
> new summary index from IR.3. Implement printing and parsing of summary
> index LLVM assemblyProposed LLVM Assembly
> Format----------------------------------------------There are several top
> level data structures within the ModuleSummaryIndex: 1.
> ModulePathStringTable: Holds the paths to the modules summarized in the
> index (only one entry for per-module indexes and multiple in the combined
> index), along with their hashes (for incremental builds and global
> promotion).2. GlobalValueMap: A map from global value GUIDs to the
> corresponding function/variable/alias summary (or summaries for the
> combined index and weak linkage).3. CFI-related data structures (TypeIdMap,
> CfiFunctionDefs, and CfiFunctionDecls)I have a WIP patch to AsmWriter.cpp
> to print the ModuleSummaryIndex that I was using to play with the format.
> It currently prints 1 and 2 above. I’ve left the CFI related summary data
> structures as a TODO for now, until the format is at least conceptually
> agreed, but from looking at those I don’t see an issue with using the same
> format (with a note/question for Peter on CFI type test representation
> below).I modeled the proposed format on metadata, with a few key
> differences noted below. Like metadata, I propose enumerating the entries
> with the SlotTracker, and prefixing them with a special character. Avoiding
> characters already used in some fashion (i.e. “!” for metadata and “#” for
> attributes), I initially have chosen “^”. Open to suggestions though.*
>
>
> Is there any reason or downside for just using metadata for summary? We
> can just stream summary related metadata into summary block in bitcode.
>
I assume you mean use the metadata printing format, i.e. "!"? I
considered
that but it seemed a little odd to me since this isn't in fact metadata. We
could presumably use summary-specific tags that would indicate to the
parser that it is not in fact metadata but rather summary, but it seemed
cleaner to me to use a separate format.

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *Consider the following example:extern void foo();int X;int bar() {
>  foo();  return X;}void barAlias() __attribute__ ((alias
("bar")));int
> main() {  barAlias();  return bar();}The proposed format has one entry per
> ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks
> like:^0 = module: {path: testA.o, hash: 5487197307045666224}^1 = gv: {guid:
> 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags:
> {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}^2 = gv:
> {guid: 6699318081062747564, name: foo}^3 = gv: {guid: 15822663052811949562,
> name: main, summaries: {function: {module: ^0, flags: {linkage: extern,
> notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags:
> {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls:
> {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}^4 = gv:
> {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0,
> flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1},
> insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0,
> returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs:
> {^1}}}}^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries:
> {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live:
> 0, dsoLocal: 1}, aliasee: ^4}}}Like metadata, the fields are tagged
> (currently using lower camel case, maybe upper camel case would be
> preferable).The proposed format has a structure that reflects the data
> structures in the summary index. For example, consider the entry “^4”. This
> corresponds to the function “bar”. The entry for that GUID in the
> GlobalValueMap contains a list of summaries. For per-module summaries such
> as this, there will be at most one summary (with no summary list for an
> external function like “foo”). In the combined summary there may be
> multiple, e.g. in the case of linkonce_odr functions which have definitions
> in multiple modules. The summary list for bar (“^4”) contains a
> FunctionSummary, so the summary is tagged “function:”. The FunctionSummary
> contains both a flags structure (inherited from the base GlobalValueSummary
> class), and a funcFlags structure (specific to FunctionSummary). It
> therefore contains a brace-enclosed list of flag tags/values for each.Where
> a global value summary references another global value summary (e.g. via a
> call list, reference list, or aliasee), the entry is referenced by its
> slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as
> “^4”.Note that in comparison metadata assembly entries tend to be much more
> decomposed since many metadata fields are themselves metadata (so then
> entries tend to be shorter with references to other metadata
> nodes).Currently, I am emitting the summary entries at the end, after the
> metadata nodes. Note that the ModuleSummaryIndex is not currently
> referenced from the Module, and isn’t currently created when parsing the
> Module IR bitcode (there is a separate derived class for reading the
> ModuleSummaryIndex from bitcode). This is because they are not currently
> used at the same time. However, in the future there is no reason why we
> couldn’t tag the global values in the Module’s LLVM assembly with the
> corresponding summary entry if the ModuleSummaryIndex is available when
> printing the Module in the assembly writer. I.e. we could do the following
> for “main” from the above example when printing the IR definition (note the
> “^3” at the end):define  dso_local i32 @main() #0 !dbg !17 ^3 {*
>
>
> I don't have any real preference regarding the syntax. Tagging the
summary
> for the IR definition is nice and it increases the readability but it might
> also has problem. Summary is currently standalone and IR definition
doesn't
> really hold a reference to the summary. You have to lookup through GUID.
> If you make summary to be tightly coupled with IR, should we verify the
> state of the summary as part of IR verifier? I guess it is related to your
> concern in the end of the email.
>
Right - tagging the IR is something we can do in the future, assuming we
make the index available to the Module structure when printing, but there
isn't currently a reference to it. Presumably it would be optional, i.e. if
the summary index is available while printing, lookup the global's entry
via its GUID, which is easily accessed from the GlobalValue, and print that.

Regarding verification, I tend to think that it should be a separately
available option (as discussed below in your comments).

>
>
>
>
>
>
>
>
> *For CFI data structures, the format would be similar. It appears that
> TypeIds are referred to by string name in the top level TypeIdMap (std::map
> indexed by std::string type identifier), whereas they are referenced by
> GUID within the FunctionSummary class (i.e. the TypeTests vector and the
> VFuncId structure). For the LLVM assembly I think there should be a top
> level entry for each TypeIdMap, which lists both the type identifier string
> and its GUID (followed by its associated information stored in the map),
> and the TypeTests/VFuncId references on the FunctionSummary entries can
> reference it by summary slot number. I.e. something like:^1 = typeid:
> {guid: 12345, identifier: name_of_type, …^2 = gv: {... {function: {....
> typeTests: {^1, …Peter - is that correct and does that sound ok?Issues when
> Parsing of Summaries from
>
Assembly--------------------------------------------------------------------When
> reading an LLVM assembly file containing module summary entries, a
> ModuleSummaryIndex will be created from the entries.Things to consider are
> the behavior when: - Invoked with “opt -module-summary” (which currently
> builds a new summary index from the IR). Options:1. recompute summary and
> throw away summary in the assembly file2. ignore -module-summary and build
> the summary from the LLVM assembly3. give an error4. compare the two
> summaries (one created from the assembly and the new one created by the
> analysis phase from the IR), and error if they are different.My opinion is
> to do a),  so that the behavior using -module-summary doesn’t change. We
> also need a way to force building of a fresh module summary for cases where
> the user has modified the LLVM assembly of the IR (see below).*
>
>
> I prefer a). d) can be achieved with a different pass.
>
Great, agreed.

>
>
>
>
> * - How to handle older LLVM assembly files that don’t contain new summary
> fields. Options:1. Force the LLVM assembly file to be recreated with a new
> summary. I.e. “opt -module-summary -o - | llvm-dis”.2. Auto-upgrade, by
> silently creating conservative values for the new summary entries.I lean
> towards b) (when possible) for user-friendliness and to reduce required
> churn on test inputs.*
>
>
> Assembly file doesn't need to be compatible. For the older LLVM
assembly
> file don't contains new summary field, we can just error out. The story
is
> different for bitcode file and they should be auto upgraded for
> compatibility. It is also necessary to debug summary. Otherwise, if you
> llvm-dis older bitcode, you will get regenerated summary info.
>
Right, the bitcode summary format is auto-upgraded. I suppose there are not
so many test .ll files containing summary that it wouldn't be too difficult
to require that they be upgraded by patches that introduce new fields. I
could go either way here.

>
>
>
>
> * - How to handle partial or incorrect LLVM assembly summary entries. How
> to handle partial summaries depends in part on how we answer the prior
> question about auto-upgrading. I think the best option like there is to
> handle it automatically when possible. However, I do think we should error
> on glaring errors like obviously missing information. For example, when
> there is summary data in the LLVM assembly, but summary entries are missing
> for some global values. E.g. if the user modified the assembly to add a
> function but forgot to add a corresponding summary entry. We could still
> have subtle issues (e.g. user adds a new call but forgets to update the
> caller’s summary call list), but it will be harder to detect those.*
>
>
> I would prefer to serialize them as them are in the error state if
> possible. We can rely on a verifier pass to catch the errors so you have
> the option to skip the verifier to see the wrong summary when debugging.
> This aligns with LLVM IR as well.
>
So, deserialize it into bitcode (assuming that the summary in assembly
isn't so corrupted that we can't), then rely on a new summary verifier
that
runs afterwards (possibly optional) - did I summarize that correctly?

Teresa

>
> Steven
>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>  408-460-2413
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180425/7527aa23/attachment-0001.html>

Peter Collingbourne via llvm-dev

2018-Apr-25 21:08 UTC

head link

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Hi Teresa,

Thanks for sending this proposal out.

I would again like to register my disagreement with the whole idea of
writing summaries in LLVM assembly format. In my view it is clear that this
is not the right direction, as it only invites additional complexity and
more ways for things to go wrong for no real benefit. However, I don't have
the energy to argue that point any further, so I won't stand in the way
here.

On Tue, Apr 24, 2018 at 7:43 AM, Teresa Johnson <tejohnson at google.com>
wrote:
> Hi everyone,
>
> I started working on a long-standing request to have the summary dumped in
> a readable format to text, and specifically to emit to LLVM assembly.
> Proposal below, please let me know your thoughts.
>
> Thanks,
> Teresa
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *RFC: LLVM Assembly format for ThinLTO
>
Summary========================================Background-----------------ThinLTO
> operates on small summaries computed during the compile step (i.e. with “-c
> -flto=thin”), which are then analyzed and updated during the Thin Link
> stage, and utilized to perform IR updates during the post-link ThinLTO
> backends. The summaries are emitted as LLVM Bitcode, however, not currently
> in the LLVM assembly.There are two ways to generate a bitcode file
> containing summary records for a module: 1. Compile with “clang -c
> -flto=thin”2. Build from LLVM assembly using “opt -module-summary”Either of
> these will result in the ModuleSummaryIndex analysis pass (which builds the
> summary index in memory for a module) to be added to the pipeline just
> before bitcode emission.Additionally, a combined index is created by
> merging all the per-module indexes during the Thin Link, which is
> optionally emitted as a bitcode file.Currently, the only way to view these
> records is via “llvm-bcanalyzer -dump”, then manually decoding the raw
> bitcode dumps.Relatedly, there is YAML reader/writer support for CFI
> related summary fields (-wholeprogramdevirt-read-summary and
> -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles
> Saternos implemented support to dump the summary in YAML from llvm-lto2
> (D34080), including the rest of the summary fields (D34063), however, there
> was pushback on the related RFC for dumping via YAML or another format
> rather than emitting as LLVM assembly.Goals: 1. Define LLVM assembly format
> for summary index2. Define interaction between parsing of summary from LLVM
> assembly and synthesis of new summary index from IR.3. Implement printing
> and parsing of summary index LLVM assemblyProposed LLVM Assembly
> Format----------------------------------------------There are several top
> level data structures within the ModuleSummaryIndex: 1.
> ModulePathStringTable: Holds the paths to the modules summarized in the
> index (only one entry for per-module indexes and multiple in the combined
> index), along with their hashes (for incremental builds and global
> promotion).2. GlobalValueMap: A map from global value GUIDs to the
> corresponding function/variable/alias summary (or summaries for the
> combined index and weak linkage).3. CFI-related data structures (TypeIdMap,
> CfiFunctionDefs, and CfiFunctionDecls)I have a WIP patch to AsmWriter.cpp
> to print the ModuleSummaryIndex that I was using to play with the format.
> It currently prints 1 and 2 above. I’ve left the CFI related summary data
> structures as a TODO for now, until the format is at least conceptually
> agreed, but from looking at those I don’t see an issue with using the same
> format (with a note/question for Peter on CFI type test representation
> below).I modeled the proposed format on metadata, with a few key
> differences noted below. Like metadata, I propose enumerating the entries
> with the SlotTracker, and prefixing them with a special character. Avoiding
> characters already used in some fashion (i.e. “!” for metadata and “#” for
> attributes), I initially have chosen “^”. Open to suggestions
> though.Consider the following example:extern void foo();int X;int bar() {
>  foo();  return X;}void barAlias() __attribute__ ((alias
("bar")));int
> main() {  barAlias();  return bar();}The proposed format has one entry per
> ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks
> like:^0 = module: {path: testA.o, hash: 5487197307045666224}^1 = gv: {guid:
> 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags:
> {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}^2 = gv:
> {guid: 6699318081062747564, name: foo}^3 = gv: {guid: 15822663052811949562,
> name: main, summaries: {function: {module: ^0, flags: {linkage: extern,
> notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags:
> {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls:
> {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}^4 = gv:
> {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0,
> flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1},
> insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0,
> returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs:
> {^1}}}}^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries:
> {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live:
> 0, dsoLocal: 1}, aliasee: ^4}}}Like metadata, the fields are tagged
> (currently using lower camel case, maybe upper camel case would be
> preferable).The proposed format has a structure that reflects the data
> structures in the summary index. For example, consider the entry “^4”. This
> corresponds to the function “bar”. The entry for that GUID in the
> GlobalValueMap contains a list of summaries. For per-module summaries such
> as this, there will be at most one summary (with no summary list for an
> external function like “foo”). In the combined summary there may be
> multiple, e.g. in the case of linkonce_odr functions which have definitions
> in multiple modules. The summary list for bar (“^4”) contains a
> FunctionSummary, so the summary is tagged “function:”. The FunctionSummary
> contains both a flags structure (inherited from the base GlobalValueSummary
> class), and a funcFlags structure (specific to FunctionSummary). It
> therefore contains a brace-enclosed list of flag tags/values for each.Where
> a global value summary references another global value summary (e.g. via a
> call list, reference list, or aliasee), the entry is referenced by its
> slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as
> “^4”.Note that in comparison metadata assembly entries tend to be much more
> decomposed since many metadata fields are themselves metadata (so then
> entries tend to be shorter with references to other metadata
> nodes).Currently, I am emitting the summary entries at the end, after the
> metadata nodes. Note that the ModuleSummaryIndex is not currently
> referenced from the Module, and isn’t currently created when parsing the
> Module IR bitcode (there is a separate derived class for reading the
> ModuleSummaryIndex from bitcode). This is because they are not currently
> used at the same time. However, in the future there is no reason why we
> couldn’t tag the global values in the Module’s LLVM assembly with the
> corresponding summary entry if the ModuleSummaryIndex is available when
> printing the Module in the assembly writer. I.e. we could do the following
> for “main” from the above example when printing the IR definition (note the
> “^3” at the end):define  dso_local i32 @main() #0 !dbg !17 ^3 {For CFI data
> structures, the format would be similar. It appears that TypeIds are
> referred to by string name in the top level TypeIdMap (std::map indexed by
> std::string type identifier), whereas they are referenced by GUID within
> the FunctionSummary class (i.e. the TypeTests vector and the VFuncId
> structure). For the LLVM assembly I think there should be a top level entry
> for each TypeIdMap, which lists both the type identifier string and its
> GUID (followed by its associated information stored in the map), and the
> TypeTests/VFuncId references on the FunctionSummary entries can reference
> it by summary slot number. I.e. something like:^1 = typeid: {guid: 12345,
> identifier: name_of_type, …^2 = gv: {... {function: {.... typeTests: {^1,
> …Peter - is that correct and does that sound ok?*
>
I don't think that would work because the purpose of the top-level
TypeIdMap is to contain resolutions for each type identifier, and
per-module summaries do not contain resolutions (only the combined summary
does). What that means in practice is that we would not be able to recover
and write out a type identifier name for per-module summaries as part of ^1
in your example (well, we could in principle, because the name is stored
somewhere in the function's IR, but that could get complicated). Probably
the easiest thing to do is to keep the type identifiers as GUIDs in the
function summaries and write out the mapping of type identifiers as a
top-level entity.

Peter

>
>
>
>
>
>
>
>
>
>
> *Issues when Parsing of Summaries from
>
Assembly--------------------------------------------------------------------When
> reading an LLVM assembly file containing module summary entries, a
> ModuleSummaryIndex will be created from the entries.Things to consider are
> the behavior when: - Invoked with “opt -module-summary” (which currently
> builds a new summary index from the IR). Options:1. recompute summary and
> throw away summary in the assembly file2. ignore -module-summary and build
> the summary from the LLVM assembly3. give an error4. compare the two
> summaries (one created from the assembly and the new one created by the
> analysis phase from the IR), and error if they are different.My opinion is
> to do a),  so that the behavior using -module-summary doesn’t change. We
> also need a way to force building of a fresh module summary for cases where
> the user has modified the LLVM assembly of the IR (see below). - How to
> handle older LLVM assembly files that don’t contain new summary fields.
> Options:1. Force the LLVM assembly file to be recreated with a new summary.
> I.e. “opt -module-summary -o - | llvm-dis”.2. Auto-upgrade, by silently
> creating conservative values for the new summary entries.I lean towards b)
> (when possible) for user-friendliness and to reduce required churn on test
> inputs. - How to handle partial or incorrect LLVM assembly summary entries.
> How to handle partial summaries depends in part on how we answer the prior
> question about auto-upgrading. I think the best option like there is to
> handle it automatically when possible. However, I do think we should error
> on glaring errors like obviously missing information. For example, when
> there is summary data in the LLVM assembly, but summary entries are missing
> for some global values. E.g. if the user modified the assembly to add a
> function but forgot to add a corresponding summary entry. We could still
> have subtle issues (e.g. user adds a new call but forgets to update the
> caller’s summary call list), but it will be harder to detect those.*
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>  408-460-2413
>


-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180425/8e7354f0/attachment.html>

Teresa Johnson via llvm-dev

2018-Apr-30 15:32 UTC

head link

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Hi Peter,
Thanks for your comments, replies below.
Teresa

On Wed, Apr 25, 2018 at 2:08 PM Peter Collingbourne <peter at pcc.me.uk>
wrote:
> Hi Teresa,
>
> Thanks for sending this proposal out.
>
> I would again like to register my disagreement with the whole idea of
> writing summaries in LLVM assembly format. In my view it is clear that this
> is not the right direction, as it only invites additional complexity and
> more ways for things to go wrong for no real benefit. However, I don't
have
> the energy to argue that point any further, so I won't stand in the way
> here.
>
I assume you are most concerned with the re-assembly/deserialization of the
summary. My main goal is to get this dumped into a text format, and I went
this route since the last dumper RFC was blocked with the LLVM assembly
direction pushed.

> On Tue, Apr 24, 2018 at 7:43 AM, Teresa Johnson <tejohnson at
google.com>
> wrote:
>
>> Hi everyone,
>>
>> I started working on a long-standing request to have the summary dumped
>> in a readable format to text, and specifically to emit to LLVM
assembly.
>> Proposal below, please let me know your thoughts.
>>
>> Thanks,
>> Teresa
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *RFC: LLVM Assembly format for ThinLTO
>>
Summary========================================Background-----------------ThinLTO
>> operates on small summaries computed during the compile step (i.e. with
“-c
>> -flto=thin”), which are then analyzed and updated during the Thin Link
>> stage, and utilized to perform IR updates during the post-link ThinLTO
>> backends. The summaries are emitted as LLVM Bitcode, however, not
currently
>> in the LLVM assembly.There are two ways to generate a bitcode file
>> containing summary records for a module: 1. Compile with “clang -c
>> -flto=thin”2. Build from LLVM assembly using “opt
-module-summary”Either of
>> these will result in the ModuleSummaryIndex analysis pass (which builds
the
>> summary index in memory for a module) to be added to the pipeline just
>> before bitcode emission.Additionally, a combined index is created by
>> merging all the per-module indexes during the Thin Link, which is
>> optionally emitted as a bitcode file.Currently, the only way to view
these
>> records is via “llvm-bcanalyzer -dump”, then manually decoding the raw
>> bitcode dumps.Relatedly, there is YAML reader/writer support for CFI
>> related summary fields (-wholeprogramdevirt-read-summary and
>> -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles
>> Saternos implemented support to dump the summary in YAML from llvm-lto2
>> (D34080), including the rest of the summary fields (D34063), however,
there
>> was pushback on the related RFC for dumping via YAML or another format
>> rather than emitting as LLVM assembly.Goals: 1. Define LLVM assembly
format
>> for summary index2. Define interaction between parsing of summary from
LLVM
>> assembly and synthesis of new summary index from IR.3. Implement
printing
>> and parsing of summary index LLVM assemblyProposed LLVM Assembly
>> Format----------------------------------------------There are several
top
>> level data structures within the ModuleSummaryIndex: 1.
>> ModulePathStringTable: Holds the paths to the modules summarized in the
>> index (only one entry for per-module indexes and multiple in the
combined
>> index), along with their hashes (for incremental builds and global
>> promotion).2. GlobalValueMap: A map from global value GUIDs to the
>> corresponding function/variable/alias summary (or summaries for the
>> combined index and weak linkage).3. CFI-related data structures
(TypeIdMap,
>> CfiFunctionDefs, and CfiFunctionDecls)I have a WIP patch to
AsmWriter.cpp
>> to print the ModuleSummaryIndex that I was using to play with the
format.
>> It currently prints 1 and 2 above. I’ve left the CFI related summary
data
>> structures as a TODO for now, until the format is at least conceptually
>> agreed, but from looking at those I don’t see an issue with using the
same
>> format (with a note/question for Peter on CFI type test representation
>> below).I modeled the proposed format on metadata, with a few key
>> differences noted below. Like metadata, I propose enumerating the
entries
>> with the SlotTracker, and prefixing them with a special character.
Avoiding
>> characters already used in some fashion (i.e. “!” for metadata and “#”
for
>> attributes), I initially have chosen “^”. Open to suggestions
>> though.Consider the following example:extern void foo();int X;int bar()
{
>>  foo();  return X;}void barAlias() __attribute__ ((alias
("bar")));int
>> main() {  barAlias();  return bar();}The proposed format has one entry
per
>> ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks
>> like:^0 = module: {path: testA.o, hash: 5487197307045666224}^1 = gv:
{guid:
>> 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags:
>> {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}^2 =
gv:
>> {guid: 6699318081062747564, name: foo}^3 = gv: {guid:
15822663052811949562,
>> name: main, summaries: {function: {module: ^0, flags: {linkage: extern,
>> notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags:
>> {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls:
>> {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}^4 =
gv:
>> {guid: 16434608426314478903, name: bar, summaries: {function: {module:
^0,
>> flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1},
>> insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0,
>> returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs:
>> {^1}}}}^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries:
>> {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0,
live:
>> 0, dsoLocal: 1}, aliasee: ^4}}}Like metadata, the fields are tagged
>> (currently using lower camel case, maybe upper camel case would be
>> preferable).The proposed format has a structure that reflects the data
>> structures in the summary index. For example, consider the entry “^4”.
This
>> corresponds to the function “bar”. The entry for that GUID in the
>> GlobalValueMap contains a list of summaries. For per-module summaries
such
>> as this, there will be at most one summary (with no summary list for an
>> external function like “foo”). In the combined summary there may be
>> multiple, e.g. in the case of linkonce_odr functions which have
definitions
>> in multiple modules. The summary list for bar (“^4”) contains a
>> FunctionSummary, so the summary is tagged “function:”. The
FunctionSummary
>> contains both a flags structure (inherited from the base
GlobalValueSummary
>> class), and a funcFlags structure (specific to FunctionSummary). It
>> therefore contains a brace-enclosed list of flag tags/values for
each.Where
>> a global value summary references another global value summary (e.g.
via a
>> call list, reference list, or aliasee), the entry is referenced by its
>> slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as
>> “^4”.Note that in comparison metadata assembly entries tend to be much
more
>> decomposed since many metadata fields are themselves metadata (so then
>> entries tend to be shorter with references to other metadata
>> nodes).Currently, I am emitting the summary entries at the end, after
the
>> metadata nodes. Note that the ModuleSummaryIndex is not currently
>> referenced from the Module, and isn’t currently created when parsing
the
>> Module IR bitcode (there is a separate derived class for reading the
>> ModuleSummaryIndex from bitcode). This is because they are not
currently
>> used at the same time. However, in the future there is no reason why we
>> couldn’t tag the global values in the Module’s LLVM assembly with the
>> corresponding summary entry if the ModuleSummaryIndex is available when
>> printing the Module in the assembly writer. I.e. we could do the
following
>> for “main” from the above example when printing the IR definition (note
the
>> “^3” at the end):define  dso_local i32 @main() #0 !dbg !17 ^3 {For CFI
data
>> structures, the format would be similar. It appears that TypeIds are
>> referred to by string name in the top level TypeIdMap (std::map indexed
by
>> std::string type identifier), whereas they are referenced by GUID
within
>> the FunctionSummary class (i.e. the TypeTests vector and the VFuncId
>> structure). For the LLVM assembly I think there should be a top level
entry
>> for each TypeIdMap, which lists both the type identifier string and its
>> GUID (followed by its associated information stored in the map), and
the
>> TypeTests/VFuncId references on the FunctionSummary entries can
reference
>> it by summary slot number. I.e. something like:^1 = typeid: {guid:
12345,
>> identifier: name_of_type, …^2 = gv: {... {function: {.... typeTests:
{^1,
>> …Peter - is that correct and does that sound ok?*
>>
>
> I don't think that would work because the purpose of the top-level
> TypeIdMap is to contain resolutions for each type identifier, and
> per-module summaries do not contain resolutions (only the combined summary
> does). What that means in practice is that we would not be able to recover
> and write out a type identifier name for per-module summaries as part of ^1
> in your example (well, we could in principle, because the name is stored
> somewhere in the function's IR, but that could get complicated).
>
Ah ok. I guess the top-level map then is generated by the regular LTO
portion of the link (since it presumably requires IR during the Thin Link
to get into the combined summary)?

Probably the easiest thing to do is to keep the type identifiers as
GUIDs> in the function summaries and write out the mapping of type identifiers as
> a top-level entity.
>
To confirm, you mean during the compile step create a top-level entity that
maps GUID -> identifier?

Thanks,
Teresa

>
> Peter
>
>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *Issues when Parsing of Summaries from
>>
Assembly--------------------------------------------------------------------When
>> reading an LLVM assembly file containing module summary entries, a
>> ModuleSummaryIndex will be created from the entries.Things to consider
are
>> the behavior when: - Invoked with “opt -module-summary” (which
currently
>> builds a new summary index from the IR). Options:1. recompute summary
and
>> throw away summary in the assembly file2. ignore -module-summary and
build
>> the summary from the LLVM assembly3. give an error4. compare the two
>> summaries (one created from the assembly and the new one created by the
>> analysis phase from the IR), and error if they are different.My opinion
is
>> to do a),  so that the behavior using -module-summary doesn’t change.
We
>> also need a way to force building of a fresh module summary for cases
where
>> the user has modified the LLVM assembly of the IR (see below). - How to
>> handle older LLVM assembly files that don’t contain new summary fields.
>> Options:1. Force the LLVM assembly file to be recreated with a new
summary.
>> I.e. “opt -module-summary -o - | llvm-dis”.2. Auto-upgrade, by silently
>> creating conservative values for the new summary entries.I lean towards
b)
>> (when possible) for user-friendliness and to reduce required churn on
test
>> inputs. - How to handle partial or incorrect LLVM assembly summary
entries.
>> How to handle partial summaries depends in part on how we answer the
prior
>> question about auto-upgrading. I think the best option like there is to
>> handle it automatically when possible. However, I do think we should
error
>> on glaring errors like obviously missing information. For example, when
>> there is summary data in the LLVM assembly, but summary entries are
missing
>> for some global values. E.g. if the user modified the assembly to add a
>> function but forgot to add a corresponding summary entry. We could
still
>> have subtle issues (e.g. user adds a new call but forgets to update the
>> caller’s summary call list), but it will be harder to detect those.*
>>
>> --
>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>>  408-460-2413
>>
>
>
>
> --
> --
> Peter
>

-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180430/365f5ca9/attachment.html>

David Blaikie via llvm-dev

2018-Apr-30 18:52 UTC

head link

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Hi Teresa,

Awesome to see - looking forward to it!

On Tue, Apr 24, 2018 at 7:44 AM Teresa Johnson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi everyone,
>
> I started working on a long-standing request to have the summary dumped in
> a readable format to text, and specifically to emit to LLVM assembly.
> Proposal below, please let me know your thoughts.
>
> Thanks,
> Teresa
>
>
>
>
>
>
> *RFC: LLVM Assembly format for ThinLTO
>
Summary========================================Background-----------------ThinLTO
> operates on small summaries computed during the compile step (i.e. with “-c
> -flto=thin”), which are then analyzed and updated during the Thin Link
> stage, and utilized to perform IR updates during the post-link ThinLTO
> backends. The summaries are emitted as LLVM Bitcode, however, not currently
> in the LLVM assembly.There are two ways to generate a bitcode file
> containing summary records for a module: 1. Compile with “clang -c
> -flto=thin”*
>
As an aside - I seem to recall that at least internally at Google some kind
of summary-only bitcode files are used (so that the whole bitcode file
(especially in builds with debug info) doesn't have to be shipped to the
node doing the summary merging). How are those summary-only files
produced? Is that upstream? Or done in a more low-level way (like an
objcopy, llvm-* tool invocation done as a post-processing step, etc)?

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> * 1. Build from LLVM assembly using “opt -module-summary”Either of these
> will result in the ModuleSummaryIndex analysis pass (which builds the
> summary index in memory for a module) to be added to the pipeline just
> before bitcode emission.Additionally, a combined index is created by
> merging all the per-module indexes during the Thin Link, which is
> optionally emitted as a bitcode file.Currently, the only way to view these
> records is via “llvm-bcanalyzer -dump”, then manually decoding the raw
> bitcode dumps.Relatedly, there is YAML reader/writer support for CFI
> related summary fields (-wholeprogramdevirt-read-summary and
> -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles
> Saternos implemented support to dump the summary in YAML from llvm-lto2
> (D34080), including the rest of the summary fields (D34063), however, there
> was pushback on the related RFC for dumping via YAML or another format
> rather than emitting as LLVM assembly.Goals: 1. Define LLVM assembly format
> for summary index2. Define interaction between parsing of summary from LLVM
> assembly and synthesis of new summary index from IR.3. Implement printing
> and parsing of summary index LLVM assemblyProposed LLVM Assembly
> Format----------------------------------------------There are several top
> level data structures within the ModuleSummaryIndex: 1.
> ModulePathStringTable: Holds the paths to the modules summarized in the
> index (only one entry for per-module indexes and multiple in the combined
> index), along with their hashes (for incremental builds and global
> promotion).2. GlobalValueMap: A map from global value GUIDs to the
> corresponding function/variable/alias summary (or summaries for the
> combined index and weak linkage).3. CFI-related data structures (TypeIdMap,
> CfiFunctionDefs, and CfiFunctionDecls)I have a WIP patch to AsmWriter.cpp
> to print the ModuleSummaryIndex that I was using to play with the format.
> It currently prints 1 and 2 above. I’ve left the CFI related summary data
> structures as a TODO for now, until the format is at least conceptually
> agreed, but from looking at those I don’t see an issue with using the same
> format (with a note/question for Peter on CFI type test representation
> below).I modeled the proposed format on metadata, with a few key
> differences noted below. Like metadata, I propose enumerating the entries
> with the SlotTracker, and prefixing them with a special character. Avoiding
> characters already used in some fashion (i.e. “!” for metadata and “#” for
> attributes), I initially have chosen “^”. Open to suggestions
> though.Consider the following example:extern void foo();int X;int bar() {
>  foo();  return X;}void barAlias() __attribute__ ((alias
("bar")));int
> main() {  barAlias();  return bar();}The proposed format has one entry per
> ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks
> like:^0 = module: {path: testA.o, hash: 5487197307045666224}^1 = gv: {guid:
> 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags:
> {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}^2 = gv:
> {guid: 6699318081062747564, name: foo}^3 = gv: {guid: 15822663052811949562,
> name: main, summaries: {function: {module: ^0, flags: {linkage: extern,
> notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags:
> {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls:
> {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}^4 = gv:
> {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0,
> flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1},
> insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0,
> returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs:
> {^1}}}}^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries:
> {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live:
> 0, dsoLocal: 1}, aliasee: ^4}}}*
>
Syntax seems pretty good to me!

>
>
>
>
>
>
>
>
>
>
>
>
>
> *Like metadata, the fields are tagged (currently using lower camel case,
> maybe upper camel case would be preferable).The proposed format has a
> structure that reflects the data structures in the summary index. For
> example, consider the entry “^4”. This corresponds to the function “bar”.
> The entry for that GUID in the GlobalValueMap contains a list of summaries.
> For per-module summaries such as this, there will be at most one summary
> (with no summary list for an external function like “foo”). In the combined
> summary there may be multiple, e.g. in the case of linkonce_odr functions
> which have definitions in multiple modules. The summary list for bar (“^4”)
> contains a FunctionSummary, so the summary is tagged “function:”. The
> FunctionSummary contains both a flags structure (inherited from the base
> GlobalValueSummary class), and a funcFlags structure (specific to
> FunctionSummary). It therefore contains a brace-enclosed list of flag
> tags/values for each.Where a global value summary references another global
> value summary (e.g. via a call list, reference list, or aliasee), the entry
> is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its
> aliasee “bar” as “^4”.Note that in comparison metadata assembly entries
> tend to be much more decomposed since many metadata fields are themselves
> metadata (so then entries tend to be shorter with references to other
> metadata nodes).Currently, I am emitting the summary entries at the end,
> after the metadata nodes. Note that the ModuleSummaryIndex is not currently
> referenced from the Module, and isn’t currently created when parsing the
> Module IR bitcode (there is a separate derived class for reading the
> ModuleSummaryIndex from bitcode). This is because they are not currently
> used at the same time. However, in the future there is no reason why we
> couldn’t tag the global values in the Module’s LLVM assembly with the
> corresponding summary entry if the ModuleSummaryIndex is available when
> printing the Module in the assembly writer. I.e. we could do the following
> for “main” from the above example when printing the IR definition (note the
> “^3” at the end):define  dso_local i32 @main() #0 !dbg !17 ^3 {For CFI data
> structures, the format would be similar. It appears that TypeIds are
> referred to by string name in the top level TypeIdMap (std::map indexed by
> std::string type identifier), whereas they are referenced by GUID within
> the FunctionSummary class (i.e. the TypeTests vector and the VFuncId
> structure). For the LLVM assembly I think there should be a top level entry
> for each TypeIdMap, which lists both the type identifier string and its
> GUID (followed by its associated information stored in the map), and the
> TypeTests/VFuncId references on the FunctionSummary entries can reference
> it by summary slot number. I.e. something like:^1 = typeid: {guid: 12345,
> identifier: name_of_type, …^2 = gv: {... {function: {.... typeTests: {^1,
> …Peter - is that correct and does that sound ok?Issues when Parsing of
> Summaries from
>
Assembly--------------------------------------------------------------------When
> reading an LLVM assembly file containing module summary entries, a
> ModuleSummaryIndex will be created from the entries.Things to consider are
> the behavior when: - Invoked with “opt -module-summary” (which currently
> builds a new summary index from the IR). Options:*
>
>
> * 1. recompute summary and throw away summary in the assembly file*
>
What happens currently if you run `opt -module-summary` on a bitcode file
that already contains a summary? I feel like the behavior should be the
same when run on a textual IR file containing a summary, probably?

>
>
>
>
>
> * 1. ignore -module-summary and build the summary from the LLVM assembly2.
> give an error3. compare the two summaries (one created from the assembly
> and the new one created by the analysis phase from the IR), and error if
> they are different.My opinion is to do a),  so that the behavior using
> -module-summary doesn’t change. We also need a way to force building of a
> fresh module summary for cases where the user has modified the LLVM
> assembly of the IR (see below). - How to handle older LLVM assembly files
> that don’t contain new summary fields. Options:*
>
Same thoughts would apply here for "what do we do in the bitcode case"
-
with the option to not support old/difficult textual IR. If there are
easy/obvious defaults, I'd say it's probably worth baking those in
(&
baking them in even for the existing fields we know about, to make it
easier to write more terse test cases that don't have to
verbosily/redundantly specify lots of default values?) to the
parsing/loading logic?

>
>
>
>
>
> * 1. Force the LLVM assembly file to be recreated with a new summary. I.e.
> “opt -module-summary -o - | llvm-dis”.2. Auto-upgrade, by silently creating
> conservative values for the new summary entries.I lean towards b) (when
> possible) for user-friendliness and to reduce required churn on test
> inputs. - How to handle partial or incorrect LLVM assembly summary entries.
> How to handle partial summaries depends in part on how we answer the prior
> question about auto-upgrading. I think the best option like there is to
> handle it automatically when possible. However, I do think we should error
> on glaring errors like obviously missing information. For example, when
> there is summary data in the LLVM assembly, but summary entries are missing
> for some global values. E.g. if the user modified the assembly to add a
> function but forgot to add a corresponding summary entry. We could still
> have subtle issues (e.g. user adds a new call but forgets to update the
> caller’s summary call list), but it will be harder to detect those.*
>
I'd be OK with the summary being validated by the IR validator (same way
other properties of IR are validated & even simple things like if you use
the wrong IR type to refer to an IR value, you get a parse error, etc) -
which, I realize, would make it feel like the textual summary was entirely
redundant (except in cases of standalone summaries - which I imagine will
be the common case in tests, because the summary processing should be
tested in isolation (except for testing things like this validation logic
itself, etc)).

- Dave

>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
> 408-460-2413 <(408)%20460-2413>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180430/978ae489/attachment.html>

Teresa Johnson via llvm-dev

2018-May-01 17:48 UTC

head link

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Hi David,
Thanks for the comments, replies below.
Teresa

On Mon, Apr 30, 2018 at 11:52 AM David Blaikie <dblaikie at gmail.com>
wrote:
> Hi Teresa,
>
> Awesome to see - looking forward to it!
>
> On Tue, Apr 24, 2018 at 7:44 AM Teresa Johnson via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi everyone,
>>
>> I started working on a long-standing request to have the summary dumped
>> in a readable format to text, and specifically to emit to LLVM
assembly.
>> Proposal below, please let me know your thoughts.
>>
>> Thanks,
>> Teresa
>>
>>
>>
>>
>>
>>
>> *RFC: LLVM Assembly format for ThinLTO
>>
Summary========================================Background-----------------ThinLTO
>> operates on small summaries computed during the compile step (i.e. with
“-c
>> -flto=thin”), which are then analyzed and updated during the Thin Link
>> stage, and utilized to perform IR updates during the post-link ThinLTO
>> backends. The summaries are emitted as LLVM Bitcode, however, not
currently
>> in the LLVM assembly.There are two ways to generate a bitcode file
>> containing summary records for a module: 1. Compile with “clang -c
>> -flto=thin”*
>>
>
> As an aside - I seem to recall that at least internally at Google some
> kind of summary-only bitcode files are used (so that the whole bitcode file
> (especially in builds with debug info) doesn't have to be shipped to
the
> node doing the summary merging). How are those summary-only files
> produced? Is that upstream? Or done in a more low-level way (like an
> objcopy, llvm-* tool invocation done as a post-processing step, etc)?
>
This is done upstream, under a special clang option that can be given in
addition to -flto=thin, so that the compile step emits both the full
IR+summary (for the distributed backends) as well as a minimized bitcode
file with summary (for the thin link). Note that the distributed backends
don't actually need the summary with the IR (as it gets all the info it
needs from the combined summary index written out by the thin link), so we
could theoretically improve this to suppress the summary write for that
first file under that option.

>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> * 1. Build from LLVM assembly using “opt -module-summary”Either of
these
>> will result in the ModuleSummaryIndex analysis pass (which builds the
>> summary index in memory for a module) to be added to the pipeline just
>> before bitcode emission.Additionally, a combined index is created by
>> merging all the per-module indexes during the Thin Link, which is
>> optionally emitted as a bitcode file.Currently, the only way to view
these
>> records is via “llvm-bcanalyzer -dump”, then manually decoding the raw
>> bitcode dumps.Relatedly, there is YAML reader/writer support for CFI
>> related summary fields (-wholeprogramdevirt-read-summary and
>> -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles
>> Saternos implemented support to dump the summary in YAML from llvm-lto2
>> (D34080), including the rest of the summary fields (D34063), however,
there
>> was pushback on the related RFC for dumping via YAML or another format
>> rather than emitting as LLVM assembly.Goals: 1. Define LLVM assembly
format
>> for summary index2. Define interaction between parsing of summary from
LLVM
>> assembly and synthesis of new summary index from IR.3. Implement
printing
>> and parsing of summary index LLVM assemblyProposed LLVM Assembly
>> Format----------------------------------------------There are several
top
>> level data structures within the ModuleSummaryIndex: 1.
>> ModulePathStringTable: Holds the paths to the modules summarized in the
>> index (only one entry for per-module indexes and multiple in the
combined
>> index), along with their hashes (for incremental builds and global
>> promotion).2. GlobalValueMap: A map from global value GUIDs to the
>> corresponding function/variable/alias summary (or summaries for the
>> combined index and weak linkage).3. CFI-related data structures
(TypeIdMap,
>> CfiFunctionDefs, and CfiFunctionDecls)I have a WIP patch to
AsmWriter.cpp
>> to print the ModuleSummaryIndex that I was using to play with the
format.
>> It currently prints 1 and 2 above. I’ve left the CFI related summary
data
>> structures as a TODO for now, until the format is at least conceptually
>> agreed, but from looking at those I don’t see an issue with using the
same
>> format (with a note/question for Peter on CFI type test representation
>> below).I modeled the proposed format on metadata, with a few key
>> differences noted below. Like metadata, I propose enumerating the
entries
>> with the SlotTracker, and prefixing them with a special character.
Avoiding
>> characters already used in some fashion (i.e. “!” for metadata and “#”
for
>> attributes), I initially have chosen “^”. Open to suggestions
>> though.Consider the following example:extern void foo();int X;int bar()
{
>>  foo();  return X;}void barAlias() __attribute__ ((alias
("bar")));int
>> main() {  barAlias();  return bar();}The proposed format has one entry
per
>> ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks
>> like:^0 = module: {path: testA.o, hash: 5487197307045666224}^1 = gv:
{guid:
>> 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags:
>> {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}^2 =
gv:
>> {guid: 6699318081062747564, name: foo}^3 = gv: {guid:
15822663052811949562,
>> name: main, summaries: {function: {module: ^0, flags: {linkage: extern,
>> notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags:
>> {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls:
>> {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}^4 =
gv:
>> {guid: 16434608426314478903, name: bar, summaries: {function: {module:
^0,
>> flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1},
>> insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0,
>> returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs:
>> {^1}}}}^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries:
>> {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0,
live:
>> 0, dsoLocal: 1}, aliasee: ^4}}}*
>>
>
> Syntax seems pretty good to me!
>
Great!

>
>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *Like metadata, the fields are tagged (currently using lower camel
case,
>> maybe upper camel case would be preferable).The proposed format has a
>> structure that reflects the data structures in the summary index. For
>> example, consider the entry “^4”. This corresponds to the function
“bar”.
>> The entry for that GUID in the GlobalValueMap contains a list of
summaries.
>> For per-module summaries such as this, there will be at most one
summary
>> (with no summary list for an external function like “foo”). In the
combined
>> summary there may be multiple, e.g. in the case of linkonce_odr
functions
>> which have definitions in multiple modules. The summary list for bar
(“^4”)
>> contains a FunctionSummary, so the summary is tagged “function:”. The
>> FunctionSummary contains both a flags structure (inherited from the
base
>> GlobalValueSummary class), and a funcFlags structure (specific to
>> FunctionSummary). It therefore contains a brace-enclosed list of flag
>> tags/values for each.Where a global value summary references another
global
>> value summary (e.g. via a call list, reference list, or aliasee), the
entry
>> is referenced by its slot. E.g. the alias “barAlias” (“^5”) references
its
>> aliasee “bar” as “^4”.Note that in comparison metadata assembly entries
>> tend to be much more decomposed since many metadata fields are
themselves
>> metadata (so then entries tend to be shorter with references to other
>> metadata nodes).Currently, I am emitting the summary entries at the
end,
>> after the metadata nodes. Note that the ModuleSummaryIndex is not
currently
>> referenced from the Module, and isn’t currently created when parsing
the
>> Module IR bitcode (there is a separate derived class for reading the
>> ModuleSummaryIndex from bitcode). This is because they are not
currently
>> used at the same time. However, in the future there is no reason why we
>> couldn’t tag the global values in the Module’s LLVM assembly with the
>> corresponding summary entry if the ModuleSummaryIndex is available when
>> printing the Module in the assembly writer. I.e. we could do the
following
>> for “main” from the above example when printing the IR definition (note
the
>> “^3” at the end):define  dso_local i32 @main() #0 !dbg !17 ^3 {For CFI
data
>> structures, the format would be similar. It appears that TypeIds are
>> referred to by string name in the top level TypeIdMap (std::map indexed
by
>> std::string type identifier), whereas they are referenced by GUID
within
>> the FunctionSummary class (i.e. the TypeTests vector and the VFuncId
>> structure). For the LLVM assembly I think there should be a top level
entry
>> for each TypeIdMap, which lists both the type identifier string and its
>> GUID (followed by its associated information stored in the map), and
the
>> TypeTests/VFuncId references on the FunctionSummary entries can
reference
>> it by summary slot number. I.e. something like:^1 = typeid: {guid:
12345,
>> identifier: name_of_type, …^2 = gv: {... {function: {.... typeTests:
{^1,
>> …Peter - is that correct and does that sound ok?Issues when Parsing of
>> Summaries from
>>
Assembly--------------------------------------------------------------------When
>> reading an LLVM assembly file containing module summary entries, a
>> ModuleSummaryIndex will be created from the entries.Things to consider
are
>> the behavior when: - Invoked with “opt -module-summary” (which
currently
>> builds a new summary index from the IR). Options:*
>>
>
>>
>> * 1. recompute summary and throw away summary in the assembly file*
>>
>
> What happens currently if you run `opt -module-summary` on a bitcode file
> that already contains a summary? I feel like the behavior should be the
> same when run on a textual IR file containing a summary, probably?
>
We rebuild the summary. Note that this in part is due to the fact mentioned
above that we have separate readers for the Module IR and the summary. The
opt tool does not even read the summary if present. We currently only read
the summary during the thin link (when building the combined index for
analysis), and in the distributed backends where we read the combined
summary index file emitted for that file by the distributed thin link.

>
>>
>>
>>
>>
>>
>> * 1. ignore -module-summary and build the summary from the LLVM
>> assembly2. give an error3. compare the two summaries (one created from
the
>> assembly and the new one created by the analysis phase from the IR),
and
>> error if they are different.My opinion is to do a),  so that the
behavior
>> using -module-summary doesn’t change. We also need a way to force
building
>> of a fresh module summary for cases where the user has modified the
LLVM
>> assembly of the IR (see below). - How to handle older LLVM assembly
files
>> that don’t contain new summary fields. Options:*
>>
>
> Same thoughts would apply here for "what do we do in the bitcode
case" -
> with the option to not support old/difficult textual IR. If there are
> easy/obvious defaults, I'd say it's probably worth baking those in
(&
> baking them in even for the existing fields we know about, to make it
> easier to write more terse test cases that don't have to
> verbosily/redundantly specify lots of default values?) to the
> parsing/loading logic?
>
So we do emit an index version in the bitcode, and auto-upgrade in a
conservative manner anything that wasn't emitted prior. We could presumably
serialize out the version number and handle auto-upgrading from textual
assembly the same way (as the version is bumped beyond the current version
at least). If we want to allow omission of some fields for test simplicity,
we could do a similar thing and apply conservative values where possible
for omitted fields (e.g. the flags). That seems fine to me, in which case I
don't think we need a version number. Although this has implications for
the validator, see below.

>
>>
>>
>>
>>
>>
>> * 1. Force the LLVM assembly file to be recreated with a new summary.
>> I.e. “opt -module-summary -o - | llvm-dis”.2. Auto-upgrade, by silently
>> creating conservative values for the new summary entries.I lean towards
b)
>> (when possible) for user-friendliness and to reduce required churn on
test
>> inputs. - How to handle partial or incorrect LLVM assembly summary
entries.
>> How to handle partial summaries depends in part on how we answer the
prior
>> question about auto-upgrading. I think the best option like there is to
>> handle it automatically when possible. However, I do think we should
error
>> on glaring errors like obviously missing information. For example, when
>> there is summary data in the LLVM assembly, but summary entries are
missing
>> for some global values. E.g. if the user modified the assembly to add a
>> function but forgot to add a corresponding summary entry. We could
still
>> have subtle issues (e.g. user adds a new call but forgets to update the
>> caller’s summary call list), but it will be harder to detect those.*
>>
>
> I'd be OK with the summary being validated by the IR validator (same
way
> other properties of IR are validated & even simple things like if you
use
> the wrong IR type to refer to an IR value, you get a parse error, etc) -
> which, I realize, would make it feel like the textual summary was entirely
> redundant
>
It is redundant when the IR is also available, which relates to Peter and
others' objections to serializing this back in. An issue with validation
would be if we allowed omission of some fields and/or auto-upgrading as
discussed above. The applied conservative values might very well not match
the recomputed values. But as I mentioned here we may just want to validate
for glaring errors like required info - i.e. I think we should require that
every GV has an associated summary entry.

(except in cases of standalone summaries - which I imagine will be
the> common case in tests, because the summary processing should be tested in
> isolation (except for testing things like this validation logic itself,
> etc)).
>
Yes, I suspect the biggest usage in tests would be a standalone combined
summary file that we can use to test the application of the thin link
optimizations on a single IR file in the LTO backend pipeline. I.e the
input to the test would be one module IR assembly file (no summary) and one
combined index assembly file, it would run just the ThinLTO backend
pipeline, and check the resulting IR via llvm-dis to ensure the
optimization is applied effectively.


> - Dave
>
>
>>
>>
>> --
>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>> 408-460-2413 <(408)%20460-2413>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180501/05e933bd/attachment-0001.html>

Peter Collingbourne via llvm-dev

2018-May-03 21:58 UTC

head link

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Hi Teresa,

I have re-read your proposal, and I'm not getting how you plan to represent
combined summaries with this. Unless I'm missing something, there
doesn't
seem to be a way to write out summaries that is independent of the global
values that they relate to. Is that something that you plan to address
later?

Peter

On Tue, Apr 24, 2018 at 7:43 AM, Teresa Johnson <tejohnson at google.com>
wrote:
> Hi everyone,
>
> I started working on a long-standing request to have the summary dumped in
> a readable format to text, and specifically to emit to LLVM assembly.
> Proposal below, please let me know your thoughts.
>
> Thanks,
> Teresa
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *RFC: LLVM Assembly format for ThinLTO
>
Summary========================================Background-----------------ThinLTO
> operates on small summaries computed during the compile step (i.e. with “-c
> -flto=thin”), which are then analyzed and updated during the Thin Link
> stage, and utilized to perform IR updates during the post-link ThinLTO
> backends. The summaries are emitted as LLVM Bitcode, however, not currently
> in the LLVM assembly.There are two ways to generate a bitcode file
> containing summary records for a module: 1. Compile with “clang -c
> -flto=thin”2. Build from LLVM assembly using “opt -module-summary”Either of
> these will result in the ModuleSummaryIndex analysis pass (which builds the
> summary index in memory for a module) to be added to the pipeline just
> before bitcode emission.Additionally, a combined index is created by
> merging all the per-module indexes during the Thin Link, which is
> optionally emitted as a bitcode file.Currently, the only way to view these
> records is via “llvm-bcanalyzer -dump”, then manually decoding the raw
> bitcode dumps.Relatedly, there is YAML reader/writer support for CFI
> related summary fields (-wholeprogramdevirt-read-summary and
> -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles
> Saternos implemented support to dump the summary in YAML from llvm-lto2
> (D34080), including the rest of the summary fields (D34063), however, there
> was pushback on the related RFC for dumping via YAML or another format
> rather than emitting as LLVM assembly.Goals: 1. Define LLVM assembly format
> for summary index2. Define interaction between parsing of summary from LLVM
> assembly and synthesis of new summary index from IR.3. Implement printing
> and parsing of summary index LLVM assemblyProposed LLVM Assembly
> Format----------------------------------------------There are several top
> level data structures within the ModuleSummaryIndex: 1.
> ModulePathStringTable: Holds the paths to the modules summarized in the
> index (only one entry for per-module indexes and multiple in the combined
> index), along with their hashes (for incremental builds and global
> promotion).2. GlobalValueMap: A map from global value GUIDs to the
> corresponding function/variable/alias summary (or summaries for the
> combined index and weak linkage).3. CFI-related data structures (TypeIdMap,
> CfiFunctionDefs, and CfiFunctionDecls)I have a WIP patch to AsmWriter.cpp
> to print the ModuleSummaryIndex that I was using to play with the format.
> It currently prints 1 and 2 above. I’ve left the CFI related summary data
> structures as a TODO for now, until the format is at least conceptually
> agreed, but from looking at those I don’t see an issue with using the same
> format (with a note/question for Peter on CFI type test representation
> below).I modeled the proposed format on metadata, with a few key
> differences noted below. Like metadata, I propose enumerating the entries
> with the SlotTracker, and prefixing them with a special character. Avoiding
> characters already used in some fashion (i.e. “!” for metadata and “#” for
> attributes), I initially have chosen “^”. Open to suggestions
> though.Consider the following example:extern void foo();int X;int bar() {
>  foo();  return X;}void barAlias() __attribute__ ((alias
("bar")));int
> main() {  barAlias();  return bar();}The proposed format has one entry per
> ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks
> like:^0 = module: {path: testA.o, hash: 5487197307045666224}^1 = gv: {guid:
> 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags:
> {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}^2 = gv:
> {guid: 6699318081062747564, name: foo}^3 = gv: {guid: 15822663052811949562,
> name: main, summaries: {function: {module: ^0, flags: {linkage: extern,
> notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags:
> {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls:
> {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}^4 = gv:
> {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0,
> flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1},
> insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0,
> returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs:
> {^1}}}}^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries:
> {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live:
> 0, dsoLocal: 1}, aliasee: ^4}}}Like metadata, the fields are tagged
> (currently using lower camel case, maybe upper camel case would be
> preferable).The proposed format has a structure that reflects the data
> structures in the summary index. For example, consider the entry “^4”. This
> corresponds to the function “bar”. The entry for that GUID in the
> GlobalValueMap contains a list of summaries. For per-module summaries such
> as this, there will be at most one summary (with no summary list for an
> external function like “foo”). In the combined summary there may be
> multiple, e.g. in the case of linkonce_odr functions which have definitions
> in multiple modules. The summary list for bar (“^4”) contains a
> FunctionSummary, so the summary is tagged “function:”. The FunctionSummary
> contains both a flags structure (inherited from the base GlobalValueSummary
> class), and a funcFlags structure (specific to FunctionSummary). It
> therefore contains a brace-enclosed list of flag tags/values for each.Where
> a global value summary references another global value summary (e.g. via a
> call list, reference list, or aliasee), the entry is referenced by its
> slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as
> “^4”.Note that in comparison metadata assembly entries tend to be much more
> decomposed since many metadata fields are themselves metadata (so then
> entries tend to be shorter with references to other metadata
> nodes).Currently, I am emitting the summary entries at the end, after the
> metadata nodes. Note that the ModuleSummaryIndex is not currently
> referenced from the Module, and isn’t currently created when parsing the
> Module IR bitcode (there is a separate derived class for reading the
> ModuleSummaryIndex from bitcode). This is because they are not currently
> used at the same time. However, in the future there is no reason why we
> couldn’t tag the global values in the Module’s LLVM assembly with the
> corresponding summary entry if the ModuleSummaryIndex is available when
> printing the Module in the assembly writer. I.e. we could do the following
> for “main” from the above example when printing the IR definition (note the
> “^3” at the end):define  dso_local i32 @main() #0 !dbg !17 ^3 {For CFI data
> structures, the format would be similar. It appears that TypeIds are
> referred to by string name in the top level TypeIdMap (std::map indexed by
> std::string type identifier), whereas they are referenced by GUID within
> the FunctionSummary class (i.e. the TypeTests vector and the VFuncId
> structure). For the LLVM assembly I think there should be a top level entry
> for each TypeIdMap, which lists both the type identifier string and its
> GUID (followed by its associated information stored in the map), and the
> TypeTests/VFuncId references on the FunctionSummary entries can reference
> it by summary slot number. I.e. something like:^1 = typeid: {guid: 12345,
> identifier: name_of_type, …^2 = gv: {... {function: {.... typeTests: {^1,
> …Peter - is that correct and does that sound ok?Issues when Parsing of
> Summaries from
>
Assembly--------------------------------------------------------------------When
> reading an LLVM assembly file containing module summary entries, a
> ModuleSummaryIndex will be created from the entries.Things to consider are
> the behavior when: - Invoked with “opt -module-summary” (which currently
> builds a new summary index from the IR). Options:1. recompute summary and
> throw away summary in the assembly file2. ignore -module-summary and build
> the summary from the LLVM assembly3. give an error4. compare the two
> summaries (one created from the assembly and the new one created by the
> analysis phase from the IR), and error if they are different.My opinion is
> to do a),  so that the behavior using -module-summary doesn’t change. We
> also need a way to force building of a fresh module summary for cases where
> the user has modified the LLVM assembly of the IR (see below). - How to
> handle older LLVM assembly files that don’t contain new summary fields.
> Options:1. Force the LLVM assembly file to be recreated with a new summary.
> I.e. “opt -module-summary -o - | llvm-dis”.2. Auto-upgrade, by silently
> creating conservative values for the new summary entries.I lean towards b)
> (when possible) for user-friendliness and to reduce required churn on test
> inputs. - How to handle partial or incorrect LLVM assembly summary entries.
> How to handle partial summaries depends in part on how we answer the prior
> question about auto-upgrading. I think the best option like there is to
> handle it automatically when possible. However, I do think we should error
> on glaring errors like obviously missing information. For example, when
> there is summary data in the LLVM assembly, but summary entries are missing
> for some global values. E.g. if the user modified the assembly to add a
> function but forgot to add a corresponding summary entry. We could still
> have subtle issues (e.g. user adds a new call but forgets to update the
> caller’s summary call list), but it will be harder to detect those.*
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>  408-460-2413
>


-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180503/83417706/attachment.html>

Teresa Johnson via llvm-dev

2018-May-03 22:10 UTC

head link

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

On Thu, May 3, 2018 at 2:58 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> Hi Teresa,
>
> I have re-read your proposal, and I'm not getting how you plan to
> represent combined summaries with this. Unless I'm missing something,
there
> doesn't seem to be a way to write out summaries that is independent of
the
> global values that they relate to. Is that something that you plan to
> address later?
>
I envisioned that the combined index assembly files would only contain
GUIDs, not GV names, just as we do in the combined index bitcode files.
Does that answer your question?

Thanks,
Teresa

> Peter
>
> On Tue, Apr 24, 2018 at 7:43 AM, Teresa Johnson <tejohnson at
google.com>
> wrote:
>
>> Hi everyone,
>>
>> I started working on a long-standing request to have the summary dumped
>> in a readable format to text, and specifically to emit to LLVM
assembly.
>> Proposal below, please let me know your thoughts.
>>
>> Thanks,
>> Teresa
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *RFC: LLVM Assembly format for ThinLTO
>>
Summary========================================Background-----------------ThinLTO
>> operates on small summaries computed during the compile step (i.e. with
“-c
>> -flto=thin”), which are then analyzed and updated during the Thin Link
>> stage, and utilized to perform IR updates during the post-link ThinLTO
>> backends. The summaries are emitted as LLVM Bitcode, however, not
currently
>> in the LLVM assembly.There are two ways to generate a bitcode file
>> containing summary records for a module: 1. Compile with “clang -c
>> -flto=thin”2. Build from LLVM assembly using “opt
-module-summary”Either of
>> these will result in the ModuleSummaryIndex analysis pass (which builds
the
>> summary index in memory for a module) to be added to the pipeline just
>> before bitcode emission.Additionally, a combined index is created by
>> merging all the per-module indexes during the Thin Link, which is
>> optionally emitted as a bitcode file.Currently, the only way to view
these
>> records is via “llvm-bcanalyzer -dump”, then manually decoding the raw
>> bitcode dumps.Relatedly, there is YAML reader/writer support for CFI
>> related summary fields (-wholeprogramdevirt-read-summary and
>> -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles
>> Saternos implemented support to dump the summary in YAML from llvm-lto2
>> (D34080), including the rest of the summary fields (D34063), however,
there
>> was pushback on the related RFC for dumping via YAML or another format
>> rather than emitting as LLVM assembly.Goals: 1. Define LLVM assembly
format
>> for summary index2. Define interaction between parsing of summary from
LLVM
>> assembly and synthesis of new summary index from IR.3. Implement
printing
>> and parsing of summary index LLVM assemblyProposed LLVM Assembly
>> Format----------------------------------------------There are several
top
>> level data structures within the ModuleSummaryIndex: 1.
>> ModulePathStringTable: Holds the paths to the modules summarized in the
>> index (only one entry for per-module indexes and multiple in the
combined
>> index), along with their hashes (for incremental builds and global
>> promotion).2. GlobalValueMap: A map from global value GUIDs to the
>> corresponding function/variable/alias summary (or summaries for the
>> combined index and weak linkage).3. CFI-related data structures
(TypeIdMap,
>> CfiFunctionDefs, and CfiFunctionDecls)I have a WIP patch to
AsmWriter.cpp
>> to print the ModuleSummaryIndex that I was using to play with the
format.
>> It currently prints 1 and 2 above. I’ve left the CFI related summary
data
>> structures as a TODO for now, until the format is at least conceptually
>> agreed, but from looking at those I don’t see an issue with using the
same
>> format (with a note/question for Peter on CFI type test representation
>> below).I modeled the proposed format on metadata, with a few key
>> differences noted below. Like metadata, I propose enumerating the
entries
>> with the SlotTracker, and prefixing them with a special character.
Avoiding
>> characters already used in some fashion (i.e. “!” for metadata and “#”
for
>> attributes), I initially have chosen “^”. Open to suggestions
>> though.Consider the following example:extern void foo();int X;int bar()
{
>>  foo();  return X;}void barAlias() __attribute__ ((alias
("bar")));int
>> main() {  barAlias();  return bar();}The proposed format has one entry
per
>> ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks
>> like:^0 = module: {path: testA.o, hash: 5487197307045666224}^1 = gv:
{guid:
>> 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags:
>> {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}^2 =
gv:
>> {guid: 6699318081062747564, name: foo}^3 = gv: {guid:
15822663052811949562,
>> name: main, summaries: {function: {module: ^0, flags: {linkage: extern,
>> notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags:
>> {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls:
>> {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}^4 =
gv:
>> {guid: 16434608426314478903, name: bar, summaries: {function: {module:
^0,
>> flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1},
>> insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0,
>> returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs:
>> {^1}}}}^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries:
>> {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0,
live:
>> 0, dsoLocal: 1}, aliasee: ^4}}}Like metadata, the fields are tagged
>> (currently using lower camel case, maybe upper camel case would be
>> preferable).The proposed format has a structure that reflects the data
>> structures in the summary index. For example, consider the entry “^4”.
This
>> corresponds to the function “bar”. The entry for that GUID in the
>> GlobalValueMap contains a list of summaries. For per-module summaries
such
>> as this, there will be at most one summary (with no summary list for an
>> external function like “foo”). In the combined summary there may be
>> multiple, e.g. in the case of linkonce_odr functions which have
definitions
>> in multiple modules. The summary list for bar (“^4”) contains a
>> FunctionSummary, so the summary is tagged “function:”. The
FunctionSummary
>> contains both a flags structure (inherited from the base
GlobalValueSummary
>> class), and a funcFlags structure (specific to FunctionSummary). It
>> therefore contains a brace-enclosed list of flag tags/values for
each.Where
>> a global value summary references another global value summary (e.g.
via a
>> call list, reference list, or aliasee), the entry is referenced by its
>> slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as
>> “^4”.Note that in comparison metadata assembly entries tend to be much
more
>> decomposed since many metadata fields are themselves metadata (so then
>> entries tend to be shorter with references to other metadata
>> nodes).Currently, I am emitting the summary entries at the end, after
the
>> metadata nodes. Note that the ModuleSummaryIndex is not currently
>> referenced from the Module, and isn’t currently created when parsing
the
>> Module IR bitcode (there is a separate derived class for reading the
>> ModuleSummaryIndex from bitcode). This is because they are not
currently
>> used at the same time. However, in the future there is no reason why we
>> couldn’t tag the global values in the Module’s LLVM assembly with the
>> corresponding summary entry if the ModuleSummaryIndex is available when
>> printing the Module in the assembly writer. I.e. we could do the
following
>> for “main” from the above example when printing the IR definition (note
the
>> “^3” at the end):define  dso_local i32 @main() #0 !dbg !17 ^3 {For CFI
data
>> structures, the format would be similar. It appears that TypeIds are
>> referred to by string name in the top level TypeIdMap (std::map indexed
by
>> std::string type identifier), whereas they are referenced by GUID
within
>> the FunctionSummary class (i.e. the TypeTests vector and the VFuncId
>> structure). For the LLVM assembly I think there should be a top level
entry
>> for each TypeIdMap, which lists both the type identifier string and its
>> GUID (followed by its associated information stored in the map), and
the
>> TypeTests/VFuncId references on the FunctionSummary entries can
reference
>> it by summary slot number. I.e. something like:^1 = typeid: {guid:
12345,
>> identifier: name_of_type, …^2 = gv: {... {function: {.... typeTests:
{^1,
>> …Peter - is that correct and does that sound ok?Issues when Parsing of
>> Summaries from
>>
Assembly--------------------------------------------------------------------When
>> reading an LLVM assembly file containing module summary entries, a
>> ModuleSummaryIndex will be created from the entries.Things to consider
are
>> the behavior when: - Invoked with “opt -module-summary” (which
currently
>> builds a new summary index from the IR). Options:1. recompute summary
and
>> throw away summary in the assembly file2. ignore -module-summary and
build
>> the summary from the LLVM assembly3. give an error4. compare the two
>> summaries (one created from the assembly and the new one created by the
>> analysis phase from the IR), and error if they are different.My opinion
is
>> to do a),  so that the behavior using -module-summary doesn’t change.
We
>> also need a way to force building of a fresh module summary for cases
where
>> the user has modified the LLVM assembly of the IR (see below). - How to
>> handle older LLVM assembly files that don’t contain new summary fields.
>> Options:1. Force the LLVM assembly file to be recreated with a new
summary.
>> I.e. “opt -module-summary -o - | llvm-dis”.2. Auto-upgrade, by silently
>> creating conservative values for the new summary entries.I lean towards
b)
>> (when possible) for user-friendliness and to reduce required churn on
test
>> inputs. - How to handle partial or incorrect LLVM assembly summary
entries.
>> How to handle partial summaries depends in part on how we answer the
prior
>> question about auto-upgrading. I think the best option like there is to
>> handle it automatically when possible. However, I do think we should
error
>> on glaring errors like obviously missing information. For example, when
>> there is summary data in the LLVM assembly, but summary entries are
missing
>> for some global values. E.g. if the user modified the assembly to add a
>> function but forgot to add a corresponding summary entry. We could
still
>> have subtle issues (e.g. user adds a new call but forgets to update the
>> caller’s summary call list), but it will be harder to detect those.*
>>
>> --
>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>>  408-460-2413
>>
>
>
>
> --
> --
> Peter
>


-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180503/715e1585/attachment-0001.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Apr 2018 - RFC: LLVM Assembly format for ThinLTO Summary

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Possibly Parallel Threads