thr3ads.net - llvm dev - [llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Mehdi AMINI via llvm-dev

2017-Jun-06 21:21 UTC

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

2017-06-06 13:38 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>
>
> On Tue, Jun 6, 2017 at 1:26 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>> 2017-06-05 14:27 GMT-07:00 David Blaikie via llvm-dev <
>> llvm-dev at lists.llvm.org>:
>>
>>> I know there's been a bunch of discussion here already, but I
was
>>> wondering if perhaps someone (probably Teresa? Peter?) could:
>>>
>>> 1) summarize the current state
>>> 2) describe the end-goal
>>> 3) describe what steps (& how this patch relates) are planned
to get to
>>> (2)
>>>
>>> My naive thoughts, not being intimately familiar with any of this:
>>> Usually bitcode and textual IR support go in together or around the
same
>>> time, and designed that way from the start (take r211920 for
examaple,
>>> which added an explicit representation of COMDATs to the IR). This
seems to
>>> have been an oversight in the implementation of IR summaries (is
that an
>>> accurate representation/statement?)
>>>
>>
>> More or less: it was not an oversight.
>> The summaries are not really part of the IR, it is more like an
"analysis
>> result" that is serialized. It can always be recomputed from the
IR. This
>> aspect makes it quite "special", it is the only analysis
result that I know
>> of that we serialize.
>>
>
> The use list work seems pretty similar in some ways (granted, can't be
> recomputed to match, hence the desire to serialize it for test case
> implementation).
>
I see use-list as a leaky implementation detail of the IR that we
serialized because it impact the processing of the IR.

Summaries are more like serializing the CFG for example.

> But it looks like the same is true here to a degree - there are test cases
> that exercise the summary handling, so they want summaries for input (for
> now, I think, I've seen test cases that run another LLVM tool to
> insert/create a summary to then feed that back in for a test), or to test
> that the resulting summary is correct.
>
We have cases were we want summaries as an input and check a combined
summary as an output, and for these having the YAML representation will be
useful (we didn't have it before).

>
> Can summaries be standalone? I thought they could (that'd be ideal for
the
> distributed situation - only the summary needs to go to the 'thin
link'
> step, I think? (currently maybe only the debug info is stripped for that -
> but ideally other unused IR wouldn't be shipped there as well, I would
> think)
>
Yes conceptually they can be standalone.

>
>
>>
>>
>>> & now there's an effort to correct that.
>>>
>>
>> The main motivation here, I believe, is more to help dev to have human
>> readable/understandable dump for ThinLTO bitcodes. Having to inspect
>> separately summaries is a pain.
>>
>
> Not sure I quite follow - inspect separately?
>
llvm-dis does not display summaries today, so you can't just use llvm-dis
like a "regular" flow.

> How are they inspected today?
>
llvm-bcanalyzer? And now the YAML dump as well.

> & also, I think there are test cases that want to/are currently testing
> summary input but do so somewhat awkwardly by using another tool to produce
> the summary first. Ideally the test case would have the summary written in
> to start, I would think, if that's a codepath worth testing?
>
The IR already contains all the information, so why repeating it? This
makes the test case harder to maintain, in the vast majority, I expect that
if a test needs IR then it shouldn't need to include a summary as well (and
vice-versa).

In the majority of test we have we want to check if the importing does what
it is supposed to do, and if the linkage are correctly adjusted. With a
YAML (or other) serialization for the summaries this could indeed been done
purely with summaries, without any IR involved.

-- 
Mehdi





>
> - Dave
>
>
>>
>>  --
>> Mehdi
>>
>> So it seems like that would start with a discussion of what the right
>>> end-state would be: What the syntax in textual IR should be, then
>>> implementing it. I can understand implementing such a thing in
steps - it's
>>> perhaps more involved than the COMDAT situation. In that case
starting on
>>> either side seems fine - implementing the emission first (hidden
behind a
>>> flag, so as not to break round-tripping in the interim) or the
parsing
>>> first (no need to hide it behind any flags - manually written
examples can
>>> be used as input tests).
>>>
>>> (& it sounds like there's some partially implemented
functionality using
>>> a YAML format that was intended to address how some test cases
could be
>>> written? & this might be a good basis for the syntax - but
seems to me like
>>> it might be a bit disjointed/out of place in the textual IR format
that's
>>> not otherwise YAML-based?)
>>>
>>> - Dave
>>>
>>> On Fri, Jun 2, 2017 at 8:46 AM Charles Saternos via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Hey all,
>>>>
>>>> Below is the proposed format for the dump of the ThinLTO module
summary
>>>> in the llvm-dis utility:
>>>>
>>>> > ../build/bin/llvm-dis t.o && cat t.o.ll
>>>> ; ModuleID = '2.o'
>>>> source_filename = "2.ll"
>>>> target datalayout =
"e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>>>> target triple = "x86_64-unknown-linux-gnu"
>>>>
>>>> @X = constant i32 42, section "foo", align 4
>>>>
>>>> @a = weak alias i32, i32* @X
>>>>
>>>> define void @afun() {
>>>>   %1 = load i32, i32* @a
>>>>   ret void
>>>> }
>>>>
>>>> define void @testtest() {
>>>>   tail call void @boop()
>>>>   ret void
>>>> }
>>>>
>>>> declare void @boop()
>>>>
>>>> ; Module summary:
>>>> ;  testtest (External linkage)
>>>> ;    Function (2 instructions)
>>>> ;    Calls: boop
>>>> ;  X (External linkage)
>>>> ;    Global Variable
>>>> ;  afun (External linkage)
>>>> ;    Function (2 instructions)
>>>> ;    Refs:
>>>> ;      a
>>>> ;  a (Weak any linkage)
>>>> ;    Alias (aliasee X)
>>>>
>>>> I've implemented the above format in the llvm-dis utility,
since there
>>>> currently isn't really a way of getting ThinLTO summaries
in a
>>>> human-readable format.
>>>>
>>>> Let me know what you think of this format, and what information
you
>>>> think should be added/removed.
>>>>
>>>> Thanks,
>>>> Charles
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170606/e06f6069/attachment.html>

Charles Saternos via llvm-dev

2017-Jun-07 15:58 UTC

head link

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

Alright, now it outputs YAML in the following format:

---
NamedGlobalValueMap:
  X:
    - Kind:            GlobalVar
      Linkage:         ExternalLinkage
      NotEligibleToImport: false
      Live:            false
  a:
    - Kind:            Alias
      Linkage:         WeakAnyLinkage
      NotEligibleToImport: false
      Live:            false
      AliaseeGUID:     1881667236089500162
  afun:
    - Kind:            Function
      Linkage:         ExternalLinkage
      NotEligibleToImport: false
      Live:            false
      InstCount:       2
  testtest:
    - Kind:            Function
      Linkage:         ExternalLinkage
      NotEligibleToImport: false
      Live:            false
      InstCount:       2
      Calls:
        - Function:        14471680721094503013
TypeIdMap:
WithGlobalValueDeadStripping: false
...

Any thoughts on the new format?

Thanks,
Charles

On Tue, Jun 6, 2017 at 5:21 PM, Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>
> 2017-06-06 13:38 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>
>>
>>
>> On Tue, Jun 6, 2017 at 1:26 PM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>
>>> 2017-06-05 14:27 GMT-07:00 David Blaikie via llvm-dev <
>>> llvm-dev at lists.llvm.org>:
>>>
>>>> I know there's been a bunch of discussion here already, but
I was
>>>> wondering if perhaps someone (probably Teresa? Peter?) could:
>>>>
>>>> 1) summarize the current state
>>>> 2) describe the end-goal
>>>> 3) describe what steps (& how this patch relates) are
planned to get to
>>>> (2)
>>>>
>>>> My naive thoughts, not being intimately familiar with any of
this:
>>>> Usually bitcode and textual IR support go in together or around
the same
>>>> time, and designed that way from the start (take r211920 for
examaple,
>>>> which added an explicit representation of COMDATs to the IR).
This seems to
>>>> have been an oversight in the implementation of IR summaries
(is that an
>>>> accurate representation/statement?)
>>>>
>>>
>>> More or less: it was not an oversight.
>>> The summaries are not really part of the IR, it is more like an
>>> "analysis result" that is serialized. It can always be
recomputed from the
>>> IR. This aspect makes it quite "special", it is the only
analysis result
>>> that I know of that we serialize.
>>>
>>
>> The use list work seems pretty similar in some ways (granted, can't
be
>> recomputed to match, hence the desire to serialize it for test case
>> implementation).
>>
>
> I see use-list as a leaky implementation detail of the IR that we
> serialized because it impact the processing of the IR.
>
> Summaries are more like serializing the CFG for example.
>
>
>> But it looks like the same is true here to a degree - there are test
>> cases that exercise the summary handling, so they want summaries for
input
>> (for now, I think, I've seen test cases that run another LLVM tool
to
>> insert/create a summary to then feed that back in for a test), or to
test
>> that the resulting summary is correct.
>>
>
> We have cases were we want summaries as an input and check a combined
> summary as an output, and for these having the YAML representation will be
> useful (we didn't have it before).
>
>
>>
>> Can summaries be standalone? I thought they could (that'd be ideal
for
>> the distributed situation - only the summary needs to go to the
'thin link'
>> step, I think? (currently maybe only the debug info is stripped for
that -
>> but ideally other unused IR wouldn't be shipped there as well, I
would
>> think)
>>
>
> Yes conceptually they can be standalone.
>
>
>>
>>
>>>
>>>
>>>> & now there's an effort to correct that.
>>>>
>>>
>>> The main motivation here, I believe, is more to help dev to have
human
>>> readable/understandable dump for ThinLTO bitcodes. Having to
inspect
>>> separately summaries is a pain.
>>>
>>
>> Not sure I quite follow - inspect separately?
>>
>
> llvm-dis does not display summaries today, so you can't just use
llvm-dis
> like a "regular" flow.
>
>
>> How are they inspected today?
>>
>
> llvm-bcanalyzer? And now the YAML dump as well.
>
>
>> & also, I think there are test cases that want to/are currently
testing
>> summary input but do so somewhat awkwardly by using another tool to
produce
>> the summary first. Ideally the test case would have the summary written
in
>> to start, I would think, if that's a codepath worth testing?
>>
>
> The IR already contains all the information, so why repeating it? This
> makes the test case harder to maintain, in the vast majority, I expect that
> if a test needs IR then it shouldn't need to include a summary as well
(and
> vice-versa).
>
> In the majority of test we have we want to check if the importing does
> what it is supposed to do, and if the linkage are correctly adjusted. With
> a YAML (or other) serialization for the summaries this could indeed been
> done purely with summaries, without any IR involved.
>
> --
> Mehdi
>
>
>
>
>
>
>>
>> - Dave
>>
>>
>>>
>>>  --
>>> Mehdi
>>>
>>> So it seems like that would start with a discussion of what the
right
>>>> end-state would be: What the syntax in textual IR should be,
then
>>>> implementing it. I can understand implementing such a thing in
steps - it's
>>>> perhaps more involved than the COMDAT situation. In that case
starting on
>>>> either side seems fine - implementing the emission first
(hidden behind a
>>>> flag, so as not to break round-tripping in the interim) or the
parsing
>>>> first (no need to hide it behind any flags - manually written
examples can
>>>> be used as input tests).
>>>>
>>>> (& it sounds like there's some partially implemented
functionality
>>>> using a YAML format that was intended to address how some test
cases could
>>>> be written? & this might be a good basis for the syntax -
but seems to me
>>>> like it might be a bit disjointed/out of place in the textual
IR format
>>>> that's not otherwise YAML-based?)
>>>>
>>>> - Dave
>>>>
>>>> On Fri, Jun 2, 2017 at 8:46 AM Charles Saternos via llvm-dev
<
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>> Below is the proposed format for the dump of the ThinLTO
module
>>>>> summary in the llvm-dis utility:
>>>>>
>>>>> > ../build/bin/llvm-dis t.o && cat t.o.ll
>>>>> ; ModuleID = '2.o'
>>>>> source_filename = "2.ll"
>>>>> target datalayout =
"e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>>>>> target triple = "x86_64-unknown-linux-gnu"
>>>>>
>>>>> @X = constant i32 42, section "foo", align 4
>>>>>
>>>>> @a = weak alias i32, i32* @X
>>>>>
>>>>> define void @afun() {
>>>>>   %1 = load i32, i32* @a
>>>>>   ret void
>>>>> }
>>>>>
>>>>> define void @testtest() {
>>>>>   tail call void @boop()
>>>>>   ret void
>>>>> }
>>>>>
>>>>> declare void @boop()
>>>>>
>>>>> ; Module summary:
>>>>> ;  testtest (External linkage)
>>>>> ;    Function (2 instructions)
>>>>> ;    Calls: boop
>>>>> ;  X (External linkage)
>>>>> ;    Global Variable
>>>>> ;  afun (External linkage)
>>>>> ;    Function (2 instructions)
>>>>> ;    Refs:
>>>>> ;      a
>>>>> ;  a (Weak any linkage)
>>>>> ;    Alias (aliasee X)
>>>>>
>>>>> I've implemented the above format in the llvm-dis
utility, since there
>>>>> currently isn't really a way of getting ThinLTO
summaries in a
>>>>> human-readable format.
>>>>>
>>>>> Let me know what you think of this format, and what
information you
>>>>> think should be added/removed.
>>>>>
>>>>> Thanks,
>>>>> Charles
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170607/57afcdc5/attachment.html>

Teresa Johnson via llvm-dev

2017-Jun-07 16:38 UTC

head link

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

On Wed, Jun 7, 2017 at 8:58 AM, Charles Saternos <charles.saternos at
gmail.com> wrote:
> Alright, now it outputs YAML in the following format:
>
> ---
> NamedGlobalValueMap:
>   X:
>     - Kind:            GlobalVar
>       Linkage:         ExternalLinkage
>       NotEligibleToImport: false
>       Live:            false
>   a:
>     - Kind:            Alias
>       Linkage:         WeakAnyLinkage
>       NotEligibleToImport: false
>       Live:            false
>       AliaseeGUID:     1881667236089500162
>   afun:
>     - Kind:            Function
>       Linkage:         ExternalLinkage
>       NotEligibleToImport: false
>       Live:            false
>       InstCount:       2
>   testtest:
>     - Kind:            Function
>       Linkage:         ExternalLinkage
>       NotEligibleToImport: false
>       Live:            false
>       InstCount:       2
>       Calls:
>         - Function:        14471680721094503013
> TypeIdMap:
> WithGlobalValueDeadStripping: false
> ...
>
> Any thoughts on the new format?
>
Thanks, Charles. The main improvement I think we would want is to output
value names instead of the GUID. Can you build up a map from GUID -> name
ahead of time and use those like you were for your initial patch? Actually,
I also think it would be useful to emit both the GUID and the name, since
the combined index will eventually only have the GUID, so this would give a
mapping to use for at least the visual inspection of the combined index.

Also, would be good to see an example with FDO, to make sure the hotness
info of the calls is emitted.

Teresa

> Thanks,
> Charles
>
> On Tue, Jun 6, 2017 at 5:21 PM, Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>>
>>
>> 2017-06-06 13:38 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>>
>>>
>>>
>>> On Tue, Jun 6, 2017 at 1:26 PM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>>
>>>> 2017-06-05 14:27 GMT-07:00 David Blaikie via llvm-dev <
>>>> llvm-dev at lists.llvm.org>:
>>>>
>>>>> I know there's been a bunch of discussion here already,
but I was
>>>>> wondering if perhaps someone (probably Teresa? Peter?)
could:
>>>>>
>>>>> 1) summarize the current state
>>>>> 2) describe the end-goal
>>>>> 3) describe what steps (& how this patch relates) are
planned to get
>>>>> to (2)
>>>>>
>>>>> My naive thoughts, not being intimately familiar with any
of this:
>>>>> Usually bitcode and textual IR support go in together or
around the same
>>>>> time, and designed that way from the start (take r211920
for examaple,
>>>>> which added an explicit representation of COMDATs to the
IR). This seems to
>>>>> have been an oversight in the implementation of IR
summaries (is that an
>>>>> accurate representation/statement?)
>>>>>
>>>>
>>>> More or less: it was not an oversight.
>>>> The summaries are not really part of the IR, it is more like an
>>>> "analysis result" that is serialized. It can always
be recomputed from the
>>>> IR. This aspect makes it quite "special", it is the
only analysis result
>>>> that I know of that we serialize.
>>>>
>>>
>>> The use list work seems pretty similar in some ways (granted,
can't be
>>> recomputed to match, hence the desire to serialize it for test case
>>> implementation).
>>>
>>
>> I see use-list as a leaky implementation detail of the IR that we
>> serialized because it impact the processing of the IR.
>>
>> Summaries are more like serializing the CFG for example.
>>
>>
>>> But it looks like the same is true here to a degree - there are
test
>>> cases that exercise the summary handling, so they want summaries
for input
>>> (for now, I think, I've seen test cases that run another LLVM
tool to
>>> insert/create a summary to then feed that back in for a test), or
to test
>>> that the resulting summary is correct.
>>>
>>
>> We have cases were we want summaries as an input and check a combined
>> summary as an output, and for these having the YAML representation will
be
>> useful (we didn't have it before).
>>
>>
>>>
>>> Can summaries be standalone? I thought they could (that'd be
ideal for
>>> the distributed situation - only the summary needs to go to the
'thin link'
>>> step, I think? (currently maybe only the debug info is stripped for
that -
>>> but ideally other unused IR wouldn't be shipped there as well,
I would
>>> think)
>>>
>>
>> Yes conceptually they can be standalone.
>>
>>
>>>
>>>
>>>>
>>>>
>>>>> & now there's an effort to correct that.
>>>>>
>>>>
>>>> The main motivation here, I believe, is more to help dev to
have human
>>>> readable/understandable dump for ThinLTO bitcodes. Having to
inspect
>>>> separately summaries is a pain.
>>>>
>>>
>>> Not sure I quite follow - inspect separately?
>>>
>>
>> llvm-dis does not display summaries today, so you can't just use
llvm-dis
>> like a "regular" flow.
>>
>>
>>> How are they inspected today?
>>>
>>
>> llvm-bcanalyzer? And now the YAML dump as well.
>>
>>
>>> & also, I think there are test cases that want to/are currently
testing
>>> summary input but do so somewhat awkwardly by using another tool to
produce
>>> the summary first. Ideally the test case would have the summary
written in
>>> to start, I would think, if that's a codepath worth testing?
>>>
>>
>> The IR already contains all the information, so why repeating it? This
>> makes the test case harder to maintain, in the vast majority, I expect
that
>> if a test needs IR then it shouldn't need to include a summary as
well (and
>> vice-versa).
>>
>> In the majority of test we have we want to check if the importing does
>> what it is supposed to do, and if the linkage are correctly adjusted.
With
>> a YAML (or other) serialization for the summaries this could indeed
been
>> done purely with summaries, without any IR involved.
>>
>> --
>> Mehdi
>>
>>
>>
>>
>>
>>
>>>
>>> - Dave
>>>
>>>
>>>>
>>>>  --
>>>> Mehdi
>>>>
>>>> So it seems like that would start with a discussion of what the
right
>>>>> end-state would be: What the syntax in textual IR should
be, then
>>>>> implementing it. I can understand implementing such a thing
in steps - it's
>>>>> perhaps more involved than the COMDAT situation. In that
case starting on
>>>>> either side seems fine - implementing the emission first
(hidden behind a
>>>>> flag, so as not to break round-tripping in the interim) or
the parsing
>>>>> first (no need to hide it behind any flags - manually
written examples can
>>>>> be used as input tests).
>>>>>
>>>>> (& it sounds like there's some partially
implemented functionality
>>>>> using a YAML format that was intended to address how some
test cases could
>>>>> be written? & this might be a good basis for the syntax
- but seems to me
>>>>> like it might be a bit disjointed/out of place in the
textual IR format
>>>>> that's not otherwise YAML-based?)
>>>>>
>>>>> - Dave
>>>>>
>>>>> On Fri, Jun 2, 2017 at 8:46 AM Charles Saternos via
llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Hey all,
>>>>>>
>>>>>> Below is the proposed format for the dump of the
ThinLTO module
>>>>>> summary in the llvm-dis utility:
>>>>>>
>>>>>> > ../build/bin/llvm-dis t.o && cat t.o.ll
>>>>>> ; ModuleID = '2.o'
>>>>>> source_filename = "2.ll"
>>>>>> target datalayout =
"e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>>>>>> target triple = "x86_64-unknown-linux-gnu"
>>>>>>
>>>>>> @X = constant i32 42, section "foo", align 4
>>>>>>
>>>>>> @a = weak alias i32, i32* @X
>>>>>>
>>>>>> define void @afun() {
>>>>>>   %1 = load i32, i32* @a
>>>>>>   ret void
>>>>>> }
>>>>>>
>>>>>> define void @testtest() {
>>>>>>   tail call void @boop()
>>>>>>   ret void
>>>>>> }
>>>>>>
>>>>>> declare void @boop()
>>>>>>
>>>>>> ; Module summary:
>>>>>> ;  testtest (External linkage)
>>>>>> ;    Function (2 instructions)
>>>>>> ;    Calls: boop
>>>>>> ;  X (External linkage)
>>>>>> ;    Global Variable
>>>>>> ;  afun (External linkage)
>>>>>> ;    Function (2 instructions)
>>>>>> ;    Refs:
>>>>>> ;      a
>>>>>> ;  a (Weak any linkage)
>>>>>> ;    Alias (aliasee X)
>>>>>>
>>>>>> I've implemented the above format in the llvm-dis
utility, since
>>>>>> there currently isn't really a way of getting
ThinLTO summaries in a
>>>>>> human-readable format.
>>>>>>
>>>>>> Let me know what you think of this format, and what
information you
>>>>>> think should be added/removed.
>>>>>>
>>>>>> Thanks,
>>>>>> Charles
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>
>

-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170607/88d24e0e/attachment.html>

David Blaikie via llvm-dev

2017-Jun-07 16:44 UTC

head link

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

On Tue, Jun 6, 2017 at 2:21 PM Mehdi AMINI <joker.eph at gmail.com> wrote:
> 2017-06-06 13:38 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>
>>
>>
>> On Tue, Jun 6, 2017 at 1:26 PM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>
>>> 2017-06-05 14:27 GMT-07:00 David Blaikie via llvm-dev <
>>> llvm-dev at lists.llvm.org>:
>>>
>>>> I know there's been a bunch of discussion here already, but
I was
>>>> wondering if perhaps someone (probably Teresa? Peter?) could:
>>>>
>>>> 1) summarize the current state
>>>> 2) describe the end-goal
>>>> 3) describe what steps (& how this patch relates) are
planned to get to
>>>> (2)
>>>>
>>>> My naive thoughts, not being intimately familiar with any of
this:
>>>> Usually bitcode and textual IR support go in together or around
the same
>>>> time, and designed that way from the start (take r211920 for
examaple,
>>>> which added an explicit representation of COMDATs to the IR).
This seems to
>>>> have been an oversight in the implementation of IR summaries
(is that an
>>>> accurate representation/statement?)
>>>>
>>>
>>> More or less: it was not an oversight.
>>> The summaries are not really part of the IR, it is more like an
>>> "analysis result" that is serialized. It can always be
recomputed from the
>>> IR. This aspect makes it quite "special", it is the only
analysis result
>>> that I know of that we serialize.
>>>
>>
>> The use list work seems pretty similar in some ways (granted, can't
be
>> recomputed to match, hence the desire to serialize it for test case
>> implementation).
>>
>
> I see use-list as a leaky implementation detail of the IR that we
> serialized because it impact the processing of the IR.
>
> Summaries are more like serializing the CFG for example.
>
>
>> But it looks like the same is true here to a degree - there are test
>> cases that exercise the summary handling, so they want summaries for
input
>> (for now, I think, I've seen test cases that run another LLVM tool
to
>> insert/create a summary to then feed that back in for a test), or to
test
>> that the resulting summary is correct.
>>
>
> We have cases were we want summaries as an input and check a combined
> summary as an output, and for these having the YAML representation will be
> useful (we didn't have it before).
>
What I'm suggesting is that this is an (optional) IR feature as much as any
other - so it seems slightly odd that it'd be YAML rather than something
that looked more like the rest of the IR. Though I'm not outright opposed
to YAML here - just want to make sure this information is being treated as
a first class IR construct (as much as use order, comdats, etc are for
rough examples)

Can summaries be standalone? I thought they could (that'd be ideal for
the>> distributed situation - only the summary needs to go to the 'thin
link'
>> step, I think? (currently maybe only the debug info is stripped for
that -
>> but ideally other unused IR wouldn't be shipped there as well, I
would
>> think)
>>
>
> Yes conceptually they can be standalone.
>
This seems to provide the strongest/clear motivation for having summaries
as a first class (though optional) IR construct.

& now there's an effort to correct that.>>>>
>>>
>>> The main motivation here, I believe, is more to help dev to have
human
>>> readable/understandable dump for ThinLTO bitcodes. Having to
inspect
>>> separately summaries is a pain.
>>>
>>
>> Not sure I quite follow - inspect separately?
>>
>
> llvm-dis does not display summaries today, so you can't just use
llvm-dis
> like a "regular" flow.
>
>
>> How are they inspected today?
>>
>
> llvm-bcanalyzer? And now the YAML dump as well.
>
>
>> & also, I think there are test cases that want to/are currently
testing
>> summary input but do so somewhat awkwardly by using another tool to
produce
>> the summary first. Ideally the test case would have the summary written
in
>> to start, I would think, if that's a codepath worth testing?
>>
>
> The IR already contains all the information, so why repeating it?
>
For the same reason that it's relevant to test cases which way it's
encoded, etc (in the same way that the LLVM IR repeats types of uses, for
example - even though they're totally redundant from a "does this have
all
the semantic information required) & because it can be standalone.

> This makes the test case harder to maintain, in the vast majority, I
> expect that if a test needs IR then it shouldn't need to include a
summary
> as well (and vice-versa).
>
Ah, sorry, I'm not suggesting it should be required - in the same way
it's
not required in the bitcode. But if you want a summary in the bitcode when
assembling a .ll file it seems OK To say you write it in the IR, and
equally if there is a summary in the bitcode it seems reasonable that it be
printed in the .ll file by llvm-dis.

> In the majority of test we have we want to check if the importing does
> what it is supposed to do, and if the linkage are correctly adjusted. With
> a YAML (or other) serialization for the summaries this could indeed been
> done purely with summaries, without any IR involved.
>
I'm not sure I understand - you mean for executions of tools that don't
need the rest of the IR, there could be a different/separate tool that
consumes YAML summaries and produces YAML summaries and that would be
tested - but the "consuming a summary in a bitcode file" would not be?

I'm not sure I understand the benefit of this separation and asymmetry with
the bitcode form of the same data.

- Dave

>
> --
> Mehdi
>
>
>
>
>
>
>>
>> - Dave
>>
>>
>>>
>>>  --
>>> Mehdi
>>>
>>> So it seems like that would start with a discussion of what the
right
>>>> end-state would be: What the syntax in textual IR should be,
then
>>>> implementing it. I can understand implementing such a thing in
steps - it's
>>>> perhaps more involved than the COMDAT situation. In that case
starting on
>>>> either side seems fine - implementing the emission first
(hidden behind a
>>>> flag, so as not to break round-tripping in the interim) or the
parsing
>>>> first (no need to hide it behind any flags - manually written
examples can
>>>> be used as input tests).
>>>>
>>>> (& it sounds like there's some partially implemented
functionality
>>>> using a YAML format that was intended to address how some test
cases could
>>>> be written? & this might be a good basis for the syntax -
but seems to me
>>>> like it might be a bit disjointed/out of place in the textual
IR format
>>>> that's not otherwise YAML-based?)
>>>>
>>>> - Dave
>>>>
>>>> On Fri, Jun 2, 2017 at 8:46 AM Charles Saternos via llvm-dev
<
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>> Below is the proposed format for the dump of the ThinLTO
module
>>>>> summary in the llvm-dis utility:
>>>>>
>>>>> > ../build/bin/llvm-dis t.o && cat t.o.ll
>>>>> ; ModuleID = '2.o'
>>>>> source_filename = "2.ll"
>>>>> target datalayout =
"e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>>>>> target triple = "x86_64-unknown-linux-gnu"
>>>>>
>>>>> @X = constant i32 42, section "foo", align 4
>>>>>
>>>>> @a = weak alias i32, i32* @X
>>>>>
>>>>> define void @afun() {
>>>>>   %1 = load i32, i32* @a
>>>>>   ret void
>>>>> }
>>>>>
>>>>> define void @testtest() {
>>>>>   tail call void @boop()
>>>>>   ret void
>>>>> }
>>>>>
>>>>> declare void @boop()
>>>>>
>>>>> ; Module summary:
>>>>> ;  testtest (External linkage)
>>>>> ;    Function (2 instructions)
>>>>> ;    Calls: boop
>>>>> ;  X (External linkage)
>>>>> ;    Global Variable
>>>>> ;  afun (External linkage)
>>>>> ;    Function (2 instructions)
>>>>> ;    Refs:
>>>>> ;      a
>>>>> ;  a (Weak any linkage)
>>>>> ;    Alias (aliasee X)
>>>>>
>>>>> I've implemented the above format in the llvm-dis
utility, since there
>>>>> currently isn't really a way of getting ThinLTO
summaries in a
>>>>> human-readable format.
>>>>>
>>>>> Let me know what you think of this format, and what
information you
>>>>> think should be added/removed.
>>>>>
>>>>> Thanks,
>>>>> Charles
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170607/5baf3e9b/attachment.html>

Mehdi AMINI via llvm-dev

2017-Jun-07 17:01 UTC

head link

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

2017-06-07 9:44 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>
>
> On Tue, Jun 6, 2017 at 2:21 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>> 2017-06-06 13:38 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>>
>>>
>>>
>>> On Tue, Jun 6, 2017 at 1:26 PM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>>
>>>> 2017-06-05 14:27 GMT-07:00 David Blaikie via llvm-dev <
>>>> llvm-dev at lists.llvm.org>:
>>>>
>>>>> I know there's been a bunch of discussion here already,
but I was
>>>>> wondering if perhaps someone (probably Teresa? Peter?)
could:
>>>>>
>>>>> 1) summarize the current state
>>>>> 2) describe the end-goal
>>>>> 3) describe what steps (& how this patch relates) are
planned to get
>>>>> to (2)
>>>>>
>>>>> My naive thoughts, not being intimately familiar with any
of this:
>>>>> Usually bitcode and textual IR support go in together or
around the same
>>>>> time, and designed that way from the start (take r211920
for examaple,
>>>>> which added an explicit representation of COMDATs to the
IR). This seems to
>>>>> have been an oversight in the implementation of IR
summaries (is that an
>>>>> accurate representation/statement?)
>>>>>
>>>>
>>>> More or less: it was not an oversight.
>>>> The summaries are not really part of the IR, it is more like an
>>>> "analysis result" that is serialized. It can always
be recomputed from the
>>>> IR. This aspect makes it quite "special", it is the
only analysis result
>>>> that I know of that we serialize.
>>>>
>>>
>>> The use list work seems pretty similar in some ways (granted,
can't be
>>> recomputed to match, hence the desire to serialize it for test case
>>> implementation).
>>>
>>
>> I see use-list as a leaky implementation detail of the IR that we
>> serialized because it impact the processing of the IR.
>>
>> Summaries are more like serializing the CFG for example.
>>
>>
>>> But it looks like the same is true here to a degree - there are
test
>>> cases that exercise the summary handling, so they want summaries
for input
>>> (for now, I think, I've seen test cases that run another LLVM
tool to
>>> insert/create a summary to then feed that back in for a test), or
to test
>>> that the resulting summary is correct.
>>>
>>
>> We have cases were we want summaries as an input and check a combined
>> summary as an output, and for these having the YAML representation will
be
>> useful (we didn't have it before).
>>
>
> What I'm suggesting is that this is an (optional) IR feature as much as
> any other
>
Well I disagree with this at this point, because I haven't read anything
that would support it.
I'd be happy to revise my position if you were providing any argument that
would make this holds in face of any other analysis result.

> - so it seems slightly odd that it'd be YAML rather than something that
> looked more like the rest of the IR. Though I'm not outright opposed to
> YAML here - just want to make sure this information is being treated as a
> first class IR construct (as much as use order, comdats, etc are for rough
> examples)
>
YAML was pushed forward as an easy way to get there IIRC. It wasn't set in
stone and it was clearly open to change it to a more integrate format.
So I'm supportive of anyone who would replace this with a more
"textual-IR
integrated" format, I haven't proposed this in this thread because
Teresa
is interested in getting something readable "quickly". My point was
more
that as an intermediate step, I rather reuse the existing YAML
serialization than creating yet another dump.


>
> Can summaries be standalone? I thought they could (that'd be ideal for
the
>>> distributed situation - only the summary needs to go to the
'thin link'
>>> step, I think? (currently maybe only the debug info is stripped for
that -
>>> but ideally other unused IR wouldn't be shipped there as well,
I would
>>> think)
>>>
>>
>> Yes conceptually they can be standalone.
>>
>
> This seems to provide the strongest/clear motivation for having summaries
> as a first class (though optional) IR construct.
>
No, this provide a strong motivation to have a proper serialization, I
don't see how you connect this to the rest of the IR.


>
> & now there's an effort to correct that.
>>>>>
>>>>
>>>> The main motivation here, I believe, is more to help dev to
have human
>>>> readable/understandable dump for ThinLTO bitcodes. Having to
inspect
>>>> separately summaries is a pain.
>>>>
>>>
>>> Not sure I quite follow - inspect separately?
>>>
>>
>> llvm-dis does not display summaries today, so you can't just use
llvm-dis
>> like a "regular" flow.
>>
>>
>>> How are they inspected today?
>>>
>>
>> llvm-bcanalyzer? And now the YAML dump as well.
>>
>>
>>> & also, I think there are test cases that want to/are currently
testing
>>> summary input but do so somewhat awkwardly by using another tool to
produce
>>> the summary first. Ideally the test case would have the summary
written in
>>> to start, I would think, if that's a codepath worth testing?
>>>
>>
>> The IR already contains all the information, so why repeating it?
>>
>
> For the same reason that it's relevant to test cases which way it's
> encoded, etc (in the same way that the LLVM IR repeats types of uses, for
> example - even though they're totally redundant from a "does this
have all
> the semantic information required) & because it can be standalone.
>
>
>> This makes the test case harder to maintain, in the vast majority, I
>> expect that if a test needs IR then it shouldn't need to include a
summary
>> as well (and vice-versa).
>>
>
> Ah, sorry, I'm not suggesting it should be required - in the same way
it's
> not required in the bitcode. But if you want a summary in the bitcode when
> assembling a .ll file it seems OK To say you write it in the IR,
>
No it does not seem OK to me to write summaries alongside the IR in tests
in general (outside of specific need like testing the round-trip of
course).
It is entirely redundant and I don't perceive any benefit, I don't see
why
you would want to do that?


> and equally if there is a summary in the bitcode it seems reasonable that
> it be printed in the .ll file by llvm-dis.
>
I agree and I advocated for this earlier.

>
>
>> In the majority of test we have we want to check if the importing does
>> what it is supposed to do, and if the linkage are correctly adjusted.
With
>> a YAML (or other) serialization for the summaries this could indeed
been
>> done purely with summaries, without any IR involved.
>>
>
> I'm not sure I understand - you mean for executions of tools that
don't
> need the rest of the IR, there could be a different/separate tool that
> consumes YAML summaries and produces YAML
>
It does not have to be a separate tool: a tool that is looking to operate
purely on summary should just ask to get the summaries out of the input
file. The input being textual or bitcode shouldn't matter much at this
point.
This is exactly how 'opt' and 'llc' operate.

summaries and that would be tested - but the "consuming a summary in
a> bitcode file" would not be?
>
This is exactly what we're doing with (almost) *all* of the .ll test: we
write them as textual, and read them back as textual, and not as bitcode.

> I'm not sure I understand the benefit of this separation and asymmetry
> with the bitcode form of the same data.
>
Have you tried to write a test directly in bitcode? ;)

I'm not sure we're talking about the same thing right now.

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170607/1dc3ab3c/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Jun 2017 - [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

Possibly Parallel Threads