thr3ads.net - llvm dev - [llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Mehdi AMINI via llvm-dev

2017-Jun-07 17:01 UTC

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

2017-06-07 9:44 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>
>
> On Tue, Jun 6, 2017 at 2:21 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>> 2017-06-06 13:38 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>>
>>>
>>>
>>> On Tue, Jun 6, 2017 at 1:26 PM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>>
>>>> 2017-06-05 14:27 GMT-07:00 David Blaikie via llvm-dev <
>>>> llvm-dev at lists.llvm.org>:
>>>>
>>>>> I know there's been a bunch of discussion here already,
but I was
>>>>> wondering if perhaps someone (probably Teresa? Peter?)
could:
>>>>>
>>>>> 1) summarize the current state
>>>>> 2) describe the end-goal
>>>>> 3) describe what steps (& how this patch relates) are
planned to get
>>>>> to (2)
>>>>>
>>>>> My naive thoughts, not being intimately familiar with any
of this:
>>>>> Usually bitcode and textual IR support go in together or
around the same
>>>>> time, and designed that way from the start (take r211920
for examaple,
>>>>> which added an explicit representation of COMDATs to the
IR). This seems to
>>>>> have been an oversight in the implementation of IR
summaries (is that an
>>>>> accurate representation/statement?)
>>>>>
>>>>
>>>> More or less: it was not an oversight.
>>>> The summaries are not really part of the IR, it is more like an
>>>> "analysis result" that is serialized. It can always
be recomputed from the
>>>> IR. This aspect makes it quite "special", it is the
only analysis result
>>>> that I know of that we serialize.
>>>>
>>>
>>> The use list work seems pretty similar in some ways (granted,
can't be
>>> recomputed to match, hence the desire to serialize it for test case
>>> implementation).
>>>
>>
>> I see use-list as a leaky implementation detail of the IR that we
>> serialized because it impact the processing of the IR.
>>
>> Summaries are more like serializing the CFG for example.
>>
>>
>>> But it looks like the same is true here to a degree - there are
test
>>> cases that exercise the summary handling, so they want summaries
for input
>>> (for now, I think, I've seen test cases that run another LLVM
tool to
>>> insert/create a summary to then feed that back in for a test), or
to test
>>> that the resulting summary is correct.
>>>
>>
>> We have cases were we want summaries as an input and check a combined
>> summary as an output, and for these having the YAML representation will
be
>> useful (we didn't have it before).
>>
>
> What I'm suggesting is that this is an (optional) IR feature as much as
> any other
>
Well I disagree with this at this point, because I haven't read anything
that would support it.
I'd be happy to revise my position if you were providing any argument that
would make this holds in face of any other analysis result.

> - so it seems slightly odd that it'd be YAML rather than something that
> looked more like the rest of the IR. Though I'm not outright opposed to
> YAML here - just want to make sure this information is being treated as a
> first class IR construct (as much as use order, comdats, etc are for rough
> examples)
>
YAML was pushed forward as an easy way to get there IIRC. It wasn't set in
stone and it was clearly open to change it to a more integrate format.
So I'm supportive of anyone who would replace this with a more
"textual-IR
integrated" format, I haven't proposed this in this thread because
Teresa
is interested in getting something readable "quickly". My point was
more
that as an intermediate step, I rather reuse the existing YAML
serialization than creating yet another dump.


>
> Can summaries be standalone? I thought they could (that'd be ideal for
the
>>> distributed situation - only the summary needs to go to the
'thin link'
>>> step, I think? (currently maybe only the debug info is stripped for
that -
>>> but ideally other unused IR wouldn't be shipped there as well,
I would
>>> think)
>>>
>>
>> Yes conceptually they can be standalone.
>>
>
> This seems to provide the strongest/clear motivation for having summaries
> as a first class (though optional) IR construct.
>
No, this provide a strong motivation to have a proper serialization, I
don't see how you connect this to the rest of the IR.


>
> & now there's an effort to correct that.
>>>>>
>>>>
>>>> The main motivation here, I believe, is more to help dev to
have human
>>>> readable/understandable dump for ThinLTO bitcodes. Having to
inspect
>>>> separately summaries is a pain.
>>>>
>>>
>>> Not sure I quite follow - inspect separately?
>>>
>>
>> llvm-dis does not display summaries today, so you can't just use
llvm-dis
>> like a "regular" flow.
>>
>>
>>> How are they inspected today?
>>>
>>
>> llvm-bcanalyzer? And now the YAML dump as well.
>>
>>
>>> & also, I think there are test cases that want to/are currently
testing
>>> summary input but do so somewhat awkwardly by using another tool to
produce
>>> the summary first. Ideally the test case would have the summary
written in
>>> to start, I would think, if that's a codepath worth testing?
>>>
>>
>> The IR already contains all the information, so why repeating it?
>>
>
> For the same reason that it's relevant to test cases which way it's
> encoded, etc (in the same way that the LLVM IR repeats types of uses, for
> example - even though they're totally redundant from a "does this
have all
> the semantic information required) & because it can be standalone.
>
>
>> This makes the test case harder to maintain, in the vast majority, I
>> expect that if a test needs IR then it shouldn't need to include a
summary
>> as well (and vice-versa).
>>
>
> Ah, sorry, I'm not suggesting it should be required - in the same way
it's
> not required in the bitcode. But if you want a summary in the bitcode when
> assembling a .ll file it seems OK To say you write it in the IR,
>
No it does not seem OK to me to write summaries alongside the IR in tests
in general (outside of specific need like testing the round-trip of
course).
It is entirely redundant and I don't perceive any benefit, I don't see
why
you would want to do that?


> and equally if there is a summary in the bitcode it seems reasonable that
> it be printed in the .ll file by llvm-dis.
>
I agree and I advocated for this earlier.

>
>
>> In the majority of test we have we want to check if the importing does
>> what it is supposed to do, and if the linkage are correctly adjusted.
With
>> a YAML (or other) serialization for the summaries this could indeed
been
>> done purely with summaries, without any IR involved.
>>
>
> I'm not sure I understand - you mean for executions of tools that
don't
> need the rest of the IR, there could be a different/separate tool that
> consumes YAML summaries and produces YAML
>
It does not have to be a separate tool: a tool that is looking to operate
purely on summary should just ask to get the summaries out of the input
file. The input being textual or bitcode shouldn't matter much at this
point.
This is exactly how 'opt' and 'llc' operate.

summaries and that would be tested - but the "consuming a summary in
a> bitcode file" would not be?
>
This is exactly what we're doing with (almost) *all* of the .ll test: we
write them as textual, and read them back as textual, and not as bitcode.

> I'm not sure I understand the benefit of this separation and asymmetry
> with the bitcode form of the same data.
>
Have you tried to write a test directly in bitcode? ;)

I'm not sure we're talking about the same thing right now.

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170607/1dc3ab3c/attachment.html>

Mehdi AMINI via llvm-dev

2017-Jun-07 17:04 UTC

head link

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

2017-06-07 10:01 GMT-07:00 Mehdi AMINI <joker.eph at gmail.com>:
>
>
> 2017-06-07 9:44 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>
>>
>>
>> On Tue, Jun 6, 2017 at 2:21 PM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>
>>> 2017-06-06 13:38 GMT-07:00 David Blaikie <dblaikie at
gmail.com>:
>>>
>>>>
>>>>
>>>> On Tue, Jun 6, 2017 at 1:26 PM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>>>
>>>>> 2017-06-05 14:27 GMT-07:00 David Blaikie via llvm-dev <
>>>>> llvm-dev at lists.llvm.org>:
>>>>>
>>>>>> I know there's been a bunch of discussion here
already, but I was
>>>>>> wondering if perhaps someone (probably Teresa? Peter?)
could:
>>>>>>
>>>>>> 1) summarize the current state
>>>>>> 2) describe the end-goal
>>>>>> 3) describe what steps (& how this patch relates)
are planned to get
>>>>>> to (2)
>>>>>>
>>>>>> My naive thoughts, not being intimately familiar with
any of this:
>>>>>> Usually bitcode and textual IR support go in together
or around the same
>>>>>> time, and designed that way from the start (take
r211920 for examaple,
>>>>>> which added an explicit representation of COMDATs to
the IR). This seems to
>>>>>> have been an oversight in the implementation of IR
summaries (is that an
>>>>>> accurate representation/statement?)
>>>>>>
>>>>>
>>>>> More or less: it was not an oversight.
>>>>> The summaries are not really part of the IR, it is more
like an
>>>>> "analysis result" that is serialized. It can
always be recomputed from the
>>>>> IR. This aspect makes it quite "special", it is
the only analysis result
>>>>> that I know of that we serialize.
>>>>>
>>>>
>>>> The use list work seems pretty similar in some ways (granted,
can't be
>>>> recomputed to match, hence the desire to serialize it for test
case
>>>> implementation).
>>>>
>>>
>>> I see use-list as a leaky implementation detail of the IR that we
>>> serialized because it impact the processing of the IR.
>>>
>>> Summaries are more like serializing the CFG for example.
>>>
>>>
>>>> But it looks like the same is true here to a degree - there are
test
>>>> cases that exercise the summary handling, so they want
summaries for input
>>>> (for now, I think, I've seen test cases that run another
LLVM tool to
>>>> insert/create a summary to then feed that back in for a test),
or to test
>>>> that the resulting summary is correct.
>>>>
>>>
>>> We have cases were we want summaries as an input and check a
combined
>>> summary as an output, and for these having the YAML representation
will be
>>> useful (we didn't have it before).
>>>
>>
>> What I'm suggesting is that this is an (optional) IR feature as
much as
>> any other
>>
>
> Well I disagree with this at this point, because I haven't read
anything
> that would support it.
> I'd be happy to revise my position if you were providing any argument
that
> would make this holds in face of any other analysis result.
>
>
>> - so it seems slightly odd that it'd be YAML rather than something
that
>> looked more like the rest of the IR. Though I'm not outright
opposed to
>> YAML here - just want to make sure this information is being treated as
a
>> first class IR construct (as much as use order, comdats, etc are for
rough
>> examples)
>>
>
> YAML was pushed forward as an easy way to get there IIRC. It wasn't set
in
> stone and it was clearly open to change it to a more integrate format.
> So I'm supportive of anyone who would replace this with a more
"textual-IR
> integrated" format, I haven't proposed this in this thread because
Teresa
> is interested in getting something readable "quickly". My point
was more
> that as an intermediate step, I rather reuse the existing YAML
> serialization than creating yet another dump.
>
>
>
>>
>> Can summaries be standalone? I thought they could (that'd be ideal
for
>>>> the distributed situation - only the summary needs to go to the
'thin link'
>>>> step, I think? (currently maybe only the debug info is stripped
for that -
>>>> but ideally other unused IR wouldn't be shipped there as
well, I would
>>>> think)
>>>>
>>>
>>> Yes conceptually they can be standalone.
>>>
>>
>> This seems to provide the strongest/clear motivation for having
summaries
>> as a first class (though optional) IR construct.
>>
>
> No, this provide a strong motivation to have a proper serialization, I
> don't see how you connect this to the rest of the IR.
>
> On the topic of how it is / isn't part of the IR: given a Module orFunction you can't get a summary. It lives on the side in the bitcode, but
there isn't any connection in memory: it is purely an analysis result that
is serialized for convenience.

Note that we considered attaching it to the Module, however it causes
problem around invalidation (like every analysis result, this is a
"view"
of the IR at a given point, changing the IR invalidates it).

-- 
Mehdi


>
>
>>
>> & now there's an effort to correct that.
>>>>>>
>>>>>
>>>>> The main motivation here, I believe, is more to help dev to
have human
>>>>> readable/understandable dump for ThinLTO bitcodes. Having
to inspect
>>>>> separately summaries is a pain.
>>>>>
>>>>
>>>> Not sure I quite follow - inspect separately?
>>>>
>>>
>>> llvm-dis does not display summaries today, so you can't just
use
>>> llvm-dis like a "regular" flow.
>>>
>>>
>>>> How are they inspected today?
>>>>
>>>
>>> llvm-bcanalyzer? And now the YAML dump as well.
>>>
>>>
>>>> & also, I think there are test cases that want to/are
currently testing
>>>> summary input but do so somewhat awkwardly by using another
tool to produce
>>>> the summary first. Ideally the test case would have the summary
written in
>>>> to start, I would think, if that's a codepath worth
testing?
>>>>
>>>
>>> The IR already contains all the information, so why repeating it?
>>>
>>
>> For the same reason that it's relevant to test cases which way
it's
>> encoded, etc (in the same way that the LLVM IR repeats types of uses,
for
>> example - even though they're totally redundant from a "does
this have all
>> the semantic information required) & because it can be standalone.
>>
>
>>
>>> This makes the test case harder to maintain, in the vast majority,
I
>>> expect that if a test needs IR then it shouldn't need to
include a summary
>>> as well (and vice-versa).
>>>
>>
>> Ah, sorry, I'm not suggesting it should be required - in the same
way
>> it's not required in the bitcode. But if you want a summary in the
bitcode
>> when assembling a .ll file it seems OK To say you write it in the IR,
>>
>
> No it does not seem OK to me to write summaries alongside the IR in tests
> in general (outside of specific need like testing the round-trip of
> course).
> It is entirely redundant and I don't perceive any benefit, I don't
see why
> you would want to do that?
>
>
>
>> and equally if there is a summary in the bitcode it seems reasonable
that
>> it be printed in the .ll file by llvm-dis.
>>
>
> I agree and I advocated for this earlier.
>
>
>>
>>
>>> In the majority of test we have we want to check if the importing
does
>>> what it is supposed to do, and if the linkage are correctly
adjusted. With
>>> a YAML (or other) serialization for the summaries this could indeed
been
>>> done purely with summaries, without any IR involved.
>>>
>>
>> I'm not sure I understand - you mean for executions of tools that
don't
>> need the rest of the IR, there could be a different/separate tool that
>> consumes YAML summaries and produces YAML
>>
>
> It does not have to be a separate tool: a tool that is looking to operate
> purely on summary should just ask to get the summaries out of the input
> file. The input being textual or bitcode shouldn't matter much at this
> point.
> This is exactly how 'opt' and 'llc' operate.
>
> summaries and that would be tested - but the "consuming a summary in a
>> bitcode file" would not be?
>>
>
> This is exactly what we're doing with (almost) *all* of the .ll test:
we
> write them as textual, and read them back as textual, and not as bitcode.
>
>
>> I'm not sure I understand the benefit of this separation and
asymmetry
>> with the bitcode form of the same data.
>>
>
> Have you tried to write a test directly in bitcode? ;)
>
> I'm not sure we're talking about the same thing right now.
>
> --
> Mehdi
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170607/e6d79464/attachment.html>

David Blaikie via llvm-dev

2017-Jun-07 17:19 UTC

head link

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

On Wed, Jun 7, 2017 at 10:01 AM Mehdi AMINI <joker.eph at gmail.com>
wrote:
> 2017-06-07 9:44 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>
>>
>>
>> On Tue, Jun 6, 2017 at 2:21 PM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>
>>> 2017-06-06 13:38 GMT-07:00 David Blaikie <dblaikie at
gmail.com>:
>>>
>>>>
>>>>
>>>> On Tue, Jun 6, 2017 at 1:26 PM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>>>
>>>>> 2017-06-05 14:27 GMT-07:00 David Blaikie via llvm-dev <
>>>>> llvm-dev at lists.llvm.org>:
>>>>>
>>>>>> I know there's been a bunch of discussion here
already, but I was
>>>>>> wondering if perhaps someone (probably Teresa? Peter?)
could:
>>>>>>
>>>>>> 1) summarize the current state
>>>>>> 2) describe the end-goal
>>>>>> 3) describe what steps (& how this patch relates)
are planned to get
>>>>>> to (2)
>>>>>>
>>>>>> My naive thoughts, not being intimately familiar with
any of this:
>>>>>> Usually bitcode and textual IR support go in together
or around the same
>>>>>> time, and designed that way from the start (take
r211920 for examaple,
>>>>>> which added an explicit representation of COMDATs to
the IR). This seems to
>>>>>> have been an oversight in the implementation of IR
summaries (is that an
>>>>>> accurate representation/statement?)
>>>>>>
>>>>>
>>>>> More or less: it was not an oversight.
>>>>> The summaries are not really part of the IR, it is more
like an
>>>>> "analysis result" that is serialized. It can
always be recomputed from the
>>>>> IR. This aspect makes it quite "special", it is
the only analysis result
>>>>> that I know of that we serialize.
>>>>>
>>>>
>>>> The use list work seems pretty similar in some ways (granted,
can't be
>>>> recomputed to match, hence the desire to serialize it for test
case
>>>> implementation).
>>>>
>>>
>>> I see use-list as a leaky implementation detail of the IR that we
>>> serialized because it impact the processing of the IR.
>>>
>>> Summaries are more like serializing the CFG for example.
>>>
>>>
>>>> But it looks like the same is true here to a degree - there are
test
>>>> cases that exercise the summary handling, so they want
summaries for input
>>>> (for now, I think, I've seen test cases that run another
LLVM tool to
>>>> insert/create a summary to then feed that back in for a test),
or to test
>>>> that the resulting summary is correct.
>>>>
>>>
>>> We have cases were we want summaries as an input and check a
combined
>>> summary as an output, and for these having the YAML representation
will be
>>> useful (we didn't have it before).
>>>
>>
>> What I'm suggesting is that this is an (optional) IR feature as
much as
>> any other
>>
>
> Well I disagree with this at this point, because I haven't read
anything
> that would support it.
> I'd be happy to revise my position if you were providing any argument
that
> would make this holds in face of any other analysis result.
>
I'm not intending to make an argument that would generalize over other
analysis results, nor following why there should be one. The bitcode
support for this doesn't generalize over other analysis results so I'm
not
sure why the textual IR would.

> - so it seems slightly odd that it'd be YAML rather than something that
>> looked more like the rest of the IR. Though I'm not outright
opposed to
>> YAML here - just want to make sure this information is being treated as
a
>> first class IR construct (as much as use order, comdats, etc are for
rough
>> examples)
>>
>
> YAML was pushed forward as an easy way to get there IIRC. It wasn't set
in
> stone and it was clearly open to change it to a more integrate format.
> So I'm supportive of anyone who would replace this with a more
"textual-IR
> integrated" format, I haven't proposed this in this thread because
Teresa
> is interested in getting something readable "quickly".
>
& I'm interested in pushing back on that a bit to discuss what the
end/desired/ideal goal is here & to understand whether such an intermediate
state is beneficial or whether it'd be better to look at the end goal &
increments to approach that.

> My point was more that as an intermediate step, I rather reuse the
> existing YAML serialization than creating yet another dump.
>
>
>
>>
>> Can summaries be standalone? I thought they could (that'd be ideal
for
>>>> the distributed situation - only the summary needs to go to the
'thin link'
>>>> step, I think? (currently maybe only the debug info is stripped
for that -
>>>> but ideally other unused IR wouldn't be shipped there as
well, I would
>>>> think)
>>>>
>>>
>>> Yes conceptually they can be standalone.
>>>
>>
>> This seems to provide the strongest/clear motivation for having
summaries
>> as a first class (though optional) IR construct.
>>
>
> No, this provide a strong motivation to have a proper serialization, I
> don't see how you connect this to the rest of the IR.
>
Because it can appear both with the rest of the IR and without it? So "it
can be derived from the IR" doesn't seem to hold as a generalization -
you're right that there could be no way to write it beside/with IR, and a
non-IR-like syntax to write it separately. But that doesn't seem like
what's needed/desired for testing - since current tests work around this by
generating bitcode with summaries to test - so bitcode with summary seems
like something that's desired as an input to tests. Feels like IR to me.

> & now there's an effort to correct that.
>>>>>>
>>>>>
>>>>> The main motivation here, I believe, is more to help dev to
have human
>>>>> readable/understandable dump for ThinLTO bitcodes. Having
to inspect
>>>>> separately summaries is a pain.
>>>>>
>>>>
>>>> Not sure I quite follow - inspect separately?
>>>>
>>>
>>> llvm-dis does not display summaries today, so you can't just
use
>>> llvm-dis like a "regular" flow.
>>>
>>>
>>>> How are they inspected today?
>>>>
>>>
>>> llvm-bcanalyzer? And now the YAML dump as well.
>>>
>>>
>>>> & also, I think there are test cases that want to/are
currently testing
>>>> summary input but do so somewhat awkwardly by using another
tool to produce
>>>> the summary first. Ideally the test case would have the summary
written in
>>>> to start, I would think, if that's a codepath worth
testing?
>>>>
>>>
>>> The IR already contains all the information, so why repeating it?
>>>
>>
>> For the same reason that it's relevant to test cases which way
it's
>> encoded, etc (in the same way that the LLVM IR repeats types of uses,
for
>> example - even though they're totally redundant from a "does
this have all
>> the semantic information required) & because it can be standalone.
>>
>
>>
>>> This makes the test case harder to maintain, in the vast majority,
I
>>> expect that if a test needs IR then it shouldn't need to
include a summary
>>> as well (and vice-versa).
>>>
>>
>> Ah, sorry, I'm not suggesting it should be required - in the same
way
>> it's not required in the bitcode. But if you want a summary in the
bitcode
>> when assembling a .ll file it seems OK To say you write it in the IR,
>>
>
> No it does not seem OK to me to write summaries alongside the IR in tests
> in general (outside of specific need like testing the round-trip of
> course).
>
I'm not sure I understand "in tests in general" - the only things
that
would include summaries are tests testing summary handling, in which case
it seems sort of useful to have that be explicit in the IR - at the moment
that's being worked around in tests by running IR through LLVM to generate
a summary, then feeding that back in to test the summary. That doesn't seem
to be ideal testing (in the same way that LLVM isn't tested by running
source code through clang and testing the resulting IR - the two features
are separated and tested independently)

> It is entirely redundant and I don't perceive any benefit, I don't
see why
> you would want to do that?
>
>
>
>> and equally if there is a summary in the bitcode it seems reasonable
that
>> it be printed in the .ll file by llvm-dis.
>>
>
> I agree and I advocated for this earlier.
>
That sounds like IR to me - it's a feature that appears in bitcode and IR
files, roundtrips through either.

> In the majority of test we have we want to check if the importing does
>>> what it is supposed to do, and if the linkage are correctly
adjusted. With
>>> a YAML (or other) serialization for the summaries this could indeed
been
>>> done purely with summaries, without any IR involved.
>>>
>>
>> I'm not sure I understand - you mean for executions of tools that
don't
>> need the rest of the IR, there could be a different/separate tool that
>> consumes YAML summaries and produces YAML
>>
>
> It does not have to be a separate tool: a tool that is looking to operate
> purely on summary should just ask to get the summaries out of the input
> file. The input being textual or bitcode shouldn't matter much at this
> point.
>
Agreed, but I'm not understanding how this isn't IR if it can be textual
or
bitcode and roundtrip through both.

> This is exactly how 'opt' and 'llc' operate.
>
> summaries and that would be tested - but the "consuming a summary in a
>> bitcode file" would not be?
>>
>
> This is exactly what we're doing with (almost) *all* of the .ll test:
we
> write them as textual, and read them back as textual, and not as bitcode.
>
>
>> I'm not sure I understand the benefit of this separation and
asymmetry
>> with the bitcode form of the same data.
>>
>
> Have you tried to write a test directly in bitcode? ;)
>
I didn't mean to suggest writing tests in bitcode - I meant I didn't
understand the asymmetry that it sounded like you were suggesting - where a
summary could be explicitly read from a textual file, but not from a .ll
file (that a .ll file would only contain an implicit summary, I guess,
never an explicit one - unlike bitcode which could contain an explicit
summary, standalone or alongside the rest of the IR).

- Dave

>
> I'm not sure we're talking about the same thing right now.
>
> --
> Mehdi
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170607/5f537e81/attachment-0001.html>

Mehdi AMINI via llvm-dev

2017-Jun-07 18:00 UTC

head link

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

2017-06-07 10:19 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>
>
> On Wed, Jun 7, 2017 at 10:01 AM Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>> 2017-06-07 9:44 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>>
>>>
>>>
>>> On Tue, Jun 6, 2017 at 2:21 PM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>>
>>>> 2017-06-06 13:38 GMT-07:00 David Blaikie <dblaikie at
gmail.com>:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 6, 2017 at 1:26 PM Mehdi AMINI <joker.eph at
gmail.com>
>>>>> wrote:
>>>>>
>>>>>> 2017-06-05 14:27 GMT-07:00 David Blaikie via llvm-dev
<
>>>>>> llvm-dev at lists.llvm.org>:
>>>>>>
>>>>>>> I know there's been a bunch of discussion here
already, but I was
>>>>>>> wondering if perhaps someone (probably Teresa?
Peter?) could:
>>>>>>>
>>>>>>> 1) summarize the current state
>>>>>>> 2) describe the end-goal
>>>>>>> 3) describe what steps (& how this patch
relates) are planned to get
>>>>>>> to (2)
>>>>>>>
>>>>>>> My naive thoughts, not being intimately familiar
with any of this:
>>>>>>> Usually bitcode and textual IR support go in
together or around the same
>>>>>>> time, and designed that way from the start (take
r211920 for examaple,
>>>>>>> which added an explicit representation of COMDATs
to the IR). This seems to
>>>>>>> have been an oversight in the implementation of IR
summaries (is that an
>>>>>>> accurate representation/statement?)
>>>>>>>
>>>>>>
>>>>>> More or less: it was not an oversight.
>>>>>> The summaries are not really part of the IR, it is more
like an
>>>>>> "analysis result" that is serialized. It can
always be recomputed from the
>>>>>> IR. This aspect makes it quite "special", it
is the only analysis result
>>>>>> that I know of that we serialize.
>>>>>>
>>>>>
>>>>> The use list work seems pretty similar in some ways
(granted, can't be
>>>>> recomputed to match, hence the desire to serialize it for
test case
>>>>> implementation).
>>>>>
>>>>
>>>> I see use-list as a leaky implementation detail of the IR that
we
>>>> serialized because it impact the processing of the IR.
>>>>
>>>> Summaries are more like serializing the CFG for example.
>>>>
>>>>
>>>>> But it looks like the same is true here to a degree - there
are test
>>>>> cases that exercise the summary handling, so they want
summaries for input
>>>>> (for now, I think, I've seen test cases that run
another LLVM tool to
>>>>> insert/create a summary to then feed that back in for a
test), or to test
>>>>> that the resulting summary is correct.
>>>>>
>>>>
>>>> We have cases were we want summaries as an input and check a
combined
>>>> summary as an output, and for these having the YAML
representation will be
>>>> useful (we didn't have it before).
>>>>
>>>
>>> What I'm suggesting is that this is an (optional) IR feature as
much as
>>> any other
>>>
>>
>> Well I disagree with this at this point, because I haven't read
anything
>> that would support it.
>> I'd be happy to revise my position if you were providing any
argument
>> that would make this holds in face of any other analysis result.
>>
>
> I'm not intending to make an argument that would generalize over other
> analysis results, nor following why there should be one. The bitcode
> support for this doesn't generalize over other analysis results so
I'm not
> sure why the textual IR would.
>
>
>> - so it seems slightly odd that it'd be YAML rather than something
that
>>> looked more like the rest of the IR. Though I'm not outright
opposed to
>>> YAML here - just want to make sure this information is being
treated as a
>>> first class IR construct (as much as use order, comdats, etc are
for rough
>>> examples)
>>>
>>
>> YAML was pushed forward as an easy way to get there IIRC. It wasn't
set
>> in stone and it was clearly open to change it to a more integrate
format.
>> So I'm supportive of anyone who would replace this with a more
>> "textual-IR integrated" format, I haven't proposed this
in this thread
>> because Teresa is interested in getting something readable
"quickly".
>>
>
> & I'm interested in pushing back on that a bit to discuss what the
> end/desired/ideal goal is here & to understand whether such an
intermediate
> state is beneficial or whether it'd be better to look at the end goal
&
> increments to approach that.
>
>
>> My point was more that as an intermediate step, I rather reuse the
>> existing YAML serialization than creating yet another dump.
>>
>>
>>
>>>
>>> Can summaries be standalone? I thought they could (that'd be
ideal for
>>>>> the distributed situation - only the summary needs to go to
the 'thin link'
>>>>> step, I think? (currently maybe only the debug info is
stripped for that -
>>>>> but ideally other unused IR wouldn't be shipped there
as well, I would
>>>>> think)
>>>>>
>>>>
>>>> Yes conceptually they can be standalone.
>>>>
>>>
>>> This seems to provide the strongest/clear motivation for having
>>> summaries as a first class (though optional) IR construct.
>>>
>>
>> No, this provide a strong motivation to have a proper serialization, I
>> don't see how you connect this to the rest of the IR.
>>
>
> Because it can appear both with the rest of the IR and without it? So
"it
> can be derived from the IR" doesn't seem to hold as a
generalization -
> you're right that there could be no way to write it beside/with IR, and
a
> non-IR-like syntax to write it separately.
>
I haven't advocated for this.
But the fact that this is an analysis result that we serialize alongside
the IR, whether in bitcode form or textual form, does not make it
automatically part of the IR to me.
We just bundle the IR with one of the analysis result because there exist a
consumer for the analysis result that benefit from not having to read all
the IR to get the information. The only reason for the summaries to exist
is a performance one. ThinLTO could be implemented without summaries,
reconstructing them on the fly when needed.


> But that doesn't seem like what's needed/desired for testing -
since
> current tests work around this by generating bitcode with summaries to test
> - so bitcode with summary seems like something that's desired as an
input
> to tests. Feels like IR to me.
>
No it isn't IR, unless you consider that any other analysis result is
"IR".

Another analogy is that if you have an analysis that is trying to find all
the SCCs on the callgraph, its input is the result of the callgraphy
analysis, and its output is a list of SCCs. You could write an input test
as IR and opt would run the callgraph analysis and feed it into the SCC
analysis and you would get the list of SCC as an output for your test. But
you could also have a textual format to write a callgraph and feed it to
your test.
It wouldn't come to my mind to write IR *and* the callgraph that
corresponds to this IR together as an input to the SCC analysis test.




>
>
>> & now there's an effort to correct that.
>>>>>>>
>>>>>>
>>>>>> The main motivation here, I believe, is more to help
dev to have
>>>>>> human readable/understandable dump for ThinLTO
bitcodes. Having to inspect
>>>>>> separately summaries is a pain.
>>>>>>
>>>>>
>>>>> Not sure I quite follow - inspect separately?
>>>>>
>>>>
>>>> llvm-dis does not display summaries today, so you can't
just use
>>>> llvm-dis like a "regular" flow.
>>>>
>>>>
>>>>> How are they inspected today?
>>>>>
>>>>
>>>> llvm-bcanalyzer? And now the YAML dump as well.
>>>>
>>>>
>>>>> & also, I think there are test cases that want to/are
currently
>>>>> testing summary input but do so somewhat awkwardly by using
another tool to
>>>>> produce the summary first. Ideally the test case would have
the summary
>>>>> written in to start, I would think, if that's a
codepath worth testing?
>>>>>
>>>>
>>>> The IR already contains all the information, so why repeating
it?
>>>>
>>>
>>> For the same reason that it's relevant to test cases which way
it's
>>> encoded, etc (in the same way that the LLVM IR repeats types of
uses, for
>>> example - even though they're totally redundant from a
"does this have all
>>> the semantic information required) & because it can be
standalone.
>>>
>>
>>>
>>>> This makes the test case harder to maintain, in the vast
majority, I
>>>> expect that if a test needs IR then it shouldn't need to
include a summary
>>>> as well (and vice-versa).
>>>>
>>>
>>> Ah, sorry, I'm not suggesting it should be required - in the
same way
>>> it's not required in the bitcode. But if you want a summary in
the bitcode
>>> when assembling a .ll file it seems OK To say you write it in the
IR,
>>>
>>
>> No it does not seem OK to me to write summaries alongside the IR in
tests
>> in general (outside of specific need like testing the round-trip of
>> course).
>>
>
> I'm not sure I understand "in tests in general" - the only
things that
> would include summaries are tests testing summary handling, in which case
> it seems sort of useful to have that be explicit in the IR
>
Why?
If you want to test summary handling, why would you need the IR? (cf above).

> - at the moment that's being worked around in tests by running IR
through
> LLVM to generate a summary, then feeding that back in to test the summary.
> That doesn't seem to be ideal testing (in the same way that LLVM
isn't
> tested by running source code through clang and testing the resulting IR -
> the two features are separated and tested independently)
>
This is what we do for every analysis (cf callgraph analogy above).



>
>
>> It is entirely redundant and I don't perceive any benefit, I
don't see
>> why you would want to do that?
>>
>>
>>
>>> and equally if there is a summary in the bitcode it seems
reasonable
>>> that it be printed in the .ll file by llvm-dis.
>>>
>>
>> I agree and I advocated for this earlier.
>>
>
> That sounds like IR to me - it's a feature that appears in bitcode and
IR
> files, roundtrips through either.
>
Let's define "IR" then. That does not pass my bar for IR:
serializing an
analysis result is not enough to change its fundamental properties that it
isn't part of the IR per-se.



>
>
>> In the majority of test we have we want to check if the importing does
>>>> what it is supposed to do, and if the linkage are correctly
adjusted. With
>>>> a YAML (or other) serialization for the summaries this could
indeed been
>>>> done purely with summaries, without any IR involved.
>>>>
>>>
>>> I'm not sure I understand - you mean for executions of tools
that don't
>>> need the rest of the IR, there could be a different/separate tool
that
>>> consumes YAML summaries and produces YAML
>>>
>>
>> It does not have to be a separate tool: a tool that is looking to
operate
>> purely on summary should just ask to get the summaries out of the input
>> file. The input being textual or bitcode shouldn't matter much at
this
>> point.
>>
>
> Agreed, but I'm not understanding how this isn't IR if it can be
textual
> or bitcode and roundtrip through both.
>
My best guess is that we don't have the same definition for "IR".

>
>
>> This is exactly how 'opt' and 'llc' operate.
>>
>> summaries and that would be tested - but the "consuming a summary
in a
>>> bitcode file" would not be?
>>>
>>
>> This is exactly what we're doing with (almost) *all* of the .ll
test: we
>> write them as textual, and read them back as textual, and not as
bitcode.
>>
>>
>>> I'm not sure I understand the benefit of this separation and
asymmetry
>>> with the bitcode form of the same data.
>>>
>>
>> Have you tried to write a test directly in bitcode? ;)
>>
>
> I didn't mean to suggest writing tests in bitcode - I meant I
didn't
> understand the asymmetry that it sounded like you were suggesting - where a
> summary could be explicitly read from a textual file, but not from a .ll
> file (that a .ll file would only contain an implicit summary, I guess,
> never an explicit one - unlike bitcode which could contain an explicit
> summary, standalone or alongside the rest of the IR).
>
I don't think this represent my position. I mentioned earlier in the thread
that we should have "llvm-dis | llvm-as" being able to round-trip
bitcode,
including summaries when present.
>This does not mean that testing how we handle summaries in ThinLTO requires
to write *both* IR and summary in the textual input.
I'm advocating that this would b a terrible way of writing test, which I
believe is an orthogonal concern to *supporting* round-tripping bitcode
files that contain summaries.

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170607/3cdb74fd/attachment.html>

llvm dev - Jun 2017 - [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format