thr3ads.net - llvm dev - [llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format [Jul 2017]

If this information is useful, please help other people find it:
Share via:

David Blaikie via llvm-dev

2017-Jul-19 15:31 UTC

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

On Mon, Jul 17, 2017 at 5:18 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:
> 2017-07-17 16:49 GMT-07:00 David Blaikie via llvm-dev <
> llvm-dev at lists.llvm.org>:
>
>>
>>
>> On Mon, Jul 17, 2017 at 6:11 AM Charles Saternos via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hey @chandlerc and @dblaikie,
>>>
>>> Any updates on this in relation to "[PATCH] D34080: [ThinLTO]
Add
>>> dump-summary command to llvm-lto2 tool"?
>>>
>>
>> Sorry you've kind of got stuck in the middle of this - but I'm
still
>> hoping to hear/understand the pushback on implementing this as a first
>> class .ll construct with serialization and deserialization support.
>>
>> I think Peter mentioned he didn't think this was the right path
forward
>> in the long term? If that's the case, I'd like to understand
that/reach
>> that conclusion for the project now rather than treating this as a
stop-gap
>> with some idea that in the future someone might implement full
>> serialization support (when it's been over a year already, and
other stop
>> gaps have been implemented (the yaml input support) already).
>>
>
> I'm totally believing we need first class serialization support in .ll,
> and I have a path forward for this (just not a lot of time to dedicate to
> this).
>
What's the rough expectation of time/complexity for this path forward?

> & if a .ll construct with serialization/deserialization is the path
>> forward, understanding the motivation for a something other than going
>> straight for that would be helpful -usually bitcode features come with
.ll
>> support from day 1, not a year later. I'm not clear on what would
make this
>> feature an exception/more expensive to do this for (& why it would
be worth
>> deferring that work, and what/when that work will be motivated/done)
>>
>
>
> We need a debugging tool for summaries ASAP, and the YAML is *already*
> implemented.
>
I'm not sure I understand why the tradeoff is worthwhile - in terms of
needing to add a new feature (even if it's already implemented) and tests,
then porting those tests to a first-class .ll construct later. Usually
adding .ll formats doesn't seem to be terribly expensive/time intensive.

> Making it available through the llvm-lto tool is a no-brainer to me.
>
> This was *not* an oversight but a deliberate choice to not do this in the
> first place. Because summaries are the first bitcode feature I know of that
> isn't attached in any way to a Module (you can't get to it from a
Module).
>
Not sure I quite follow why that difference made the choice/tradeoff here
different (which admittedly is a bit easier to see in retrospect maybe -
now that there's been a need to build serialization and deserialization).
Do you mean it wasn't clear that serialization support was needed when
summaries were first implemented, but it is clear now?

- Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170719/213af952/attachment.html>

Teresa Johnson via llvm-dev

2017-Jul-19 15:43 UTC

head link

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

On Wed, Jul 19, 2017 at 8:31 AM, David Blaikie <dblaikie at gmail.com>
wrote:
>
>
> On Mon, Jul 17, 2017 at 5:18 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>> 2017-07-17 16:49 GMT-07:00 David Blaikie via llvm-dev <
>> llvm-dev at lists.llvm.org>:
>>
>>>
>>>
>>> On Mon, Jul 17, 2017 at 6:11 AM Charles Saternos via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Hey @chandlerc and @dblaikie,
>>>>
>>>> Any updates on this in relation to "[PATCH] D34080:
[ThinLTO] Add
>>>> dump-summary command to llvm-lto2 tool"?
>>>>
>>>
>>> Sorry you've kind of got stuck in the middle of this - but
I'm still
>>> hoping to hear/understand the pushback on implementing this as a
first
>>> class .ll construct with serialization and deserialization support.
>>>
>>> I think Peter mentioned he didn't think this was the right path
forward
>>> in the long term? If that's the case, I'd like to
understand that/reach
>>> that conclusion for the project now rather than treating this as a
stop-gap
>>> with some idea that in the future someone might implement full
>>> serialization support (when it's been over a year already, and
other stop
>>> gaps have been implemented (the yaml input support) already).
>>>
>>
>> I'm totally believing we need first class serialization support in
.ll,
>> and I have a path forward for this (just not a lot of time to dedicate
to
>> this).
>>
>
> What's the rough expectation of time/complexity for this path forward?
>
>
>> & if a .ll construct with serialization/deserialization is the path
>>> forward, understanding the motivation for a something other than
going
>>> straight for that would be helpful -usually bitcode features come
with .ll
>>> support from day 1, not a year later. I'm not clear on what
would make this
>>> feature an exception/more expensive to do this for (& why it
would be worth
>>> deferring that work, and what/when that work will be
motivated/done)
>>>
>>
>>
>> We need a debugging tool for summaries ASAP, and the YAML is *already*
>> implemented.
>>
>
> I'm not sure I understand why the tradeoff is worthwhile - in terms of
> needing to add a new feature (even if it's already implemented) and
tests,
> then porting those tests to a first-class .ll construct later. Usually
> adding .ll formats doesn't seem to be terribly expensive/time
intensive.
>
The main complication I see is defining the behavior when the serialized
summary is read back in.
1) Do we trust that it is correct and consistent with the IR and blindly
use it? That could cause some issues if someone changes the IR in a .ll
file for testing and doesn't realize they need to also update the summary
correspondingly.
2) Do we always want to build the summary from the IR and check it against
the summary read from the .ll file? In that case, what is even the use of
building a summary from the serialized form?
3) If we want to allow tweaks to the summary in the .ll that override what
is in the IR, for testing purposes, how does any checking we do in 2)
distinguish between the case in 1) (user error) and 3) (intended
difference)?

This is why I suggested emitting as comments in the .ll file initially
(which is useful for debugging purposes, although the YAML works fine for
that too), while the above are hashed out.

>
>> Making it available through the llvm-lto tool is a no-brainer to me.
>>
>> This was *not* an oversight but a deliberate choice to not do this in
the
>> first place. Because summaries are the first bitcode feature I know of
that
>> isn't attached in any way to a Module (you can't get to it from
a Module).
>>
>
> Not sure I quite follow why that difference made the choice/tradeoff here
> different (which admittedly is a bit easier to see in retrospect maybe -
> now that there's been a need to build serialization and
deserialization).
> Do you mean it wasn't clear that serialization support was needed when
> summaries were first implemented, but it is clear now?
>
Speaking for myself, it wasn't clear to me that serialization support was
needed for anything other than debugging/testing, since it is redundant
with and computed from the IR, and I wasn't sure emitting into the .ll file
was the right way for debugging. Which is why the testing used
llvm-bcanalyzer -dump.

Teresa


>
> - Dave
>


-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170719/90b2a33d/attachment.html>

Mehdi AMINI via llvm-dev

2017-Jul-19 15:47 UTC

head link

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

2017-07-19 8:31 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>
>
> On Mon, Jul 17, 2017 at 5:18 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>> 2017-07-17 16:49 GMT-07:00 David Blaikie via llvm-dev <
>> llvm-dev at lists.llvm.org>:
>>
>>>
>>>
>>> On Mon, Jul 17, 2017 at 6:11 AM Charles Saternos via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Hey @chandlerc and @dblaikie,
>>>>
>>>> Any updates on this in relation to "[PATCH] D34080:
[ThinLTO] Add
>>>> dump-summary command to llvm-lto2 tool"?
>>>>
>>>
>>> Sorry you've kind of got stuck in the middle of this - but
I'm still
>>> hoping to hear/understand the pushback on implementing this as a
first
>>> class .ll construct with serialization and deserialization support.
>>>
>>> I think Peter mentioned he didn't think this was the right path
forward
>>> in the long term? If that's the case, I'd like to
understand that/reach
>>> that conclusion for the project now rather than treating this as a
stop-gap
>>> with some idea that in the future someone might implement full
>>> serialization support (when it's been over a year already, and
other stop
>>> gaps have been implemented (the yaml input support) already).
>>>
>>
>> I'm totally believing we need first class serialization support in
.ll,
>> and I have a path forward for this (just not a lot of time to dedicate
to
>> this).
>>
>
> What's the rough expectation of time/complexity for this path forward?
>
It depends, I think that either:

1) We unblock the progress on the independent YAML side, so that folks can
work / debug efficiently. We have time to hammer the right syntax and
implement it for .ll
2) We don't move forward with the independent YAML files, but since we need
something to be able to work with, we start by integrating YAML in .ll so
that we have the round-trip feature available ASAP, and folks can debug as
well. We can then gradually move to another syntax since .ll format isn't
stable.

I'm quite unhappy that this is been stalled right now.


> & if a .ll construct with serialization/deserialization is the path
>>> forward, understanding the motivation for a something other than
going
>>> straight for that would be helpful -usually bitcode features come
with .ll
>>> support from day 1, not a year later. I'm not clear on what
would make this
>>> feature an exception/more expensive to do this for (& why it
would be worth
>>> deferring that work, and what/when that work will be
motivated/done)
>>>
>>
>>
>> We need a debugging tool for summaries ASAP, and the YAML is *already*
>> implemented.
>>
>
> I'm not sure I understand why the tradeoff is worthwhile -
>
Because (as Teresa mentioned separately) debugging issues with
llvm-bc-analyzer is terrible.

> in terms of needing to add a new feature (even if it's already
> implemented) and tests, then porting those tests to a first-class .ll
> construct later.
>
The feature we're talking about here is just the dump part: i.e. it
provides a debugging tool for developer. It does not allow to write test
that would read it.
This is helpful to anyone that has to understand "what's going on"
when
there something unexpected with ThinLTO.

> Usually adding .ll formats doesn't seem to be terribly expensive/time
> intensive.
>
I'm fine if you want to do it.

>
>
>> Making it available through the llvm-lto tool is a no-brainer to me.
>>
>> This was *not* an oversight but a deliberate choice to not do this in
the
>> first place. Because summaries are the first bitcode feature I know of
that
>> isn't attached in any way to a Module (you can't get to it from
a Module).
>>
>
> Not sure I quite follow why that difference made the choice/tradeoff here
> different
>
You keep mentioning that the lack of .ll was an oversight:
1) why are you writing that if it does not make any difference for the
current discussion?
2) I had to rectify the facts at some point.


> (which admittedly is a bit easier to see in retrospect maybe - now that
> there's been a need to build serialization and deserialization). Do you
> mean it wasn't clear that serialization support was needed when
summaries
> were first implemented, but it is clear now?
>
You mean *textual* serialization I assume (we have bitcode serialization
from the beginning of course).
So, no it wasn't clear that it was *needed*, I think we saw it as "nice
to
have". We talked about it multiple times, but we figured that since the
summaries a produced from the textual IR, you can write your test in IR and
generate the summaries on demand. That got us a long way!
Debugging has been quite annoying though (llvm-bc-analyzer and custom
"printf" here and there).

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170719/7fd94bf6/attachment-0001.html>

David Blaikie via llvm-dev

2017-Jul-19 15:52 UTC

head link

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

On Wed, Jul 19, 2017 at 8:43 AM Teresa Johnson <tejohnson at google.com>
wrote:
> On Wed, Jul 19, 2017 at 8:31 AM, David Blaikie <dblaikie at
gmail.com> wrote:
>
>>
>>
>> On Mon, Jul 17, 2017 at 5:18 PM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>
>>> 2017-07-17 16:49 GMT-07:00 David Blaikie via llvm-dev <
>>> llvm-dev at lists.llvm.org>:
>>>
>>>>
>>>>
>>>> On Mon, Jul 17, 2017 at 6:11 AM Charles Saternos via llvm-dev
<
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hey @chandlerc and @dblaikie,
>>>>>
>>>>> Any updates on this in relation to "[PATCH] D34080:
[ThinLTO] Add
>>>>> dump-summary command to llvm-lto2 tool"?
>>>>>
>>>>
>>>> Sorry you've kind of got stuck in the middle of this - but
I'm still
>>>> hoping to hear/understand the pushback on implementing this as
a first
>>>> class .ll construct with serialization and deserialization
support.
>>>>
>>>> I think Peter mentioned he didn't think this was the right
path forward
>>>> in the long term? If that's the case, I'd like to
understand that/reach
>>>> that conclusion for the project now rather than treating this
as a stop-gap
>>>> with some idea that in the future someone might implement full
>>>> serialization support (when it's been over a year already,
and other stop
>>>> gaps have been implemented (the yaml input support) already).
>>>>
>>>
>>> I'm totally believing we need first class serialization support
in .ll,
>>> and I have a path forward for this (just not a lot of time to
dedicate to
>>> this).
>>>
>>
>> What's the rough expectation of time/complexity for this path
forward?
>>
>>
>>> & if a .ll construct with serialization/deserialization is the
path
>>>> forward, understanding the motivation for a something other
than going
>>>> straight for that would be helpful -usually bitcode features
come with .ll
>>>> support from day 1, not a year later. I'm not clear on what
would make this
>>>> feature an exception/more expensive to do this for (& why
it would be worth
>>>> deferring that work, and what/when that work will be
motivated/done)
>>>>
>>>
>>>
>>> We need a debugging tool for summaries ASAP, and the YAML is
*already*
>>> implemented.
>>>
>>
>> I'm not sure I understand why the tradeoff is worthwhile - in terms
of
>> needing to add a new feature (even if it's already implemented) and
tests,
>> then porting those tests to a first-class .ll construct later. Usually
>> adding .ll formats doesn't seem to be terribly expensive/time
intensive.
>>
>
> The main complication I see is defining the behavior when the serialized
> summary is read back in.
> 1) Do we trust that it is correct and consistent with the IR and blindly
> use it? That could cause some issues if someone changes the IR in a .ll
> file for testing and doesn't realize they need to also update the
summary
> correspondingly.
>
Generally textual IR is assumed to be bogus and is validated for invariants
like this.

> 2) Do we always want to build the summary from the IR and check it against
> the summary read from the .ll file? In that case, what is even the use of
> building a summary from the serialized form?
>
Isn't that why the YAML support was added in the first place - for reading
in summaries for test cases?

> 3) If we want to allow tweaks to the summary in the .ll that override what
> is in the IR, for testing purposes, how does any checking we do in
>
I'm not sure why we would - what would be tested that way? If the
invariants of the summary are violated presumably the behavior of LLVM is
undefined & so there's nothing to test/no expected/defined/usable
behavior
there.

> 2) distinguish between the case in 1) (user error) and 3) (intended
> difference)?
>
> This is why I suggested emitting as comments in the .ll file initially
> (which is useful for debugging purposes, although the YAML works fine for
> that too), while the above are hashed out.
>
>
>>
>>> Making it available through the llvm-lto tool is a no-brainer to
me.
>>>
>>> This was *not* an oversight but a deliberate choice to not do this
in
>>> the first place. Because summaries are the first bitcode feature I
know of
>>> that isn't attached in any way to a Module (you can't get
to it from a
>>> Module).
>>>
>>
>> Not sure I quite follow why that difference made the choice/tradeoff
here
>> different (which admittedly is a bit easier to see in retrospect maybe
-
>> now that there's been a need to build serialization and
deserialization).
>> Do you mean it wasn't clear that serialization support was needed
when
>> summaries were first implemented, but it is clear now?
>>
>
> Speaking for myself, it wasn't clear to me that serialization support
was
> needed for anything other than debugging/testing,
>
The textual IR in its entirety basically exists only for debugging/testing
- but it's a big 'only'. (as has become apparent in this case with
experience, by the looks of it)

(not a criticism of you/your work - just that this should've been caught in
code review, I think)

> since it is redundant with and computed from the IR, and I wasn't sure
> emitting into the .ll file was the right way for debugging. Which is why
> the testing used llvm-bcanalyzer -dump.
>
> Teresa
>
>
>
>>
>> - Dave
>>
>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
> 408-460-2413 <(408)%20460-2413>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170719/8334f12c/attachment.html>

David Blaikie via llvm-dev

2017-Jul-19 16:00 UTC

head link

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

On Wed, Jul 19, 2017 at 8:47 AM Mehdi AMINI <joker.eph at gmail.com>
wrote:
> 2017-07-19 8:31 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>
>>
>>
>> On Mon, Jul 17, 2017 at 5:18 PM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>
>>> 2017-07-17 16:49 GMT-07:00 David Blaikie via llvm-dev <
>>> llvm-dev at lists.llvm.org>:
>>>
>>>>
>>>>
>>>> On Mon, Jul 17, 2017 at 6:11 AM Charles Saternos via llvm-dev
<
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hey @chandlerc and @dblaikie,
>>>>>
>>>>> Any updates on this in relation to "[PATCH] D34080:
[ThinLTO] Add
>>>>> dump-summary command to llvm-lto2 tool"?
>>>>>
>>>>
>>>> Sorry you've kind of got stuck in the middle of this - but
I'm still
>>>> hoping to hear/understand the pushback on implementing this as
a first
>>>> class .ll construct with serialization and deserialization
support.
>>>>
>>>> I think Peter mentioned he didn't think this was the right
path forward
>>>> in the long term? If that's the case, I'd like to
understand that/reach
>>>> that conclusion for the project now rather than treating this
as a stop-gap
>>>> with some idea that in the future someone might implement full
>>>> serialization support (when it's been over a year already,
and other stop
>>>> gaps have been implemented (the yaml input support) already).
>>>>
>>>
>>> I'm totally believing we need first class serialization support
in .ll,
>>> and I have a path forward for this (just not a lot of time to
dedicate to
>>> this).
>>>
>>
>> What's the rough expectation of time/complexity for this path
forward?
>>
>
> It depends, I think that either:
>
> 1) We unblock the progress on the independent YAML side, so that folks can
> work / debug efficiently. We have time to hammer the right syntax and
> implement it for .ll
> 2) We don't move forward with the independent YAML files, but since we
> need something to be able to work with, we start by integrating YAML in .ll
> so that we have the round-trip feature available ASAP, and folks can debug
> as well. We can then gradually move to another syntax since .ll format
> isn't stable.
>
I'm not wholely against (2), really - though I'm not sure that the
incremental cost of designing something more suitable/fitting in the
textual IR is a huge deal here - is it?

> I'm quite unhappy that this is been stalled right now.
>
>
>
>> & if a .ll construct with serialization/deserialization is the path
>>>> forward, understanding the motivation for a something other
than going
>>>> straight for that would be helpful -usually bitcode features
come with .ll
>>>> support from day 1, not a year later. I'm not clear on what
would make this
>>>> feature an exception/more expensive to do this for (& why
it would be worth
>>>> deferring that work, and what/when that work will be
motivated/done)
>>>>
>>>
>>>
>>> We need a debugging tool for summaries ASAP, and the YAML is
*already*
>>> implemented.
>>>
>>
>> I'm not sure I understand why the tradeoff is worthwhile -
>>
>
> Because (as Teresa mentioned separately) debugging issues with
> llvm-bc-analyzer is terrible.
>
Sure, I understand the state now is bad - I don't understand why the cost
of implementing the usual textual format is so expensive that the tradeoff
of implementing some other format, writing tests, then implementing a new
format and upgrading tests is worth it compared to frontloading that work.

> in terms of needing to add a new feature (even if it's already
>> implemented) and tests, then porting those tests to a first-class .ll
>> construct later.
>>
>
> The feature we're talking about here is just the dump part: i.e. it
> provides a debugging tool for developer.
>
It does not allow to write test that would read it.> This is helpful to anyone that has to understand "what's going
on" when
> there something unexpected with ThinLTO.
>
*nod* I certainly appreciate the value of having an output format.

> Usually adding .ll formats doesn't seem to be terribly expensive/time
>> intensive.
>>
>
> I'm fine if you want to do it.
>
>
>>
>>
>>> Making it available through the llvm-lto tool is a no-brainer to
me.
>>>
>>> This was *not* an oversight but a deliberate choice to not do this
in
>>> the first place. Because summaries are the first bitcode feature I
know of
>>> that isn't attached in any way to a Module (you can't get
to it from a
>>> Module).
>>>
>>
>> Not sure I quite follow why that difference made the choice/tradeoff
here
>> different
>>
>
> You keep mentioning that the lack of .ll was an oversight:
> 1) why are you writing that if it does not make any difference for the
> current discussion?
>
Fair point - I mention it because it makes a difference to me in terms of
how the work would be prioritized - if it's already technical debt and has
been for over a year, I'm less inclined to pile on more/keep going in that
direction compared to if it were "this was the right direction, now
there's
a new direction but a stop-gap would be handy", basically.

> 2) I had to rectify the facts at some point.
>
>
>
>> (which admittedly is a bit easier to see in retrospect maybe - now that
>> there's been a need to build serialization and deserialization). Do
you
>> mean it wasn't clear that serialization support was needed when
summaries
>> were first implemented, but it is clear now?
>>
>
> You mean *textual* serialization I assume (we have bitcode serialization
> from the beginning of course).
>
Yes.

> So, no it wasn't clear that it was *needed*, I think we saw it as
"nice to
> have". We talked about it multiple times, but we figured that since
the
> summaries a produced from the textual IR, you can write your test in IR and
> generate the summaries on demand. That got us a long way!
> Debugging has been quite annoying though (llvm-bc-analyzer and custom
> "printf" here and there).
>
> --
> Mehdi
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170719/fcedeb5b/attachment.html>

llvm dev - Jul 2017 - [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format