thr3ads.net - llvm dev - [llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format [Jul 2017]

If this information is useful, please help other people find it:
Share via:

David Blaikie via llvm-dev

2017-Jul-19 15:52 UTC

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

On Wed, Jul 19, 2017 at 8:43 AM Teresa Johnson <tejohnson at google.com>
wrote:
> On Wed, Jul 19, 2017 at 8:31 AM, David Blaikie <dblaikie at
gmail.com> wrote:
>
>>
>>
>> On Mon, Jul 17, 2017 at 5:18 PM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>
>>> 2017-07-17 16:49 GMT-07:00 David Blaikie via llvm-dev <
>>> llvm-dev at lists.llvm.org>:
>>>
>>>>
>>>>
>>>> On Mon, Jul 17, 2017 at 6:11 AM Charles Saternos via llvm-dev
<
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hey @chandlerc and @dblaikie,
>>>>>
>>>>> Any updates on this in relation to "[PATCH] D34080:
[ThinLTO] Add
>>>>> dump-summary command to llvm-lto2 tool"?
>>>>>
>>>>
>>>> Sorry you've kind of got stuck in the middle of this - but
I'm still
>>>> hoping to hear/understand the pushback on implementing this as
a first
>>>> class .ll construct with serialization and deserialization
support.
>>>>
>>>> I think Peter mentioned he didn't think this was the right
path forward
>>>> in the long term? If that's the case, I'd like to
understand that/reach
>>>> that conclusion for the project now rather than treating this
as a stop-gap
>>>> with some idea that in the future someone might implement full
>>>> serialization support (when it's been over a year already,
and other stop
>>>> gaps have been implemented (the yaml input support) already).
>>>>
>>>
>>> I'm totally believing we need first class serialization support
in .ll,
>>> and I have a path forward for this (just not a lot of time to
dedicate to
>>> this).
>>>
>>
>> What's the rough expectation of time/complexity for this path
forward?
>>
>>
>>> & if a .ll construct with serialization/deserialization is the
path
>>>> forward, understanding the motivation for a something other
than going
>>>> straight for that would be helpful -usually bitcode features
come with .ll
>>>> support from day 1, not a year later. I'm not clear on what
would make this
>>>> feature an exception/more expensive to do this for (& why
it would be worth
>>>> deferring that work, and what/when that work will be
motivated/done)
>>>>
>>>
>>>
>>> We need a debugging tool for summaries ASAP, and the YAML is
*already*
>>> implemented.
>>>
>>
>> I'm not sure I understand why the tradeoff is worthwhile - in terms
of
>> needing to add a new feature (even if it's already implemented) and
tests,
>> then porting those tests to a first-class .ll construct later. Usually
>> adding .ll formats doesn't seem to be terribly expensive/time
intensive.
>>
>
> The main complication I see is defining the behavior when the serialized
> summary is read back in.
> 1) Do we trust that it is correct and consistent with the IR and blindly
> use it? That could cause some issues if someone changes the IR in a .ll
> file for testing and doesn't realize they need to also update the
summary
> correspondingly.
>
Generally textual IR is assumed to be bogus and is validated for invariants
like this.

> 2) Do we always want to build the summary from the IR and check it against
> the summary read from the .ll file? In that case, what is even the use of
> building a summary from the serialized form?
>
Isn't that why the YAML support was added in the first place - for reading
in summaries for test cases?

> 3) If we want to allow tweaks to the summary in the .ll that override what
> is in the IR, for testing purposes, how does any checking we do in
>
I'm not sure why we would - what would be tested that way? If the
invariants of the summary are violated presumably the behavior of LLVM is
undefined & so there's nothing to test/no expected/defined/usable
behavior
there.

> 2) distinguish between the case in 1) (user error) and 3) (intended
> difference)?
>
> This is why I suggested emitting as comments in the .ll file initially
> (which is useful for debugging purposes, although the YAML works fine for
> that too), while the above are hashed out.
>
>
>>
>>> Making it available through the llvm-lto tool is a no-brainer to
me.
>>>
>>> This was *not* an oversight but a deliberate choice to not do this
in
>>> the first place. Because summaries are the first bitcode feature I
know of
>>> that isn't attached in any way to a Module (you can't get
to it from a
>>> Module).
>>>
>>
>> Not sure I quite follow why that difference made the choice/tradeoff
here
>> different (which admittedly is a bit easier to see in retrospect maybe
-
>> now that there's been a need to build serialization and
deserialization).
>> Do you mean it wasn't clear that serialization support was needed
when
>> summaries were first implemented, but it is clear now?
>>
>
> Speaking for myself, it wasn't clear to me that serialization support
was
> needed for anything other than debugging/testing,
>
The textual IR in its entirety basically exists only for debugging/testing
- but it's a big 'only'. (as has become apparent in this case with
experience, by the looks of it)

(not a criticism of you/your work - just that this should've been caught in
code review, I think)

> since it is redundant with and computed from the IR, and I wasn't sure
> emitting into the .ll file was the right way for debugging. Which is why
> the testing used llvm-bcanalyzer -dump.
>
> Teresa
>
>
>
>>
>> - Dave
>>
>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
> 408-460-2413 <(408)%20460-2413>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170719/8334f12c/attachment.html>

Teresa Johnson via llvm-dev

2017-Jul-19 15:58 UTC

head link

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

On Wed, Jul 19, 2017 at 8:52 AM, David Blaikie <dblaikie at gmail.com>
wrote:
>
>
> On Wed, Jul 19, 2017 at 8:43 AM Teresa Johnson <tejohnson at
google.com>
> wrote:
>
>> On Wed, Jul 19, 2017 at 8:31 AM, David Blaikie <dblaikie at
gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Mon, Jul 17, 2017 at 5:18 PM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>>
>>>> 2017-07-17 16:49 GMT-07:00 David Blaikie via llvm-dev <
>>>> llvm-dev at lists.llvm.org>:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Jul 17, 2017 at 6:11 AM Charles Saternos via
llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Hey @chandlerc and @dblaikie,
>>>>>>
>>>>>> Any updates on this in relation to "[PATCH]
D34080: [ThinLTO] Add
>>>>>> dump-summary command to llvm-lto2 tool"?
>>>>>>
>>>>>
>>>>> Sorry you've kind of got stuck in the middle of this -
but I'm still
>>>>> hoping to hear/understand the pushback on implementing this
as a first
>>>>> class .ll construct with serialization and deserialization
support.
>>>>>
>>>>> I think Peter mentioned he didn't think this was the
right path
>>>>> forward in the long term? If that's the case, I'd
like to understand
>>>>> that/reach that conclusion for the project now rather than
treating this as
>>>>> a stop-gap with some idea that in the future someone might
implement full
>>>>> serialization support (when it's been over a year
already, and other stop
>>>>> gaps have been implemented (the yaml input support)
already).
>>>>>
>>>>
>>>> I'm totally believing we need first class serialization
support in .ll,
>>>> and I have a path forward for this (just not a lot of time to
dedicate to
>>>> this).
>>>>
>>>
>>> What's the rough expectation of time/complexity for this path
forward?
>>>
>>>
>>>> & if a .ll construct with serialization/deserialization is
the path
>>>>> forward, understanding the motivation for a something other
than going
>>>>> straight for that would be helpful -usually bitcode
features come with .ll
>>>>> support from day 1, not a year later. I'm not clear on
what would make this
>>>>> feature an exception/more expensive to do this for (&
why it would be worth
>>>>> deferring that work, and what/when that work will be
motivated/done)
>>>>>
>>>>
>>>>
>>>> We need a debugging tool for summaries ASAP, and the YAML is
*already*
>>>> implemented.
>>>>
>>>
>>> I'm not sure I understand why the tradeoff is worthwhile - in
terms of
>>> needing to add a new feature (even if it's already implemented)
and tests,
>>> then porting those tests to a first-class .ll construct later.
Usually
>>> adding .ll formats doesn't seem to be terribly expensive/time
intensive.
>>>
>>
>> The main complication I see is defining the behavior when the
serialized
>> summary is read back in.
>> 1) Do we trust that it is correct and consistent with the IR and
blindly
>> use it? That could cause some issues if someone changes the IR in a .ll
>> file for testing and doesn't realize they need to also update the
summary
>> correspondingly.
>>
>
> Generally textual IR is assumed to be bogus and is validated for
> invariants like this.
>
So why have support to read it in from the .ll at all?

>
>
>> 2) Do we always want to build the summary from the IR and check it
>> against the summary read from the .ll file? In that case, what is even
the
>> use of building a summary from the serialized form?
>>
>
> Isn't that why the YAML support was added in the first place - for
reading
> in summaries for test cases?
>
Peter added it for testing early ThinLTO CFI support since it predated the
summary building support for these features.

>
>
>> 3) If we want to allow tweaks to the summary in the .ll that override
>> what is in the IR, for testing purposes, how does any checking we do in
>>
>
> I'm not sure why we would - what would be tested that way? If the
> invariants of the summary are violated presumably the behavior of LLVM is
> undefined & so there's nothing to test/no expected/defined/usable
behavior
> there.
>
So again, why bother having support to serialize it back in at all?

>
>
>> 2) distinguish between the case in 1) (user error) and 3) (intended
>> difference)?
>>
>> This is why I suggested emitting as comments in the .ll file initially
>> (which is useful for debugging purposes, although the YAML works fine
for
>> that too), while the above are hashed out.
>>
>
>>
>>>
>>>> Making it available through the llvm-lto tool is a no-brainer
to me.
>>>>
>>>> This was *not* an oversight but a deliberate choice to not do
this in
>>>> the first place. Because summaries are the first bitcode
feature I know of
>>>> that isn't attached in any way to a Module (you can't
get to it from a
>>>> Module).
>>>>
>>>
>>> Not sure I quite follow why that difference made the
choice/tradeoff
>>> here different (which admittedly is a bit easier to see in
retrospect maybe
>>> - now that there's been a need to build serialization and
deserialization).
>>> Do you mean it wasn't clear that serialization support was
needed when
>>> summaries were first implemented, but it is clear now?
>>>
>>
>> Speaking for myself, it wasn't clear to me that serialization
support was
>> needed for anything other than debugging/testing,
>>
>
> The textual IR in its entirety basically exists only for debugging/testing
> - but it's a big 'only'. (as has become apparent in this case
with
> experience, by the looks of it)
>
I think there is a big difference - for most of the IR, the information is
not redundant - you simply can't create the same compiler behavior when
reading a .ll. For the summary it is completely redundant as it is
constructed from the IR.

Teresa

> (not a criticism of you/your work - just that this should've been
caught
> in code review, I think)
>
>
>> since it is redundant with and computed from the IR, and I wasn't
sure
>> emitting into the .ll file was the right way for debugging. Which is
why
>> the testing used llvm-bcanalyzer -dump.
>>
>> Teresa
>>
>>
>>
>>>
>>> - Dave
>>>
>>
>>
>>
>> --
>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>> 408-460-2413 <(408)%20460-2413>
>>
>

-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170719/74774b9c/attachment.html>

David Blaikie via llvm-dev

2017-Jul-19 16:46 UTC

head link

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

On Wed, Jul 19, 2017 at 8:58 AM Teresa Johnson <tejohnson at google.com>
wrote:
> On Wed, Jul 19, 2017 at 8:52 AM, David Blaikie <dblaikie at
gmail.com> wrote:
>
>>
>>
>> On Wed, Jul 19, 2017 at 8:43 AM Teresa Johnson <tejohnson at
google.com>
>> wrote:
>>
>>> On Wed, Jul 19, 2017 at 8:31 AM, David Blaikie <dblaikie at
gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Jul 17, 2017 at 5:18 PM Mehdi AMINI <joker.eph at
gmail.com>
>>>> wrote:
>>>>
>>>>> 2017-07-17 16:49 GMT-07:00 David Blaikie via llvm-dev <
>>>>> llvm-dev at lists.llvm.org>:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 17, 2017 at 6:11 AM Charles Saternos via
llvm-dev <
>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>>> Hey @chandlerc and @dblaikie,
>>>>>>>
>>>>>>> Any updates on this in relation to "[PATCH]
D34080: [ThinLTO] Add
>>>>>>> dump-summary command to llvm-lto2 tool"?
>>>>>>>
>>>>>>
>>>>>> Sorry you've kind of got stuck in the middle of
this - but I'm still
>>>>>> hoping to hear/understand the pushback on implementing
this as a first
>>>>>> class .ll construct with serialization and
deserialization support.
>>>>>>
>>>>>> I think Peter mentioned he didn't think this was
the right path
>>>>>> forward in the long term? If that's the case,
I'd like to understand
>>>>>> that/reach that conclusion for the project now rather
than treating this as
>>>>>> a stop-gap with some idea that in the future someone
might implement full
>>>>>> serialization support (when it's been over a year
already, and other stop
>>>>>> gaps have been implemented (the yaml input support)
already).
>>>>>>
>>>>>
>>>>> I'm totally believing we need first class serialization
support in
>>>>> .ll, and I have a path forward for this (just not a lot of
time to dedicate
>>>>> to this).
>>>>>
>>>>
>>>> What's the rough expectation of time/complexity for this
path forward?
>>>>
>>>>
>>>>> & if a .ll construct with serialization/deserialization
is the path
>>>>>> forward, understanding the motivation for a something
other than going
>>>>>> straight for that would be helpful -usually bitcode
features come with .ll
>>>>>> support from day 1, not a year later. I'm not clear
on what would make this
>>>>>> feature an exception/more expensive to do this for
(& why it would be worth
>>>>>> deferring that work, and what/when that work will be
motivated/done)
>>>>>>
>>>>>
>>>>>
>>>>> We need a debugging tool for summaries ASAP, and the YAML
is *already*
>>>>> implemented.
>>>>>
>>>>
>>>> I'm not sure I understand why the tradeoff is worthwhile -
in terms of
>>>> needing to add a new feature (even if it's already
implemented) and tests,
>>>> then porting those tests to a first-class .ll construct later.
Usually
>>>> adding .ll formats doesn't seem to be terribly
expensive/time intensive.
>>>>
>>>
>>> The main complication I see is defining the behavior when the
serialized
>>> summary is read back in.
>>> 1) Do we trust that it is correct and consistent with the IR and
blindly
>>> use it? That could cause some issues if someone changes the IR in a
.ll
>>> file for testing and doesn't realize they need to also update
the summary
>>> correspondingly.
>>>
>>
>> Generally textual IR is assumed to be bogus and is validated for
>> invariants like this.
>>
>
> So why have support to read it in from the .ll at all?
>
Fair question - if there's only a single right answer that's a bit
different from a lot of (but not all) other .ll file constructs.

But, for example - the .ll syntax specifies the types of expressions at
their use and verifies this. There's no ambiguity there, only one right
answer, but it's still written in the IR and verified, makes the test case
easier to read, perhaps (this may be a bad example - I certainly find that
quirk of the IR a bit weird).

> 2) Do we always want to build the summary from the IR and check it against
>>> the summary read from the .ll file? In that case, what is even the
use of
>>> building a summary from the serialized form?
>>>
>>
>> Isn't that why the YAML support was added in the first place - for
>> reading in summaries for test cases?
>>
>
> Peter added it for testing early ThinLTO CFI support since it predated the
> summary building support for these features.
>
Ah, that's interesting to understand - thanks for explaining!

So does that make the current YAML reading support unneeded? (hopefully
dead) or are old tests still written in it that need to be ported to some
newer technique?

> 3) If we want to allow tweaks to the summary in the .ll that override what
>>> is in the IR, for testing purposes, how does any checking we do in
>>>
>>
>> I'm not sure why we would - what would be tested that way? If the
>> invariants of the summary are violated presumably the behavior of LLVM
is
>> undefined & so there's nothing to test/no
expected/defined/usable behavior
>> there.
>>
>
> So again, why bother having support to serialize it back in at all?
>
>
>>
>>
>>> 2) distinguish between the case in 1) (user error) and 3) (intended
>>> difference)?
>>>
>>> This is why I suggested emitting as comments in the .ll file
initially
>>> (which is useful for debugging purposes, although the YAML works
fine for
>>> that too), while the above are hashed out.
>>>
>>
>>>
>>>>
>>>>> Making it available through the llvm-lto tool is a
no-brainer to me.
>>>>>
>>>>> This was *not* an oversight but a deliberate choice to not
do this in
>>>>> the first place. Because summaries are the first bitcode
feature I know of
>>>>> that isn't attached in any way to a Module (you
can't get to it from a
>>>>> Module).
>>>>>
>>>>
>>>> Not sure I quite follow why that difference made the
choice/tradeoff
>>>> here different (which admittedly is a bit easier to see in
retrospect maybe
>>>> - now that there's been a need to build serialization and
deserialization).
>>>> Do you mean it wasn't clear that serialization support was
needed when
>>>> summaries were first implemented, but it is clear now?
>>>>
>>>
>>> Speaking for myself, it wasn't clear to me that serialization
support
>>> was needed for anything other than debugging/testing,
>>>
>>
>> The textual IR in its entirety basically exists only for
>> debugging/testing - but it's a big 'only'. (as has become
apparent in this
>> case with experience, by the looks of it)
>>
>
> I think there is a big difference - for most of the IR, the information is
> not redundant - you simply can't create the same compiler behavior when
> reading a .ll.
>
I think there are a fair few constructs in the .ll file format that are
redundant and exist for ease of use/readability/self-documenting tests &
this seems like it'd fall under a similar category.

> For the summary it is completely redundant as it is constructed from the
> IR.
>
> Teresa
>
>
>> (not a criticism of you/your work - just that this should've been
caught
>> in code review, I think)
>>
>>
>>> since it is redundant with and computed from the IR, and I
wasn't sure
>>> emitting into the .ll file was the right way for debugging. Which
is why
>>> the testing used llvm-bcanalyzer -dump.
>>>
>>> Teresa
>>>
>>>
>>>
>>>>
>>>> - Dave
>>>>
>>>
>>>
>>>
>>> --
>>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>>> 408-460-2413 <(408)%20460-2413>
>>>
>>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
> 408-460-2413 <(408)%20460-2413>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170719/8c6d5be6/attachment.html>

llvm dev - Jul 2017 - [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format