thr3ads.net - llvm dev - [llvm-dev] (RFC) Encoding code duplication factor in discriminator [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Dehao Chen via llvm-dev

2016-Nov-01 21:14 UTC

[llvm-dev] (RFC) Encoding code duplication factor in discriminator

damn... my english is not readable at all when I try to write fast...
trying to make some clarification below, hopefully can make it more
readable...

On Tue, Nov 1, 2016 at 2:07 PM, Dehao Chen <dehao at google.com> wrote:
> Oops... pressed the wrong button and sent out early...
>
> On Tue, Nov 1, 2016 at 2:01 PM, Dehao Chen <dehao at google.com>
wrote:
>
>> If Hal's proposal is for SamplePGO purpose, let me clarify some
design
>> principles of SamplePGO.
>>
>> The profile for sample pgo uses source location as the key to map the
>> execution count back to IR. This design is based on the principle that
we
>> do not want the profile to be tightly couple with compiler IR. Instead,
>> profile is simple an attribute of the source code. We have been
benefited
>>  a lot from this design that the profile can easily be reused across
>> different source versions and compiler versions, or even compilers.
>>
>> That being said, the design to encode more info into discriminator does
>> not mean that we will change the profile. The encoded info in
discriminator
>> will be handled by the create_llvm_prof tool, which combines counts
from
>> different clones of the same source code and generate the combined
profile
>> data. The output profile will not have any cloning/dupliaction bits at
all.
>> So for the initial example profile I provided, the output profile will
be:
>>
>
> #1: 10
> #3: 80
>
> Not:
>
> #1: 10
> #3.0x400: 70
> #3.0x10400: 5
> #3.0x20400: 3
> #3.0x30400: 2
>
> The goal of the proposed change, is to make profile more accurately
> represent the attribute of the source.
> The non-goal of the proposed change, is to provide more context in the
> profile to present the behavior of program in the context of different
> context.
>
The non-goal of the proposed change, is to provide more context in the
profile to present the behavior of program in different contexts.

>
> If we are pursuing more context-sensitive profile, that would be a very
> different design. In this design, we will need to read profile multiple
> times, and have the profile tightly coupled with the compiler IR/pass
> manager. That is doable, but I don't think that probably better suits
> instrumentation based PGO's domain. Comments?
>
If we are pursuing more context-sensitive profile, that would be a very
different design in which we need to read in profiles multiple times, and
have the profile tightly coupled with the compiler IR/pass manager. That is
doable, but that probably better suits instrumentation based PGO's domain.
Comments?
>
> Thanks,
> Dehao
>
>
>>
>> On Tue, Nov 1, 2016 at 1:04 PM, Hal Finkel <hfinkel at anl.gov>
wrote:
>>
>>>
>>> ------------------------------
>>>
>>> *From: *"Paul Robinson" <paul.robinson at sony.com>
>>> *To: *"Dehao Chen" <dehao at google.com>, "Hal
Finkel" <hfinkel at anl.gov>
>>> *Cc: *"Xinliang David Li" <davidxl at google.com>,
llvm-dev at lists.llvm.org
>>> *Sent: *Tuesday, November 1, 2016 2:15:38 PM
>>> *Subject: *RE: [llvm-dev] (RFC) Encoding code duplication factor
>>> in        discriminator
>>>
>>> As illustrated in the above example, it is not like
"vectorization has a
>>> distinct bit". All different optimizations make clones of code
which will
>>> be labeled by UIDs represented by N (e.g. 8) bits. In this way, the
space
>>> will be capped by the number of clones all optimizations have made,
instead
>>> of # of optimizations that has applied. And it will be capped at
2^N-1. The
>>> cons of using uid is that you will not know if a clone is coming
from
>>> vectorization or unroll or loop distribution.
>>>
>>> Okay, but that kind of semantic mapping is important. How should we
>>> encode/recover that information? To be clear, I'm not saying
that we need
>>> to implement that up front, but there needs to be a clear path to
an
>>> implementation, because I don't want to have two disjoint
schemes.
>>>
>>>
>>>
>>> You mean that you want to know which optimization created the
clone? How
>>> would you use that info? Looks to me this will expose compiler
>>> implementation detail in debug info.
>>>
>>>
>>>
>>> This is still doable, assume we have 15 interesting optimizations
to
>>> track, we can use 4 bits to encode the optimization type that
created the
>>> clone. But this becomes nasty if the a clone is created by more
than one
>>> optimizations. In that way, discriminator may not be fit for this
purpose.
>>>
>>>
>>>
>>> My understanding was that the encoding scheme would allow the
profiling
>>> analysis to correctly map execution data back to the original
source
>>> construct, while preserving the property that each distinct basic
block
>>> would have its own discriminator value.  That is, the execution
data would
>>> be attributed back to the original source construct, not whatever
each
>>> individual optimization had done to it, and the data for the
original
>>> source construct would correctly reflect the execution (e.g.
profiling says
>>> you got 82 hits on the original loop, rather than reporting 20 hits
on the
>>> unrolled-by-4 loop plus 1 each on 2 of the trailing copies).
>>>
>>>
>>>
>>> It sounds like Hal is thinking that the per-discriminator execution
info
>>> would be preserved down to the point where an individual
optimization could
>>> look at the profile for each piece, and make decisions on that
basis.
>>>
>>>
>>>
>>> I'm not clear how that would be possible, as the optimization
would have
>>> to first do the transform (or predict how it would do the
transform) in
>>> order to see which individual-discriminator counts mapped to which
actual
>>> blocks, and then make some kind of decision about whether to do the
>>> transform differently based on that information.  Then, if the
optimization
>>> did choose to do the transform differently, then that leaves the IR
in a
>>> state where the individual discriminators *cannot* map back to it. 
(Say
>>> you unroll by 2 instead of 4; then you have only 1 trailing copy,
not 3,
>>> and a discriminator that maps to the second trailing copy now maps
to
>>> nothing.  The individual-discriminator data becomes useless.)
>>>
>>>
>>>
>>> Am I expressing this well enough to show that what Hal is looking
for is
>>> not feasible?
>>>
>>> Yes, it will need to predict how the transformation would affect
the
>>> blocks produced. That does not seem problematic (at least at a
coarse
>>> level). Yes, if transformations made earlier in the pipeline make
different
>>> decisions, then that will invalidate later fine-grained data (at
least
>>> potentially). I don't see how any of this makes this
infeasible. We just
>>> need a way for the profiling counts, per descriminator, to remain
>>> available, and for the transformations themselves to know which
>>> discriminators (loop ids, or whatever) to consider.
>>>
>>>  -Hal
>>>
>>> --paulr
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Hal Finkel
>>> Lead, Compiler Technology and Programming Languages
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161101/896183c4/attachment.html>

Hal Finkel via llvm-dev

2016-Nov-01 21:26 UTC

head link

[llvm-dev] (RFC) Encoding code duplication factor in discriminator

----- Original Message -----
> From: "Dehao Chen" <dehao at google.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Paul Robinson" <paul.robinson at sony.com>,
"Xinliang David Li"
> <davidxl at google.com>, "llvm-dev" <llvm-dev at
lists.llvm.org>
> Sent: Tuesday, November 1, 2016 4:14:43 PM
> Subject: Re: [llvm-dev] (RFC) Encoding code duplication factor in
> discriminator
> damn... my english is not readable at all when I try to write fast...
> trying to make some clarification below, hopefully can make it more
> readable...
> On Tue, Nov 1, 2016 at 2:07 PM, Dehao Chen < dehao at google.com >
> wrote:
> > Oops... pressed the wrong button and sent out early...
> 
> > On Tue, Nov 1, 2016 at 2:01 PM, Dehao Chen < dehao at google.com
>
> > wrote:
> 
> > > If Hal's proposal is for SamplePGO purpose, let me clarify
some
> > > design principles of SamplePGO.
> > 
> 
> > > The profile for sample pgo uses source location as the key to map
> > > the
> > > execution count back to IR. This design is based on the principle
> > > that we do not want the profile to be tightly couple with
> > > compiler
> > > IR. Instead, profile is simple an attribute of the source code.
> > > We
> > > have been benefited a lot from this design that the profile can
> > > easily be reused across different source versions and compiler
> > > versions, or even compilers.
> > 
> 
> > > That being said, the design to encode more info into
> > > discriminator
> > > does not mean that we will change the profile. The encoded info
> > > in
> > > discriminator will be handled by the create_llvm_prof tool, which
> > > combines counts from different clones of the same source code and
> > > generate the combined profile data. The output profile will not
> > > have
> > > any cloning/dupliaction bits at all. So for the initial example
> > > profile I provided, the output profile will be:
> > 
> 
> > #1: 10
> 
> > #3: 80
> 
> > Not:
> 
> > #1: 10
> 
> > #3.0x400: 70
> 
> > #3.0x10400: 5
> 
> > #3.0x20400: 3
> 
> > #3.0x30400: 2
> 
> > The goal of the proposed change, is to make profile more accurately
> > represent the attribute of the source.
> 
> > The non-goal of the proposed change, is to provide more context in
> > the profile to present the behavior of program in the context of
> > different context.
> 
> The non-goal of the proposed change, is to provide more context in
> the profile to present the behavior of program in different
> contexts.'Okay, but why? The information will be present in the profiles, why not encode
it in the metadata and let the optimizer decide when to sum it up? We should
even provide some utility functions that make this transparent for passes that
don't care about the distinction.
> > If we are pursuing more context-sensitive profile, that would be a
> > very different design. In this design, we will need to read profile
> > multiple times, and have the profile tightly coupled with the
> > compiler IR/pass manager. That is doable, but I don't think that
> > probably better suits instrumentation based PGO's domain.
Comments?
> 
> If we are pursuing more context-sensitive profile, that would be a
> very different design in which we need to read in profiles multiple
> times, and have the profile tightly coupled with the compiler
> IR/pass manager. That is doable, but that probably better suits
> instrumentation based PGO's domain. Comments?I don't understand why you say we'd need to read the profile multiple
times. Can you please explain? I also don't think it needs to be that
tightly coupled; we just need each pass that generates multiple copies of things
to have some way to generate a unique id for what it's doing. It's all
per source location, so I don't even think we need any kind of complicated
hashing scheme.

Thanks again, 
Hal 
> > Thanks,
> 
> > Dehao
> 
> > > On Tue, Nov 1, 2016 at 1:04 PM, Hal Finkel < hfinkel at
anl.gov >
> > > wrote:
> > 
> 
> > > > > From: "Paul Robinson" < paul.robinson at
sony.com >
> > > > 
> > > 
> > 
> 
> > > > > To: "Dehao Chen" < dehao at google.com
>, "Hal Finkel" <
> > > > > hfinkel at anl.gov
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > Cc: "Xinliang David Li" < davidxl at
google.com >,
> > > > > llvm-dev at lists.llvm.org
> > > > 
> > > 
> > 
> 
> > > > > Sent: Tuesday, November 1, 2016 2:15:38 PM
> > > > 
> > > 
> > 
> 
> > > > > Subject: RE: [llvm-dev] (RFC) Encoding code duplication
> > > > > factor
> > > > > in
> > > > > discriminator
> > > > 
> > > 
> > 
> 
> > > > > As illustrated in the above example, it is not like
> > > > > "vectorization
> > > > > has a distinct bit". All different optimizations
make clones
> > > > > of
> > > > > code
> > > > > which will be labeled by UIDs represented by N (e.g. 8)
bits.
> > > > > In
> > > > > this way, the space will be capped by the number of
clones
> > > > > all
> > > > > optimizations have made, instead of # of optimizations
that
> > > > > has
> > > > > applied. And it will be capped at 2^N-1. The cons of
using
> > > > > uid
> > > > > is
> > > > > that you will not know if a clone is coming from
> > > > > vectorization
> > > > > or
> > > > > unroll or loop distribution.
> > > > 
> > > 
> > 
> 
> > > > > Okay, but that kind of semantic mapping is important.
How
> > > > > should
> > > > > we
> > > > > encode/recover that information? To be clear, I'm
not saying
> > > > > that
> > > > > we
> > > > > need to implement that up front, but there needs to be
a
> > > > > clear
> > > > > path
> > > > > to an implementation, because I don't want to have
two
> > > > > disjoint
> > > > > schemes.
> > > > 
> > > 
> > 
> 
> > > > > You mean that you want to know which optimization
created the
> > > > > clone?
> > > > > How would you use that info? Looks to me this will
expose
> > > > > compiler
> > > > > implementation detail in debug info.
> > > > 
> > > 
> > 
> 
> > > > > This is still doable, assume we have 15 interesting
> > > > > optimizations
> > > > > to
> > > > > track, we can use 4 bits to encode the optimization
type that
> > > > > created the clone. But this becomes nasty if the a
clone is
> > > > > created
> > > > > by more than one optimizations. In that way,
discriminator
> > > > > may
> > > > > not
> > > > > be fit for this purpose.
> > > > 
> > > 
> > 
> 
> > > > > My understanding was that the encoding scheme would
allow the
> > > > > profiling analysis to correctly map execution data back
to
> > > > > the
> > > > > original source construct, while preserving the
property that
> > > > > each
> > > > > distinct basic block would have its own discriminator
value.
> > > > > That
> > > > > is, the execution data would be attributed back to the
> > > > > original
> > > > > source construct, not whatever each individual
optimization
> > > > > had
> > > > > done
> > > > > to it, and the data for the original source construct
would
> > > > > correctly reflect the execution (e.g. profiling says
you got
> > > > > 82
> > > > > hits
> > > > > on the original loop, rather than reporting 20 hits on
the
> > > > > unrolled-by-4 loop plus 1 each on 2 of the trailing
copies).
> > > > 
> > > 
> > 
> 
> > > > > It sounds like Hal is thinking that the
per-discriminator
> > > > > execution
> > > > > info would be preserved down to the point where an
individual
> > > > > optimization could look at the profile for each piece,
and
> > > > > make
> > > > > decisions on that basis.
> > > > 
> > > 
> > 
> 
> > > > > I'm not clear how that would be possible, as the
optimization
> > > > > would
> > > > > have to first do the transform (or predict how it would
do
> > > > > the
> > > > > transform) in order to see which
individual-discriminator
> > > > > counts
> > > > > mapped to which actual blocks, and then make some kind
of
> > > > > decision
> > > > > about whether to do the transform differently based on
that
> > > > > information. Then, if the optimization did choose to do
the
> > > > > transform differently, then that leaves the IR in a
state
> > > > > where
> > > > > the
> > > > > individual discriminators *cannot* map back to it. (Say
you
> > > > > unroll
> > > > > by 2 instead of 4; then you have only 1 trailing copy,
not 3,
> > > > > and
> > > > > a
> > > > > discriminator that maps to the second trailing copy now
maps
> > > > > to
> > > > > nothing. The individual-discriminator data becomes
useless.)
> > > > 
> > > 
> > 
> 
> > > > > Am I expressing this well enough to show that what Hal
is
> > > > > looking
> > > > > for
> > > > > is not feasible?
> > > > 
> > > 
> > 
> 
> > > > Yes, it will need to predict how the transformation would
> > > > affect
> > > > the
> > > > blocks produced. That does not seem problematic (at least at
a
> > > > coarse level). Yes, if transformations made earlier in the
> > > > pipeline
> > > > make different decisions, then that will invalidate later
> > > > fine-grained data (at least potentially). I don't see
how any
> > > > of
> > > > this makes this infeasible. We just need a way for the
> > > > profiling
> > > > counts, per descriminator, to remain available, and for the
> > > > transformations themselves to know which discriminators
(loop
> > > > ids,
> > > > or whatever) to consider.
> > > 
> > 
> 
> > > > -Hal
> > > 
> > 
> 
> > > > > --paulr
> > > > 
> > > 
> > 
> 
> > > > --
> > > 
> > 
> 
> > > > Hal Finkel
> > > 
> > 
> 
> > > > Lead, Compiler Technology and Programming Languages
> > > 
> > 
> 
> > > > Leadership Computing Facility
> > > 
> > 
> 
> > > > Argonne National Laboratory
> > > 
> > 
> 
-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161101/079ed5b1/attachment.html>

Hal Finkel via llvm-dev

2016-Nov-01 21:36 UTC

head link

[llvm-dev] (RFC) Encoding code duplication factor in discriminator

----- Original Message -----
> From: "Hal Finkel via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "Dehao Chen" <dehao at google.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Xinliang
David Li"
> <davidxl at google.com>
> Sent: Tuesday, November 1, 2016 4:26:17 PM
> Subject: Re: [llvm-dev] (RFC) Encoding code duplication factor in
> discriminator
> ----- Original Message -----
> > From: "Dehao Chen" <dehao at google.com>
> 
> > To: "Hal Finkel" <hfinkel at anl.gov>
> 
> > Cc: "Paul Robinson" <paul.robinson at sony.com>,
"Xinliang David Li"
> > <davidxl at google.com>, "llvm-dev" <llvm-dev at
lists.llvm.org>
> 
> > Sent: Tuesday, November 1, 2016 4:14:43 PM
> 
> > Subject: Re: [llvm-dev] (RFC) Encoding code duplication factor in
> > discriminator
> 
> > damn... my english is not readable at all when I try to write
> > fast...
> > trying to make some clarification below, hopefully can make it more
> > readable...
> 
> > On Tue, Nov 1, 2016 at 2:07 PM, Dehao Chen < dehao at google.com
>
> > wrote:
> 
> > > Oops... pressed the wrong button and sent out early...
> > 
> 
> > > On Tue, Nov 1, 2016 at 2:01 PM, Dehao Chen < dehao at
google.com >
> > > wrote:
> > 
> 
> > > > If Hal's proposal is for SamplePGO purpose, let me
clarify some
> > > > design principles of SamplePGO.
> > > 
> > 
> 
> > > > The profile for sample pgo uses source location as the key
to
> > > > map
> > > > the
> > > > execution count back to IR. This design is based on the
> > > > principle
> > > > that we do not want the profile to be tightly couple with
> > > > compiler
> > > > IR. Instead, profile is simple an attribute of the source
code.
> > > > We
> > > > have been benefited a lot from this design that the profile
can
> > > > easily be reused across different source versions and
compiler
> > > > versions, or even compilers.
> > > 
> > 
> 
> > > > That being said, the design to encode more info into
> > > > discriminator
> > > > does not mean that we will change the profile. The encoded
info
> > > > in
> > > > discriminator will be handled by the create_llvm_prof tool,
> > > > which
> > > > combines counts from different clones of the same source
code
> > > > and
> > > > generate the combined profile data. The output profile will
not
> > > > have
> > > > any cloning/dupliaction bits at all. So for the initial
example
> > > > profile I provided, the output profile will be:
> > > 
> > 
> 
> > > #1: 10
> > 
> 
> > > #3: 80
> > 
> 
> > > Not:
> > 
> 
> > > #1: 10
> > 
> 
> > > #3.0x400: 70
> > 
> 
> > > #3.0x10400: 5
> > 
> 
> > > #3.0x20400: 3
> > 
> 
> > > #3.0x30400: 2
> > 
> 
Also, how does this work for vectorization? For vectorization, you're going
to multiply by the duplication factor first? The issue obviously is that each
vector instruction counts for VF times as many scalar instructions, and so to
get equivalent scalar counts, you need to multiply by VF. I had assumed this is
what you were saying when I read the initial e-mail, but if you're also
using the same duplication scheme for unrolling, then I think we need some way
to differentiate.

-Hal 
> > > The goal of the proposed change, is to make profile more
> > > accurately
> > > represent the attribute of the source.
> > 
> 
> > > The non-goal of the proposed change, is to provide more context
> > > in
> > > the profile to present the behavior of program in the context of
> > > different context.
> > 
> 
> > The non-goal of the proposed change, is to provide more context in
> > the profile to present the behavior of program in different
> > contexts.'
> 
> Okay, but why? The information will be present in the profiles, why
> not encode it in the metadata and let the optimizer decide when to
> sum it up? We should even provide some utility functions that make
> this transparent for passes that don't care about the distinction.
> > > If we are pursuing more context-sensitive profile, that would be
> > > a
> > > very different design. In this design, we will need to read
> > > profile
> > > multiple times, and have the profile tightly coupled with the
> > > compiler IR/pass manager. That is doable, but I don't think
that
> > > probably better suits instrumentation based PGO's domain.
> > > Comments?
> > 
> 
> > If we are pursuing more context-sensitive profile, that would be a
> > very different design in which we need to read in profiles multiple
> > times, and have the profile tightly coupled with the compiler
> > IR/pass manager. That is doable, but that probably better suits
> > instrumentation based PGO's domain. Comments?
> 
> I don't understand why you say we'd need to read the profile
multiple
> times. Can you please explain? I also don't think it needs to be
> that tightly coupled; we just need each pass that generates multiple
> copies of things to have some way to generate a unique id for what
> it's doing. It's all per source location, so I don't even think
we
> need any kind of complicated hashing scheme.
> Thanks again,
> Hal
> > > Thanks,
> > 
> 
> > > Dehao
> > 
> 
> > > > On Tue, Nov 1, 2016 at 1:04 PM, Hal Finkel < hfinkel at
anl.gov >
> > > > wrote:
> > > 
> > 
> 
> > > > > > From: "Paul Robinson" < paul.robinson
at sony.com >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > To: "Dehao Chen" < dehao at
google.com >, "Hal Finkel" <
> > > > > > hfinkel at anl.gov
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Cc: "Xinliang David Li" < davidxl at
google.com >,
> > > > > > llvm-dev at lists.llvm.org
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Sent: Tuesday, November 1, 2016 2:15:38 PM
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Subject: RE: [llvm-dev] (RFC) Encoding code
duplication
> > > > > > factor
> > > > > > in
> > > > > > discriminator
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > As illustrated in the above example, it is not
like
> > > > > > "vectorization
> > > > > > has a distinct bit". All different
optimizations make
> > > > > > clones
> > > > > > of
> > > > > > code
> > > > > > which will be labeled by UIDs represented by N
(e.g. 8)
> > > > > > bits.
> > > > > > In
> > > > > > this way, the space will be capped by the number
of clones
> > > > > > all
> > > > > > optimizations have made, instead of # of
optimizations that
> > > > > > has
> > > > > > applied. And it will be capped at 2^N-1. The cons
of using
> > > > > > uid
> > > > > > is
> > > > > > that you will not know if a clone is coming from
> > > > > > vectorization
> > > > > > or
> > > > > > unroll or loop distribution.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Okay, but that kind of semantic mapping is
important. How
> > > > > > should
> > > > > > we
> > > > > > encode/recover that information? To be clear,
I'm not
> > > > > > saying
> > > > > > that
> > > > > > we
> > > > > > need to implement that up front, but there needs
to be a
> > > > > > clear
> > > > > > path
> > > > > > to an implementation, because I don't want to
have two
> > > > > > disjoint
> > > > > > schemes.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > You mean that you want to know which optimization
created
> > > > > > the
> > > > > > clone?
> > > > > > How would you use that info? Looks to me this will
expose
> > > > > > compiler
> > > > > > implementation detail in debug info.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > This is still doable, assume we have 15
interesting
> > > > > > optimizations
> > > > > > to
> > > > > > track, we can use 4 bits to encode the
optimization type
> > > > > > that
> > > > > > created the clone. But this becomes nasty if the a
clone is
> > > > > > created
> > > > > > by more than one optimizations. In that way,
discriminator
> > > > > > may
> > > > > > not
> > > > > > be fit for this purpose.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > My understanding was that the encoding scheme
would allow
> > > > > > the
> > > > > > profiling analysis to correctly map execution data
back to
> > > > > > the
> > > > > > original source construct, while preserving the
property
> > > > > > that
> > > > > > each
> > > > > > distinct basic block would have its own
discriminator
> > > > > > value.
> > > > > > That
> > > > > > is, the execution data would be attributed back to
the
> > > > > > original
> > > > > > source construct, not whatever each individual
optimization
> > > > > > had
> > > > > > done
> > > > > > to it, and the data for the original source
construct would
> > > > > > correctly reflect the execution (e.g. profiling
says you
> > > > > > got
> > > > > > 82
> > > > > > hits
> > > > > > on the original loop, rather than reporting 20
hits on the
> > > > > > unrolled-by-4 loop plus 1 each on 2 of the
trailing
> > > > > > copies).
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > It sounds like Hal is thinking that the
per-discriminator
> > > > > > execution
> > > > > > info would be preserved down to the point where an
> > > > > > individual
> > > > > > optimization could look at the profile for each
piece, and
> > > > > > make
> > > > > > decisions on that basis.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > I'm not clear how that would be possible, as
the
> > > > > > optimization
> > > > > > would
> > > > > > have to first do the transform (or predict how it
would do
> > > > > > the
> > > > > > transform) in order to see which
individual-discriminator
> > > > > > counts
> > > > > > mapped to which actual blocks, and then make some
kind of
> > > > > > decision
> > > > > > about whether to do the transform differently
based on that
> > > > > > information. Then, if the optimization did choose
to do the
> > > > > > transform differently, then that leaves the IR in
a state
> > > > > > where
> > > > > > the
> > > > > > individual discriminators *cannot* map back to it.
(Say you
> > > > > > unroll
> > > > > > by 2 instead of 4; then you have only 1 trailing
copy, not
> > > > > > 3,
> > > > > > and
> > > > > > a
> > > > > > discriminator that maps to the second trailing
copy now
> > > > > > maps
> > > > > > to
> > > > > > nothing. The individual-discriminator data becomes
> > > > > > useless.)
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Am I expressing this well enough to show that what
Hal is
> > > > > > looking
> > > > > > for
> > > > > > is not feasible?
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > Yes, it will need to predict how the transformation
would
> > > > > affect
> > > > > the
> > > > > blocks produced. That does not seem problematic (at
least at
> > > > > a
> > > > > coarse level). Yes, if transformations made earlier in
the
> > > > > pipeline
> > > > > make different decisions, then that will invalidate
later
> > > > > fine-grained data (at least potentially). I don't
see how any
> > > > > of
> > > > > this makes this infeasible. We just need a way for the
> > > > > profiling
> > > > > counts, per descriminator, to remain available, and for
the
> > > > > transformations themselves to know which discriminators
(loop
> > > > > ids,
> > > > > or whatever) to consider.
> > > > 
> > > 
> > 
> 
> > > > > -Hal
> > > > 
> > > 
> > 
> 
> > > > > > --paulr
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > --
> > > > 
> > > 
> > 
> 
> > > > > Hal Finkel
> > > > 
> > > 
> > 
> 
> > > > > Lead, Compiler Technology and Programming Languages
> > > > 
> > > 
> > 
> 
> > > > > Leadership Computing Facility
> > > > 
> > > 
> > 
> 
> > > > > Argonne National Laboratory
> > > > 
> > > 
> > 
> 
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161101/c5948553/attachment-0001.html>

llvm dev - Nov 2016 - (RFC) Encoding code duplication factor in discriminator

[llvm-dev] (RFC) Encoding code duplication factor in discriminator

[llvm-dev] (RFC) Encoding code duplication factor in discriminator

[llvm-dev] (RFC) Encoding code duplication factor in discriminator