thr3ads.net - llvm dev - [llvm-dev] New and more general Function Merging optimization for code size [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Rodrigo Caetano Rocha via llvm-dev

2018-Aug-02 15:58 UTC

[llvm-dev] New and more general Function Merging optimization for code size

Hi Hal,

Because my function merging strategy is able to merge any two function,
allowing for different CFGs, different parameters, etc.
I am unable to use just a simple hash value to compare whether or not two
functions are similar.

Therefore, the idea is to have an infrastructure which allows me to compare
whether or not two functions are similar without having traverse the two
function (basically performing a merge for all pairs).
I'm precomputing a fingerprint of all functions, which is then cached for
later use (this might also be useful to enable this function merging with
ThinLTO).
At the moment, this fingerprint is just a map of opcode -> number of
occurrences in the function, which is just a ~64-int-array.

Then, for each functions being considered for a merge, I'm able to rank the
candidates with a PriorityQueue.

Hopefully, we are able to do that in a very lightweight manner.

After that, the more expensive bit will be actually performing the merge
and then checking for profitability, using the TTI for code-size.

I haven't given much thought about adapting this infrastructure for the SLP
Vectorizer, but perhaps something similar could also work there.

Cheers,

Rodrigo Rocha


On Thu, 2 Aug 2018 at 16:43 Hal Finkel <hfinkel at anl.gov> wrote:
>
> On 08/02/2018 10:25 AM, Rodrigo Caetano Rocha via llvm-dev wrote:
>
> Hi everyone,
>
> I'm currently working on a new function merging optimization that is
more
> general than the current LLVM function merging optimization that works only
> on identical functions.
>
> I would like to know if the community has any interest in having a more
> powerful function merging optimization.
>
>
> Yes, I think there is certainly interest in this space.
>
>
>
> ---- More Details ----
>
> Up until now, I have been focusing on the quality of the code reduction.
>
> Some preliminary result on SPEC'06 in a full LTO fashion:
> The baseline has no function merge but the optimization pipeline is the
> same, and I am comparing my function merge with LLVM's identical
function
> merge, where everything else in the optimization pipeline is the same as
> the baseline.
> Average reduction in the final exectuable file over the baseline: 5.55%
> compared to 0.49% of the identical merge.
> Average reduction in the total number of instructions over the baseline:
> 7.04% compared to 0.47% of the identical merge.
>
> The highest reduction on the executable file is of about 20% (both 429.mcf
> and 447.dealII) and the highest reduction on the total number of
> instructions is of about 37% (447.dealII).
>
> It has an average slowdown of about 1%, but having no statistical
> difference from the baseline in most of the benchmarks in the SPEC'06
suite.
>
>
> Because this new function merging technique is able to merge any pair of
> functions, except for a few restrictions, the exploration strategy is
> critical for having an acceptable compilation time.
>
> At the moment I'm starting to focus more on simplifying the
optimization
> and reducing the overhead in compilation time.
> My optimization has an exploration threshold which can be tuned to
> trade-off compilation time overhead for a more aggressive merge.
> It does *not* perform n^2 merge operations. Instead, I have a ranking
> strategy based on a similarity metric computed from the function's
> "fingerprint".
>
>
> Can you explain in more detail how this works?
>
> (I'm also, as a side note, keeping my eye out for things like this that
> might also help us efficiently do more-compile-time-efficient SLP
> vetorization).
>
>  -Hal
>
> The threshold limits the exploration to focus on the top functions of the
> rank.
> The idea is to make the ranking mechanism as lightweight as possible.
>
> Cheers,
>
> Rodrigo Rocha
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180802/1b55e0d5/attachment.html>

JF Bastien via llvm-dev

2018-Aug-02 17:30 UTC

head link

[llvm-dev] New and more general Function Merging optimization for code size

> On Aug 2, 2018, at 8:58 AM, Rodrigo Caetano Rocha via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
> 
> Hi Hal,
> 
> Because my function merging strategy is able to merge any two function,
allowing for different CFGs, different parameters, etc.
> I am unable to use just a simple hash value to compare whether or not two
functions are similar.
Can you give us more detail on what criteria you use for “fuzzy” merging? Jason
Koenig had uploaded a prototype of something similar in 2015, based on
mergefuncs.

> Therefore, the idea is to have an infrastructure which allows me to compare
whether or not two functions are similar without having traverse the two
function (basically performing a merge for all pairs).
> I'm precomputing a fingerprint of all functions, which is then cached
for later use (this might also be useful to enable this function merging with
ThinLTO).
> At the moment, this fingerprint is just a map of opcode -> number of
occurrences in the function, which is just a ~64-int-array.
The difficulty with mergefuncs is keeping its comparator / hasher in sync with
IR. All it takes is one new IR property to break it, and I think the only want
to fix this issue is to have comparison / hashing be part of the IR definition.
How would you solve this issue?

> Then, for each functions being considered for a merge, I'm able to rank
the candidates with a PriorityQueue.
> 
> Hopefully, we are able to do that in a very lightweight manner.
> 
> After that, the more expensive bit will be actually performing the merge
and then checking for profitability, using the TTI for code-size.
> 
> I haven't given much thought about adapting this infrastructure for the
SLP Vectorizer, but perhaps something similar could also work there.
> 
> Cheers,
> 
> Rodrigo Rocha
> 
> 
> On Thu, 2 Aug 2018 at 16:43 Hal Finkel <hfinkel at anl.gov
<mailto:hfinkel at anl.gov>> wrote:
> 
> On 08/02/2018 10:25 AM, Rodrigo Caetano Rocha via llvm-dev wrote:
>> Hi everyone,
>> 
>> I'm currently working on a new function merging optimization that
is more general than the current LLVM function merging optimization that works
only on identical functions.
>> 
>> I would like to know if the community has any interest in having a more
powerful function merging optimization.
> 
> Yes, I think there is certainly interest in this space.
Yes.

I'd also be interested in hearing about how this combines with
MachineOutliner. I expect they find some redundant things, but mostly help each
other.

>> ---- More Details ----
>> 
>> Up until now, I have been focusing on the quality of the code
reduction.
>> 
>> Some preliminary result on SPEC'06 in a full LTO fashion:
>> The baseline has no function merge but the optimization pipeline is the
same, and I am comparing my function merge with LLVM's identical function
merge, where everything else in the optimization pipeline is the same as the
baseline.
>> Average reduction in the final exectuable file over the baseline: 5.55%
compared to 0.49% of the identical merge.
>> Average reduction in the total number of instructions over the
baseline: 7.04% compared to 0.47% of the identical merge.
IIRC this roughly matches Jason’s results on Chrome / Firefox: a few percentage
point reduction. Can you try large applications like Chrome / Firefox / WebKit
to get more real-world numbers? It’s interesting to compare what you get from
regular builds as well as LTO builds (which will take forever, but expose much
more duplication).

>> The highest reduction on the executable file is of about 20% (both
429.mcf and 447.dealII) and the highest reduction on the total number of
instructions is of about 37% (447.dealII).
>> 
>> It has an average slowdown of about 1%, but having no statistical
difference from the baseline in most of the benchmarks in the SPEC'06 suite.
Jason’s (uncommitted) work found *speedups* when compiling all of Chrome. The
way he did this was with an early and fast mergefuncs which didn’t try to be
fuzzy: it just tried to remove code duplication, which means the optimizer spent
less time because there were fewer functions. He then had a later mergefuncs
which did fuzzy matching and tried to pick up more things. Keeping it somewhat
later is important because merging similar functions might make it less
attractive to inlining (because the merged functions are now slightly more
complex).

>> Because this new function merging technique is able to merge any pair
of functions, except for a few restrictions, the exploration strategy is
critical for having an acceptable compilation time.
>> 
>> At the moment I'm starting to focus more on simplifying the
optimization and reducing the overhead in compilation time.
>> My optimization has an exploration threshold which can be tuned to
trade-off compilation time overhead for a more aggressive merge.
>> It does *not* perform n^2 merge operations. Instead, I have a ranking
strategy based on a similarity metric computed from the function's
"fingerprint".
> 
> Can you explain in more detail how this works?
> 
> (I'm also, as a side note, keeping my eye out for things like this that
might also help us efficiently do more-compile-time-efficient SLP vetorization).
> 
>  -Hal
> 
> 
>> The threshold limits the exploration to focus on the top functions of
the rank.
>> The idea is to make the ranking mechanism as lightweight as possible.
>> 
>> Cheers,
>> 
>> Rodrigo Rocha
>> 
>> 
> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180802/023b6872/attachment.html>

Matthias Braun via llvm-dev

2018-Aug-02 20:34 UTC

head link

[llvm-dev] New and more general Function Merging optimization for code size

How does it compare to the new machine outliner pass in llvm?

https://www.youtube.com/watch?v=yorld-WSOeU
<https://www.youtube.com/watch?v=yorld-WSOeU>
http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html
<http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html>

- Matthias
> On Aug 2, 2018, at 10:30 AM, JF Bastien via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> 
>> On Aug 2, 2018, at 8:58 AM, Rodrigo Caetano Rocha via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>> 
>> Hi Hal,
>> 
>> Because my function merging strategy is able to merge any two function,
allowing for different CFGs, different parameters, etc.
>> I am unable to use just a simple hash value to compare whether or not
two functions are similar.
> 
> Can you give us more detail on what criteria you use for “fuzzy” merging?
Jason Koenig had uploaded a prototype of something similar in 2015, based on
mergefuncs.
> 
> 
>> Therefore, the idea is to have an infrastructure which allows me to
compare whether or not two functions are similar without having traverse the two
function (basically performing a merge for all pairs).
>> I'm precomputing a fingerprint of all functions, which is then
cached for later use (this might also be useful to enable this function merging
with ThinLTO).
>> At the moment, this fingerprint is just a map of opcode -> number of
occurrences in the function, which is just a ~64-int-array.
> 
> The difficulty with mergefuncs is keeping its comparator / hasher in sync
with IR. All it takes is one new IR property to break it, and I think the only
want to fix this issue is to have comparison / hashing be part of the IR
definition. How would you solve this issue?
> 
> 
>> Then, for each functions being considered for a merge, I'm able to
rank the candidates with a PriorityQueue.
>> 
>> Hopefully, we are able to do that in a very lightweight manner.
>> 
>> After that, the more expensive bit will be actually performing the
merge and then checking for profitability, using the TTI for code-size.
>> 
>> I haven't given much thought about adapting this infrastructure for
the SLP Vectorizer, but perhaps something similar could also work there.
>> 
>> Cheers,
>> 
>> Rodrigo Rocha
>> 
>> 
>> On Thu, 2 Aug 2018 at 16:43 Hal Finkel <hfinkel at anl.gov
<mailto:hfinkel at anl.gov>> wrote:
>> 
>> On 08/02/2018 10:25 AM, Rodrigo Caetano Rocha via llvm-dev wrote:
>>> Hi everyone,
>>> 
>>> I'm currently working on a new function merging optimization
that is more general than the current LLVM function merging optimization that
works only on identical functions.
>>> 
>>> I would like to know if the community has any interest in having a
more powerful function merging optimization.
>> 
>> Yes, I think there is certainly interest in this space.
> 
> Yes.
> 
> I'd also be interested in hearing about how this combines with
MachineOutliner. I expect they find some redundant things, but mostly help each
other.
> 
> 
>>> ---- More Details ----
>>> 
>>> Up until now, I have been focusing on the quality of the code
reduction.
>>> 
>>> Some preliminary result on SPEC'06 in a full LTO fashion:
>>> The baseline has no function merge but the optimization pipeline is
the same, and I am comparing my function merge with LLVM's identical
function merge, where everything else in the optimization pipeline is the same
as the baseline.
>>> Average reduction in the final exectuable file over the baseline:
5.55% compared to 0.49% of the identical merge.
>>> Average reduction in the total number of instructions over the
baseline: 7.04% compared to 0.47% of the identical merge.
> 
> IIRC this roughly matches Jason’s results on Chrome / Firefox: a few
percentage point reduction. Can you try large applications like Chrome / Firefox
/ WebKit to get more real-world numbers? It’s interesting to compare what you
get from regular builds as well as LTO builds (which will take forever, but
expose much more duplication).
> 
> 
>>> The highest reduction on the executable file is of about 20% (both
429.mcf and 447.dealII) and the highest reduction on the total number of
instructions is of about 37% (447.dealII).
>>> 
>>> It has an average slowdown of about 1%, but having no statistical
difference from the baseline in most of the benchmarks in the SPEC'06 suite.
> 
> Jason’s (uncommitted) work found *speedups* when compiling all of Chrome.
The way he did this was with an early and fast mergefuncs which didn’t try to be
fuzzy: it just tried to remove code duplication, which means the optimizer spent
less time because there were fewer functions. He then had a later mergefuncs
which did fuzzy matching and tried to pick up more things. Keeping it somewhat
later is important because merging similar functions might make it less
attractive to inlining (because the merged functions are now slightly more
complex).
> 
> 
>>> Because this new function merging technique is able to merge any
pair of functions, except for a few restrictions, the exploration strategy is
critical for having an acceptable compilation time.
>>> 
>>> At the moment I'm starting to focus more on simplifying the
optimization and reducing the overhead in compilation time.
>>> My optimization has an exploration threshold which can be tuned to
trade-off compilation time overhead for a more aggressive merge.
>>> It does *not* perform n^2 merge operations. Instead, I have a
ranking strategy based on a similarity metric computed from the function's
"fingerprint".
>> 
>> Can you explain in more detail how this works?
>> 
>> (I'm also, as a side note, keeping my eye out for things like this
that might also help us efficiently do more-compile-time-efficient SLP
vetorization).
>> 
>>  -Hal
>> 
>> 
>>> The threshold limits the exploration to focus on the top functions
of the rank.
>>> The idea is to make the ranking mechanism as lightweight as
possible.
>>> 
>>> Cheers,
>>> 
>>> Rodrigo Rocha
>>> 
>>> 
>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> 
>> 
>> -- 
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180802/c6bd21e1/attachment.html>

llvm dev - Aug 2018 - New and more general Function Merging optimization for code size

[llvm-dev] New and more general Function Merging optimization for code size

[llvm-dev] New and more general Function Merging optimization for code size

[llvm-dev] New and more general Function Merging optimization for code size