Daniel Berlin wrote:> On Sun, Jul 11, 2010 at 3:12 AM, Nick Lewycky<nicholas at mxc.ca>
wrote:
>> I'd like to turn on the function merging pass by default (patch
attached).
>> Function merging is already in the tree and I've been testing it
locally for
>> a while. It works great and doesn't do undue harm to my build
times.
>>
>> Functions are merged when they are identical under a comparison that
treats
>> all pointers as equivalent. So i8* and i32* are considered equivalent
types,
>> so long as you never load one which would produce an i8 as that's
different
>> from i32.
>>
>> First of all, function merging is good because it produces smaller
binaries.
>> Doing it in the IR (say, instead of the linker) also saves us the
compile
>> time of emitting duplicate functions. When we can't remove a
function
>> mergefuncs leaves a tail-call stub and updates all callers to a
canonical
>> choice, improving the instruction cache hit rate.
>>
>> The bad side is that the process is inherently O(n^2).
> Wait what?
> Why?
>
> <I took a look at merge functions>
>
> 1. It's not really N^2 unless everything hashes to the same value.
> 2. You can even do better, just do a cheesy O(N) value numbering pass
> on the function, and include the number of values generated. Then
> your compare trivially fails quickly if you do it in reverse and they
> aren't equal.
> 3. You can do even better than this, even without value numbering
> There is no need for true single pair wise comparison of every
> function against every other function in the bucket, you can compare
> them all x at a time, using a variant of the multi-way merge
> algorithm.
>
> Multi-way partition algorithm for the bucket.
>
> Start -
> For all functions in the bucket:
> put IR operation into an array that is (number of functions in
> bucket) elements.
> Sort array
>
> Everything that only occurs once in this array can be dropped
> completely, it is unmergeable.
> Split the array into multiple arrays where all elements in a given
> array are equal (O(number of functions time, since it requires a
> single array walk)).
>
> Replace all elements in each array with the next IR operation from
> each function.
> Repeat from start with each ir operation.
>
> When you are done you now have a set of arrays where all elements of a
> given array can be merged.
>
> There is even some splittable set data structure (basically opposite
> of union-find) that escapes me at the moment to make this all really
> fast.
>
> You can also, just like multi-way merge, be a little more complex and
> instead of sort array (N log N), do
> put IR operation into a heap for each function (the only property you
> need from ordering is that equal elements end up next to each other)
> extract min on heap repeatedly into a new heap, and switch to a new
> heap whenever the value is not the same as the last one, so you end up
> with a forest of heaps.
> Destroy all heaps with one element.
> Repeat just like above.
>
>
> Right now you are comparing functions without taking advantage of the
> knowledge you gained from the last functions. You are basically doing
> multi-way list partitioning by pairwise comparison, instead of
> all-ways comparison. :)
Replying to llvm-dev so that everyone can see your excellent critique. :)
Thanks. I recall why I didn't consider doing multi-way comparison, and
that's because I had rejected anything that would use up a little more
memory. Which is thoroughly stupid in retrospect.
I'd still like to turn it on as soon as I've implemented Chris'
review
comments, but I think it's clear that removing O(n^2) from it is the
next step. In the event that it hurts someone's compile time, we can
turn it right back off.
Nick