Duncan P. N. Exon Smith
2014-Jan-22 16:08 UTC
[LLVMdev] MergeFunctions: reduce complexity to O(log(N))
On 2014 Jan 22, at 07:35, Stepan Dyatkovskiy <stpworld at narod.ru> wrote:> Hi Raul and Duncan, > > Duncan, > Thank you for review. I hope to present fixed patch tomorrow. > > First, I would like to show few performance results: > > command: "time opt -mergefunc <test>" > > File: tramp3d-v4.ll, 12963 functions > Current : Patched > Time: 9.8 s : 2.0 secs ; >400%!! > Functions merged: 3503 : 3507 > > File: spirit.ll, 2885 functions > Current : Patched > Time: 2.2 s : 0.5 s > Functions merged: 1503 : 1505 > > File: k.ll, 2788 functions > Current : Patched > Time: 1.5 s : 0.7 s > Functions merged: 1626 : 1626 > > File: sqlite3.ll, 1057 functions > Current : Patched > Time: 0.4 s : 0.4 s > Functions merged: 7 : 7 > > All the tests were token from test-suite object directory. In average it shows >200% performance improvement.These results look promising! When you say 200% average, do you mean 200% average for all files in test-suite, or just these four? Are there any negative results?> You could also see that patched version detects a bit more functions for merging, since it has improved constant-equivalence detection.1. I’d prefer to see the improvement to constant-equivalence detection as a separate initial patch (to the old code). Is that possible? In particular, it would be reassuring to see your patch *not* change the number of functions merged. 2. I’m also still concerned about correctness. Previously, when the comparison result was unknown, returning “false” would conservatively consider the functions distinct, so we had no correctness issues. However, with your patch, *everything* must be accurately compared, right? Are there any cases where you can’t accurately order functions? (Is FunctionComparator::cmpEnumerate an example?) If these are problems, maybe you can use the previous ==-style comparison to confirm.> Raul, > > Yes. you right it is O(N*log(N)). And yes, I have some ideas about hashing. > Though in the end of hashing improvement I still see the tree. I'll explain. > > Once you calculated hash (even really good one), you still have probability of N>1 functions with the same hash. Of course it could be really low. But then, you have to spend some time just being creating such a complex hash (definitely, you have to scan whole function body). > > I think there should be sequence of short hash numbers. In another words we can try to represent function as a chain of hash codes: > F0: hashF0-0, hashF0-1, hashF0-2, ... > F1: hashF1-0, hashF1-1, ... > > In this case you can create kind of hash-tree. Consider we have F0, F1 and F2. > > Imagine, hashF0-0 == hashF1-0 == hashF2-0, hashF0-1 == hashF1-1: > > Then we can build the next tree (requires fixed-width fonts): > > [hashF0-0] > / \ > [hashF0-1] [hashF2-1] > / \ \ > [hashF0-2] [hashF1-2] [hashF2-2] > > In this case as you can see, whole process would be of complexity O(N*n), where: > > N - number of functions > n - function size. > > Note, hashF[i]-[j] could be generated on demand. It is not obvious to generate whole hash at once. > > I have also implemented this approach. Though it is looks more complex, and not so fast, as I hoped. Perhaps I can make it simplier. But it could not be simpler that just to replace "bool" with -1,0,1. Actually I really wanted to propose idea with hash-trees, since it combines advantages of hashing and tree-based lookup routines. But then I discovered idea I presented finally, and saw that it shows almost same improvement and looks much simpler. > > -StepanI think you get diminishing returns for each level of the tree. I’d be interested in seeing numbers on a simpler 2-stage hash. - Leave the first hash as it is now in the code. - Add a secondary hash that’s built from the quantities that the equals comparison (or your new less-than comparison) is based on. - Fall back on O(n^2) for the third stage when there are collisions.
Stepan Dyatkovskiy
2014-Jan-22 19:24 UTC
[LLVMdev] MergeFunctions: reduce complexity to O(log(N))
Hi Duncan,> These results look promising! When you say 200% average, do you mean > 200% average for all files in test-suite, or just these four? Are > there any negative results?OK, now its clear we need really full statistics. I have one, but for old revision. In general, this patch only updates result type, and it really don't do some extra calculations (except the constants, but below I'll explain, why its necessary). Few words about accuracy and correctness. When old algorithm says "these two parts differs", new one says only a bit more: which part is less. Two things outcomes from that approach: 1. Each time when we determine difference in old algorithm, we also determine difference in new one. If less comparison implemented properly, that also means that result would be 100% the same. So the correctness is provided by correctness of "less" comparison: all order-relation properties should be implemented and proven. 2. The changes are so small (only few "if" statements for each comparison method), so its almost impossible to get it working slower than old one. Even in worst case, when difference lies in the very end of functions, we spent almost the same cpu time. Just as example: - if (Ty1->getTypeID() != Ty2->getTypeID()) - return false; + if (int Res = cmpNumbers(Ty1->getTypeID(), Ty2->getTypeID())) + return Res; So, when there is nothing to merge, we got the same performance results, like for old approach. When there are some equivalent functions that could be merged, new approach gives performance improvement. Now about constants. I'll prepare separate patch. In comparison with old approach, it compares constant contents. It is replacement of bitcast possibility check. And yes, its the slowest place of new approach. Here we need to understand which constant is actually greater, and which one is less. Though it could be implemented like that: if (C1 != ConstantExpr::getBitCast(const_cast<Constant*>(C2), C1->getType())) return cmpConstantContents(C1, C2); // to to next checks The only thing to be reassured here: whether getBitCast is transitive. About hash. Thats reasonable I'll try it. Full statistics is coming soon.. -Stepan 22.01.2014, 20:09, "Duncan P. N. Exon Smith" <dexonsmith at apple.com>:> On 2014 Jan 22, at 07:35, Stepan Dyatkovskiy <stpworld at narod.ru> wrote: > >> Hi Raul and Duncan, >> >> Duncan, >> Thank you for review. I hope to present fixed patch tomorrow. >> >> First, I would like to show few performance results: >> >> command: "time opt -mergefunc <test>" >> >> File: tramp3d-v4.ll, 12963 functions >> Current : Patched >> Time: 9.8 s : 2.0 secs ; >400%!! >> Functions merged: 3503 : 3507 >> >> File: spirit.ll, 2885 functions >> Current : Patched >> Time: 2.2 s : 0.5 s >> Functions merged: 1503 : 1505 >> >> File: k.ll, 2788 functions >> Current : Patched >> Time: 1.5 s : 0.7 s >> Functions merged: 1626 : 1626 >> >> File: sqlite3.ll, 1057 functions >> Current : Patched >> Time: 0.4 s : 0.4 s >> Functions merged: 7 : 7 >> >> All the tests were token from test-suite object directory. In average it shows >200% performance improvement. > > These results look promising! When you say 200% average, do you mean > 200% average for all files in test-suite, or just these four? Are > there any negative results? > >> You could also see that patched version detects a bit more functions for merging, since it has improved constant-equivalence detection. > > 1. I’d prefer to see the improvement to constant-equivalence detection > as a separate initial patch (to the old code). Is that possible? > In particular, it would be reassuring to see your patch *not* change > the number of functions merged. > > 2. I’m also still concerned about correctness. Previously, when the > comparison result was unknown, returning “false” would conservatively > consider the functions distinct, so we had no correctness issues. > > However, with your patch, *everything* must be accurately compared, > right? Are there any cases where you can’t accurately order > functions? (Is FunctionComparator::cmpEnumerate an example?) > > If these are problems, maybe you can use the previous ==-style > comparison to confirm. > >> Raul, >> >> Yes. you right it is O(N*log(N)). And yes, I have some ideas about hashing. >> Though in the end of hashing improvement I still see the tree. I'll explain. >> >> Once you calculated hash (even really good one), you still have probability of N>1 functions with the same hash. Of course it could be really low. But then, you have to spend some time just being creating such a complex hash (definitely, you have to scan whole function body). >> >> I think there should be sequence of short hash numbers. In another words we can try to represent function as a chain of hash codes: >> F0: hashF0-0, hashF0-1, hashF0-2, ... >> F1: hashF1-0, hashF1-1, ... >> >> In this case you can create kind of hash-tree. Consider we have F0, F1 and F2. >> >> Imagine, hashF0-0 == hashF1-0 == hashF2-0, hashF0-1 == hashF1-1: >> >> Then we can build the next tree (requires fixed-width fonts): >> >> [hashF0-0] >> / \ >> [hashF0-1] [hashF2-1] >> / \ \ >> [hashF0-2] [hashF1-2] [hashF2-2] >> >> In this case as you can see, whole process would be of complexity O(N*n), where: >> >> N - number of functions >> n - function size. >> >> Note, hashF[i]-[j] could be generated on demand. It is not obvious to generate whole hash at once. >> >> I have also implemented this approach. Though it is looks more complex, and not so fast, as I hoped. Perhaps I can make it simplier. But it could not be simpler that just to replace "bool" with -1,0,1. Actually I really wanted to propose idea with hash-trees, since it combines advantages of hashing and tree-based lookup routines. But then I discovered idea I presented finally, and saw that it shows almost same improvement and looks much simpler. >> >> -Stepan > > I think you get diminishing returns for each level of the tree. I’d be > interested in seeing numbers on a simpler 2-stage hash. > > - Leave the first hash as it is now in the code. > - Add a secondary hash that’s built from the quantities that the equals > comparison (or your new less-than comparison) is based on. > - Fall back on O(n^2) for the third stage when there are collisions.
Stepan Dyatkovskiy
2014-Jan-23 18:01 UTC
[LLVMdev] MergeFunctions: reduce complexity to O(log(N))
Hi Duncan, Statistics. Is it OK I sent it in CSV format? Below few improvement numbers are presented. Next formulae was used: Improvement = (OldTime/NewTime - 1)*100%: 1. Overall (even with cases when there is nothing to merge): 21% 2. When >100 functions were merged: 95% (this is what I meant in previous post, when there is job for -mergefunc, new patch does it 2 times faster) 3. >500 functions (though there are only 3 modules): 127%. 4. Total number of functions: ~60000 (currently this number is absent in statistics, it just based on previous analysis) 5. Functions were merged with patch: 8342 6. Functions were merged without patch: 8345 7. So taking into account #5 and #6, we can conclude, that MergeFunctions catches about 10-14% of cases (though I need to recheck this number). Since it is not possible to establish order relation on cross-referenced functions, here we can see that new patch can't merge 3 functions. Though it is only 0.03%. All .ll files were token from test-suite object dir. I'm working on more detailed statistics. Once statistics script is written, it is easy to bring more details into this research. P.S.: New patches are coming also... -Stepan. Stepan Dyatkovskiy wrote:> Hi Duncan, >> These results look promising! When you say 200% average, do you mean >> 200% average for all files in test-suite, or just these four? Are >> there any negative results? > OK, now its clear we need really full statistics. I have one, but for old revision. > In general, this patch only updates result type, and it really don't do some extra calculations (except the constants, but below I'll explain, why its necessary). > > Few words about accuracy and correctness. > When old algorithm says "these two parts differs", new one says only a bit more: which part is less. Two things outcomes from that approach: > > 1. Each time when we determine difference in old algorithm, we also determine difference in new one. If less comparison implemented properly, that also means that result would be 100% the same. > So the correctness is provided by correctness of "less" comparison: all order-relation properties should be implemented and proven. > > 2. The changes are so small (only few "if" statements for each comparison method), so its almost impossible to get it working slower than old one. Even in worst case, when difference lies in the very end of functions, we spent almost the same cpu time. Just as example: > - if (Ty1->getTypeID() != Ty2->getTypeID()) > - return false; > + if (int Res = cmpNumbers(Ty1->getTypeID(), Ty2->getTypeID())) > + return Res; > > So, when there is nothing to merge, we got the same performance results, like for old approach. > When there are some equivalent functions that could be merged, new approach gives performance improvement. > > Now about constants. > I'll prepare separate patch. In comparison with old approach, it compares constant contents. It is replacement of bitcast possibility check. And yes, its the slowest place of new approach. Here we need to understand which constant is actually greater, and which one is less. Though it could be implemented like that: > > if (C1 != ConstantExpr::getBitCast(const_cast<Constant*>(C2), C1->getType())) > return cmpConstantContents(C1, C2); > // to to next checks > > The only thing to be reassured here: whether getBitCast is transitive. > > About hash. > Thats reasonable I'll try it. > > Full statistics is coming soon.. > > -Stepan > > 22.01.2014, 20:09, "Duncan P. N. Exon Smith" <dexonsmith at apple.com>: >> On 2014 Jan 22, at 07:35, Stepan Dyatkovskiy <stpworld at narod.ru> wrote: >> >>> Hi Raul and Duncan, >>> >>> Duncan, >>> Thank you for review. I hope to present fixed patch tomorrow. >>> >>> First, I would like to show few performance results: >>> >>> command: "time opt -mergefunc <test>" >>> >>> File: tramp3d-v4.ll, 12963 functions >>> Current : Patched >>> Time: 9.8 s : 2.0 secs ; >400%!! >>> Functions merged: 3503 : 3507 >>> >>> File: spirit.ll, 2885 functions >>> Current : Patched >>> Time: 2.2 s : 0.5 s >>> Functions merged: 1503 : 1505 >>> >>> File: k.ll, 2788 functions >>> Current : Patched >>> Time: 1.5 s : 0.7 s >>> Functions merged: 1626 : 1626 >>> >>> File: sqlite3.ll, 1057 functions >>> Current : Patched >>> Time: 0.4 s : 0.4 s >>> Functions merged: 7 : 7 >>> >>> All the tests were token from test-suite object directory. In average it shows >200% performance improvement. >> >> These results look promising! When you say 200% average, do you mean >> 200% average for all files in test-suite, or just these four? Are >> there any negative results? >> >>> You could also see that patched version detects a bit more functions for merging, since it has improved constant-equivalence detection. >> >> 1. I’d prefer to see the improvement to constant-equivalence detection >> as a separate initial patch (to the old code). Is that possible? >> In particular, it would be reassuring to see your patch *not* change >> the number of functions merged. >> >> 2. I’m also still concerned about correctness. Previously, when the >> comparison result was unknown, returning “false” would conservatively >> consider the functions distinct, so we had no correctness issues. >> >> However, with your patch, *everything* must be accurately compared, >> right? Are there any cases where you can’t accurately order >> functions? (Is FunctionComparator::cmpEnumerate an example?) >> >> If these are problems, maybe you can use the previous ==-style >> comparison to confirm. >> >>> Raul, >>> >>> Yes. you right it is O(N*log(N)). And yes, I have some ideas about hashing. >>> Though in the end of hashing improvement I still see the tree. I'll explain. >>> >>> Once you calculated hash (even really good one), you still have probability of N>1 functions with the same hash. Of course it could be really low. But then, you have to spend some time just being creating such a complex hash (definitely, you have to scan whole function body). >>> >>> I think there should be sequence of short hash numbers. In another words we can try to represent function as a chain of hash codes: >>> F0: hashF0-0, hashF0-1, hashF0-2, ... >>> F1: hashF1-0, hashF1-1, ... >>> >>> In this case you can create kind of hash-tree. Consider we have F0, F1 and F2. >>> >>> Imagine, hashF0-0 == hashF1-0 == hashF2-0, hashF0-1 == hashF1-1: >>> >>> Then we can build the next tree (requires fixed-width fonts): >>> >>> [hashF0-0] >>> / \ >>> [hashF0-1] [hashF2-1] >>> / \ \ >>> [hashF0-2] [hashF1-2] [hashF2-2] >>> >>> In this case as you can see, whole process would be of complexity O(N*n), where: >>> >>> N - number of functions >>> n - function size. >>> >>> Note, hashF[i]-[j] could be generated on demand. It is not obvious to generate whole hash at once. >>> >>> I have also implemented this approach. Though it is looks more complex, and not so fast, as I hoped. Perhaps I can make it simplier. But it could not be simpler that just to replace "bool" with -1,0,1. Actually I really wanted to propose idea with hash-trees, since it combines advantages of hashing and tree-based lookup routines. But then I discovered idea I presented finally, and saw that it shows almost same improvement and looks much simpler. >>> >>> -Stepan >> >> I think you get diminishing returns for each level of the tree. I’d be >> interested in seeing numbers on a simpler 2-stage hash. >> >> - Leave the first hash as it is now in the code. >> - Add a secondary hash that’s built from the quantities that the equals >> comparison (or your new less-than comparison) is based on. >> - Fall back on O(n^2) for the third stage when there are collisions. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- A non-text attachment was scrubbed... Name: real-set-report.csv Type: text/csv Size: 48081 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140123/457a67b4/attachment.csv>