Duncan P. N. Exon Smith
2014-Jan-21 19:23 UTC
[LLVMdev] MergeFunctions: reduce complexity to O(log(N))
Hi Stepan, This looks interesting! Some high-level comments: - Please post the patch untarred next time. Also, I'm not sure if it's typical to provide supporting documents in .doc format; while many of us probably have access to Word, a more portable and email-friendly format (like text, markdown or rst) is probably better. - Have you profiled this? What's the speedup? I second Raul's question about whether improving the hashing (or doing secondary hashing) could result in a larger benefit (maybe even with simpler code). - At the beginning of FunctionComparator::cmpEnumerate, you have a comment that you can't introduce an order relation for cross-references, and the following code: // Unfortunately we can't introduce order relation for // cross reference case. It is possible for self-reference only. if (V1 == F1) { if (V2 == F2) return 0; return -1; } if (V2 == F2) { if (V1 == F1) return 0; return 1; } Can't this break asymmetry? Doesn't breaking the ordering relation cause a correctness problem? I.e., what am I missing here? I also have a few low-level comments (and nitpicks): - The comments at the top of the file don't match the new implementation! - Lots of strange tabbing after your changes. E.g.: + int cmpGEP(const GetElementPtrInst *GEP1, const GetElementPtrInst *GEP2) { - After the FunctionPtr class declaration, you have an extra blank line. There are a few of these scattered through (e.g., at the beginning of FunctionComparator::cmpConstants). - Your new helper functions (such as cmpNumbers) that are local to the file should be declared 'static'. - You use the following pattern throughout: int Res; // ... Res = cmpNumbers(I1->getNumOperands(), I2->getNumOperands()); if (Res != 0) return Res; whereas, since you never use Res unless it's non-zero, I think the following is more clear and concise: if (int Res = cmpNumbers(I1->getNumOperands(), I2->getNumOperands())) return Res; - I'm not sure if your MERGEFUNC_SANITY should be included the way it is: +//#define MERGEFUNC_SANITY 900 +#ifdef MERGEFUNC_SANITY - I’m also unsure about the following: + // If we have some problematic functions, perhaps we would like + // to keep declarations only, and those definitions fires the problem. + // We could insert manualy last ones (notepad, nano, vi, whatever else :-) ) +#if 0 + for (Module::iterator I = M.begin(), E = M.end(); I != E; ++I) { + Function *F = I; + F->deleteBody(); + } +#endif Cheers, Duncan On Jan 21, 2014, at 9:58 AM, Stepan Dyatkovskiy <stpworld at narod.ru> wrote:> ping > Stepan Dyatkovskiy wrote: >> Hi all, >> >> I propose simple improvement for MergeFunctions pass, that reduced its >> complexity from O(N^2) to O(log(N)), where N is number of functions in >> module. >> >> The idea, is to replace the result of comparison from "bool" to >> "-1,0,1". In another words: define order relation on functions set. >> To be sure, that functions could be comparable that way, we have to >> prove that order relation is possible on all comparison stage. >> >> The last one is possible, if for each comparison stage we implement has >> next properties: >> * reflexivity (a <= a, a == a, a >= a), >> * antisymmetry (if a <= b and b <= a then a == b), >> * transitivity (a <= b and b <= c, then a <= c) >> * asymmetry (if a < b, then a > b or a == b).For asymmetry, I think it’s: if a < b, then not a > b and not a == b.>> >> Once we have defined order relation we can store all the functions in >> binary tree and perform lookup in O(log(N)) time. >> >> This post has two attachments: >> 1. The patch, that has implementation of this idea. >> 2. The MergeFunctions pass detailed description, with explanation how >> order relation could be possible. >> >> Hope it helps to make things better! >> >> -Stepan. >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Stepan Dyatkovskiy
2014-Jan-22 15:35 UTC
[LLVMdev] MergeFunctions: reduce complexity to O(log(N))
Hi Raul and Duncan, Duncan, Thank you for review. I hope to present fixed patch tomorrow. First, I would like to show few performance results: command: "time opt -mergefunc <test>" File: tramp3d-v4.ll, 12963 functions Current : Patched Time: 9.8 s : 2.0 secs ; >400%!! Functions merged: 3503 : 3507 File: spirit.ll, 2885 functions Current : Patched Time: 2.2 s : 0.5 s Functions merged: 1503 : 1505 File: k.ll, 2788 functions Current : Patched Time: 1.5 s : 0.7 s Functions merged: 1626 : 1626 File: sqlite3.ll, 1057 functions Current : Patched Time: 0.4 s : 0.4 s Functions merged: 7 : 7 All the tests were token from test-suite object directory. In average it shows >200% performance improvement. You could also see that patched version detects a bit more functions for merging, since it has improved constant-equivalence detection. Raul, Yes. you right it is O(N*log(N)). And yes, I have some ideas about hashing. Though in the end of hashing improvement I still see the tree. I'll explain. Once you calculated hash (even really good one), you still have probability of N>1 functions with the same hash. Of course it could be really low. But then, you have to spend some time just being creating such a complex hash (definitely, you have to scan whole function body). I think there should be sequence of short hash numbers. In another words we can try to represent function as a chain of hash codes: F0: hashF0-0, hashF0-1, hashF0-2, ... F1: hashF1-0, hashF1-1, ... In this case you can create kind of hash-tree. Consider we have F0, F1 and F2. Imagine, hashF0-0 == hashF1-0 == hashF2-0, hashF0-1 == hashF1-1: Then we can build the next tree (requires fixed-width fonts): [hashF0-0] / \ [hashF0-1] [hashF2-1] / \ \ [hashF0-2] [hashF1-2] [hashF2-2] In this case as you can see, whole process would be of complexity O(N*n), where: N - number of functions n - function size. Note, hashF[i]-[j] could be generated on demand. It is not obvious to generate whole hash at once. I have also implemented this approach. Though it is looks more complex, and not so fast, as I hoped. Perhaps I can make it simplier. But it could not be simpler that just to replace "bool" with -1,0,1. Actually I really wanted to propose idea with hash-trees, since it combines advantages of hashing and tree-based lookup routines. But then I discovered idea I presented finally, and saw that it shows almost same improvement and looks much simpler. -Stepan Duncan P. N. Exon Smith wrote:> Hi Stepan, > > This looks interesting! Some high-level comments: > > - Please post the patch untarred next time. Also, I'm not sure if it's typical > to provide supporting documents in .doc format; while many of us probably > have access to Word, a more portable and email-friendly format (like text, > markdown or rst) is probably better. > - Have you profiled this? What's the speedup? I second Raul's question about > whether improving the hashing (or doing secondary hashing) could result in a > larger benefit (maybe even with simpler code). > - At the beginning of FunctionComparator::cmpEnumerate, you have a comment that > you can't introduce an order relation for cross-references, and the following > code: > > // Unfortunately we can't introduce order relation for > // cross reference case. It is possible for self-reference only. > if (V1 == F1) { > if (V2 == F2) > return 0; > return -1; > } > if (V2 == F2) { > if (V1 == F1) > return 0; > return 1; > } > > Can't this break asymmetry? Doesn't breaking the ordering relation cause a > correctness problem? I.e., what am I missing here? > > I also have a few low-level comments (and nitpicks): > > - The comments at the top of the file don't match the new implementation! > - Lots of strange tabbing after your changes. E.g.: > > + int cmpGEP(const GetElementPtrInst *GEP1, > const GetElementPtrInst *GEP2) { > > - After the FunctionPtr class declaration, you have an extra blank line. There > are a few of these scattered through (e.g., at the beginning of > FunctionComparator::cmpConstants). > - Your new helper functions (such as cmpNumbers) that are local to the file > should be declared 'static'. > - You use the following pattern throughout: > > int Res; > // ... > Res = cmpNumbers(I1->getNumOperands(), I2->getNumOperands()); > if (Res != 0) > return Res; > > whereas, since you never use Res unless it's non-zero, I think the > following is more clear and concise: > > if (int Res = cmpNumbers(I1->getNumOperands(), I2->getNumOperands())) > return Res; > > - I'm not sure if your MERGEFUNC_SANITY should be included the way it is: > > +//#define MERGEFUNC_SANITY 900 > +#ifdef MERGEFUNC_SANITY > > - I’m also unsure about the following: > > + // If we have some problematic functions, perhaps we would like > + // to keep declarations only, and those definitions fires the problem. > + // We could insert manualy last ones (notepad, nano, vi, whatever else :-) ) > +#if 0 > + for (Module::iterator I = M.begin(), E = M.end(); I != E; ++I) { > + Function *F = I; > + F->deleteBody(); > + } > +#endif > > Cheers, > Duncan > > On Jan 21, 2014, at 9:58 AM, Stepan Dyatkovskiy <stpworld at narod.ru> wrote: > >> ping >> Stepan Dyatkovskiy wrote: >>> Hi all, >>> >>> I propose simple improvement for MergeFunctions pass, that reduced its >>> complexity from O(N^2) to O(log(N)), where N is number of functions in >>> module. >>> >>> The idea, is to replace the result of comparison from "bool" to >>> "-1,0,1". In another words: define order relation on functions set. >>> To be sure, that functions could be comparable that way, we have to >>> prove that order relation is possible on all comparison stage. >>> >>> The last one is possible, if for each comparison stage we implement has >>> next properties: >>> * reflexivity (a <= a, a == a, a >= a), >>> * antisymmetry (if a <= b and b <= a then a == b), >>> * transitivity (a <= b and b <= c, then a <= c) >>> * asymmetry (if a < b, then a > b or a == b). > > For asymmetry, I think it’s: if a < b, then not a > b and not a == b. > >>> >>> Once we have defined order relation we can store all the functions in >>> binary tree and perform lookup in O(log(N)) time. >>> >>> This post has two attachments: >>> 1. The patch, that has implementation of this idea. >>> 2. The MergeFunctions pass detailed description, with explanation how >>> order relation could be possible. >>> >>> Hope it helps to make things better! >>> >>> -Stepan. >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Duncan P. N. Exon Smith
2014-Jan-22 16:08 UTC
[LLVMdev] MergeFunctions: reduce complexity to O(log(N))
On 2014 Jan 22, at 07:35, Stepan Dyatkovskiy <stpworld at narod.ru> wrote:> Hi Raul and Duncan, > > Duncan, > Thank you for review. I hope to present fixed patch tomorrow. > > First, I would like to show few performance results: > > command: "time opt -mergefunc <test>" > > File: tramp3d-v4.ll, 12963 functions > Current : Patched > Time: 9.8 s : 2.0 secs ; >400%!! > Functions merged: 3503 : 3507 > > File: spirit.ll, 2885 functions > Current : Patched > Time: 2.2 s : 0.5 s > Functions merged: 1503 : 1505 > > File: k.ll, 2788 functions > Current : Patched > Time: 1.5 s : 0.7 s > Functions merged: 1626 : 1626 > > File: sqlite3.ll, 1057 functions > Current : Patched > Time: 0.4 s : 0.4 s > Functions merged: 7 : 7 > > All the tests were token from test-suite object directory. In average it shows >200% performance improvement.These results look promising! When you say 200% average, do you mean 200% average for all files in test-suite, or just these four? Are there any negative results?> You could also see that patched version detects a bit more functions for merging, since it has improved constant-equivalence detection.1. I’d prefer to see the improvement to constant-equivalence detection as a separate initial patch (to the old code). Is that possible? In particular, it would be reassuring to see your patch *not* change the number of functions merged. 2. I’m also still concerned about correctness. Previously, when the comparison result was unknown, returning “false” would conservatively consider the functions distinct, so we had no correctness issues. However, with your patch, *everything* must be accurately compared, right? Are there any cases where you can’t accurately order functions? (Is FunctionComparator::cmpEnumerate an example?) If these are problems, maybe you can use the previous ==-style comparison to confirm.> Raul, > > Yes. you right it is O(N*log(N)). And yes, I have some ideas about hashing. > Though in the end of hashing improvement I still see the tree. I'll explain. > > Once you calculated hash (even really good one), you still have probability of N>1 functions with the same hash. Of course it could be really low. But then, you have to spend some time just being creating such a complex hash (definitely, you have to scan whole function body). > > I think there should be sequence of short hash numbers. In another words we can try to represent function as a chain of hash codes: > F0: hashF0-0, hashF0-1, hashF0-2, ... > F1: hashF1-0, hashF1-1, ... > > In this case you can create kind of hash-tree. Consider we have F0, F1 and F2. > > Imagine, hashF0-0 == hashF1-0 == hashF2-0, hashF0-1 == hashF1-1: > > Then we can build the next tree (requires fixed-width fonts): > > [hashF0-0] > / \ > [hashF0-1] [hashF2-1] > / \ \ > [hashF0-2] [hashF1-2] [hashF2-2] > > In this case as you can see, whole process would be of complexity O(N*n), where: > > N - number of functions > n - function size. > > Note, hashF[i]-[j] could be generated on demand. It is not obvious to generate whole hash at once. > > I have also implemented this approach. Though it is looks more complex, and not so fast, as I hoped. Perhaps I can make it simplier. But it could not be simpler that just to replace "bool" with -1,0,1. Actually I really wanted to propose idea with hash-trees, since it combines advantages of hashing and tree-based lookup routines. But then I discovered idea I presented finally, and saw that it shows almost same improvement and looks much simpler. > > -StepanI think you get diminishing returns for each level of the tree. I’d be interested in seeing numbers on a simpler 2-stage hash. - Leave the first hash as it is now in the code. - Add a secondary hash that’s built from the quantities that the equals comparison (or your new less-than comparison) is based on. - Fall back on O(n^2) for the third stage when there are collisions.
Raul Silvera
2014-Jan-22 20:43 UTC
[LLVMdev] MergeFunctions: reduce complexity to O(log(N))
Thank you Stepan for your comments. While I agree that a better hash would be more expensive to compute, it won't be a factor as long as it is linear on the size of the function, even if you have to walk the whole routine. After all, the comparison may need to walk over the whole routine as well (and do so n log(n) times, instead of just n times). Where the comparison approach has an edge is that it can bail out on to the first difference, while the hashing needs to be fully computed all the time. This may be a noticeable diference in practice. That is why a multiple-hash approach may be interesting to explore. Use a very fast hash first, and then compute a more expensive hash on the second level as we go down into each bucket. Even multiple levels where we slowly look further down into the function may be appropriate -- eg look at the first block on the first hash, then the next three blocks on the second hash, etc... Also, some heuristics related to code size may be relevant here -- as the size of the function increases the chances of finding a match likely go down very quickly. On Wed, Jan 22, 2014 at 7:35 AM, Stepan Dyatkovskiy <stpworld at narod.ru>wrote:> Hi Raul and Duncan, > > Duncan, > Thank you for review. I hope to present fixed patch tomorrow. > > First, I would like to show few performance results: > > command: "time opt -mergefunc <test>" > > File: tramp3d-v4.ll, 12963 functions > Current : Patched > Time: 9.8 s : 2.0 secs ; >400%!! > Functions merged: 3503 : 3507 > > File: spirit.ll, 2885 functions > Current : Patched > Time: 2.2 s : 0.5 s > Functions merged: 1503 : 1505 > > File: k.ll, 2788 functions > Current : Patched > Time: 1.5 s : 0.7 s > Functions merged: 1626 : 1626 > > File: sqlite3.ll, 1057 functions > Current : Patched > Time: 0.4 s : 0.4 s > Functions merged: 7 : 7 > > All the tests were token from test-suite object directory. In average it > shows >200% performance improvement. You could also see that patched > version detects a bit more functions for merging, since it has improved > constant-equivalence detection. > > Raul, > > Yes. you right it is O(N*log(N)). And yes, I have some ideas about hashing. > Though in the end of hashing improvement I still see the tree. I'll > explain. > > Once you calculated hash (even really good one), you still have > probability of N>1 functions with the same hash. Of course it could be > really low. But then, you have to spend some time just being creating such > a complex hash (definitely, you have to scan whole function body). > > I think there should be sequence of short hash numbers. In another words > we can try to represent function as a chain of hash codes: > F0: hashF0-0, hashF0-1, hashF0-2, ... > F1: hashF1-0, hashF1-1, ... > > In this case you can create kind of hash-tree. Consider we have F0, F1 and > F2. > > Imagine, hashF0-0 == hashF1-0 == hashF2-0, hashF0-1 == hashF1-1: > > Then we can build the next tree (requires fixed-width fonts): > > [hashF0-0] > / \ > [hashF0-1] [hashF2-1] > / \ \ > [hashF0-2] [hashF1-2] [hashF2-2] > > In this case as you can see, whole process would be of complexity O(N*n), > where: > > N - number of functions > n - function size. > > Note, hashF[i]-[j] could be generated on demand. It is not obvious to > generate whole hash at once. > > I have also implemented this approach. Though it is looks more complex, > and not so fast, as I hoped. Perhaps I can make it simplier. But it could > not be simpler that just to replace "bool" with -1,0,1. Actually I really > wanted to propose idea with hash-trees, since it combines advantages of > hashing and tree-based lookup routines. But then I discovered idea I > presented finally, and saw that it shows almost same improvement and looks > much simpler. > > -Stepan > > > Duncan P. N. Exon Smith wrote: > >> Hi Stepan, >> >> This looks interesting! Some high-level comments: >> >> - Please post the patch untarred next time. Also, I'm not sure if it's >> typical >> to provide supporting documents in .doc format; while many of us >> probably >> have access to Word, a more portable and email-friendly format (like >> text, >> markdown or rst) is probably better. >> - Have you profiled this? What's the speedup? I second Raul's question >> about >> whether improving the hashing (or doing secondary hashing) could >> result in a >> larger benefit (maybe even with simpler code). >> - At the beginning of FunctionComparator::cmpEnumerate, you have a >> comment that >> you can't introduce an order relation for cross-references, and the >> following >> code: >> >> // Unfortunately we can't introduce order relation for >> // cross reference case. It is possible for self-reference only. >> if (V1 == F1) { >> if (V2 == F2) >> return 0; >> return -1; >> } >> if (V2 == F2) { >> if (V1 == F1) >> return 0; >> return 1; >> } >> >> Can't this break asymmetry? Doesn't breaking the ordering relation >> cause a >> correctness problem? I.e., what am I missing here? >> >> I also have a few low-level comments (and nitpicks): >> >> - The comments at the top of the file don't match the new implementation! >> - Lots of strange tabbing after your changes. E.g.: >> >> + int cmpGEP(const GetElementPtrInst *GEP1, >> const GetElementPtrInst *GEP2) { >> >> - After the FunctionPtr class declaration, you have an extra blank line. >> There >> are a few of these scattered through (e.g., at the beginning of >> FunctionComparator::cmpConstants). >> - Your new helper functions (such as cmpNumbers) that are local to the >> file >> should be declared 'static'. >> - You use the following pattern throughout: >> >> int Res; >> // ... >> Res = cmpNumbers(I1->getNumOperands(), I2->getNumOperands()); >> if (Res != 0) >> return Res; >> >> whereas, since you never use Res unless it's non-zero, I think the >> following is more clear and concise: >> >> if (int Res = cmpNumbers(I1->getNumOperands(), >> I2->getNumOperands())) >> return Res; >> >> - I'm not sure if your MERGEFUNC_SANITY should be included the way it is: >> >> +//#define MERGEFUNC_SANITY 900 >> +#ifdef MERGEFUNC_SANITY >> >> - I’m also unsure about the following: >> >> + // If we have some problematic functions, perhaps we would >> like >> + // to keep declarations only, and those definitions fires the >> problem. >> + // We could insert manualy last ones (notepad, nano, vi, >> whatever else :-) ) >> +#if 0 >> + for (Module::iterator I = M.begin(), E = M.end(); I != E; >> ++I) { >> + Function *F = I; >> + F->deleteBody(); >> + } >> +#endif >> >> Cheers, >> Duncan >> >> On Jan 21, 2014, at 9:58 AM, Stepan Dyatkovskiy <stpworld at narod.ru> >> wrote: >> >> ping >>> Stepan Dyatkovskiy wrote: >>> >>>> Hi all, >>>> >>>> I propose simple improvement for MergeFunctions pass, that reduced its >>>> complexity from O(N^2) to O(log(N)), where N is number of functions in >>>> module. >>>> >>>> The idea, is to replace the result of comparison from "bool" to >>>> "-1,0,1". In another words: define order relation on functions set. >>>> To be sure, that functions could be comparable that way, we have to >>>> prove that order relation is possible on all comparison stage. >>>> >>>> The last one is possible, if for each comparison stage we implement has >>>> next properties: >>>> * reflexivity (a <= a, a == a, a >= a), >>>> * antisymmetry (if a <= b and b <= a then a == b), >>>> * transitivity (a <= b and b <= c, then a <= c) >>>> * asymmetry (if a < b, then a > b or a == b). >>>> >>> >> For asymmetry, I think it’s: if a < b, then not a > b and not a == b. >> >> >>>> Once we have defined order relation we can store all the functions in >>>> binary tree and perform lookup in O(log(N)) time. >>>> >>>> This post has two attachments: >>>> 1. The patch, that has implementation of this idea. >>>> 2. The MergeFunctions pass detailed description, with explanation how >>>> order relation could be possible. >>>> >>>> Hope it helps to make things better! >>>> >>>> -Stepan. >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >> >> >-- Raúl E. Silvera -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140122/b7ad8a96/attachment.html>
Apparently Analagous Threads
- [LLVMdev] MergeFunctions: reduce complexity to O(log(N))
- [LLVMdev] MergeFunctions: reduce complexity to O(log(N))
- [LLVMdev] MergeFunctions: reduce complexity to O(log(N))
- [LLVMdev] MergeFunctions: reduce complexity to O(log(N))
- [LLVMdev] Debug Info and MergeFunctions Transform