Stepan Dyatkovskiy
2014-Mar-07 17:47 UTC
[LLVMdev] MergeFunctions: reduce complexity to O(log(N))
Ping> ping > Stepan Dyatkovskiy wrote: > >> Hi Nick, >> >> I tried to rework changes as you requested. One of patches (0004 with >> extra assertions) has been removed. >> >>> + bool isEquivalentType(Type *Ty1, Type *Ty2) const { >>> + return cmpType(Ty1, Ty2) == 0; >>> + } >>> >>> Why do we still need isEquivalentType? Can we nuke this? >> >> Yup. After applying all the patches isEquivalentType will be totally >> replaced with cmpType. All isEqXXXX and friends will be removed in 0011 >> (old 0012). No traces left. >> Old function wasn't removed in 0001 just for keeping patches without >> extra noise like: >> >> - something that uses isEquivalentType >> + something that uses cmpType >> >> The point is, that "something" that uses isEquivalentType, will be also >> replaced with one of next patches in this series. >> >>> +static int cmpNumbers(uint64_t L, uint64_t R) { >>> + if (L < R) return -1; >>> + if (L > R) return 1; >>> + return 0; >>> +} >>> >>> At a high level, you don't actually need a <=> operator to use a sort. A >>> strict ordering ( < operator) is sufficient. (Note that for mergefunc, a >>> strict weak ordering is not sufficient, it must be a total ordering.) >> >> That could be done with int FunctionComparator::compare(). We can >> replace it with bool FunctionComparator::less(). Though for all other >> cmp methods need at least 2 bits of information as result: >> 1. Whether things are equal. >> 2. Whether left less than right. >> >> As for FunctionComparator::compare(), conversion into less() will >> increase time of sanity check (patch #0010). >> Sanity check is just a sweet bonus. It checks that ordering implemented >> properly (checks order relation properties). >> Turning compare() into less() mean, that we'll have to run comparison >> two times: L.less(R) and R.less(L). But may be sanity check is not a >> thing to be published at all. >> >>> Consider hoisting this inside the FunctionComparator class? That class >>> should have a bunch of implementations of comparisons between various >>> different things, which can pass down to other methods in the same >> >> class. >> In new patch series attached to this post, I have moved all static >> methods into FunctionComparator. >> >>> + // Replacement for type::canLosslesslyBitCastTo, that >>> + // establish order relation on this kind of properties. >>> + int checkForLosslessBitcast(const Type *L, const Type *R); >>> >>> Type:: not type:: . Please make this comment more descriptive. >> >> Done. >> [new comment] >> Replacement for Type::canLosslesslyBitCastTo, that >> establish order relation on this kind of properties >> Returns 0, if L and R types could be converted to each other without >> reinterpretation of bits. >> Otherwise method returns -1 or 1, defining total ordering between >> types in context of lossless bitcastability trait. >> E.g.: if L is less than R (result is -1), than every type that could be >> losslessly bitcasted to L is less than R. >> [/new comment] >> >>> + /// Replacement for C1 == getBitCast(C2, C1Ty) >>> + /// Its more controllable, and supposed to be simpler and more >>> predictionable. >>> + /// As very important advantage: it allows to introduce order >> >> relation on >> >>> + /// constants set, and thus use it as trait in refinement routines. >>> >>> "Its" --> "It's". "predictionable" --> "predictable". And how is it more >>> predictable? I think this comment would be better if it described the >>> function instead of making comparisons between it and other functions. >>> Something like, "Compare constants under a system where pointer to X and >>> pointer to Y are considered equal" or whatever is actually true here. >> >> Done. >> [new comment] >> Replacement for C1 == getBitCast(C2, C1Ty) >> Parses constants contents, assuming that types are losslessly >> bitcasted between each other. So actually it ignores types and only >> compares bits from L and R. >> Returns 0, if L and R has equivalent content. >> -1 or 1 if values are different. Maintaining total ordering requires >> two values that indicates non-equivalence (L less R, L greater R). >> [/new comment] >> >>> +enum ConstantType { >>> I'm not sure that this buys you much. All the "isa" tests can be broken >>> down into a switch on getValueID() with the one exception of >> >> isNullValue(). >> Done. >> >>> + assert( >>> + C1->getType()->canLosslesslyBitCastTo(C2->getType()) && >>> + "Pass is healthless. checkForLosslessBitcast should be twin of " >>> + "canLosslesslyBitCastTo method, except the case when the last >> >> one " >> >>> + "returns false, the first one should return -1 or 1"); >> >> ... >> >>> I think we can skip the asserts here. They aren't detecting a specific >>> bug, they're checking whether the new code does a certain task relative >>> to the old code. Drop the old code, your new code is the new sheriff in >>> town. >> >> Asserts has been removed. >> >>> + DenseMap<const Value*, int> sn_map1, sn_map2; >>> >>> What is sn short for ("seen")? Why are there two of these? >> >> Serial number :-) >> >>> + std::pair<DenseMap<const Value *, int>::iterator, bool> >>> + LeftSN = sn_map1.insert(std::make_pair(V1, sn_map1.size())), >>> + RightSN = sn_map2.insert(std::make_pair(V2, sn_map2.size())); >>> >>> So I think I get it, this is easy to reason about. You number each value >>> going down both functions. But I think you can eliminate one of these >>> maps because you know that if the left and right were ever different >>> then we terminate the comparison immediately. You can at least share one >>> map with both V1 and V2, but I think you can reduce the number of map >>> insertions. >> >> Not sure. Consider that in left you have: >> %A = alloca i32 >> %B = alloca i32 >> store i32 1, i32* %A >> >> And in right: >> %A = alloca i32 >> %B = alloca i32 >> store i32 1, i32* %B >> >> When we meet 'store' instruction we need to check in which order %A was >> allocated at left and in which order %B was allocated at right. >> So for each value in function we have to assign serial number. Then >> comparing to local values from different functions we can just check >> their serial numbers. >> Though, may be I missed something? May be there is another way to >> determine "who was first". >> >>> High-level question, are attributes ordered? Your comparison is ordered. >>> So if I have function F with string attributes "sse2", "gc=hemisphere" >>> and another function G with string attributes "gc=hemisphere", "sse2" >>> then they would be considered different. I don't think that's right. >> >> I didn't check it. But if it is not ordered, we could order it >> lexicographically. >> >>> + int cmpOperation(const Instruction *I1, const Instruction *I2) const; >>> + >>> bool isEquivalentOperation(const Instruction *I1, >>> - const Instruction *I2) const; >>> + const Instruction *I2) const { >>> + return cmpOperation(I1, I2) == 0; >>> + } >>> >>> Hopefully this patch series ultimately eliminates calls of >>> isEquivalentOperation? >> >> Of course. Just like isEquivalentType. >> >>> By the way, this is an interesting one. Should "add x, y" and "add nsw >>> x, y" be equal or not? >> >> Yeah. Those are interesting questions. We could even treat 'mul a, 2' >> and 'shl a,1' as equal in some cases. I think it's matter of time and >> further optimization. >> >> Please find first reworked patches in attachment. >> >> -Stepan >> >> Nick Lewycky wrote: >> >>> /// Compare two Types, treating all pointer types as equal. >>> - bool isEquivalentType(Type *Ty1, Type *Ty2) const; >>> + int cmpType(Type *Ty1, Type *Ty2) const; >>> + >>> + /// Compare two Types, treating all pointer types as equal. >>> + bool isEquivalentType(Type *Ty1, Type *Ty2) const { >>> + return cmpType(Ty1, Ty2) == 0; >>> + } >>> >>> Why do we still need isEquivalentType? Can we nuke this? >>> >>> +static int cmpNumbers(uint64_t L, uint64_t R) { >>> + if (L < R) return -1; >>> + if (L > R) return 1; >>> + return 0; >>> +} >>> >>> At a high level, you don't actually need a <=> operator to use a sort. A >>> strict ordering ( < operator) is sufficient. (Note that for mergefunc, a >>> strict weak ordering is not sufficient, it must be a total ordering.) >>> >>> Consider hoisting this inside the FunctionComparator class? That class >>> should have a bunch of implementations of comparisons between various >>> different things, which can pass down to other methods in the same class. >>> >>> + // Replacement for type::canLosslesslyBitCastTo, that >>> + // establish order relation on this kind of properties. >>> + int checkForLosslessBitcast(const Type *L, const Type *R); >>> >>> Type:: not type:: . Please make this comment more descriptive. >>> >>> + /// Replacement for C1 == getBitCast(C2, C1Ty) >>> + /// Its more controllable, and supposed to be simpler and more >>> predictionable. >>> + /// As very important advantage: it allows to introduce order >>> relation on >>> + /// constants set, and thus use it as trait in refinement routines. >>> >>> "Its" --> "It's". "predictionable" --> "predictable". And how is it more >>> predictable? I think this comment would be better if it described the >>> function instead of making comparisons between it and other functions. >>> Something like, "Compare constants under a system where pointer to X and >>> pointer to Y are considered equal" or whatever is actually true here. >>> >>> +enum ConstantType { >>> >>> I'm not sure that this buys you much. All the "isa" tests can be broken >>> down into a switch on getValueID() with the one exception of >>> isNullValue(). >>> >>> + // Check whether C2 and C1 has equal bit patterns. >>> + if (checkForLosslessBitcast(C1->getType(), C2->getType()) != 0) >>> + return false; >>> + >>> + assert( >>> + C1->getType()->canLosslesslyBitCastTo(C2->getType()) && >>> + "Pass is healthless. checkForLosslessBitcast should be twin of " >>> + "canLosslesslyBitCastTo method, except the case when the last >>> one " >>> + "returns false, the first one should return -1 or 1"); >>> + >>> + // Compare constant contents >>> + if (cmpConstants(C1, C2) != 0) >>> + return false; >>> + >>> + assert( >>> + C1 =>>> + ConstantExpr::getBitCast(const_cast<Constant*>(C2), >>> C1->getType()) && >>> + "Pass is healthless. cmpConstants should detect *subset* of " >>> + "equivalent constants that could be detected by " >>> + "getBitCast method. So whereas cmpConstants returns 0 " >>> + "getBitCast also must return true. And wheres getBitCast " >>> + "returns false, cmpConstants also must return -1 or 1."); >>> >>> I think we can skip the asserts here. They aren't detecting a specific >>> bug, they're checking whether the new code does a certain task relative >>> to the old code. Drop the old code, your new code is the new sheriff in >>> town. >>> >>> + DenseMap<const Value*, int> sn_map1, sn_map2; >>> >>> What is sn short for ("seen")? Why are there two of these? >>> >>> + std::pair<DenseMap<const Value *, int>::iterator, bool> >>> + LeftSN = sn_map1.insert(std::make_pair(V1, sn_map1.size())), >>> + RightSN = sn_map2.insert(std::make_pair(V2, sn_map2.size())); >>> >>> So I think I get it, this is easy to reason about. You number each value >>> going down both functions. But I think you can eliminate one of these >>> maps because you know that if the left and right were ever different >>> then we terminate the comparison immediately. You can at least share one >>> map with both V1 and V2, but I think you can reduce the number of map >>> insertions. >>> >>> +// Mostly cloned from BitcodeWriter, but simplified a bit. >>> >>> Missing indent. >>> >>> +static int cmpAttrs(const AttributeSet L, const AttributeSet R) { >>> >>> High-level question, are attributes ordered? Your comparison is ordered. >>> So if I have function F with string attributes "sse2", "gc=hemisphere" >>> and another function G with string attributes "gc=hemisphere", "sse2" >>> then they would be considered different. I don't think that's right. >>> >>> + int cmpOperation(const Instruction *I1, const Instruction *I2) const; >>> + >>> bool isEquivalentOperation(const Instruction *I1, >>> - const Instruction *I2) const; >>> + const Instruction *I2) const { >>> + return cmpOperation(I1, I2) == 0; >>> + } >>> >>> Hopefully this patch series ultimately eliminates calls of >>> isEquivalentOperation? >>> >>> + if (int Res = cmpNumbers(I1->getRawSubclassOptionalData(), >>> + I2->getRawSubclassOptionalData())) >>> >>> By the way, this is an interesting one. Should "add x, y" and "add nsw >>> x, y" be equal or not? Depends whether the backend uses it, and whether >>> we're going to do more optimization afterwards (ie., this is the compile >>> step of an LTO build). Nothing for you to change here, just want you to >>> be aware that this is a point where we might want to have a flag and >>> make mergefunc behave differently in different compile strategies. >>> >>> (I stopped before getting to 0008.) >>> >>> Nick >>> >>> On 3 February 2014 11:11, Stepan Dyatkovskiy <stpworld at narod.ru >>> <mailto:stpworld at narod.ru>> wrote: >>> >>> Hi all, >>> >>> Previous patch has been split onto series of small changes. >>> On each stage (after each patch) MergeFunctions pass is compilable >>> and stable. >>> >>> Please find patches in attachment for review. >>> >>> To apply all patches at once, use "apply-patches.sh" script as >>> follows: >>> 0. Place "apply-patches.sh" in same directory with patches. >>> 1. cd <llvm-sources-dir> >>> 2. "bash <path-to-patches-dir>/apply-__patches.sh" >>> >>> -Stepan >>> >>> Stepan Dyatkovskiy wrote: >>> >>> Hi Nick, >>> About partition refinement. >>> >>> It looks like patch proposed here allows to apply your idea with >>> painless way (as series of small changes actually). >>> >>> If I got it right, you would like to introduce the next >>> partition-refinement. >>> >>> Initial partition P consists of only one functions class X0 >>> (whole >>> module, single bucket). Suppose S is subset of X0, so S >>> represents >>> functions with some trait that is absent in rest part of X0. >>> >>> Then refinement looks like below: >>> >>> P`=refine(P, S): >>> X`[0] = intersection(X[0], S) >>> X`[1] = difference(X[0], S) // X0 without S0 >>> >>> so P`={X`[0], X`[1]} >>> >>> And actually X`[0] and X`[1] are classes of functions that >>> differs with >>> trait defined by S. >>> >>> Then (suppose, we need to introduce new trait, so it would be >>> S`): >>> >>> P``=refine(P`, S`), so each X`` could be defined like below: >>> for each X`[i] in P`: >>> X``[i*2] = intersection(X`[i], S`) >>> X``[i*2+1] = difference(X`[i], S`) >>> >>> and so on, until all the traits are over. >>> >>> Finally we got the classes (set of Xs) that consists of >>> equivalent >>> functions. So, we can merge functions inside the class bounds. >>> >>> Am I right? >>> >>> Patch, that was proposed here could be represented as such >>> partition >>> refinement then: >>> >>> On each refinement stage, you need to introduce S somehow, the >>> algorithm >>> that answers the question: "what sort of difference we want to >>> check in >>> this bucket?" >>> >>> In existing implementation we check traits already, but pair-wise >>> (attributes, calling convention, number of arguments, number of >>> basic >>> blocks, etc.). How to to use the same traits in partition >>> refinement? >>> >>> Consider next example. Suppose we want to split some bucket X: >>> >>> 0. Take some trait from *pair-wise* comparison. For example: >>> number of >>> arguments. >>> 1. Take some function from bucket X, let it be F1. Let F1 has N1 >>> number >>> of arguments. >>> 2. Then S are functions with number of arguments *less* than N1. >>> 3. Do refinement by S. As result we got X`[0] and X`[1]. >>> 4. If both of X`[0] and X`[1] are not empty, then repeat #1-#3 >>> for them. >>> 5. If some of X`[0] of X`[1] is empty set, then all functions in >>> X are >>> the same in context of trait S. We need another trait, go to >>> #0. And >>> repeat #0-#5 for all X` buckets we have. >>> >>> Finally we got the groups of equivalent functions. And actually >>> these >>> groups would be organized in binary tree. >>> Most of stuff is implemented in std::map. I showed it as #0-#5 >>> just to >>> put it into context of your idea. The only problem was to >>> introduce way >>> of how to define order-relation (or how to define S). But this >>> problem >>> has been solved. >>> And on output we got groups of equal functions. >>> >>> Fix me if I'm wrong. But it looks like every advantage given by >>> partition refinement is present here. >>> >>> As I said in beginning, it is way of how to commit things you >>> proposed >>> with minimum of changes. We will use old comparison routines, >>> only thing >>> we need to change is return type (from bool to -1,0,1). >>> In this way, partition refinement could be committed as series >>> of small >>> changes: one comparison routine per commit. >>> >>> Every change in MergeFunctions from now could be well checked by >>> this >>> builder: >>> Now we have special builder for MergeFunctions. It runs >>> test-suite with >>> forced MergeFunctions pass: >>> >>> http://lab.llvm.org:8011/__builders/clang-mergefunc-x86___64-freeBSD9.2 >>> >>> <http://lab.llvm.org:8011/builders/clang-mergefunc-x86_64-freeBSD9.2> >>> Currently, MergeFunctions pass is stable, builder is green and >>> passes >>> all test-suite programs. >>> >>> -Stepan. >>> >>> Stepan Dyatkovskiy wrote: >>> >>> Hi all, >>> Please find the updated patch in attachment: >>> * Added some comments. >>> * Fixed some typos. >>> >>> -Stepan >>> >>> Nick Lewycky wrote: >>> >>> On 30 January 2014 01:27, Stepan Dyatkovskiy >>> <stpworld at narod.ru <mailto:stpworld at narod.ru> >>> <mailto:stpworld at narod.ru <mailto:stpworld at narod.ru>>> >>> wrote: >>> >>> Hello Sean and Tobias, >>> >>> Sean, >>> Thank you. Could you describe Nick's ideas in few >>> words or give me >>> links to your discussion, so I could adapt my ideas >>> to it. >>> >>> Sorry I haven't had time to write it up. The essential >>> idea is that we >>> use partition-refinement to determine the differences >>> between a group of >>> functions instead of using pair-wise comparison. >>> >>> Imagine you want to compare long sequences of numbers. >>> You can grab a >>> pair of sequences and run down them until they differ or >>> you reach the >>> end, and move on to another pair of sequences. With >>> partition refinement >>> you start with the first number in all the sequences and >>> bucket them. >>> Say, three 1's, two 2's, and an 8. You've just formed 3 >>> buckets. For >>> each bucket (with more than 1 member), do it again with >>> the next number >>> in the sequence. This gets you an equivalence comparison >>> at less than >>> the complexity of pairwise comparisons. (The buckets are >>> partitions, and >>> you're refining them as you read more data. Partition >>> refinement is the >>> dual of union-find, which we use quite heavily in the >>> compiler.) >>> >>> I haven't figured out how this stacks up against >>> Stepan's sorting with >>> <=> comparison. I can probably do that, I just haven't >>> spent the time >>> thinking about it. >>> >>> I also haven't thought through how to integrate it with >>> "near miss" >>> comparison. My current leading idea is that you do the >>> same thing, but >>> make a record of what makes two buckets similar. (Note >>> that the wrong >>> solution is to store a per-function-pair list of >>> similarities/differences.) >>> >>> Nick >>> >>> Tobias, >>> Your patch fails on several modules in my benchmark >>> (73 of ~1800 >>> tests). I have sent one as attachment. >>> >>> See statistics files for more details, all the .ll >>> files you could >>> simply find in test-suite object directory (after >>> "make TEST=nightly >>> report"). >>> I have attached script that launches benchmark. >>> Example of usage >>> (you need Linux OS): >>> bash test-scipt.sh report.txt opt-no-patch >>> opt-with-patch >>> >>> Some numbers. >>> >>> Total number of functions: 52715 >>> >>> Below, by default, I have excluded modules where >>> your patch fails. >>> >>> Functions merged >>> Original version: 1113 >>> Order relation patch: 1112 >>> Similar merging patch: 541 >>> >>> Functions merged (including failed tests) >>> Original version: 8398 >>> Order relation patch: 8395 >>> >>> Summary files size >>> Initial: 163595634 >>> bytes. >>> After merging with order relation patch: 163147731 >>> bytes. >>> After similar merging patch: 162491905 >>> bytes. >>> >>> Summary files size (including failed tests) >>> Initial: 250242266 bytes. >>> Original version: 247168198 bytes. >>> Order relation: 247175650 bytes. >>> >>> Time. I measured with "time" utility, and used >>> "real (wall clock) >>> time used by the process". >>> >>> Summary time spent >>> With existing version: 28.05 secs. >>> With order relation patch: 28.19 secs. >>> Similar merging patch: 28.61 secs. >>> >>> Summary time spent (including failed tests) >>> With existing version: 41.74 secs. >>> With order relation patch: 36.35 secs. >>> >>> -Stepan >>> >>> Sean Silva wrote: >>> >>> On Tue, Jan 28, 2014 at 2:47 PM, Tobias von Koch >>> <tobias.von.koch at gmail.com >>> <mailto:tobias.von.koch at gmail.com> >>> <mailto:tobias.von.koch at gmail.__com >>> <mailto:tobias.von.koch at gmail.com>> >>> <mailto:tobias.von.koch at gmail. >>> <mailto:tobias.von.koch at gmail.>____com >>> <mailto:tobias.von.koch at gmail.__com >>> <mailto:tobias.von.koch at gmail.com>>>> wrote: >>> >>> Hi Stepan, >>> >>> Sorry for the delay. It's great that you >>> are working on >>> MergeFunctions as well and I agree, we >>> should definitely >>> try to >>> combine our efforts to improve >>> MergeFunctions. >>> >>> Just to give you some context, the pass >>> (with the similar >>> function >>> merging patch) is already being used in a >>> production >>> setting. From >>> my point of view, it would be better if we >>> focus on >>> improving its >>> capability of merging functions at this >>> stage rather >>> than on >>> reduction of compilation time. >>> >>> I'll give a brief overview of how our >>> similar function >>> merging >>> algorithm works (you can also watch the >>> presentation from >>> the US >>> LLVM conference to hear me explain it >>> using an animation). >>> >>> 1. Hash all functions into buckets >>> >>> In each bucket, separately: >>> >>> 2. Compare functions pair-wise and >>> determine a >>> similarity metric for each pair (%age >>> of equivalent >>> instructions) >>> >>> 3. Merge identical functions (with >>> similarity = 100%), >>> update call >>> sites for those functions. >>> >>> 4. If the updates of call sites have >>> touched other >>> functions, >>> go back to step 2 and re-compare >>> *only* those >>> functions to all >>> others in their buckets. >>> >>> Finally, >>> >>> 5. Form groups of similar functions for >>> merging: >>> a) Pick most similar pair (A,B) >>> b) Find all functions C that are also >>> similar to B but >>> are not >>> more similar to some other >>> function D. >>> c) Merge A with B and all the C's. >>> Repeat until there are no more >>> functions to merge. >>> >>> As you can see, we need to compute a >>> similarity measure for >>> each >>> pair of functions in a bucket. If I >>> understand correctly, >>> your patch >>> reduces compile-time by avoiding >>> comparisons of functions >>> that >>> cannot be identical. That's exactly what >>> you want to do if >>> you only >>> want to merge identical functions. >>> However, because we also >>> need to >>> determine whether functions are merely >>> *similar*, I can't >>> see how >>> your idea could be applied in that case. >>> >>> Yeah, the existing pass only tries to merge >>> identical functions >>> (identical in machine code). I'm wondering if >>> the "similar" >>> function >>> merging would be better as a separate pass (I'm >>> genuinely >>> wondering; I >>> have no clue). I would wait for Nick's feedback. >>> >>> -- Sean Silva >>> >>> Looking at your statistics, I see that you >>> are considering >>> a very >>> large number of functions. I don't know >>> which benchmarks >>> you are >>> using, but I doubt that many of these are >>> worth merging. >>> For >>> instance, you rarely gain a code size >>> benefit from merging >>> functions >>> with just a few instructions. Our patch >>> takes care of this >>> using >>> thresholds. It's worth looking at actual >>> code size >>> reductions, >>> rather than just numbers of functions >>> merged/ compilation >>> time spent >>> comparing functions. It turns out that the >>> number of >>> functions in >>> each bucket is really quite small once you >>> apply some >>> heuristics >>> (and could still be further reduced!). >>> >>> Given your experience with MergeFunctions, >>> it would be >>> really great >>> if you could review our patch and also try >>> it out on your >>> benchmarks. >>> >>> Tobias >>> >>> On 24/01/2014 19:11, Stepan Dyatkovskiy >>> wrote: >>> >>> Hi Tobias. >>> >>> So, what do you think? >>> >>> If it means to much extra-job for your >>> team, may be I >>> can help you >>> somehow? I really would like to. >>> >>> -Stepan >>> >>> Stepan Dyatkovskiy wrote: >>> >>> Hi Tobias, >>> >>> I can't really see a way to >>> combine our >>> approach with >>> your patch. What >>> are your thoughts? >>> >>> I think it is possible. Even more >>> - we need to >>> combine our >>> efforts, in >>> order to bring this pass into real >>> live. >>> I'have read your presentation >>> file, and >>> unfortunately read >>> your patch >>> only a little. >>> How exactly you scan functions on >>> 2nd stage? >>> Could you >>> explain the >>> algorithm in few words, how you >>> compare workflow? >>> Is it >>> possible to >>> scan binary tree instead of hash >>> table? ...OK. >>> >>> That's how I see the modification. >>> Now its >>> important to >>> explain idea, >>> so consider the simplest case: you >>> need to catch >>> two >>> functions that >>> are differs with single >>> instruction somewhere. >>> >>> 1. Imagine, that IR *module* >>> contents is >>> represented as >>> binary tree: >>> Each line (trace) from root to >>> terminal node is a >>> function. >>> Each node - is function's >>> primitive (instruction >>> opcode, for >>> example). >>> Down - is direction to terminal >>> nodes, up - is >>> direction to >>> the root. >>> >>> 2. Now you are standing on some >>> node. And you >>> have two >>> directions >>> down-left and down-right. >>> >>> 3. If you are able to find two >>> equal sub-traces >>> down (at >>> left and at >>> right), then the only difference >>> lies in this node. >>> Then we >>> catch that >>> case. >>> >>> 4. Even more, if two subtrees at >>> left and at >>> right are >>> equal, than you >>> catch at once all the cases that >>> are represented by >>> these >>> subtrees. >>> >>> I still didn't look at you patch >>> carefully. Sorry.. >>> But I >>> hope that >>> helps, and I'll try look at it in >>> nearest time and >>> perhaps >>> its not the >>> best solution I gave in this post. >>> >>> -Stepan >>> >>> 22.01.2014, 20:53, "Tobias von >>> Koch" >>> <tobias.von.koch at gmail.com >>> <mailto:tobias.von.koch at gmail.com> >>> <mailto:tobias.von.koch at gmail.__com >>> <mailto:tobias.von.koch at gmail.com>> >>> <mailto:tobias.von.koch at gmail. >>> <mailto:tobias.von.koch at gmail.>____com >>> <mailto:tobias.von.koch at gmail.__com >>> <mailto:tobias.von.koch at gmail.com>>>>: >>> >>> Hi Stepan, >>> >>> As you've seen we have >>> recently implemented a >>> significant enhancement to >>> the MergeFunctions pass that >>> also allows >>> merging of >>> functions that are >>> only similar but not identical >>> >>> (http://llvm-reviews.__chandle____rc.com/D2591 >>> <http://chandle__rc.com/D2591> >>> <http://chandlerc.com/D2591> >>> >>> <http://llvm-reviews.__chandle__rc.com/D2591 >>> <http://chandlerc.com/D2591> >>> <http://llvm-reviews.__chandlerc.com/D2591 >>> <http://llvm-reviews.chandlerc.com/D2591>>>). >>> >>> Our patch also changes the way >>> that functions >>> are >>> compared quite >>> significantly. This is >>> necessary because we >>> need to >>> compare all >>> functions in a bucket in order >>> to get a >>> similarity >>> measure for each >>> pair, so we can then decide >>> which 'groups' of >>> functions >>> to merge. >>> >>> I can't really see a way to >>> combine our >>> approach with >>> your patch. What >>> are your thoughts? >>> >>> Another way to improve the >>> performance of >>> MergeFunctions >>> might be to >>> make the hash function better. >>> I've already >>> added the >>> size of the first >>> BB to it, and perhaps there >>> are other factors >>> we could >>> try... if we >>> don't have to compare >>> functions in the first >>> place >>> (because they're in >>> different buckets) then that's >>> obviously a much >>> bigger win. >>> >>> Thanks, >>> Tobias >>> >>> On 17/01/2014 20:25, Stepan >>> Dyatkovskiy wrote: >>> >>> Hi all, >>> >>> I propose simple >>> improvement for >>> MergeFunctions >>> pass, that reduced >>> its >>> complexity from O(N^2) >>> to O(log(N)), >>> where N is >>> number of >>> functions in >>> module. >>> >>> The idea, is to replace >>> the result of >>> comparison >>> from "bool" to >>> "-1,0,1". In another >>> words: define order >>> relation >>> on functions set. >>> To be sure, that >>> functions could be >>> comparable >>> that way, we have to >>> prove that order >>> relation is possible >>> on all >>> comparison stage. >>> >>> The last one is >>> possible, if for each >>> comparison >>> stage we >>> implement has >>> next properties: >>> * reflexivity (a <= a, >>> a == a, a >= a), >>> * antisymmetry (if a <>>> b and b <= a >>> then a == b), >>> * transitivity (a <= b >>> and b <= c, then >>> a <= c) >>> * asymmetry (if a < b, >>> then a > b or a >>> == b). >>> >>> Once we have defined >>> order relation we >>> can store >>> all the functions in >>> binary tree and perform >>> lookup in >>> O(log(N)) time. >>> >>> This post has two >>> attachments: >>> 1. The patch, that has >>> implementation of >>> this idea. >>> 2. The MergeFunctions >>> pass detailed >>> description, >>> with explanation how >>> order relation could be >>> possible. >>> >>> Hope it helps to make >>> things better! >>> >>> -Stepan. >>> >>> _____________________________________________________ >>> LLVM Developers mailing >>> list >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> >>> <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>> >>> <mailto:LLVMdev at cs.uiuc.edu >>> <mailto:LLVMdev at cs.uiuc.edu> <mailto:LLVMdev at cs.uiuc.edu >>> <mailto:LLVMdev at cs.uiuc.edu>>> >>> http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/______mailman/listinfo/llvmdev >>> <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev> >>> >>> <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev >>> <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>> >>> >>> <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev >>> <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev> >>> >>> <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev >>> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>>> >>> >>> _____________________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> >>> <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>> >>> <mailto:LLVMdev at cs.uiuc.edu >>> <mailto:LLVMdev at cs.uiuc.edu> <mailto:LLVMdev at cs.uiuc.edu >>> <mailto:LLVMdev at cs.uiuc.edu>>> >>> http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/______mailman/listinfo/llvmdev >>> <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev> >>> >>> <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev >>> <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>> >>> >>> <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev >>> <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev> >>> >>> <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev >>> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>>> >>> >>> _________________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> >>> <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>> >>> http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev >>> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >>> >>> _________________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> >>> http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev >>> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >>> >>> _________________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> >>> http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev >>> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-- Truly yours, Stepan Dyatkovskiy