Andrew Litteken via llvm-dev
2020-Sep-01 21:29 UTC
[llvm-dev] [RFC] Framework for Finding and Using Similarity at the IR Level
Hello, I’m Andrew Litteken, and I am working on a framework for defining, detecting, and deduplicating similar code sections at the IR level. Programmers can introduce sections of code that perform nearly the same operation. By copying and pasting code throughout a code base, or using a piece of code as a reference to create a new piece of code that has nearly the same structure redundancies are inadvertently introduced throughout programs. Furthermore, compilers can introduce similarity through sets of instructions that require the same code generation strategies, or optimizing to the same patterns. For example these two pieces of code: int fn(const std::vector<int> &myVec) { for (auto it = myVec.begin(), et = myVec.end(); it != et; ++it) { if (*it & 1) return 0; } return 1; } And int fn(const std::vector<int> &myVec) { for (const int &x : myVec) { if (x % 2) return 1; } return 0; } have a conditional that is finding whether the variable (it and x respectively) is equal to 1. And, they generate the same piece of IR code for the conditional: %11 = load i32, i32* %10, align 4 %12 = and i32 %11, 1 %13 = icmp eq i32 %12, 0 %14 = getelementptr inbounds i32, i32* %10, i64 1 br i1 %13, label %7, label %15 Such sections of code can be called similar. Two sections of code are considered similar when they perform the same operations on a given set of inputs. However, this does not require identity. This particular definition allows for the same sequences of instructions to have different operand names. For example: %1 = add i32 %a, 10 %2 = add i32 %a, %1 %3 = icmp slt icmp %1, %2 and %1 = add i32 11, %a %2 = sub i32 %a, %1 %3 = icmp sgt icmp %2, %1 are clearly not identical. There are different constants, the comparison operator is different and there are flipped operands. But if the constant 10, the constant 11, and register %a are considered inputs inputs into these sections, then the two regions could reduce to be: %1 = add i32 %input1, %input2 %2 = sub %input2 %a, %1 %3 = icmp slt icmp %1, %2 Ultimately, these sets of operations that are performing the same action, even though the operands are different. This patch focuses on this sort of “structural” similarity. However, this concept can be easily generalized. For example, two sections of code could be considered similar when they contain the same instructions, but are not necessarily used in the same order. Yet, because of the way the operands are used in this different ordering, the instructions still compute the same results for the given inputs. If these similarities are recognized, it could offer improvements towards reducing code size, analyzing redundancies in a large project, or creating tools to help refactor code for a programmer. This new framework offers an interface to detect similarity at the IR level. It has an internal representation that can be accessed via an analysis, allowing for easy access by other transformation and analysis passes that want to leverage detected similarity within a program. It can also be easily serialized into a file format like JSON, or as OptRemarks. In the future, a more advanced notion of similarity could be used, and would it be relatively simple to swap a new detection algorithm into the similarity framework. At present, there have been two different applications developed on top of this framework. The first is a similarity visualizer tool called LLVM-Sim that can ingest a module, and uses the framework to find the similarity in that module, and output a report for a programmer in a serialized form, like mentioned above. The second is an IR Outliner using the analysis pass as the basis for the outlining candidates. I will first talk briefly about the framework before moving into the two applications. Finding Similarity: The framework for finding similarity is centered around adding an analysis pass that finds similar regions of code that do not cross basic blocks, and groups them into different groups of similarity based on their structure. These are then made available to other passes, such as an outliner, or a tool, such as LLVM-Sim. In more detail, finding similarity works by hashing instructions to integers, and placing them in a long list. The list can be thought of as a string, with each unsigned integer acting as a character. This string can then be processed by a Suffix Tree to find repeated substrings in the program. Following this, the instances of each repeated substring are analyzed for structural similarity. In this case, if a one-to-one mapping for the operands of two different instances of a repeated substring can be found, so that each set of operands is operated on in the same way, then those two regions of instructions are considered to be structurally similar to one another. These tasks are handled by three new classes: The IRInstructionMapper, the IRSimilarityCandidate, and the IRSimilarityIdentifier. Finding Similarity Implementation: IRInstructionMapper: Since matching instructions that are identical does not provide that much information when looking for structural similarity, the value of the instruction, and the value of the operands for most instructions are not considered. Instead only the opcode, types and parameters are considered. Ultimately, if the opcode and types match, then the instructions are considered to be performing the same operation. However, for some instructions like shufflevector, or getelementptr instructions, it is ensured that the parameters match, in keeping with the “isSameOperationAs” instruction method. Each unique instruction via this scheme is mapped to an unsigned integer. Creating the mapping of the entire program to integers treats the program like a string with each integer as a character, and allows the use of data structures such as the Suffix Tree (recently refactored into LibSupport in review D80586 <https://reviews.llvm.org/D80586>) to process the string for repeated sequences of instruction in program. This maps instructions such as %1 = add i32 %a, 1 %2 = add i32 %b, %c To the same values as they have the same opcode and same type, and instructions such as %1 = add i32 %a, 1 %2 = add i64 %b, %c %3 = mul i64 %b, %c Are mapped to different values, since they have different opcodes and types. IRSimilarityIdentifier: The list of integers representing the program is then passed to the Suffix Tree, which can quickly find repeated substrings, and creates an IRSimilarityCandidate for each instance of these repeated substring. The IRSimilarityCandidate provides an interface for comparing the operand use and instructions of one IRSimilarityCandidate to another to determine structural similarity. Structural similarity is determined by attempting to find a one-to-one mapping from each register in one candidate to each register in another. If this mapping is found, then the two candidates are performing the same operations on a given set of inputs and are structurally similar. For example: %1 = add %2, %3 %4 = add %1, %2 Is equivalent to %a = add %b, %c %d = add %b, %a And the IRSimilarityCandidates containing these sections of code would be considered structurally similar. But %1 = add %2, %3 %4 = add %1, %2 Is not equivalent to %a = add %b, %c %d = add %a, %e. Since the use of %2 in the first section does not match the use of %b in the second. So the IRSimilarityCandidates containing these sections would not be considered structurally similar. The IRSimilarityIdentifier takes the IRSimilarityCandidates created from the instances of the repeated substrings and uses the IRSimilarityCandidate’s interface, described above, to determine which instances of a given repeated substring are structurally similar to one another. Once it has been determined which instances of a repeated substring that are structurally similar, sets of IRSimilarityCandidates that have been found to be compatible are grouped together (a Similarity Group). The IRSimilarityCandidates contained in each group are structurally similar to one another. In terms of the example above, the first two sections of code would be in the same Similarity Group, but the third is not structurally similar to either. Therefore, it would be in its own group, and does not have a matching structure with another IRSimilarityCandidate in the program. Application: Visualizing Similarity One potential application is simply exporting any similarities found for viewing by a user. LLVM-Sim is a tool that accepts a module, and passes them to the IRSimilarityIdentifier and retrieves the Similarity Groups for the module. These are then exported to a JSON file that has an entry for each Similarity Group, and a list of ranges for each region of similarity contained in that group. This tool also could be implemented with remarks rather than JSON for faster, more streamlined processing of program information. If a module had a Similarity Group with two sections, instructions 5 to 10 and instructions 15 to 20, and a different Similarity Group contained two sections between instructions 1 to 4 and 21 to 24, the output JSON would be: { “1": [{"s": 5, "e": 10, {“s": 15, "e": 20}], “2": [{"s": 1, "e": 4}, {"s": 21, "e": 24}] } This information could be used to create interfaces to look at, and compare the similarity present in the program, such as in the example interface here: https://reviews.llvm.org/D86974 <https://reviews.llvm.org/D86974>, which has two instances of the program alongside its IR with two different similar sections highlighted. Application: Deduplicating Similarity with Outlining: Alternatively, following identifying the similar sections of the program, the similar regions of code can be deduplicated. The IR Outliner makes use of the Code Extractor and the new Similarity Analysis Pass to remove each of the similar ranges individually, and the new IR Outliner deduplicates the individually extracted functions for each region ensuring that the information needed for each extraction is not lost. Outlining at the IR level can benefit any target architecture and reduce the overall size of a program before any target specific decisions. On the other hand, the Machine Outliner in LLVM runs after the register allocator and needs to be implemented for each target separately. The concept of code structural similarity is easier to implement in a general way at the IR level. The IR Outliner is off by default. There is a flag -mllvm -ir-outliner that will use the defined cost model to attempt to outline sections of code that will lead to an overall decrease in the size of the program. There is also a debugging option -mllvm -ir-outliner-no-cost that will outline the Similarity Groups greedily based on how many instructions will be removed, but does not consider the number of instructions added. There are more details about the implementation of the IR Outliner at the end of the RFC. Results: The LLVM Test Suite, compiled with the IR Outliner on top of -Oz, has seen code size reductions of up to 60% and an average of 1% code size reduction for both X86 and AArch64. In these tests, the IR Outliner is placed as the last stage before code generation for the given target. There are some regressions. These are due to the fact that our cost model is not as accurate at the assembly code generation phase. It can be difficult to estimate how many instructions will be added during code generation at the IR level. For instance, if many arguments need to be created to handle constants, the decisions when creating a call at the assembly level are not directly obvious. So, our cost model may estimate that fewer instructions are added than actually need to be. Using target hooks may be a way to address this issue to get a better sense of what will happen during code generation than a general framework for calling conventions. CTMark Code Size (x86_64): Test |Original| With | Reduction | -Oz |Outlining| ----------------------------------------------------- consumer-typeset.test | 484566 | 458642 | -5.3% SPASS.test | 490445 | 470669 | -4.0% sqlite3.test | 525902 | 512813 | -2.5% clamscan.test | 569405 | 567437 | -0.3% lencod.test | 762084 | 759454 | -0.3% bullet.test | 779072 | 779072 | 0.0% kc.test | 482848 | 482848 | 0.0% tramp3d-v4.test | 708712 | 708712 | 0.0% pairlocalalign.test | 485246 | 485462 | +0.0% 7zip-benchmark.test | 982651 | 983563 | +0.1% Geomean difference: -1.3% Notable Decreases: GCC-C-execute-20041011-1.test: 6600 before, 2343 after, -64.5%. This test uses a lot of macros, and the outliner picks up on the similarity of the same set of inserted instructions. frame_layout: 113631 before, 98110 after, -13.7%. This test has a lot of repetitive and repeated instructions performed on different sets of arguments, making it a good candidate for outlining. Related Work: The previous RFC (RFC: PT.2 Add IR level interprocedural outliner for code size. <http://lists.llvm.org/pipermail/llvm-dev/2017-September/117153.html>) and review (D53942: IR Outliner Pass <https://reviews.llvm.org/D53942>) discuss similar techniques for outlining identical instructions at the machine code level. This patch uses a more general concept of similarity and works at the IR Level instead. Both share the Suffix Tree data structure (see Interprocedural MIR-level outlining pass <http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html>) to identify “outlinable” code sequences within a module. The framework could also be extended to encompass the Machine Outliner, so a MachineCodeSimilarityIdentifier that feeds the MachineOutliner could be used rather than having all of the functionality inside the one MachineOutliner pass. Similarity Identification Implementation: IR Outliner Implementation: - Cost Model: Using information such as region size, inputs, and outputs, across all the candidates, the arguments that need to be passed to the outlined function are found. From this, estimates are made for: How many instructions will need to be added to make a call to the function. How many instructions will be removed by inserting the call. How many instructions will be added by creating the new function due to argument handling. How many instructions will be added from stack instructions at call site. How many instructions will be added from stack instructions inside call. Then the outlined function is only created if the number of instructions added is less than the number of instructions removed. Right now, this cost model was developed through a general sense of what happens in most calling conventions, and is conservative. Target hooks would allow for a target specific sense of how large each instruction is, so better sense how many machine instructions are being outlined could be developed. It would also give insight into the particular calling convention in use. This would allow for a more accurate cost model, and find more opportunities for outlining. - Extraction and Consolidation: If it is determined that outlining all of the IRSimilarityCandidates in a Similarity Group will add fewer instructions than it removes, each candidate's block is extracted using the the Code Extractor. The contents of one of the functions is placed in a new overall function, that has input and outputs depending on the arguments needed for each region, as determined by the Code Extractor. When the consolidation is performed, anything that is not the same across each section, like constants, are extracted into arguments as well. If the arguments are large, they are current extracted directly into the function arguments as well and are not passed by reference. As an example of extraction and consolidation, if the following two blocks are extracted, which each have a different constant in the second operand of the first add, %1 = add i32 %a, 10 %2 = add i32 %1, %a %1 = add i32 %a, 11 %2 = add i32 %1, %a The extracted function would be: define internal @outlined_ir_func(i32 %a, i32 %b) { entry: %1 = add i32 %a, %b %2 = add i32 %1, %a ret void } And calls of: call void @outlined_ir_func(i32 %a, 10) call void @outlined_ir_func(i32 %a, 11) For values that are used outside of the region, but are created inside the region, a register that has a pointer to a location is passed as an argument and the data is stored into that register. However, between different similar sections, the values used for outputs are not necessarily the same. If this is the case, a block for each set of possible outputs needed is added, with a switch statement to different sections, controlled by an extra argument. For example, if the following two blocks were extracted, which each have a different item used outside of the similar section %1 = add i32 %a, %b %2 = add i32 %1, %a %3 = mul i32 %1, %1 %4 = add i32 %a, %b %5 = add i32 %5, %a %6 = div i32 %5, %5 The extracted function would be: define internal @outlined_ir_func(i32 %a, i32 %b, i32* %c, i32 %d) { entry: %1 = add i32 %a, %b %2 = add i32 %1, %a switch %d, label %final [ i32 0, label %block1 i32 1, label %block2 ] block1: store i32 %1, %c block2: store i32 %2, %c } And calls of call void @outlined_ir_func(i32 %a, i32 %b, i32* %c, 0) %val = load i32, i32* %c %3 = mul i32 %val, %val call void @outlined_ir_func(i32 %a, i32 %b, i32* %c, 1) %val1 = load i32, i32* %c %6 = div i32 %val1, %val1 Once there is complete function, the call sites are replaced with the new function. If a function does not require a certain argument it is simply replaced with a null value. For instance, in the example above, if there was a third region did not need any sort of output value the call would be: call void @outlined_ir_func(i32 %a, i32 %b, i32* nullptr, -1) Using a negative one for the switch statement, the program simply falls through to the ending block. Implementation in Many Patches: Each of these patches includes tests. But the minimum number of patches needed to see outlining are the Instruction Mapper, Candidate, Identifier, and Wrapper pass paired with the initial setup and extraction with the input consolidation. These are current on Phabricator and the links are included. Similarity Identification Analysis Pass: Instruction Mapper: IRInstructionMapper: Adding initial mapping of instructions to unsigned integers Link: https://reviews.llvm.org/D86968 <https://reviews.llvm.org/D86968> Link ilist Parent: https://reviews.llvm.org/D86969 <https://reviews.llvm.org/D86969> IRInstructionCandidate: IRInstructionCandidate: Adding initial candidate structural code, with no special cases for commutativity. Link Basic: https://reviews.llvm.org/D86970 <https://reviews.llvm.org/D86970> Link Structural Comparison: https://reviews.llvm.org/D86971 <https://reviews.llvm.org/D86971> IRSimilarityIdentifier: IRSimilarityIdentifier: Add Suffix Tree processing and grouping candidates by structural similarity. Link: https://reviews.llvm.org/D86972 <https://reviews.llvm.org/D86972> WrapperPass: Wrapper Pass: Adding a wrapper and printer pass for the IRSimilarityIdentifier. Link: https://reviews.llvm.org/D86973 <https://reviews.llvm.org/D86973> Extra Features: - Commutativity IRSimilarityCandidate: Adding flexibility for commutative instructions in structural comparison. Predicate Isomorphism IRInstructionMapper/IRSimilarityCandidate: Adding flexibility for isomorphic predicate instructions in structural comparison. LLVM-Sim: LLVM-Sim Basic: Process multiple files using LLVM-Sim and give results of the identifier to a JSON File. Link: https://reviews.llvm.org/D86974 <https://reviews.llvm.org/D86974> IROutliner: Extraction: Initial Setup and Extraction: Extract the different regions individually Link: https://reviews.llvm.org/D86975 <https://reviews.llvm.org/D86975>Consolidation: IR Outliner: Adding checks to exclude certain functions that either cannot be outlined, or require extra work to outline. Link: https://reviews.llvm.org/D86976 <https://reviews.llvm.org/D86976> IROutliner Only Inputs: Extract only regions that require inputs, and consolidate each group into one outlined function. Link Extract: https://reviews.llvm.org/D86977 <https://reviews.llvm.org/D86977> Link Consolidate: https://reviews.llvm.org/D86978 <https://reviews.llvm.org/D86978> IROutliner Constants: Elevate constants to arguments when they are not the same in each region. IROutliner Assumes: Handles the special case of llvm.assumes which are removed from extracted regions. IROutliner Sunken Allocas: Handles the case of objects with their lifetimes entirely included in the region, and elevates them to arguments. IROutliner Outputs: Handles the case of multiple different output schemes in different regions. IROutliner Reduce Output Blocks: remove duplicate output scheme plots for output registers in outlined functions. Miscellaneous Features: IROutliner Attribute Merging: Makes sure that function attributes are merged consistently. IROutliner Attribute Compatibility: Adds an option to turn on function attribute incompatibilities for items like sanitizing memory for outlined function. IROutliner DebugInfo: Makes sure that debug info is consistent for outlined functions. IROutliner Cost Model: Adds a cost model, so that reductions are prioritized rather than what comes first in terms of sheer code size outlined. IROutliner with OptRemarks: Adding OptRemarks to the outliner. IROutliner with SwiftError: Adding support for swifterror outlined parameters IROutliner with LinkOnceODR: Adding flag to outline from LinkOnceODR functions. Any feedback, comments, and discussion are welcome and appreciated! Thank You, Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200901/8dd541a4/attachment-0001.html>
Neil Nelson via llvm-dev
2020-Sep-02 02:24 UTC
[llvm-dev] [RFC] Framework for Finding and Using Similarity at the IR Level
This looks like a very worthwhile project. Often we are looking for faster execution with code size being a secondary issue though some uses will have the reverse priority. Hopefully, the project will provide execution times along with code-size reductions. Neil Nelson On 9/1/20 3:29 PM, Andrew Litteken via llvm-dev wrote:> Hello, > > I’m Andrew Litteken, and I am working on a framework for defining, > detecting, and deduplicating similar code sections at the IR level. > > Programmers can introduce sections of code that perform nearly the > same operation. By copying and pasting code throughout a code base, > or using a piece of code as a reference to create a new piece of code > that has nearly the same structure redundancies are inadvertently > introduced throughout programs. Furthermore, compilers can introduce > similarity through sets of instructions that require the same code > generation strategies, or optimizing to the same patterns. > > For example these two pieces of code: > > int fn(const std::vector<int> &myVec) { > for (auto it = myVec.begin(), > et = myVec.end(); it != et; ++it) { > if (*it & 1) > return 0; > } > return 1; > } > > And > > int fn(const std::vector<int> &myVec) { > for (const int &x : myVec) { > if (x % 2) > return 1; > } > return 0; > } > > have a conditional that is finding whether the variable (it and x > respectively) is equal to 1. And, they generate the same piece of IR > code for the conditional: > > %11 = load i32, i32* %10, align 4 > %12 = and i32 %11, 1 > %13 = icmp eq i32 %12, 0 > %14 = getelementptr inbounds i32, i32* %10, i64 1 > br i1 %13, label %7, label %15 > > Such sections of code can be called similar. Two sections of code are > considered similar when they perform the same operations on a given > set of inputs. However, this does not require identity. This > particular definition allows for the same sequences of instructions to > have different operand names. For example: > > %1 = add i32 %a, 10 > %2 = add i32 %a, %1 > %3 = icmp slt icmp %1, %2 > > and > > %1 = add i32 11, %a > %2 = sub i32 %a, %1 > %3 = icmp sgt icmp %2, %1 > > are clearly not identical. There are different constants, the > comparison operator is different and there are flipped operands. But > if the constant 10, the constant 11, and register %a are considered > inputs inputs into these sections, then the two regions could reduce > to be: > > %1 = add i32 %input1, %input2 > %2 = sub %input2 %a, %1 > %3 = icmp slt icmp %1, %2 > > Ultimately, these sets of operations that are performing the same > action, even though the operands are different. This patch focuses on > this sort of “structural” similarity. > > However, this concept can be easily generalized. For example, two > sections of code could be considered similar when they contain the > same instructions, but are not necessarily used in the same order. > Yet, because of the way the operands are used in this different > ordering, the instructions still compute the same results for the > given inputs. > > If these similarities are recognized, it could offer improvements > towards reducing code size, analyzing redundancies in a large project, > or creating tools to help refactor code for a programmer. > > This new framework offers an interface to detect similarity at the IR > level. It has an internal representation that can be accessed via an > analysis, allowing for easy access by other transformation and > analysis passes that want to leverage detected similarity within a > program. It can also be easily serialized into a file format like > JSON, or as OptRemarks. In the future, a more advanced notion of > similarity could be used, and would it be relatively simple to swap a > new detection algorithm into the similarity framework. > > At present, there have been two different applications developed on > top of this framework. The first is a similarity visualizer tool > called LLVM-Sim that can ingest a module, and uses the framework to > find the similarity in that module, and output a report for a > programmer in a serialized form, like mentioned above. The second is > an IR Outliner using the analysis pass as the basis for the outlining > candidates. I will first talk briefly about the framework before > moving into the two applications. > > *Finding Similarity:* > > The framework for finding similarity is centered around adding an > analysis pass that finds similar regions of code that do not cross > basic blocks, and groups them into different groups of similarity > based on their structure. These are then made available to other > passes, such as an outliner, or a tool, such as LLVM-Sim. > > In more detail, finding similarity works by hashing instructions to > integers, and placing them in a long list. The list can be thought of > as a string, with each unsigned integer acting as a character. This > string can then be processed by a Suffix Tree to find repeated > substrings in the program. Following this, the instances of each > repeated substring are analyzed for structural similarity. In this > case, if a one-to-one mapping for the operands of two different > instances of a repeated substring can be found, so that each set of > operands is operated on in the same way, then those two regions of > instructions are considered to be structurally similar to one > another. These tasks are handled by three new classes: The > IRInstructionMapper, the IRSimilarityCandidate, and the > IRSimilarityIdentifier. > > *Finding Similarity Implementation:* > > *IRInstructionMapper:* > Since matching instructions that are identical does not provide that > much information when looking for structural similarity, the value of > the instruction, and the value of the operands for most instructions > are not considered. Instead only the opcode, types and parameters are > considered. Ultimately, if the opcode and types match, then the > instructions are considered to be performing the same operation. > However, for some instructions like shufflevector, or getelementptr > instructions, it is ensured that the parameters match, in keeping with > the “isSameOperationAs” instruction method. Each unique instruction > via this scheme is mapped to an unsigned integer. Creating the > mapping of the entire program to integers treats the program like a > string with each integer as a character, and allows the use of data > structures such as the Suffix Tree (recently refactored into > LibSupport in review D80586 <https://reviews.llvm.org/D80586>) to > process the string for repeated sequences of instruction in program. > > This maps instructions such as > %1 = add i32 %a, 1 > %2 = add i32 %b, %c > To the same values as they have the same opcode and same type, and > instructions such as > %1 = add i32 %a, 1 > %2 = add i64 %b, %c > %3 = mul i64 %b, %c > Are mapped to different values, since they have different opcodes and > types. > > *IRSimilarityIdentifier:* > The list of integers representing the program is then passed to the > Suffix Tree, which can quickly find repeated substrings, and creates > an IRSimilarityCandidate for each instance of these repeated > substring. The IRSimilarityCandidate provides an interface for > comparing the operand use and instructions of one > IRSimilarityCandidate to another to determine structural similarity. > > Structural similarity is determined by attempting to find a one-to-one > mapping from each register in one candidate to each register in > another. If this mapping is found, then the two candidates are > performing the same operations on a given set of inputs and are > structurally similar. For example: > > %1 = add %2, %3 > %4 = add %1, %2 > > Is equivalent to > > %a = add %b, %c > %d = add %b, %a > > And the IRSimilarityCandidates containing these sections of code would > be considered structurally similar. But > > %1 = add %2, %3 > %4 = add %1, %2 > > Is not equivalent to > > %a = add %b, %c > %d = add %a, %e. > > Since the use of %2 in the first section does not match the use of > %b in the second. So the IRSimilarityCandidates containing these > sections would not be considered structurally similar. > > The IRSimilarityIdentifier takes the IRSimilarityCandidates created > from the instances of the repeated substrings and uses the > IRSimilarityCandidate’s interface, described above, to determine which > instances of a given repeated substring are structurally similar to > one another. > > Once it has been determined which instances of a repeated substring > that are structurally similar, sets of IRSimilarityCandidates that > have been found to be compatible are grouped together (a Similarity > Group). The IRSimilarityCandidates contained in each group are > structurally similar to one another. In terms of the example above, > the first two sections of code would be in the same Similarity Group, > but the third is not structurally similar to either. Therefore, it > would be in its own group, and does not have a matching structure with > another IRSimilarityCandidate in the program. > > *Application: Visualizing Similarity* > > One potential application is simply exporting any similarities found > for viewing by a user. LLVM-Sim is a tool that accepts a module, and > passes them to the IRSimilarityIdentifier and retrieves the Similarity > Groups for the module. These are then exported to a JSON file that > has an entry for each Similarity Group, and a list of ranges for each > region of similarity contained in that group. This tool also could be > implemented with remarks rather than JSON for faster, more streamlined > processing of program information. > > If a module had a Similarity Group with two sections, instructions 5 > to 10 and instructions 15 to 20, and a different Similarity Group > contained two sections between instructions 1 to 4 and 21 to 24, the > output JSON would be: > > { > “1": [{"s": 5, "e": 10, {“s": 15, "e": 20}], > “2": [{"s": 1, "e": 4}, {"s": 21, "e": 24}] > } > > This information could be used to create interfaces to look at, and > compare the similarity present in the program, such as in the example > interface here: https://reviews.llvm.org/D86974, which has two > instances of the program alongside its IR with two different similar > sections highlighted. > > *Application: Deduplicating Similarity with Outlining:* > > Alternatively, following identifying the similar sections of the > program, the similar regions of code can be deduplicated. The IR > Outliner makes use of the Code Extractor and the new Similarity > Analysis Pass to remove each of the similar ranges individually, and > the new IR Outliner deduplicates the individually extracted functions > for each region ensuring that the information needed for each > extraction is not lost. > > Outlining at the IR level can benefit any target architecture and > reduce the overall size of a program before any target specific > decisions. On the other hand, the Machine Outliner in LLVM runs after > the register allocator and needs to be implemented for each target > separately. The concept of code structural similarity is easier to > implement in a general way at the IR level. > > The IR Outliner is off by default. There is a flag > -mllvm -ir-outliner that will use the defined cost model to attempt to > outline sections of code that will lead to an overall decrease in the > size of the program. There is also a debugging option > -mllvm -ir-outliner-no-cost that will outline the Similarity Groups > greedily based on how many instructions will be removed, but does not > consider the number of instructions added. > > There are more details about the implementation of the IR Outliner at > the end of the RFC. > > *Results:* > The LLVM Test Suite, compiled with the IR Outliner on top of -Oz, has > seen code size reductions of up to 60% and an average of 1% code size > reduction for both X86 and AArch64. In these tests, the IR Outliner > is placed as the last stage before code generation for the given > target. There are some regressions. These are due to the fact that > our cost model is not as accurate at the assembly code generation > phase. It can be difficult to estimate how many instructions will be > added during code generation at the IR level. For instance, if many > arguments need to be created to handle constants, the decisions when > creating a call at the assembly level are not directly obvious. So, > our cost model may estimate that fewer instructions are added than > actually need to be. Using target hooks may be a way to address this > issue to get a better sense of what will happen during code generation > than a general framework for calling conventions. > > CTMark Code Size (x86_64): > > Test |Original| With | Reduction > | -Oz |Outlining| > ----------------------------------------------------- > consumer-typeset.test | 484566 | 458642 | -5.3% > SPASS.test | 490445 | 470669 | -4.0% > sqlite3.test | 525902 | 512813 | -2.5% > clamscan.test | 569405 | 567437 | -0.3% > lencod.test | 762084 | 759454 | -0.3% > bullet.test | 779072 | 779072 | 0.0% > kc.test | 482848 | 482848 | 0.0% > tramp3d-v4.test | 708712 | 708712 | 0.0% > pairlocalalign.test | 485246 | 485462 | +0.0% > 7zip-benchmark.test | 982651 | 983563 | +0.1% > Geomean difference: -1.3% > > Notable Decreases: > GCC-C-execute-20041011-1.test: 6600 before, 2343 after, -64.5%. > This test uses a lot of macros, and the outliner picks up on the > similarity of the same set of inserted instructions. > frame_layout: 113631 before, 98110 after, -13.7%. > This test has a lot of repetitive and repeated instructions performed > on different sets of arguments, making it a good candidate for outlining. > > *Related Work:* > The previous RFC (RFC: PT.2 Add IR level interprocedural outliner for > code size. > <http://lists.llvm.org/pipermail/llvm-dev/2017-September/117153.html>) > and review (D53942: IR Outliner Pass > <https://reviews.llvm.org/D53942>) discuss similar techniques for > outlining identical instructions at the machine code level. This patch > uses a more general concept of similarity and works at the IR Level > instead. Both share the Suffix Tree data structure (see > Interprocedural MIR-level outlining pass > <http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html>) to > identify “outlinable” code sequences within a module. > > The framework could also be extended to encompass the Machine > Outliner, so a MachineCodeSimilarityIdentifier that feeds the > MachineOutliner could be used rather than having all of the > functionality inside the one MachineOutliner pass. > > *Similarity Identification Implementation:* > > *IR Outliner Implementation:* > *- Cost Model:* > Using information such as region size, inputs, and outputs, across all > the candidates, the arguments that need to be passed to the outlined > function are found. From this, estimates are made for: > > * How many instructions will need to be added to make a call to the > function. > * How many instructions will be removed by inserting the call. > * How many instructions will be added by creating the new function > due to argument handling. > * How many instructions will be added from stack instructions at > call site. > * How many instructions will be added from stack instructions > inside call. > > Then the outlined function is only created if the number of > instructions added is less than the number of instructions removed. > > Right now, this cost model was developed through a general sense of > what happens in most calling conventions, and is conservative. Target > hooks would allow for a target specific sense of how large each > instruction is, so better sense how many machine instructions are > being outlined could be developed. It would also give insight into the > particular calling convention in use. This would allow for a more > accurate cost model, and find more opportunities for outlining. > > *- Extraction and Consolidation:* > If it is determined that outlining all of the IRSimilarityCandidates > in a Similarity Group will add fewer instructions than it removes, > each candidate's block is extracted using the the Code Extractor. The > contents of one of the functions is placed in a new overall function, > that has input and outputs depending on the arguments needed for each > region, as determined by the Code Extractor. When the consolidation > is performed, anything that is not the same across each section, like > constants, are extracted into arguments as well. If the arguments are > large, they are current extracted directly into the function arguments > as well and are not passed by reference. > > As an example of extraction and consolidation, if the following two > blocks are extracted, which each have a different constant in the > second operand of the first add, > > %1 = add i32 %a, 10 > %2 = add i32 %1, %a > > %1 = add i32 %a, 11 > %2 = add i32 %1, %a > > The extracted function would be: > > define internal @outlined_ir_func(i32 %a, i32 %b) { > entry: > %1 = add i32 %a, %b > %2 = add i32 %1, %a > ret void > } > And calls of: > > call void @outlined_ir_func(i32 %a, 10) > call void @outlined_ir_func(i32 %a, 11) > > For values that are used outside of the region, but are created inside > the region, a register that has a pointer to a location is passed as > an argument and the data is stored into that register. However, > between different similar sections, the values used for outputs are > not necessarily the same. If this is the case, a block for each set > of possible outputs needed is added, with a switch statement to > different sections, controlled by an extra argument. > > For example, if the following two blocks were extracted, which each > have a different item used outside of the similar section > > %1 = add i32 %a, %b > %2 = add i32 %1, %a > %3 = mul i32 %1, %1 > > %4 = add i32 %a, %b > %5 = add i32 %5, %a > %6 = div i32 %5, %5 > The extracted function would be: > > define internal @outlined_ir_func(i32 %a, i32 %b, i32* %c, i32 %d) { > entry: > %1 = add i32 %a, %b > %2 = add i32 %1, %a > switch %d, label %final [ > i32 0, label %block1 > i32 1, label %block2 > ] > block1: > store i32 %1, %c > block2: > store i32 %2, %c > } > > And calls of > > call void @outlined_ir_func(i32 %a, i32 %b, i32* %c, 0) > %val = load i32, i32* %c > %3 = mul i32 %val, %val > > call void @outlined_ir_func(i32 %a, i32 %b, i32* %c, 1) > %val1 = load i32, i32* %c > > %6 = div i32 %val1, %val1 > > Once there is complete function, the call sites are replaced with the > new function. If a function does not require a certain argument it is > simply replaced with a null value. For instance, in the example > above, if there was a third region did not need any sort of output > value the call would be: > > call void @outlined_ir_func(i32 %a, i32 %b, i32* nullptr, -1) > > Using a negative one for the switch statement, the program simply > falls through to the ending block. > > *Implementation in Many Patches:* > > Each of these patches includes tests. But the minimum number of > patches needed to see outlining are the Instruction Mapper, Candidate, > Identifier, and Wrapper pass paired with the initial setup and > extraction with the input consolidation. These are current on > Phabricator and the links are included. > > *Similarity Identification Analysis Pass:* > */Instruction Mapper:/* > > * IRInstructionMapper: Adding initial mapping of instructions to > unsigned integers > o Link: https://reviews.llvm.org/D86968 > o Link ilist Parent: https://reviews.llvm.org/D86969 > > */IRInstructionCandidate:/* > > * IRInstructionCandidate: Adding initial candidate structural code, > with no special cases for commutativity. > o Link Basic: https://reviews.llvm.org/D86970 > o Link Structural Comparison: https://reviews.llvm.org/D86971 > > */IRSimilarityIdentifier:/* > > * IRSimilarityIdentifier: Add Suffix Tree processing and grouping > candidates by structural similarity. > o Link: https://reviews.llvm.org/D86972 > > */WrapperPass:/* > > * Wrapper Pass: Adding a wrapper and printer pass for the > IRSimilarityIdentifier. > o Link: https://reviews.llvm.org/D86973 > > */Extra Features:/* > - Commutativity IRSimilarityCandidate: Adding flexibility for > commutative instructions in structural comparison. > > * Predicate Isomorphism IRInstructionMapper/IRSimilarityCandidate: > Adding flexibility for isomorphic predicate instructions in > structural comparison. > > > *LLVM-Sim:* > > * LLVM-Sim Basic: Process multiple files using LLVM-Sim and give > results of the identifier to a JSON File. > o Link: https://reviews.llvm.org/D86974 > > *IROutliner:* > */Extraction:/* > > * Initial Setup and Extraction: Extract the different regions > individually > o Link: https://reviews.llvm.org/D86975 > > */Consolidation:/* > > * IR Outliner: Adding checks to exclude certain functions that > either cannot be outlined, or require extra work to outline. > o Link: https://reviews.llvm.org/D86976 > * IROutliner Only Inputs: Extract only regions that require inputs, > and consolidate each group into one outlined function. > o Link Extract: https://reviews.llvm.org/D86977 > o Link Consolidate: https://reviews.llvm.org/D86978 > * IROutliner Constants: Elevate constants to arguments when they are > not the same in each region. > * IROutliner Assumes: Handles the special case of llvm.assumes which > are removed from extracted regions. > * IROutliner Sunken Allocas: Handles the case of objects with their > lifetimes entirely included in the region, and elevates them to > arguments. > * IROutliner Outputs: Handles the case of multiple different output > schemes in different regions. > * IROutliner Reduce Output Blocks: remove duplicate output scheme > plots for output registers in outlined functions. > > */Miscellaneous Features:/* > > * IROutliner Attribute Merging: Makes sure that function attributes > are merged consistently. > * IROutliner Attribute Compatibility: Adds an option to turn on > function attribute incompatibilities for items like sanitizing > memory for outlined function. > * IROutliner DebugInfo: Makes sure that debug info is consistent for > outlined functions. > * IROutliner Cost Model: Adds a cost model, so that reductions are > prioritized rather than what comes first in terms of sheer code > size outlined. > * IROutliner with OptRemarks: Adding OptRemarks to the outliner. > * IROutliner with SwiftError: Adding support for swifterror outlined > parameters > * IROutliner with LinkOnceODR: Adding flag to outline from > LinkOnceODR functions. > > > Any feedback, comments, and discussion are welcome and appreciated! > > Thank You, > Andrew > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200901/d22aa72c/attachment-0001.html>
Renato Golin via llvm-dev
2020-Sep-02 10:04 UTC
[llvm-dev] [RFC] Framework for Finding and Using Similarity at the IR Level
Indeed, an awesome project and an excellent report! Code size doesn't really get much attention, so the level of detail and the strong roadmap is refreshing. Hopefully, the project will provide execution times along with code-size> reductions. >I doubt it. Outlining will (almost?) always make for slower code due to a lot more calls being made. But that's ok for embedded targets, that don't have strong speculative machinery. Also, if the program fits versus doesn't, then it will run infinitely faster. :D Kristof, do you know who at Arm would be good to review those patches? cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200902/85ab89a4/attachment.html>