Daniel Dunbar
2009-Jul-23 03:54 UTC
[LLVMdev] [POTENTIAL API CHANGE] Improving llvm::Value getName()/setName()
Following up on my introduction of the StringRef type, I wanted to clean up the API for getting and setting the name of an llvm::Value. There are a number of problems with the current API: 1. There are too many ...Name methods. 2. getName() returns an std::string, which is expensive. 3. getName() is frequently used to derive new names, which are passed to setName(). Currently this uses std::string concatenation. 4. In a Release-Asserts build, we usually don't name values, but we still end up with the overhead of creating and destroying strings we either don't use, or don't care about. Problem #1 can be addressed solely with the StringRef type, but it requires either updating many clients (which frequently do something like (getName() + ".not")), or making StringRef transparently convert to an std::string, which hides inefficient code. Chris and I discussed this problem and came up with what I think is a nice and tidy solution, based on introducing a new type for handling the results of string concatenation. I tried to do an extensive evaluation of the results to make sure the approach was working & worth it, and in true salesman style, I'm going to give you the benefits first before the implementation! :) On 64-bit Darwin, using gcc-4.2 to compile LLVM; and using instcombine.bc (the .bc for InstructionCombining.bc) as a test case: Code size: -1% on opt (from 5315144 to 5249024) Memory allocation: -10% calls to malloc (from 944090 to 832970), for 'opt -f -o /dev/null -std-compile-opts instcombine.bc' Performance: -1% for the preceeding opt command. Compile Time: Compile time for InstructionCombining.cpp got slightly faster, although gcc's memory usage increased by ~15%. ---- The idea is to introduce a new class (Twine) which represents the result of string concatenation. The class is essentially a lightweight rope data structure, where the nodes are typed pointers to C++ temporaries. Thus a Twine doesn't require any heap allocation during construction, and we can defer the conversion of integers into a string until the result of the Twine is used -- in the case that the result is never needed, the conversion is avoided completely. Additionally, by deferring the generation of the concatenated string, we reduce the amount of code that is generated at the place where the concatenation is done and coalesce it into one place. Finally, we can provide a special Twine value which always concatenates to form an empty string -- llvm::Value can return this value when it has no name, and any clients which derive names from this value will automatically avoid doing unused string concatenation, uitostr conversion, etc. As with StringRef, the main price we end up paying is increased complexity and the potential for unsafe memory operations (the Twine is very tied to the ability to keep pointers to memory it doesn't own). The nice part of this API change is that it does not impact the majority of clients; the StringRef and Twine classes combine so that clients can write code like 'setName(getName() + ".neg")' just as before. To keep API/high-level discussion separate from implementation details, I will send the patches separately to llvm-commits for review. - Daniel
Reasonably Related Threads
- [LLVMdev] Fwd: Problem getting LoopInfo inside non-LoopPass
- [LLVMdev] Fwd: Problem getting LoopInfo inside non-LoopPass
- [LLVMdev] Fwd: Problem getting LoopInfo inside non-LoopPass
- [LLVMdev] Fwd: Problem getting LoopInfo inside non-LoopPass
- [LLVMdev] Fwd: Problem getting LoopInfo inside non-LoopPass