On Apr 16, 2:13 am, Dan Gohman <goh... at apple.com> wrote:> Hi Gabor, > > Can you provide performance data for this? I'd > like to know what affect these changes have on > compile time.Hi Dan, Unfortunately, no. I can feed you with some speculation, though, see below. The reason why I cannot do measurements (at the moment) is that - I have no experience with performance measurements of llvm compile time, - I have no good test cases that would provide the data interesting for you, - I have no machine that can run Release-mode tests my PPC mac mini at home runs Tiger --> Xcode 2.4.1 is broken for Release mode [1] I have access to a beefy Mac Pro, but it also miscompiles Release for the same reason. Also it is an image-processing machine in production, so I can only run "nice make ..." I have access to a Solaris 10 workstation, but I really do not want to build llvm-gcc on this one. It also keeps my /home on NFS, which has speed and quota problems. But: It is very simple to do your measurements yourself (and I'd love to hear the results!) : cd llvm svn switch http://llvm.org/svn/llvm-project/llvm/branches/ggreif/use-diet . make rebuild llvm-gcc, etc. retest. And now here is my educated speculation: There are 2 things that became slower 1) Use::getUser() 2) Use::get/set due to tagging. The former is seldom called: $ find lib -name "*.cpp" | xargs grep "getUser(" | wc -l 41 We could audit those to make sure that no unnecessary calls are done. But the getUse() algorithm is not sooo inefficient, anyway. The second is counterbalanced with a faster access to the Use object in most cases: With exception of PHINode and SwitchInst, the getOperand() function (if called on a specialized "this" pointer) does a "this"-relative access instead of getting OperandList pointer first and going thru that. This was the case before for fixed-arity instructions, but now it applies to variadic ones too that cannot perform operand list resizing. Some things got faster, like 1) getOperand access on (say) CallInst is now fetching operands from relative to "this". Also, there is no "new Use[n]" allocations (and delete []) for these any more. 2) no need to maintain Use::U 3) data-cache related gains on smaller Use objects (not yet activated) So, my idea is that these changes are performance neutral. There are potentially more speedups to be realized: - indexing operands in the decreasing direction eliminates the getNumOperands() call in getOperand() for variadic arity instructions. - shaving off (unneeded) operand related administrative members from User. I hope that this is interesting, but I'd like to ask anybody who is comfortable with performance testing to help provide some hard data :-) Cheers, Gabor [1] <http://lists.cs.uiuc.edu/pipermail/llvmdev/2008-March/ 013436.html>> > Thanks, > > Dan > > On Apr 15, 2008, at 3:32 PM, Gabor Greif wrote: > > > > > Hi All, > > > here comes the patch for the second wave of Use class size reduction. > > > I have included all the machinery that is needed, and it is > > *active*. The User* inside of Use is even sometimes NULL, > > but the algorithm is able to recover it. > > If there is a non-null User* present, then I am > > asserting that it equals the computed value. > > > I did not receive feedback for the algorithmic part yet, > > so I assume, you are comfortable with it. > > > Unfortunately I had to introduce a new GlobalVariable::Create > > mechanism (I hoped to have nailed all in wave 1, but life is cruel). > > I will submit scripts for the easy conversion of external projects > > like the last time. > > > I have split this review material into 3 files, corresponding to > > - essential changes, > > - related changes in .cpp files, > > - collateral nitty-gritty, mostly mechanical stuff. > > > Outlook: The actual removal of the Use::U member > > will happen in wave 3 after this stuff is merged to > > trunk. I do not expect any problems. > > Btw., I have already performed a test merge > > and it also passes all tests (deja, clang, test-suite). > > > Cheers, > > > Gabor > > > --------------------- STATS ------------------ > > ggreif$ ls -l wave2* > > -rw-r--r-- 1 ggreif ggreif 79367 Apr 16 00:20 wave2- > > essentials.diff > > -rw-r--r-- 1 ggreif ggreif 51795 Apr 16 00:24 wave2-impl.diff > > -rw-r--r-- 1 ggreif ggreif 25300 Apr 16 00:25 wave2- > > nittygritty.diff > > > ggreif$ wc wave2* > > 2189 9005 79367 wave2-essentials.diff > > 1408 4793 51795 wave2-impl.diff > > 521 1995 25300 wave2-nittygritty.diff > > 4118 15793 156462 total > > > <wave2-essentials.diff><wave2-impl.diff><wave2-nittygritty.diff> >
> > But: It is very simple to do your measurements yourself (and I'd love > to hear the results!) : > > cd llvm > svn switch http://llvm.org/svn/llvm-project/llvm/branches/ggreif/use-diet . > make > > rebuild llvm-gcc, etc. > retest. >Hmm, do not forget to revert the switch after being done: svn switch http://llvm.org/svn/llvm-project/llvm/trunk . Also, you will need to do these conversions to llvm-gcc: ##### tcsh script follows ###### foreach CLASS (GlobalVariable) setenv SUBST1 "s/new $CLASS/${CLASS}::Create/g" setenv SUBST2 "s/new llvm::$CLASS/llvm::${CLASS}::Create/g" foreach i (`find . -name "*.cpp"`) sed -e "$SUBST1" -e "$SUBST2" < $i > $i.2 rm $i mv $i.2 $i end foreach i (`find . -name "*.h"`) sed -e "$SUBST1" -e "$SUBST2" < $i > $i.2 rm $i mv $i.2 $i end end In one place you will have to undo the change (compiler will tell where). You can also experiment with removing User::U (and the related asserts). Cheers, Gabor
On Apr 16, 2008, at 2:50 AM, heisenbug wrote:> > And now here is my educated speculation: > There are 2 things that became slower > 1) Use::getUser() > 2) Use::get/set due to tagging. > > The former is seldom called: > > $ find lib -name "*.cpp" | xargs grep "getUser(" | wc -l > 41The majority of those aren't actually Use::getUser, but on the other hand this grep misses all the users of value_use_iterator::operator*, which is much more popular. Unfortunately, overloaded operators are not the easiest to grep for ;-).> The second is counterbalanced with a faster access to the Use object > in most cases: > With exception of PHINode and SwitchInst, the getOperand() function > (if called on a specialized "this" pointer) does a "this"-relative > access instead of getting OperandList pointer first and going thru > that. This was the case before for fixed-arity instructions, but now > it applies to variadic ones too that cannot perform operand list > resizing. > > Some things got faster, like > 1) getOperand access on (say) CallInst is now fetching operands from > relative to "this". Also, there is no "new Use[n]" allocations (and > delete []) for these any more.This is fine, but technically it's an independent change that could be done without the getUser algorithm changes and tagging.> 2) no need to maintain Use::U> 3) data-cache related gains on smaller Use objects (not yet activated) > > > So, my idea is that these changes are performance neutral. > > There are potentially more speedups to be realized: > - indexing operands in the decreasing direction eliminates the > getNumOperands() call in getOperand() for variadic arity instructions.At a glance, the only getOperand() that I saw that uses getNumOperands() is ReturnInst's, and that actually looks like a bug.> - shaving off (unneeded) operand related administrative members from > User.Which members?> > > I hope that this is interesting, but I'd like to ask anybody who is > comfortable with performance testing to help provide some hard > data :-)I agree that performance testing is not easy and requires resources, but I'm curious what's motivating this project here. My assumption has been that no one would likely go to the extreme of inventing a mini-language encoded in the least significant bits of pointer members in successive structs unless they were pretty desperate for performance or scalability. Dan
On 2008-04-16, at 14:25, Dan Gohman wrote:> On Apr 16, 2008, at 2:50 AM, heisenbug wrote: >> > >> And now here is my educated speculation: >> There are 2 things that became slower >> 1) Use::getUser() >> 2) Use::get/set due to tagging. >> >> The former is seldom called: >> >> $ find lib -name "*.cpp" | xargs grep "getUser(" | wc -l >> 41 > > The majority of those aren't actually Use::getUser, but on the other > hand this grep misses all the users of > value_use_iterator::operator*, which is much more popular. > Unfortunately, overloaded operators are not the easiest to grep > for ;-).Perhaps someone should write some sort of C++ refactoring tool for that! — Gordon
On Apr 16, 2008, at 11:25 AM, Dan Gohman wrote:>> So, my idea is that these changes are performance neutral.I strongly agree with Dan that we need to measure performance to ensure there is no significant performance regression.>> I hope that this is interesting, but I'd like to ask anybody who is >> comfortable with performance testing to help provide some hard >> data :-) > > I agree that performance testing is not easy and requires resources, > but I'm curious what's motivating this project here. My assumption > has been that no one would likely go to the extreme of inventing a > mini-language encoded in the least significant bits of pointer > members in successive structs unless they were pretty desperate for > performance or scalability.Ah, this question is easy: it shrinks the Use class by a word, which reduces the most popular class in the LLVM IR by 25%. This directly makes all binary operators 8 bytes smaller for example, which is a pretty big memory footprint win, and memory footprint translates to dcache efficiency as well. -Chris
On Apr 16, 8:25 pm, Dan Gohman <goh... at apple.com> wrote:> On Apr 16, 2008, at 2:50 AM, heisenbug wrote: > > > > > And now here is my educated speculation: > > There are 2 things that became slower > > 1) Use::getUser() > > 2) Use::get/set due to tagging. > > > The former is seldom called: > > > $ find lib -name "*.cpp" | xargs grep "getUser(" | wc -l > > 41 > > The majority of those aren't actually Use::getUser, but on the > other hand this grep misses all the users of > value_use_iterator::operator*, which is much more popular. > Unfortunately, overloaded operators are not the easiest to > grep for ;-).Hi Dan, thanks for this hint, this makes me to look harder into cost reducing getUser. My current idea is to do a very cheap heuristic inline, which catches 100% of the cases for UnaryInstruction and 50% of the cases for Binary/CmpInstruction, without doing a function call at all.> > > The second is counterbalanced with a faster access to the Use object > > in most cases: > > With exception of PHINode and SwitchInst, the getOperand() function > > (if called on a specialized "this" pointer) does a "this"-relative > > access instead of getting OperandList pointer first and going thru > > that. This was the case before for fixed-arity instructions, but now > > it applies to variadic ones too that cannot perform operand list > > resizing. > > > Some things got faster, like > > 1) getOperand access on (say) CallInst is now fetching operands from > > relative to "this". Also, there is no "new Use[n]" allocations (and > > delete []) for these any more. > > This is fine, but technically it's an independent change that could > be done without the getUser algorithm changes and tagging. > > > 2) no need to maintain Use::U > > 3) data-cache related gains on smaller Use objects (not yet activated) > > > So, my idea is that these changes are performance neutral. > > > There are potentially more speedups to be realized: > > - indexing operands in the decreasing direction eliminates the > > getNumOperands() call in getOperand() for variadic arity instructions. > > At a glance, the only getOperand() that I saw that uses getNumOperands() > is ReturnInst's, and that actually looks like a bug. > > > - shaving off (unneeded) operand related administrative members from > > User. > > Which members?I am thinking of OperandList and NumOperands. In most User subclasses these are trivially recomputable/constant. The catch is only that getting these via a User* is a bit more costly (virtual call or other trick). The question is how often the code calls getOperand thru an unspecialized User* ?> > > > > I hope that this is interesting, but I'd like to ask anybody who is > > comfortable with performance testing to help provide some hard > > data :-) > > I agree that performance testing is not easy and requires resources,I am on my way into this area. Now I have a working Release setup and already in possession of hard numbers, which are pretty encouraging. I see about 1% of speed degradation on a cruel verifier test (2004-09-29-VerifierIsReallySlow.llx) but even there the user-time is well in the cloud of measurements of the regular (non-tagged) build. On all other tests I have seen no degradation so far. And this was even before I moved the tag from the Val to the Pred member of Use.> but I'm curious what's motivating this project here. My assumptionLLVM is intended to be a pretty much target independent technology with a compact delivery format and potentially good startup behaviour. To actually being a viable system for a wide range of machines it must be as memory efficient as possible. The in-memory data structures are needed for the JIT and optimizer and thus they must coexist with the generated machine code for the platform. <Use> seems to be the structure with the currently greatest bang for buck ratio, so I am thinking about a solution to that :-)> has been that no one would likely go to the extreme of inventing a > mini-language encoded in the least significant bits of pointer > members in successive structs unless they were pretty desperate for > performance or scalability.Yep. There are desperate needs. I am in the embedded world. When sometime iPhone/Pod-like devices are supposed to run many multi-megainstruction programs routinely, anybody else will see the difference. Regarding the mini-language, I decided that the alternatives like linear search (see the Kernighan-Ritchie mini-language of null-terminated strings and strlen :-) and map lookup are far more costly, esp. in a multi- threaded environment. For completeness see my musings at <http:// heisenbug.blogspot.com/2008/04/llvm-data-structures-and-putting-use- on.html>.> > DanThanks for your feedback! Cheers, Gabor> > _______________________________________________ > LLVM Developers mailing list > LLVM... at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev