thr3ads.net - llvm dev - [LLVMdev] PATCH: Use size reduction -- wave2 [Apr 2008]

If this information is useful, please help other people find it:
Share via:

heisenbug

2008-Apr-16 09:50 UTC

[LLVMdev] PATCH: Use size reduction -- wave2

On Apr 16, 2:13 am, Dan Gohman <goh... at apple.com>
wrote:> Hi Gabor,
>
> Can you provide performance data for this? I'd
> like to know what affect these changes have on
> compile time.
Hi Dan,

Unfortunately, no. I can feed you with some speculation, though,
see below.

The reason why I cannot do measurements (at the moment) is that
- I have no experience with performance measurements of llvm compile
time,
- I have no good test cases that would provide the data interesting
for you,
- I have no machine that can run Release-mode tests
  my PPC mac mini at home runs Tiger --> Xcode 2.4.1 is broken for
Release mode [1]
  I have access to a beefy Mac Pro, but it also miscompiles Release
for the same reason. Also it is an image-processing machine in
production, so I can only run "nice make ..."
  I have access to a Solaris 10 workstation, but I really do not want
to build llvm-gcc on this one. It also keeps my /home on NFS, which
has speed and quota problems.

But: It is very simple to do your measurements yourself (and I'd love
to hear the results!) :

   cd llvm
   svn switch http://llvm.org/svn/llvm-project/llvm/branches/ggreif/use-diet
.
   make

 rebuild llvm-gcc, etc.
 retest.

And now here is my educated speculation:
There are 2 things that became slower
1) Use::getUser()
2) Use::get/set due to tagging.

The former is seldom called:

$ find lib -name "*.cpp" | xargs grep "getUser(" | wc -l
      41

We could audit those to make sure that no unnecessary calls are done.
But the getUse() algorithm is not sooo inefficient, anyway.

The second is counterbalanced with a faster access to the Use object
in most cases:
  With exception of PHINode and SwitchInst, the getOperand() function
(if called on a specialized "this" pointer) does a
"this"-relative
access instead of getting OperandList pointer first and going thru
that. This was the case before for fixed-arity instructions, but now
it applies to variadic ones too that cannot perform operand list
resizing.

Some things got faster, like
1) getOperand access on (say) CallInst is now fetching operands from
relative to "this". Also, there is no "new Use[n]"
allocations (and
delete []) for these any more.
2) no need to maintain Use::U
3) data-cache related gains on smaller Use objects (not yet activated)

So, my idea is that these changes are performance neutral.

There are potentially more speedups to be realized:
- indexing operands in the decreasing direction eliminates the
getNumOperands() call in getOperand() for variadic arity instructions.
- shaving off (unneeded) operand related administrative members from
User.

I hope that this is interesting, but I'd like to ask anybody who is
comfortable with performance testing to help provide some hard
data :-)

Cheers,

    Gabor


[1] <http://lists.cs.uiuc.edu/pipermail/llvmdev/2008-March/
013436.html>

>
> Thanks,
>
> Dan
>
> On Apr 15, 2008, at 3:32 PM, Gabor Greif wrote:
>
>
>
> > Hi All,
>
> > here comes the patch for the second wave of Use class size reduction.
>
> > I have included all the machinery that is needed, and it is
> > *active*. The User* inside of Use is even sometimes NULL,
> > but the algorithm is able to recover it.
> > If there is a non-null User* present, then I am
> > asserting that it equals the computed value.
>
> > I did not receive feedback for the algorithmic part yet,
> > so I assume, you are comfortable with it.
>
> > Unfortunately I had to introduce a new GlobalVariable::Create
> > mechanism (I hoped to have nailed all in wave 1, but life is cruel).
> > I will submit scripts for the easy conversion of external projects
> > like the last time.
>
> > I have split this review material into 3 files, corresponding to
> > - essential changes,
> > - related changes in .cpp files,
> > - collateral nitty-gritty, mostly mechanical stuff.
>
> > Outlook: The actual removal of the Use::U member
> > will happen in wave 3 after this stuff is merged to
> > trunk. I do not expect any problems.
> > Btw., I have already performed a test merge
> > and it also passes all tests (deja, clang, test-suite).
>
> > Cheers,
>
> >    Gabor
>
> > --------------------- STATS ------------------
> > ggreif$ ls -l wave2*
> > -rw-r--r--   1 ggreif  ggreif  79367 Apr 16 00:20 wave2-
> > essentials.diff
> > -rw-r--r--   1 ggreif  ggreif  51795 Apr 16 00:24 wave2-impl.diff
> > -rw-r--r--   1 ggreif  ggreif  25300 Apr 16 00:25 wave2-
> > nittygritty.diff
>
> > ggreif$ wc wave2*
> >    2189    9005   79367 wave2-essentials.diff
> >    1408    4793   51795 wave2-impl.diff
> >     521    1995   25300 wave2-nittygritty.diff
> >    4118   15793  156462 total
>
> >
<wave2-essentials.diff><wave2-impl.diff><wave2-nittygritty.diff>
>

heisenbug

2008-Apr-16 11:58 UTC

head link

[LLVMdev] PATCH: Use size reduction -- wave2

>
> But: It is very simple to do your measurements yourself (and I'd love
> to hear the results!) :
>
>    cd llvm
>    svn switch
http://llvm.org/svn/llvm-project/llvm/branches/ggreif/use-diet .
>    make
>
>  rebuild llvm-gcc, etc.
>  retest.
>
Hmm, do not forget to revert the switch after being done:

   svn switch http://llvm.org/svn/llvm-project/llvm/trunk .

Also, you will need to do these conversions to llvm-gcc:

##### tcsh script follows ######
foreach CLASS (GlobalVariable)
setenv SUBST1 "s/new $CLASS/${CLASS}::Create/g"
setenv SUBST2 "s/new llvm::$CLASS/llvm::${CLASS}::Create/g"
foreach i (`find . -name "*.cpp"`)
sed -e "$SUBST1" -e "$SUBST2" < $i > $i.2
rm $i
mv $i.2 $i
end
foreach i (`find . -name "*.h"`)
sed -e "$SUBST1" -e "$SUBST2" < $i > $i.2
rm $i
mv $i.2 $i
end
end

In one place you will have to undo the change (compiler will tell
where).

You can also experiment with removing User::U (and the related
asserts).

Cheers,

    Gabor

Dan Gohman

2008-Apr-16 18:25 UTC

head link

[LLVMdev] PATCH: Use size reduction -- wave2

On Apr 16, 2008, at 2:50 AM, heisenbug wrote:>
> And now here is my educated speculation:
> There are 2 things that became slower
> 1) Use::getUser()
> 2) Use::get/set due to tagging.
>
> The former is seldom called:
>
> $ find lib -name "*.cpp" | xargs grep "getUser(" | wc
-l
>      41
The majority of those aren't actually Use::getUser, but on the
other hand this grep misses all the users of
value_use_iterator::operator*, which is much more popular.
Unfortunately, overloaded operators are not the easiest to
grep for ;-).
> The second is counterbalanced with a faster access to the Use object
> in most cases:
>  With exception of PHINode and SwitchInst, the getOperand() function
> (if called on a specialized "this" pointer) does a
"this"-relative
> access instead of getting OperandList pointer first and going thru
> that. This was the case before for fixed-arity instructions, but now
> it applies to variadic ones too that cannot perform operand list
> resizing.
>
> Some things got faster, like
> 1) getOperand access on (say) CallInst is now fetching operands from
> relative to "this". Also, there is no "new Use[n]"
allocations (and
> delete []) for these any more.
This is fine, but technically it's an independent change that could
be done without the getUser algorithm changes and tagging.
> 2) no need to maintain Use::U
> 3) data-cache related gains on smaller Use objects (not yet activated)
>
>
> So, my idea is that these changes are performance neutral.
>
> There are potentially more speedups to be realized:
> - indexing operands in the decreasing direction eliminates the
> getNumOperands() call in getOperand() for variadic arity instructions.
At a glance, the only getOperand() that I saw that uses getNumOperands()
is ReturnInst's, and that actually looks like a bug.
> - shaving off (unneeded) operand related administrative members from
> User.
Which members?
>
>
> I hope that this is interesting, but I'd like to ask anybody who is
> comfortable with performance testing to help provide some hard
> data :-)
I agree that performance testing is not easy and requires resources,
but I'm curious what's motivating this project here. My assumption
has been that no one would likely go to the extreme of inventing a
mini-language encoded in the least significant bits of pointer
members in successive structs unless they were pretty desperate for
performance or scalability.

Dan

Gordon Henriksen

2008-Apr-16 18:51 UTC

head link

[LLVMdev] PATCH: Use size reduction -- wave2

On 2008-04-16, at 14:25, Dan Gohman wrote:
> On Apr 16, 2008, at 2:50 AM, heisenbug wrote:
>>
>
>> And now here is my educated speculation:
>> There are 2 things that became slower
>> 1) Use::getUser()
>> 2) Use::get/set due to tagging.
>>
>> The former is seldom called:
>>
>> $ find lib -name "*.cpp" | xargs grep "getUser(" |
wc -l
>>     41
>
> The majority of those aren't actually Use::getUser, but on the other  
> hand this grep misses all the users of  
> value_use_iterator::operator*, which is much more popular.  
> Unfortunately, overloaded operators are not the easiest to grep  
> for ;-).
Perhaps someone should write some sort of C++ refactoring tool for that!

— Gordon

Chris Lattner

2008-Apr-17 02:12 UTC

head link

[LLVMdev] PATCH: Use size reduction -- wave2

On Apr 16, 2008, at 11:25 AM, Dan Gohman wrote:>> So, my idea is that these changes are performance neutral.
I strongly agree with Dan that we need to measure performance to  
ensure there is no significant performance regression.
>> I hope that this is interesting, but I'd like to ask anybody who is
>> comfortable with performance testing to help provide some hard
>> data :-)
>
> I agree that performance testing is not easy and requires resources,
> but I'm curious what's motivating this project here. My assumption
> has been that no one would likely go to the extreme of inventing a
> mini-language encoded in the least significant bits of pointer
> members in successive structs unless they were pretty desperate for
> performance or scalability.
Ah, this question is easy: it shrinks the Use class by a word, which  
reduces the most popular class in the LLVM IR by 25%.  This directly  
makes all binary operators 8 bytes smaller for example, which is a  
pretty big memory footprint win, and memory footprint translates to  
dcache efficiency as well.

-Chris

heisenbug

2008-Apr-18 15:42 UTC

head link

[LLVMdev] PATCH: Use size reduction -- wave2

On Apr 16, 8:25 pm, Dan Gohman <goh... at apple.com>
wrote:> On Apr 16, 2008, at 2:50 AM, heisenbug wrote:
>
>
>
> > And now here is my educated speculation:
> > There are 2 things that became slower
> > 1) Use::getUser()
> > 2) Use::get/set due to tagging.
>
> > The former is seldom called:
>
> > $ find lib -name "*.cpp" | xargs grep "getUser(" |
wc -l
> >      41
>
> The majority of those aren't actually Use::getUser, but on the
> other hand this grep misses all the users of
> value_use_iterator::operator*, which is much more popular.
> Unfortunately, overloaded operators are not the easiest to
> grep for ;-).

Hi Dan,

thanks for this hint, this makes me to look harder into cost reducing
getUser. My current idea is to do a very cheap heuristic inline, which
catches 100% of the cases for UnaryInstruction and 50% of the cases
for Binary/CmpInstruction, without doing a function call at all.
>
> > The second is counterbalanced with a faster access to the Use object
> > in most cases:
> >  With exception of PHINode and SwitchInst, the getOperand() function
> > (if called on a specialized "this" pointer) does a
"this"-relative
> > access instead of getting OperandList pointer first and going thru
> > that. This was the case before for fixed-arity instructions, but now
> > it applies to variadic ones too that cannot perform operand list
> > resizing.
>
> > Some things got faster, like
> > 1) getOperand access on (say) CallInst is now fetching operands from
> > relative to "this". Also, there is no "new Use[n]"
allocations (and
> > delete []) for these any more.
>
> This is fine, but technically it's an independent change that could
> be done without the getUser algorithm changes and tagging.
>
> > 2) no need to maintain Use::U
> > 3) data-cache related gains on smaller Use objects (not yet activated)
>
> > So, my idea is that these changes are performance neutral.
>
> > There are potentially more speedups to be realized:
> > - indexing operands in the decreasing direction eliminates the
> > getNumOperands() call in getOperand() for variadic arity instructions.
>
> At a glance, the only getOperand() that I saw that uses getNumOperands()
> is ReturnInst's, and that actually looks like a bug.
>
> > - shaving off (unneeded) operand related administrative members from
> > User.
>
> Which members?
I am thinking of OperandList and NumOperands. In most User subclasses
these
are trivially recomputable/constant. The catch is only that getting
these via a
User* is a bit more costly (virtual call or other trick). The question
is how
often the code calls getOperand thru an unspecialized User* ?
>
>
>
> > I hope that this is interesting, but I'd like to ask anybody who
is
> > comfortable with performance testing to help provide some hard
> > data :-)
>
> I agree that performance testing is not easy and requires resources,
I am on my way into this area. Now I have a working Release setup
and already in possession of hard numbers, which are pretty
encouraging.
I see about 1% of speed degradation on a cruel verifier test
(2004-09-29-VerifierIsReallySlow.llx) but even there the user-time
is well in the cloud of measurements of the regular (non-tagged)
build.
On all other tests I have seen no degradation so far. And this
was even before I moved the tag from the Val to the Pred member of
Use.
> but I'm curious what's motivating this project here. My assumption
LLVM is intended to be a pretty much target independent technology
with a
compact delivery format and potentially good startup behaviour. To
actually being a viable system for a wide range of machines it
must be as memory efficient as possible. The in-memory data structures
are needed for the JIT and optimizer and thus they must coexist with
the
generated machine code for the platform. <Use> seems to be the
structure
with the currently greatest bang for buck ratio, so I am thinking
about
a solution to that :-)
> has been that no one would likely go to the extreme of inventing a
> mini-language encoded in the least significant bits of pointer
> members in successive structs unless they were pretty desperate for
> performance or scalability.
Yep. There are desperate needs. I am in the embedded world. When
sometime
iPhone/Pod-like devices are supposed to run many multi-megainstruction
programs routinely, anybody else will see the difference.

Regarding the mini-language, I decided that the alternatives like
linear
search (see the Kernighan-Ritchie mini-language of null-terminated
strings
and strlen :-) and map lookup are far more costly, esp. in a multi-
threaded
environment. For completeness see my musings at <http://
heisenbug.blogspot.com/2008/04/llvm-data-structures-and-putting-use-
on.html>.
>
> Dan
Thanks for your feedback!

Cheers,

    Gabor
>
> _______________________________________________
> LLVM Developers mailing list
> LLVM... at cs.uiuc.edu       
http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Apr 2008 - [LLVMdev] PATCH: Use size reduction -- wave2

[LLVMdev] PATCH: Use size reduction -- wave2

[LLVMdev] PATCH: Use size reduction -- wave2

[LLVMdev] PATCH: Use size reduction -- wave2

[LLVMdev] PATCH: Use size reduction -- wave2

[LLVMdev] PATCH: Use size reduction -- wave2

[LLVMdev] PATCH: Use size reduction -- wave2

Seemingly Similar Threads