On Apr 17, 4:12 am, Chris Lattner <sa... at nondot.org>
wrote:> On Apr 16, 2008, at 11:25 AM, Dan Gohman wrote:
>
> >> So, my idea is that these changes are performance neutral.
>
> I strongly agree with Dan that we need to measure performance to
> ensure there is no significant performance regression.
Dan, Chris,
finally I am in possession of hard performance data on
a realistic testcase (kimwitu++).
I have 20 measurements regading trunk LLVM and 5 with my changes
merged in:
###### TRUNK + DIET
pixxi:~ ggreif$ cat kimwituFastDiet50147.scatter|grep user|sort
user 1m25.404s
user 1m25.453s
user 1m25.454s
user 1m25.526s
user 1m25.973s
###### TRUNK
pixxi:~ ggreif$ cat kimwituRegular.scatter.backup|grep user|sort
user 1m25.127s
user 1m25.132s
user 1m25.147s
user 1m25.160s
user 1m25.169s
user 1m25.179s
user 1m25.179s
user 1m25.184s
user 1m25.189s
user 1m25.199s
user 1m25.204s
user 1m25.207s
user 1m25.212s
user 1m25.217s
user 1m25.219s
user 1m25.233s
user 1m25.243s
user 1m25.245s
user 1m25.259s
user 1m25.560s
ratio of the two best CPU times:
pixxi:~ ggreif$ expr 854040000 / 85127
10032
ratio of the two second best CPU times:
pixxi:~ ggreif$ expr 854530000 / 85132
10037
It looks like we have a degradation of 0.3%.
The <system> and <real> times show no surprises at all.
There is one important change still missing from the
use-diet branch, viz. the capacity of the BitcodeReaderValueList
is computed very naively with the new algorithm at each push_back.
I left this in to see whether the algorithm scales.
Kimwitu++ bitcode-reading puts more than 250.000 Use
objects into a contiguous array. To get its capacity
my algoritm has to visit more than 18 pointers each time.
Tonight I will store the capacity in a member variable,
and run comprehensive tests. I expect further speedups,
possibly even parity.
Barring any surprises I plan to merge the use-diet branch to
trunk this weekend. Owen promised to help me doing more
performance evaluations till then, so we get a clearer
picture.
I have also downloaded CHUD, so maybe even looking at
(and fixing) of bottlenecks is feasible in the next days.
What do you think?
Cheers,
Gabor
PS: Yes I will send out several mails to llvm-dev before
and after merging.
>
> >> I hope that this is interesting, but I'd like to ask anybody
who is
> >> comfortable with performance testing to help provide some hard
> >> data :-)
>
> > I agree that performance testing is not easy and requires resources,
> > but I'm curious what's motivating this project here. My
assumption
> > has been that no one would likely go to the extreme of inventing a
> > mini-language encoded in the least significant bits of pointer
> > members in successive structs unless they were pretty desperate for
> > performance or scalability.
>
> Ah, this question is easy: it shrinks the Use class by a word, which
> reduces the most popular class in the LLVM IR by 25%. This directly
> makes all binary operators 8 bytes smaller for example, which is a
> pretty big memory footprint win, and memory footprint translates to
> dcache efficiency as well.
>
> -Chris
> _______________________________________________
> LLVM Developers mailing list
> LLVM... at cs.uiuc.edu
http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev