thr3ads.net - llvm dev - [llvm-dev] [External] Re: Performance benefits shown in [RFC: CSSPGO with Pseudo-Instrumentation] can't be reproduced. [Nov 2021]

If this information is useful, please help other people find it:
Share via:

Hongtao Yu via llvm-dev

2021-Oct-29 19:57 UTC

[llvm-dev] Performance benefits shown in [RFC: CSSPGO with Pseudo-Instrumentation] can't be reproduced.

Please also be noted that in order to maximize the benefit from CSSPGO and its
improved inlining, LTO mode is recommended. I suggest to try out -flto.

Thanks,
Hongtao

From: Wenlei He <wenlei at fb.com>
Date: Friday, October 29, 2021 at 12:49 PM
To: 徐青青 <xuqingqing.0729 at bytedance.com>, via llvm-dev <llvm-dev at
lists.llvm.org>
Cc: Hongtao Yu <hoy at fb.com>, Lei Wang <wlei at fb.com>
Subject: Re: [llvm-dev] Performance benefits shown in [RFC: CSSPGO with
Pseudo-Instrumentation] can't be reproduced.
For Spec2017, we’ve seen 1%+ CPU improvements on Broadwell hosts in the past. We
use spec only for bringing up new technologies and we no longer tracks spec
results now as we move towards production workload. Also note that the
measurement was done on our internal fork, with some internal patches. We’re
still working on upstreaming some of them.

For the setup, -fdebug-info-for-profiling needs to be removed.

Thanks,
Wenlei

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of 徐青青 via
llvm-dev <llvm-dev at lists.llvm.org>
Date: Thursday, October 28, 2021 at 1:26 AM
To: via llvm-dev <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] Performance benefits shown in [RFC: CSSPGO with
Pseudo-Instrumentation] can't be reproduced.
Hi All,

I am using CSSPGO with Pseudo-Instrumentation. But I found that the performance
benefits shown in [RFC: CSSPGO with
Pseudo-Instrumentation]<https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s/m/iJjcmUS7AwAJ>
can't be reproduced on Spec CPU 2017 based on llvm-12. In RFC, results show
that CSSPGO with Pseudo-Instrumentation achieves better performance over
AutoFDO.

Here, I have two question:
1.       Why choose Spec CPU 2006 instead of Spec CPU 2017? Do you have results
on Spec CPU 2017?
2.       Please point out if there is any error with my usage of CSSPGO, the
steps are as follows:
Suppose that my program is test.cpp.
Step 1: clang  -O3  -g3  -fno-omit-frame-pointer  -fdebug-info-for-profiling 
-fpseudo-probe-for-profiling  test.cpp  -o  test
Step 2: perf  record  -g  --call-graph  fp  -e  br_inst_retired.near_taken:uppp 
-c  16009  -b  -o  test.perf.data  ./test
Step 3: perf  script  -F  ip,brstack  -i  test.perf.data  --show-mmap-event 
&>  test.perf.script
Step 4: llvm_install/bin/llvm-profgen  --perfscript=test.perf.script 
--binary=./test  --output=test.spgo.profraw  --format=text
Step 5: llvm_install/bin/llvm-profdata  merge  --text  --sample 
-output=test.spgo.prof  test.profraw ...
Step 6: clang  -O3  -g3  -fpseudo-probe-for-profiling 
--fprofile-sample-use=test.spgo.prof  test.cpp  -o  cs_test
Step 7: ./cs_test

Thanks,
Qingqing Xu

llvm-dev at lists.llvm.org<mailto:*llvm-dev at lists.llvm.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211029/40e340d9/attachment.html>

Lei Wang via llvm-dev

2021-Oct-29 20:53 UTC

head link

[llvm-dev] Performance benefits shown in [RFC: CSSPGO with Pseudo-Instrumentation] can't be reproduced.

BTW, regarding the issue in 
https://groups.google.com/g/llvm-dev/c/QJFIzk6bP1Y/m/8YlhrhXDAQAJ. (Sorry I
overlooked the message)

We have a fix in https://reviews.llvm.org/D110081 which can filter out the
negative LineOffset, you can have a try on latest llvm-profgen.

Thanks.
Lei

From: Hongtao Yu <hoy at fb.com>
Date: Friday, October 29, 2021 at 12:57 PM
To: Wenlei He <wenlei at fb.com>, 徐青青 <xuqingqing.0729 at
bytedance.com>, via llvm-dev <llvm-dev at lists.llvm.org>
Cc: Lei Wang <wlei at fb.com>
Subject: Re: [llvm-dev] Performance benefits shown in [RFC: CSSPGO with
Pseudo-Instrumentation] can't be reproduced.

Please also be noted that in order to maximize the benefit from CSSPGO and its
improved inlining, LTO mode is recommended. I suggest to try out -flto.

Thanks,
Hongtao

From: Wenlei He <wenlei at fb.com>
Date: Friday, October 29, 2021 at 12:49 PM
To: 徐青青 <xuqingqing.0729 at bytedance.com>, via llvm-dev <llvm-dev at
lists.llvm.org>
Cc: Hongtao Yu <hoy at fb.com>, Lei Wang <wlei at fb.com>
Subject: Re: [llvm-dev] Performance benefits shown in [RFC: CSSPGO with
Pseudo-Instrumentation] can't be reproduced.
For Spec2017, we’ve seen 1%+ CPU improvements on Broadwell hosts in the past. We
use spec only for bringing up new technologies and we no longer tracks spec
results now as we move towards production workload. Also note that the
measurement was done on our internal fork, with some internal patches. We’re
still working on upstreaming some of them.

For the setup, -fdebug-info-for-profiling needs to be removed.

Thanks,
Wenlei

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of 徐青青 via
llvm-dev <llvm-dev at lists.llvm.org>
Date: Thursday, October 28, 2021 at 1:26 AM
To: via llvm-dev <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] Performance benefits shown in [RFC: CSSPGO with
Pseudo-Instrumentation] can't be reproduced.
Hi All,

I am using CSSPGO with Pseudo-Instrumentation. But I found that the performance
benefits shown in [RFC: CSSPGO with
Pseudo-Instrumentation]<https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s/m/iJjcmUS7AwAJ>
can't be reproduced on Spec CPU 2017 based on llvm-12. In RFC, results show
that CSSPGO with Pseudo-Instrumentation achieves better performance over
AutoFDO.

Here, I have two question:
1.       Why choose Spec CPU 2006 instead of Spec CPU 2017? Do you have results
on Spec CPU 2017?
2.       Please point out if there is any error with my usage of CSSPGO, the
steps are as follows:
Suppose that my program is test.cpp.
Step 1: clang  -O3  -g3  -fno-omit-frame-pointer  -fdebug-info-for-profiling 
-fpseudo-probe-for-profiling  test.cpp  -o  test
Step 2: perf  record  -g  --call-graph  fp  -e  br_inst_retired.near_taken:uppp 
-c  16009  -b  -o  test.perf.data  ./test
Step 3: perf  script  -F  ip,brstack  -i  test.perf.data  --show-mmap-event 
&>  test.perf.script
Step 4: llvm_install/bin/llvm-profgen  --perfscript=test.perf.script 
--binary=./test  --output=test.spgo.profraw  --format=text
Step 5: llvm_install/bin/llvm-profdata  merge  --text  --sample 
-output=test.spgo.prof  test.profraw ...
Step 6: clang  -O3  -g3  -fpseudo-probe-for-profiling 
--fprofile-sample-use=test.spgo.prof  test.cpp  -o  cs_test
Step 7: ./cs_test

Thanks,
Qingqing Xu

llvm-dev at lists.llvm.org<mailto:*llvm-dev at lists.llvm.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211029/9bf9bad4/attachment.html>

徐青青 via llvm-dev

2021-Nov-02 06:42 UTC

head link

[llvm-dev] [External] Re: Performance benefits shown in [RFC: CSSPGO with Pseudo-Instrumentation] can't be reproduced.

As you suggested, I remove -fdebug-info-for-profiling in first compiling
process and add -flto in second compiling process for CSSPGO, -flto can
bring great improvement.

To be fair, I also add -flto in second compiling process for AutoFDO. The
result shows that AutoFDO bring more performance benefits over CSSPGO
(about 20% on SpecCPU2017's 523.xalancbmk_r).

The version of llvm I used is llvm-12. And your RFC is also based on
llvm-12 according the time of RFC. Have I missed anything in the usage of
CSSPGO? Is there any option for CSSPGO which I need to open manually? Could
you please test the branch release/12.x and confirm the results to help me
to get performance benefits over AutoFDO?

Thanks,
Qingqing

On Sat, Oct 30, 2021, 04:53 <wlei at fb.com> wrote:

BTW, regarding the issue in
https://groups.google.com/g/llvm-dev/c/QJFIzk6bP1Y/m/8YlhrhXDAQAJ. (Sorry I
overlooked the message)



We have a fix in https://reviews.llvm.org/D110081 which can filter out the
negative LineOffset, you can have a try on latest llvm-profgen.



Thanks.

Lei



*From: *Hongtao Yu <hoy at fb.com>
*Date: *Friday, October 29, 2021 at 12:57 PM
*To: *Wenlei He <wenlei at fb.com>, 徐青青 <xuqingqing.0729 at
bytedance.com>, via
llvm-dev <llvm-dev at lists.llvm.org>
*Cc: *Lei Wang <wlei at fb.com>
*Subject: *Re: [llvm-dev] Performance benefits shown in [RFC: CSSPGO with
Pseudo-Instrumentation] can't be reproduced.



Please also be noted that in order to maximize the benefit from CSSPGO and
its improved inlining, LTO mode is recommended. I suggest to try out -flto.



Thanks,

Hongtao



*From: *Wenlei He <wenlei at fb.com>
*Date: *Friday, October 29, 2021 at 12:49 PM
*To: *徐青青 <xuqingqing.0729 at bytedance.com>, via llvm-dev <
llvm-dev at lists.llvm.org>
*Cc: *Hongtao Yu <hoy at fb.com>, Lei Wang <wlei at fb.com>
*Subject: *Re: [llvm-dev] Performance benefits shown in [RFC: CSSPGO with
Pseudo-Instrumentation] can't be reproduced.

For Spec2017, we’ve seen 1%+ CPU improvements on Broadwell hosts in the
past. We use spec only for bringing up new technologies and we no longer
tracks spec results now as we move towards production workload. Also note
that the measurement was done on our internal fork, with some internal
patches. We’re still working on upstreaming some of them.



For the setup, -fdebug-info-for-profiling needs to be removed.



Thanks,

Wenlei



*From: *llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of 徐青青 via
llvm-dev <llvm-dev at lists.llvm.org>
*Date: *Thursday, October 28, 2021 at 1:26 AM
*To: *via llvm-dev <llvm-dev at lists.llvm.org>
*Subject: *[llvm-dev] Performance benefits shown in [RFC: CSSPGO with
Pseudo-Instrumentation] can't be reproduced.

Hi All,



I am using *CSSPGO with Pseudo-Instrumentation*. But I found that the
performance benefits shown in [RFC: CSSPGO with Pseudo-Instrumentation]
<https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s/m/iJjcmUS7AwAJ>
*can't
be reproduced on Spec CPU 2017* based on *llvm-12*. In RFC, results show
that CSSPGO with Pseudo-Instrumentation achieves better performance over
AutoFDO.



Here, I have two question:

1.       Why choose Spec CPU 2006 instead of Spec CPU 2017? Do you have
results on Spec CPU 2017?

2.       Please point out if there is any error with my usage of CSSPGO,
the steps are as follows:

Suppose that my program is test.cpp.

Step 1: clang  -O3  -g3  -fno-omit-frame-pointer
 -fdebug-info-for-profiling  -fpseudo-probe-for-profiling  test.cpp  -o
 test

Step 2: perf  record  -g  --call-graph  fp  -e
 br_inst_retired.near_taken:uppp  -c  16009  -b  -o  test.perf.data  ./test

Step 3: perf  script  -F  ip,brstack  -i  test.perf.data  --show-mmap-event
 &>  test.perf.script

Step 4: llvm_install/bin/llvm-profgen  --perfscript=test.perf.script
 --binary=./test  --output=test.spgo.profraw  --format=text

Step 5: llvm_install/bin/llvm-profdata  merge  --text  --sample
 -output=test.spgo.prof  test.profraw ...

Step 6: clang  -O3  -g3  -fpseudo-probe-for-profiling
 --fprofile-sample-use=test.spgo.prof  test.cpp  -o  cs_test

Step 7: ./cs_test



Thanks,

Qingqing Xu



<*llvm-dev at lists.llvm.org>llvm-dev at lists.llvm.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211101/e83b52cb/attachment.html>

llvm dev - Nov 2021 - [External] Re: Performance benefits shown in [RFC: CSSPGO with Pseudo-Instrumentation] can't be reproduced.

[llvm-dev] Performance benefits shown in [RFC: CSSPGO with Pseudo-Instrumentation] can't be reproduced.

[llvm-dev] Performance benefits shown in [RFC: CSSPGO with Pseudo-Instrumentation] can't be reproduced.

[llvm-dev] [External] Re: Performance benefits shown in [RFC: CSSPGO with Pseudo-Instrumentation] can't be reproduced.