Wenlei He via llvm-dev
2021-Oct-29 19:49 UTC
[llvm-dev] Performance benefits shown in [RFC: CSSPGO with Pseudo-Instrumentation] can't be reproduced.
For Spec2017, we’ve seen 1%+ CPU improvements on Broadwell hosts in the past. We use spec only for bringing up new technologies and we no longer tracks spec results now as we move towards production workload. Also note that the measurement was done on our internal fork, with some internal patches. We’re still working on upstreaming some of them. For the setup, -fdebug-info-for-profiling needs to be removed. Thanks, Wenlei From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of 徐青青 via llvm-dev <llvm-dev at lists.llvm.org> Date: Thursday, October 28, 2021 at 1:26 AM To: via llvm-dev <llvm-dev at lists.llvm.org> Subject: [llvm-dev] Performance benefits shown in [RFC: CSSPGO with Pseudo-Instrumentation] can't be reproduced. Hi All, I am using CSSPGO with Pseudo-Instrumentation. But I found that the performance benefits shown in [RFC: CSSPGO with Pseudo-Instrumentation]<https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s/m/iJjcmUS7AwAJ> can't be reproduced on Spec CPU 2017 based on llvm-12. In RFC, results show that CSSPGO with Pseudo-Instrumentation achieves better performance over AutoFDO. Here, I have two question: 1. Why choose Spec CPU 2006 instead of Spec CPU 2017? Do you have results on Spec CPU 2017? 2. Please point out if there is any error with my usage of CSSPGO, the steps are as follows: Suppose that my program is test.cpp. Step 1: clang -O3 -g3 -fno-omit-frame-pointer -fdebug-info-for-profiling -fpseudo-probe-for-profiling test.cpp -o test Step 2: perf record -g --call-graph fp -e br_inst_retired.near_taken:uppp -c 16009 -b -o test.perf.data ./test Step 3: perf script -F ip,brstack -i test.perf.data --show-mmap-event &> test.perf.script Step 4: llvm_install/bin/llvm-profgen --perfscript=test.perf.script --binary=./test --output=test.spgo.profraw --format=text Step 5: llvm_install/bin/llvm-profdata merge --text --sample -output=test.spgo.prof test.profraw ... Step 6: clang -O3 -g3 -fpseudo-probe-for-profiling --fprofile-sample-use=test.spgo.prof test.cpp -o cs_test Step 7: ./cs_test Thanks, Qingqing Xu llvm-dev at lists.llvm.org<mailto:*llvm-dev at lists.llvm.org> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211029/69394d38/attachment.html>
Hongtao Yu via llvm-dev
2021-Oct-29 19:57 UTC
[llvm-dev] Performance benefits shown in [RFC: CSSPGO with Pseudo-Instrumentation] can't be reproduced.
Please also be noted that in order to maximize the benefit from CSSPGO and its improved inlining, LTO mode is recommended. I suggest to try out -flto. Thanks, Hongtao From: Wenlei He <wenlei at fb.com> Date: Friday, October 29, 2021 at 12:49 PM To: 徐青青 <xuqingqing.0729 at bytedance.com>, via llvm-dev <llvm-dev at lists.llvm.org> Cc: Hongtao Yu <hoy at fb.com>, Lei Wang <wlei at fb.com> Subject: Re: [llvm-dev] Performance benefits shown in [RFC: CSSPGO with Pseudo-Instrumentation] can't be reproduced. For Spec2017, we’ve seen 1%+ CPU improvements on Broadwell hosts in the past. We use spec only for bringing up new technologies and we no longer tracks spec results now as we move towards production workload. Also note that the measurement was done on our internal fork, with some internal patches. We’re still working on upstreaming some of them. For the setup, -fdebug-info-for-profiling needs to be removed. Thanks, Wenlei From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of 徐青青 via llvm-dev <llvm-dev at lists.llvm.org> Date: Thursday, October 28, 2021 at 1:26 AM To: via llvm-dev <llvm-dev at lists.llvm.org> Subject: [llvm-dev] Performance benefits shown in [RFC: CSSPGO with Pseudo-Instrumentation] can't be reproduced. Hi All, I am using CSSPGO with Pseudo-Instrumentation. But I found that the performance benefits shown in [RFC: CSSPGO with Pseudo-Instrumentation]<https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s/m/iJjcmUS7AwAJ> can't be reproduced on Spec CPU 2017 based on llvm-12. In RFC, results show that CSSPGO with Pseudo-Instrumentation achieves better performance over AutoFDO. Here, I have two question: 1. Why choose Spec CPU 2006 instead of Spec CPU 2017? Do you have results on Spec CPU 2017? 2. Please point out if there is any error with my usage of CSSPGO, the steps are as follows: Suppose that my program is test.cpp. Step 1: clang -O3 -g3 -fno-omit-frame-pointer -fdebug-info-for-profiling -fpseudo-probe-for-profiling test.cpp -o test Step 2: perf record -g --call-graph fp -e br_inst_retired.near_taken:uppp -c 16009 -b -o test.perf.data ./test Step 3: perf script -F ip,brstack -i test.perf.data --show-mmap-event &> test.perf.script Step 4: llvm_install/bin/llvm-profgen --perfscript=test.perf.script --binary=./test --output=test.spgo.profraw --format=text Step 5: llvm_install/bin/llvm-profdata merge --text --sample -output=test.spgo.prof test.profraw ... Step 6: clang -O3 -g3 -fpseudo-probe-for-profiling --fprofile-sample-use=test.spgo.prof test.cpp -o cs_test Step 7: ./cs_test Thanks, Qingqing Xu llvm-dev at lists.llvm.org<mailto:*llvm-dev at lists.llvm.org> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211029/40e340d9/attachment.html>