Thomas Preud'homme via llvm-dev
2021-Mar-02 11:57 UTC
[llvm-dev] Spurious peeling in simple loop unrolling
Hi, We've noticed some codegen regressions due to spurious peeling in the simple loop unrolling pass. This happens when the incoming values of the phi nodes causing the peeling are equivalent, since GVN only runs after the loop peeling. Ironically, this prevents vectorization on our target due to the loss of alignment in the loop body caused by the peeling. The spurious peeling seems to affect AArch64 as well, so it's possible other targets may be affected. Attached is an example test case and the corresponding codegen as of 2632ba6a358a62c5cbaddc141de81b756b68698f (with and without 35b3989a30eefa66cd6edca4c6e1ec061c05ad96 reverted). Compiling with -O3 -S -target aarch64-linux-gnu gives the attached 3 logs. Prerotate is just before loop rotation. The postrotate logfile shows the IR after the rotate loop pass and a few more passes have run (the logs after the rotate loop pass only show the loop body). Loop rotate introduces two elidable phis: %33 = phi i8* [ %9, %22 ], [ %14, %32 ] %34 = phi %class.HomemadeVector.0* [ %8, %22 ], [ %13, %32 ] This leads to loop peeling inside simple loop unrolling to peel the first iteration of the loop. Prior to 35b3989a30eefa66cd6edca4c6e1ec061c05ad96 the phi would remain until GVN runs and would remove them since %8 and %13 are the same values. The same would happen to %9 and %14. Running GVN earlier would fix this issue, but I suspect it would have other regressions. Does anyone know how to address this? Best regards, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210302/d01c82f3/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: spurious_loop_peeling.cpp Type: text/x-c++src Size: 486 bytes Desc: spurious_loop_peeling.cpp URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210302/d01c82f3/attachment.cpp> -------------- next part -------------- A non-text attachment was scrubbed... Name: spurious_loop_peeling-35b3989a30eefa66cd6edca4c6e1ec061c05ad96-reverted.s Type: application/octet-stream Size: 2967 bytes Desc: spurious_loop_peeling-35b3989a30eefa66cd6edca4c6e1ec061c05ad96-reverted.s URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210302/d01c82f3/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: spurious_loop_peeling-2632ba6a358a62c5cbaddc141de81b756b68698f.s Type: application/octet-stream Size: 3316 bytes Desc: spurious_loop_peeling-2632ba6a358a62c5cbaddc141de81b756b68698f.s URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210302/d01c82f3/attachment-0001.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: spurious_loop_peeling-2632ba6a358a62c5cbaddc141de81b756b68698f_postpeeling.log Type: text/x-log Size: 4104 bytes Desc: spurious_loop_peeling-2632ba6a358a62c5cbaddc141de81b756b68698f_postpeeling.log URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210302/d01c82f3/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: spurious_loop_peeling-2632ba6a358a62c5cbaddc141de81b756b68698f_postrotate.log Type: text/x-log Size: 3454 bytes Desc: spurious_loop_peeling-2632ba6a358a62c5cbaddc141de81b756b68698f_postrotate.log URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210302/d01c82f3/attachment-0001.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: spurious_loop_peeling-2632ba6a358a62c5cbaddc141de81b756b68698f_prerotate.log Type: text/x-log Size: 2866 bytes Desc: spurious_loop_peeling-2632ba6a358a62c5cbaddc141de81b756b68698f_prerotate.log URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210302/d01c82f3/attachment-0002.bin>
Florian Hahn via llvm-dev
2021-Mar-04 13:32 UTC
[llvm-dev] Spurious peeling in simple loop unrolling
Hi,> On Mar 2, 2021, at 11:57, Thomas Preud'homme <thomasp at graphcore.ai> wrote: > > Hi, > > We've noticed some codegen regressions due to spurious peeling in the simple loop unrolling pass. This happens when the incoming values of the phi nodes causing the peeling are equivalent, since GVN only runs after the loop peeling. Ironically, this prevents vectorization on our target due to the loss of alignment in the loop body caused by the peeling. The spurious peeling seems to affect AArch64 as well, so it's possible other targets may be affected. >Thanks for sharing the example. IIUC the problem here is that there is phi node which becomes invariant after the first iteration and which causes the loop to be peeled (small example here https://godbolt.org/z/Ej4Y19 <https://godbolt.org/z/Ej4Y19>). In your case, the PHI node is actually redundant because the incoming values are actually the same, just different bit casts of the same value, so peeling does not really add any benefits.> Attached is an example test case and the corresponding codegen as of 2632ba6a358a62c5cbaddc141de81b756b68698f (with and without 35b3989a30eefa66cd6edca4c6e1ec061c05ad96 reverted). Compiling with -O3 -S -target aarch64-linux-gnu gives the attached 3 logs. Prerotate is just before loop rotation. The post IIUrotate logfile shows the IR after the rotate loop pass and a few more passes have run (the logs after the rotate loop pass only show the loop body). Loop rotate introduces two elidable phis: > > %33 = phi i8* [ %9, %22 ], [ %14, %32 ] > %34 = phi %class.HomemadeVector.0* [ %8, %22 ], [ %13, %32 ] > > This leads to loop peeling inside simple loop unrolling to peel the first iteration of the loop. Prior to 35b3989a30eefa66cd6edca4c6e1ec061c05ad96 the phi would remain until GVN runs and would remove them since %8 and %13 are the same values. The same would happen to %9 and %14. > > Running GVN earlier would fix this issue, but I suspect it would have other regressions. Does anyone know how to address this?Adjusting the position of GVN to address the issue at hand is probably not a good idea. I think there are at least a few small improvements we could make to improve the current situation: 1. Only peel if the PHI that becomes invariant has any concrete users; if the only users of a phi that becomes invariant are things like llvm.assume, peeling should be very unlikely to be beneficial (note that currently peeling also seems to happily peel for completely unused phis) 2. Instcombine before peeling already simplifies the IR so that both incoming values are bit casts of the same value. It probably would be trivial to also have instcombine simplify pointer phis if the incoming values striped by pointer casts are equal. (There might be some other reason why we are not doing this at the moment though). 3. For targets very sensitive to the number of iterations, perhaps it would be worth adding a TTI hook to express that. 4. Perhaps peeling should also be a bit more careful when a known trip count isn’t a power-of-2 (or some similar constraint) any more, but was before peeling. Cheers, Florian -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210304/f94a6a62/attachment.html>