Arnaud Allard de Grandmaison
2011-Mar-13 21:01 UTC
[LLVMdev] IndVarSimplify too aggressive ?
Hi all, The IndVarSimplify pass seems to be too aggressive when it enlarge the induction variable type ; this can pessimize the generated code when the new induction variable size is not natively supported by the target. This is probably not an issue for x86_64, which supports natively all types, but it is a real one for several embedded targets, with very few native types. I attached a patch to address this issue; if TargetData is available, the patch attempts to keep the induction variable to a native type when going thru the induction variable users. Also attached my test-case in C, as well as the resulting assembly output, with and without the patch applied, for arm and x86_32 targets. You will note the loop instructions count can be reduced by 30% in several cases. The patch could probably be made smarter : I am welcoming all suggestions. Best Regards, -- Arnaud de Grandmaison -------------- next part -------------- A non-text attachment was scrubbed... Name: IndVarSimplify-nativeType.patch Type: application/octet-stream Size: 2091 bytes Desc: IndVarSimplify-nativeType.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110313/71c8b9cc/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: test.c Type: text/x-c Size: 256 bytes Desc: test.c URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110313/71c8b9cc/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: test.s.patch.arm Type: application/octet-stream Size: 898 bytes Desc: test.s.patch.arm URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110313/71c8b9cc/attachment-0001.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: test.s.patch.x86_32 Type: application/octet-stream Size: 2075 bytes Desc: test.s.patch.x86_32 URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110313/71c8b9cc/attachment-0002.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: test.s.wo_patch.arm Type: application/octet-stream Size: 1102 bytes Desc: test.s.wo_patch.arm URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110313/71c8b9cc/attachment-0003.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: test.s.wo_patch.x86_32 Type: application/octet-stream Size: 2304 bytes Desc: test.s.wo_patch.x86_32 URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110313/71c8b9cc/attachment-0004.obj>
On Sun, Mar 13, 2011 at 5:01 PM, Arnaud Allard de Grandmaison <Arnaud.AllardDeGrandMaison at dibcom.com> wrote:> Hi all, > > The IndVarSimplify pass seems to be too aggressive when it enlarge the induction variable type ; this can pessimize the generated code when the new induction variable size is not natively supported by the target. This is probably not an issue for x86_64, which supports natively all types, but it is a real one for several embedded targets, with very few native types. > > I attached a patch to address this issue; if TargetData is available, the patch attempts to keep the induction variable to a native type when going thru the induction variable users. > > Also attached my test-case in C, as well as the resulting assembly output, with and without the patch applied, for arm and x86_32 targets. You will note the loop instructions count can be reduced by 30% in several cases. > > The patch could probably be made smarter : I am welcoming all suggestions.It's worth pointing out that LoopStrengthReduce is doing essentially the same transformation. The only reason the generated code is improved at all with your change is that ISel has a longstanding issue where it can't conclude that the upper half of zext i32 %x to i64 is zero if the zext is in a different block from the user of the zext. -Eli
Arnaud, I also noticed that IndVarSimplify increases variable size, and in some cases pessimize the program. I just wanted to add that I have seen cases where i64 types were converted to i65 types, for which there is no native support. In the case of i65 multiplication, for some platforms there is not even a library call to perform a 128bit multiplication. So, I welcome your change and I will test your patch locally. Nadav -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Arnaud Allard de Grandmaison Sent: Sunday, March 13, 2011 23:02 To: llvmdev at cs.uiuc.edu Subject: [LLVMdev] IndVarSimplify too aggressive ? Hi all, The IndVarSimplify pass seems to be too aggressive when it enlarge the induction variable type ; this can pessimize the generated code when the new induction variable size is not natively supported by the target. This is probably not an issue for x86_64, which supports natively all types, but it is a real one for several embedded targets, with very few native types. I attached a patch to address this issue; if TargetData is available, the patch attempts to keep the induction variable to a native type when going thru the induction variable users. Also attached my test-case in C, as well as the resulting assembly output, with and without the patch applied, for arm and x86_32 targets. You will note the loop instructions count can be reduced by 30% in several cases. The patch could probably be made smarter : I am welcoming all suggestions. Best Regards, -- Arnaud de Grandmaison --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
Arnaud Allard de Grandmaison
2011-Mar-14 18:27 UTC
[LLVMdev] IndVarSimplify too aggressive ?
Thanks Eli, After digging thru mail archives & bugzilla, it seems fixing properly this issue would require a major change in the selectionDAG code --- to have it operate on a per function basis instead of per basic-block. This however, does not seem to be the only issue. The following C code does not produce an efficicient assembly sequence either. extern void f(unsigned long long v); void test2() { for (unsigned i=0; i<512; i++) f(i); } The resulting .ll out of clang looks reasonnable (with and without the patch), but the arm assembly output looks ugly, though marginally better with my patch : the induction variable should be counting up, and it could be zero extended before the call to f. This again points to Isel, but to a different area, as everything is taking place in the same BB. Is this some known issue ? I could not find a bug report matching this. -- Arnaud de Grandmaison -----Original Message----- From: Eli Friedman [mailto:eli.friedman at gmail.com] Sent: Sunday, March 13, 2011 11:08 PM To: Arnaud Allard de Grandmaison Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] IndVarSimplify too aggressive ? On Sun, Mar 13, 2011 at 5:01 PM, Arnaud Allard de Grandmaison <Arnaud.AllardDeGrandMaison at dibcom.com> wrote:> Hi all, > > The IndVarSimplify pass seems to be too aggressive when it enlarge the induction variable type ; this can pessimize the generated code when the new induction variable size is not natively supported by the target. This is probably not an issue for x86_64, which supports natively all types, but it is a real one for several embedded targets, with very few native types. > > I attached a patch to address this issue; if TargetData is available, the patch attempts to keep the induction variable to a native type when going thru the induction variable users. > > Also attached my test-case in C, as well as the resulting assembly output, with and without the patch applied, for arm and x86_32 targets. You will note the loop instructions count can be reduced by 30% in several cases. > > The patch could probably be made smarter : I am welcoming all suggestions.It's worth pointing out that LoopStrengthReduce is doing essentially the same transformation. The only reason the generated code is improved at all with your change is that ISel has a longstanding issue where it can't conclude that the upper half of zext i32 %x to i64 is zero if the zext is in a different block from the user of the zext. -Eli -------------- next part -------------- A non-text attachment was scrubbed... Name: test2.s.wo_patch.arm Type: application/octet-stream Size: 447 bytes Desc: test2.s.wo_patch.arm URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110314/3e93ccc4/attachment.obj> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test2.c URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110314/3e93ccc4/attachment.c> -------------- next part -------------- A non-text attachment was scrubbed... Name: test2.ll.w_patch.arm Type: application/octet-stream Size: 744 bytes Desc: test2.ll.w_patch.arm URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110314/3e93ccc4/attachment-0001.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: test2.ll.wo_patch.arm Type: application/octet-stream Size: 676 bytes Desc: test2.ll.wo_patch.arm URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110314/3e93ccc4/attachment-0002.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: test2.s.w_patch.arm Type: application/octet-stream Size: 440 bytes Desc: test2.s.w_patch.arm URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110314/3e93ccc4/attachment-0003.obj>
On Mar 13, 2011, at 2:01 PM, Arnaud Allard de Grandmaison wrote:> Hi all, > > The IndVarSimplify pass seems to be too aggressive when it enlarge the induction variable type ; this can pessimize the generated code when the new induction variable size is not natively supported by the target. This is probably not an issue for x86_64, which supports natively all types, but it is a real one for several embedded targets, with very few native types. > > I attached a patch to address this issue; if TargetData is available, the patch attempts to keep the induction variable to a native type when going thru the induction variable users. > > Also attached my test-case in C, as well as the resulting assembly output, with and without the patch applied, for arm and x86_32 targets. You will note the loop instructions count can be reduced by 30% in several cases. > > The patch could probably be made smarter : I am welcoming all suggestions. > > Best Regards, > -- > Arnaud de GrandmaisonArnaud, I've been investigating whether it's safe to apply your patch. I still need to understand why our generated code is slower in some cases. I noticed a particularly bad regression in MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow that I documented here: http://llvm.org/bugs/show_bug.cgi?id=9490 We would like to avoid generating canonical induction variables in IndVarSimplify. Once that work is complete, your patch should no longer be needed. Although in the meantime, it would be nice to understand why promoting IVs to wider types is sometimes required for codegen. -Andy
Arnaud Allard de Grandmaison
2011-Mar-16 17:23 UTC
[LLVMdev] IndVarSimplify too aggressive ?
Hi Andy, Thanks for looking into this. I have tried today to make a reduced testcase from the value function, but as I do not have any arm hardware available to measure the real cycle count, it can be quite errorprone, especially with all those loops. Maybe I should give a try at qemu. Best regards, -- Arnaud de Grandmaison -----Original Message----- From: Andrew Trick [mailto:atrick at apple.com] Sent: Wednesday, March 16, 2011 1:16 AM To: Arnaud Allard de Grandmaison Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] IndVarSimplify too aggressive ? On Mar 13, 2011, at 2:01 PM, Arnaud Allard de Grandmaison wrote:> Hi all, > > The IndVarSimplify pass seems to be too aggressive when it enlarge the induction variable type ; this can pessimize the generated code when the new induction variable size is not natively supported by the target. This is probably not an issue for x86_64, which supports natively all types, but it is a real one for several embedded targets, with very few native types. > > I attached a patch to address this issue; if TargetData is available, the patch attempts to keep the induction variable to a native type when going thru the induction variable users. > > Also attached my test-case in C, as well as the resulting assembly output, with and without the patch applied, for arm and x86_32 targets. You will note the loop instructions count can be reduced by 30% in several cases. > > The patch could probably be made smarter : I am welcoming all suggestions. > > Best Regards, > -- > Arnaud de GrandmaisonArnaud, I've been investigating whether it's safe to apply your patch. I still need to understand why our generated code is slower in some cases. I noticed a particularly bad regression in MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow that I documented here: http://llvm.org/bugs/show_bug.cgi?id=9490 We would like to avoid generating canonical induction variables in IndVarSimplify. Once that work is complete, your patch should no longer be needed. Although in the meantime, it would be nice to understand why promoting IVs to wider types is sometimes required for codegen. -Andy
On Mar 13, 2011, at 2:01 PM, Arnaud Allard de Grandmaison wrote:> Hi all, > > The IndVarSimplify pass seems to be too aggressive when it enlarge the induction variable type ; this can pessimize the generated code when the new induction variable size is not natively supported by the target. This is probably not an issue for x86_64, which supports natively all types, but it is a real one for several embedded targets, with very few native types. > > I attached a patch to address this issue; if TargetData is available, the patch attempts to keep the induction variable to a native type when going thru the induction variable users. > > Also attached my test-case in C, as well as the resulting assembly output, with and without the patch applied, for arm and x86_32 targets. You will note the loop instructions count can be reduced by 30% in several cases. > > The patch could probably be made smarter : I am welcoming all suggestions.Hi Arnaud, This should be fixed in r127884. In some cases, your patch could result in multiple IVs for the same recurrence, and LSR was not able to cleanup afterward. Dan Gohman proposed an alternative, which seems to work great. See http://llvm.org/bugs/show_bug.cgi?id=9490. -Andy
Arnaud Allard de Grandmaison
2011-Mar-21 16:21 UTC
[LLVMdev] IndVarSimplify too aggressive ?
Thanks Andy & Dan for looking into this. Tested it on my own backends, and it works great ! Best regards, -- Arnaud de Grandmaison -----Original Message----- From: Andrew Trick [mailto:atrick at apple.com] Sent: Friday, March 18, 2011 6:11 PM To: Arnaud Allard de Grandmaison Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] IndVarSimplify too aggressive ? On Mar 13, 2011, at 2:01 PM, Arnaud Allard de Grandmaison wrote:> Hi all, > > The IndVarSimplify pass seems to be too aggressive when it enlarge the induction variable type ; this can pessimize the generated code when the new induction variable size is not natively supported by the target. This is probably not an issue for x86_64, which supports natively all types, but it is a real one for several embedded targets, with very few native types. > > I attached a patch to address this issue; if TargetData is available, the patch attempts to keep the induction variable to a native type when going thru the induction variable users. > > Also attached my test-case in C, as well as the resulting assembly output, with and without the patch applied, for arm and x86_32 targets. You will note the loop instructions count can be reduced by 30% in several cases. > > The patch could probably be made smarter : I am welcoming all suggestions.Hi Arnaud, This should be fixed in r127884. In some cases, your patch could result in multiple IVs for the same recurrence, and LSR was not able to cleanup afterward. Dan Gohman proposed an alternative, which seems to work great. See http://llvm.org/bugs/show_bug.cgi?id=9490. -Andy
Possibly Parallel Threads
- [LLVMdev] IndVarSimplify too aggressive ?
- [LLVMdev] IndVarSimplify too aggressive ?
- [LLVMdev] IndVarSimplify too aggressive ?
- [LLVMdev] IndVarSimplify too aggressive ?
- [LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs