On Wed, Sep 8, 2010 at 12:35 AM, Nicolas Capens <nicolas.capens at gmail.com> wrote:> Hi Chris, > > It's not broken, but the performance is crippled. > > I noticed that the code still contains some MMX instructions, but several > operations get expanded (apparently swizzling and such get expanded to a > large number of byte moves).I think some changes related to MMX landed before 2.8 branched which shouldn't have... please file a bug.> I could use intrinsics, but they wouldn't be optimized like other vector > operations. I could use SSE operations, but they would increase SSE register > pressure while MMX registers are left unused. > > > > So ideally I would like to inform LLVM that selecting MMX instructions is > fine. I'm inserting emms instructions in the right spots myself.I think the direction going forward we're going to prefer is that 64-bit vectors get widened to 128-bit vectors, which might not be quite ideal in some situations, but will avoid situations where MMX instructions are incorrectly generated. That said, the work isn't finished, so it shouldn't be in 2.8. -Eli
On Sep 8, 2010, at 7:24 AM, Eli Friedman wrote:> On Wed, Sep 8, 2010 at 12:35 AM, Nicolas Capens > <nicolas.capens at gmail.com> wrote: >> Hi Chris, >> >> It's not broken, but the performance is crippled. >> >> I noticed that the code still contains some MMX instructions, but several >> operations get expanded (apparently swizzling and such get expanded to a >> large number of byte moves). > > I think some changes related to MMX landed before 2.8 branched which > shouldn't have... please file a bug.Right. There should be no major change before 2.8, so if something bad happened, it needs to be fixed on the branch.>> I could use intrinsics, but they wouldn't be optimized like other vector >> operations. I could use SSE operations, but they would increase SSE register >> pressure while MMX registers are left unused. >> >> So ideally I would like to inform LLVM that selecting MMX instructions is >> fine. I'm inserting emms instructions in the right spots myself. > > I think the direction going forward we're going to prefer is that > 64-bit vectors get widened to 128-bit vectors, which might not be > quite ideal in some situations, but will avoid situations where MMX > instructions are incorrectly generated. That said, the work isn't > finished, so it shouldn't be in 2.8.In 2.9, the only way to get MMX will be to use mmx intrinsics, generic vectors will not map onto MMX, sorry Nicolas. One major problem is that the optimizer introduces generic vectors (e.g. see r112696 in the SRoA pass) which use mmx where it was not previously used. This means that your frontend introducing emms is not enough. -Chris
Hi all, Sorry for the late reply. I got sidetracked by other fun projects. ;-) I found that the performance regression is caused by revisions 112804, 112805 and 112806. Those changes were made 2 days prior to the 2.8 branching, so it may have not been the intention to include them there? Either way they make my vector-intensive code two times slower so it would be much appreciated to revert these changes for the 2.8 release. Thanks, Nicolas -----Original Message----- From: Chris Lattner [mailto:clattner at apple.com] Sent: Wednesday, September 08, 2010 18:59 To: Eli Friedman Cc: Nicolas Capens; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] LLVM 2.8 and MMX On Sep 8, 2010, at 7:24 AM, Eli Friedman wrote:> On Wed, Sep 8, 2010 at 12:35 AM, Nicolas Capens > <nicolas.capens at gmail.com> wrote: >> Hi Chris, >> >> It's not broken, but the performance is crippled. >> >> I noticed that the code still contains some MMX instructions, but several >> operations get expanded (apparently swizzling and such get expanded to a >> large number of byte moves). > > I think some changes related to MMX landed before 2.8 branched which > shouldn't have... please file a bug.Right. There should be no major change before 2.8, so if something bad happened, it needs to be fixed on the branch.>> I could use intrinsics, but they wouldn't be optimized like other vector >> operations. I could use SSE operations, but they would increase SSEregister>> pressure while MMX registers are left unused. >> >> So ideally I would like to inform LLVM that selecting MMX instructions is >> fine. I'm inserting emms instructions in the right spots myself. > > I think the direction going forward we're going to prefer is that > 64-bit vectors get widened to 128-bit vectors, which might not be > quite ideal in some situations, but will avoid situations where MMX > instructions are incorrectly generated. That said, the work isn't > finished, so it shouldn't be in 2.8.In 2.9, the only way to get MMX will be to use mmx intrinsics, generic vectors will not map onto MMX, sorry Nicolas. One major problem is that the optimizer introduces generic vectors (e.g. see r112696 in the SRoA pass) which use mmx where it was not previously used. This means that your frontend introducing emms is not enough. -Chris
On Sep 21, 2010, at 10:23 AMPDT, Nicolas Capens wrote:> Hi all, > > Sorry for the late reply. I got sidetracked by other fun projects. ;-) > > I found that the performance regression is caused by revisions 112804, > 112805 and 112806. Those changes were made 2 days prior to the 2.8 > branching, so it may have not been the intention to include them there? > Either way they make my vector-intensive code two times slower so it would > be much appreciated to revert these changes for the 2.8 release. > > Thanks, > > NicolasInteresting. These are all Bruno's patches, and I'm pretty sure they weren't intended to affect MMX. I doubt reverting them is right since the effect on SSE is presumably positive. Unfortunately Bruno is not here any more.> -----Original Message----- > From: Chris Lattner [mailto:clattner at apple.com] >> >> I think some changes related to MMX landed before 2.8 branched which >> shouldn't have... please file a bug.So please file a bug, with example.
On Sep 21, 2010, at 10:23 AM, Nicolas Capens wrote:> Hi all, > > Sorry for the late reply. I got sidetracked by other fun projects. ;-) > > I found that the performance regression is caused by revisions 112804, > 112805 and 112806. Those changes were made 2 days prior to the 2.8 > branching, so it may have not been the intention to include them there? > Either way they make my vector-intensive code two times slower so it would > be much appreciated to revert these changes for the 2.8 release. >Hi Nicolas, Are you able to narrow it down to one of those patches? From the comments, 112804 and 112805 seem fairly innocuous: ------------------------------------------------------------------------ r112805 | bruno | 2010-09-01 21:20:26 -0700 (Wed, 01 Sep 2010) | 1 line Move condition out to prepare for more matching ------------------------------------------------------------------------ r112804 | bruno | 2010-09-01 20:57:58 -0700 (Wed, 01 Sep 2010) | 1 line Remove checking for isUNPCKL_v_undef_Mask, the specific node is already emitted for it ------------------------------------------------------------------------ This one maybe caused the MMX regression? ------------------------------------------------------------------------ r112806 | bruno | 2010-09-01 22:23:12 -0700 (Wed, 01 Sep 2010) | 1 line Replace unpckl_undef and unpckh_undef matching with target specific opcodes ------------------------------------------------------------------------ If you can narrow it down, file a bug report, and if it doesn't cause a regression from 2.7 in other areas of the compiler, then we can make a determination on whether to remove the offending commit from 2.8. Thanks! -bw
Hi Dale, I suspect that these patches were intended to improve 128-bit vector performance but caused certain 64-bit vector operations to no longer lower to MMX instructions. Anyway, now that I've narrowed it down to these patches I think I can narrow it down further to a specific case so I can file a bug... Will Bruno be back soon or is he no longer working on the project for good? Cheers, Nicolas -----Original Message----- From: Dale Johannesen [mailto:dalej at apple.com] Sent: Tuesday, September 21, 2010 20:12 To: Nicolas Capens Cc: Dale Johannesen; 'Chris Lattner'; 'Eli Friedman'; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] LLVM 2.8 and MMX On Sep 21, 2010, at 10:23 AMPDT, Nicolas Capens wrote:> Hi all, > > Sorry for the late reply. I got sidetracked by other fun projects. ;-) > > I found that the performance regression is caused by revisions 112804, > 112805 and 112806. Those changes were made 2 days prior to the 2.8 > branching, so it may have not been the intention to include them there? > Either way they make my vector-intensive code two times slower so it would > be much appreciated to revert these changes for the 2.8 release. > > Thanks, > > NicolasInteresting. These are all Bruno's patches, and I'm pretty sure they weren't intended to affect MMX. I doubt reverting them is right since the effect on SSE is presumably positive. Unfortunately Bruno is not here any more.> -----Original Message----- > From: Chris Lattner [mailto:clattner at apple.com] >> >> I think some changes related to MMX landed before 2.8 branched which >> shouldn't have... please file a bug.So please file a bug, with example.
This thread confuses me. I thought Chris said that LLVM 2.8 will not lower generic vectors to MMX because it breaks x87 code, and I didn't see an answer to your question about a switch to tell the code generator otherwise. However, you're complaining that MMX performance is subpar, even though LLVM 2.8 isn't supposed to generate MMX instructions. Can someone clarify the situation for me? Thanks, Reid On Tue, Sep 21, 2010 at 6:13 PM, Nicolas Capens <nicolas.capens at gmail.com> wrote:> Hi Dale, > > I suspect that these patches were intended to improve 128-bit vector > performance but caused certain 64-bit vector operations to no longer lower > to MMX instructions. Anyway, now that I've narrowed it down to these patches > I think I can narrow it down further to a specific case so I can file a > bug... > > Will Bruno be back soon or is he no longer working on the project for good? > > Cheers, > > Nicolas > > > -----Original Message----- > From: Dale Johannesen [mailto:dalej at apple.com] > Sent: Tuesday, September 21, 2010 20:12 > To: Nicolas Capens > Cc: Dale Johannesen; 'Chris Lattner'; 'Eli Friedman'; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] LLVM 2.8 and MMX > > > On Sep 21, 2010, at 10:23 AMPDT, Nicolas Capens wrote: > >> Hi all, >> >> Sorry for the late reply. I got sidetracked by other fun projects. ;-) >> >> I found that the performance regression is caused by revisions 112804, >> 112805 and 112806. Those changes were made 2 days prior to the 2.8 >> branching, so it may have not been the intention to include them there? >> Either way they make my vector-intensive code two times slower so it would >> be much appreciated to revert these changes for the 2.8 release. >> >> Thanks, >> >> Nicolas > > Interesting. These are all Bruno's patches, and I'm pretty sure they > weren't intended to affect MMX. I doubt reverting them is right since the > effect on SSE is presumably positive. Unfortunately Bruno is not here any > more. > >> -----Original Message----- >> From: Chris Lattner [mailto:clattner at apple.com] >>> >>> I think some changes related to MMX landed before 2.8 branched which >>> shouldn't have... please file a bug. > > So please file a bug, with example. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
LLVM isn't going to stop generating MMX instructions all together. We can't do that. :-) If the user specifically wants MMX (by, say, using the builtins), we have to support that still. The plan to cease generating MMX for generic vectors is a work-in-progress right now. It's not in 2.8. -bw On Sep 21, 2010, at 4:24 PM, Reid Kleckner wrote:> This thread confuses me. I thought Chris said that LLVM 2.8 will not > lower generic vectors to MMX because it breaks x87 code, and I didn't > see an answer to your question about a switch to tell the code > generator otherwise. However, you're complaining that MMX performance > is subpar, even though LLVM 2.8 isn't supposed to generate MMX > instructions. > > Can someone clarify the situation for me? > > Thanks, > Reid > > On Tue, Sep 21, 2010 at 6:13 PM, Nicolas Capens > <nicolas.capens at gmail.com> wrote: >> Hi Dale, >> >> I suspect that these patches were intended to improve 128-bit vector >> performance but caused certain 64-bit vector operations to no longer lower >> to MMX instructions. Anyway, now that I've narrowed it down to these patches >> I think I can narrow it down further to a specific case so I can file a >> bug... >> >> Will Bruno be back soon or is he no longer working on the project for good? >> >> Cheers, >> >> Nicolas >> >> >> -----Original Message----- >> From: Dale Johannesen [mailto:dalej at apple.com] >> Sent: Tuesday, September 21, 2010 20:12 >> To: Nicolas Capens >> Cc: Dale Johannesen; 'Chris Lattner'; 'Eli Friedman'; llvmdev at cs.uiuc.edu >> Subject: Re: [LLVMdev] LLVM 2.8 and MMX >> >> >> On Sep 21, 2010, at 10:23 AMPDT, Nicolas Capens wrote: >> >>> Hi all, >>> >>> Sorry for the late reply. I got sidetracked by other fun projects. ;-) >>> >>> I found that the performance regression is caused by revisions 112804, >>> 112805 and 112806. Those changes were made 2 days prior to the 2.8 >>> branching, so it may have not been the intention to include them there? >>> Either way they make my vector-intensive code two times slower so it would >>> be much appreciated to revert these changes for the 2.8 release. >>> >>> Thanks, >>> >>> Nicolas >> >> Interesting. These are all Bruno's patches, and I'm pretty sure they >> weren't intended to affect MMX. I doubt reverting them is right since the >> effect on SSE is presumably positive. Unfortunately Bruno is not here any >> more. >> >>> -----Original Message----- >>> From: Chris Lattner [mailto:clattner at apple.com] >>>> >>>> I think some changes related to MMX landed before 2.8 branched which >>>> shouldn't have... please file a bug. >> >> So please file a bug, with example. >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Hi Bill, I'm currently focusing on 112804 and it definitely results in a performance regression. If I understand the comment correctly, it is intended to remove a case that should never be hit. However, when I check for isUNPCKL_v_undef_Mask(SVOp) below it my code does hit it. In particular it happens when I generate "shufflevector <8 x i8> %v1, <8 x i8> %v1, undef, {0, 8, 1, 9, 2, 10, 3, 11}". This used to lower to UNPCKLBW but with revision 112804 it becomes a bunch of byte moves. Also note that this change doesn't appear to improve SSE code generation. I'll look at whether the other patches could also be reverted to fix MMX performance without hurting SSE... Cheers, Nicolas -----Original Message----- From: Bill Wendling [mailto:wendling at apple.com] Sent: Tuesday, September 21, 2010 23:37 To: Nicolas Capens Cc: 'Chris Lattner'; 'Eli Friedman'; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] LLVM 2.8 and MMX On Sep 21, 2010, at 10:23 AM, Nicolas Capens wrote:> Hi all, > > Sorry for the late reply. I got sidetracked by other fun projects. ;-) > > I found that the performance regression is caused by revisions 112804, > 112805 and 112806. Those changes were made 2 days prior to the 2.8 > branching, so it may have not been the intention to include them there? > Either way they make my vector-intensive code two times slower so it would > be much appreciated to revert these changes for the 2.8 release. >Hi Nicolas, Are you able to narrow it down to one of those patches? From the comments, 112804 and 112805 seem fairly innocuous: ------------------------------------------------------------------------ r112805 | bruno | 2010-09-01 21:20:26 -0700 (Wed, 01 Sep 2010) | 1 line Move condition out to prepare for more matching ------------------------------------------------------------------------ r112804 | bruno | 2010-09-01 20:57:58 -0700 (Wed, 01 Sep 2010) | 1 line Remove checking for isUNPCKL_v_undef_Mask, the specific node is already emitted for it ------------------------------------------------------------------------ This one maybe caused the MMX regression? ------------------------------------------------------------------------ r112806 | bruno | 2010-09-01 22:23:12 -0700 (Wed, 01 Sep 2010) | 1 line Replace unpckl_undef and unpckh_undef matching with target specific opcodes ------------------------------------------------------------------------ If you can narrow it down, file a bug report, and if it doesn't cause a regression from 2.7 in other areas of the compiler, then we can make a determination on whether to remove the offending commit from 2.8. Thanks! -bw