thr3ads.net - llvm dev - [LLVMdev] LLVM 2.8 and MMX [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Eli Friedman

2010-Sep-08 14:24 UTC

[LLVMdev] LLVM 2.8 and MMX

On Wed, Sep 8, 2010 at 12:35 AM, Nicolas Capens
<nicolas.capens at gmail.com> wrote:> Hi Chris,
>
> It's not broken, but the performance is crippled.
>
> I noticed that the code still contains some MMX instructions, but several
> operations get expanded (apparently swizzling and such get expanded to a
> large number of byte moves).
I think some changes related to MMX landed before 2.8 branched which
shouldn't have... please file a bug.
> I could use intrinsics, but they wouldn't be optimized like other
vector
> operations. I could use SSE operations, but they would increase SSE
register
> pressure while MMX registers are left unused.
>
>
>
> So ideally I would like to inform LLVM that selecting MMX instructions is
> fine. I'm inserting emms instructions in the right spots myself.
I think the direction going forward we're going to prefer is that
64-bit vectors get widened to 128-bit vectors, which might not be
quite ideal in some situations, but will avoid situations where MMX
instructions are incorrectly generated.  That said, the work isn't
finished, so it shouldn't be in 2.8.

-Eli

Chris Lattner

2010-Sep-08 16:58 UTC

head link

[LLVMdev] LLVM 2.8 and MMX

On Sep 8, 2010, at 7:24 AM, Eli Friedman wrote:
> On Wed, Sep 8, 2010 at 12:35 AM, Nicolas Capens
> <nicolas.capens at gmail.com> wrote:
>> Hi Chris,
>> 
>> It's not broken, but the performance is crippled.
>> 
>> I noticed that the code still contains some MMX instructions, but
several
>> operations get expanded (apparently swizzling and such get expanded to
a
>> large number of byte moves).
> 
> I think some changes related to MMX landed before 2.8 branched which
> shouldn't have... please file a bug.
Right.  There should be no major change before 2.8, so if something bad
happened, it needs to be fixed on the branch.
>> I could use intrinsics, but they wouldn't be optimized like other
vector
>> operations. I could use SSE operations, but they would increase SSE
register
>> pressure while MMX registers are left unused.
>> 
>> So ideally I would like to inform LLVM that selecting MMX instructions
is
>> fine. I'm inserting emms instructions in the right spots myself.
> 
> I think the direction going forward we're going to prefer is that
> 64-bit vectors get widened to 128-bit vectors, which might not be
> quite ideal in some situations, but will avoid situations where MMX
> instructions are incorrectly generated.  That said, the work isn't
> finished, so it shouldn't be in 2.8.
In 2.9, the only way to get MMX will be to use mmx intrinsics, generic vectors
will not map onto MMX, sorry Nicolas.  One major problem is that the optimizer
introduces generic vectors (e.g. see r112696 in the SRoA pass) which use mmx
where it was not previously used.  This means that your frontend introducing
emms is not enough.

-Chris

Nicolas Capens

2010-Sep-21 17:23 UTC

head link

[LLVMdev] LLVM 2.8 and MMX

Hi all,

Sorry for the late reply. I got sidetracked by other fun projects. ;-)

I found that the performance regression is caused by revisions 112804,
112805 and 112806. Those changes were made 2 days prior to the 2.8
branching, so it may have not been the intention to include them there?
Either way they make my vector-intensive code two times slower so it would
be much appreciated to revert these changes for the 2.8 release.

Thanks,

Nicolas


-----Original Message-----
From: Chris Lattner [mailto:clattner at apple.com] 
Sent: Wednesday, September 08, 2010 18:59
To: Eli Friedman
Cc: Nicolas Capens; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] LLVM 2.8 and MMX


On Sep 8, 2010, at 7:24 AM, Eli Friedman wrote:
> On Wed, Sep 8, 2010 at 12:35 AM, Nicolas Capens
> <nicolas.capens at gmail.com> wrote:
>> Hi Chris,
>> 
>> It's not broken, but the performance is crippled.
>> 
>> I noticed that the code still contains some MMX instructions, but
several
>> operations get expanded (apparently swizzling and such get expanded to
a
>> large number of byte moves).
> 
> I think some changes related to MMX landed before 2.8 branched which
> shouldn't have... please file a bug.
Right.  There should be no major change before 2.8, so if something bad
happened, it needs to be fixed on the branch.
>> I could use intrinsics, but they wouldn't be optimized like other
vector
>> operations. I could use SSE operations, but they would increase SSE
register>> pressure while MMX registers are left unused.
>> 
>> So ideally I would like to inform LLVM that selecting MMX instructions
is
>> fine. I'm inserting emms instructions in the right spots myself.
> 
> I think the direction going forward we're going to prefer is that
> 64-bit vectors get widened to 128-bit vectors, which might not be
> quite ideal in some situations, but will avoid situations where MMX
> instructions are incorrectly generated.  That said, the work isn't
> finished, so it shouldn't be in 2.8.
In 2.9, the only way to get MMX will be to use mmx intrinsics, generic
vectors will not map onto MMX, sorry Nicolas.  One major problem is that the
optimizer introduces generic vectors (e.g. see r112696 in the SRoA pass)
which use mmx where it was not previously used.  This means that your
frontend introducing emms is not enough.

-Chris

Dale Johannesen

2010-Sep-21 18:12 UTC

head link

[LLVMdev] LLVM 2.8 and MMX

On Sep 21, 2010, at 10:23 AMPDT, Nicolas Capens wrote:
> Hi all,
> 
> Sorry for the late reply. I got sidetracked by other fun projects. ;-)
> 
> I found that the performance regression is caused by revisions 112804,
> 112805 and 112806. Those changes were made 2 days prior to the 2.8
> branching, so it may have not been the intention to include them there?
> Either way they make my vector-intensive code two times slower so it would
> be much appreciated to revert these changes for the 2.8 release.
> 
> Thanks,
> 
> Nicolas
Interesting.  These are all Bruno's patches, and I'm pretty sure they
weren't intended to affect MMX.   I doubt reverting them is right since the
effect on SSE is presumably positive.  Unfortunately Bruno is not here any more.
> -----Original Message-----
> From: Chris Lattner [mailto:clattner at apple.com] 
>> 
>> I think some changes related to MMX landed before 2.8 branched which
>> shouldn't have... please file a bug.
So please file a bug, with example.

Bill Wendling

2010-Sep-21 21:36 UTC

head link

[LLVMdev] LLVM 2.8 and MMX

On Sep 21, 2010, at 10:23 AM, Nicolas Capens wrote:
> Hi all,
> 
> Sorry for the late reply. I got sidetracked by other fun projects. ;-)
> 
> I found that the performance regression is caused by revisions 112804,
> 112805 and 112806. Those changes were made 2 days prior to the 2.8
> branching, so it may have not been the intention to include them there?
> Either way they make my vector-intensive code two times slower so it would
> be much appreciated to revert these changes for the 2.8 release.
> Hi Nicolas,

Are you able to narrow it down to one of those patches? From the comments,
112804 and 112805 seem fairly innocuous:

------------------------------------------------------------------------
r112805 | bruno | 2010-09-01 21:20:26 -0700 (Wed, 01 Sep 2010) | 1 line

Move condition out to prepare for more matching
------------------------------------------------------------------------
r112804 | bruno | 2010-09-01 20:57:58 -0700 (Wed, 01 Sep 2010) | 1 line

Remove checking for isUNPCKL_v_undef_Mask, the specific node is already emitted
for it
------------------------------------------------------------------------

This one maybe caused the MMX regression?

------------------------------------------------------------------------
r112806 | bruno | 2010-09-01 22:23:12 -0700 (Wed, 01 Sep 2010) | 1 line

Replace unpckl_undef and unpckh_undef matching with target specific opcodes
------------------------------------------------------------------------

If you can narrow it down, file a bug report, and if it doesn't cause a
regression from 2.7 in other areas of the compiler, then we can make a
determination on whether to remove the offending commit from 2.8.

Thanks!
-bw

Nicolas Capens

2010-Sep-21 22:13 UTC

head link

[LLVMdev] LLVM 2.8 and MMX

Hi Dale,

I suspect that these patches were intended to improve 128-bit vector
performance but caused certain 64-bit vector operations to no longer lower
to MMX instructions. Anyway, now that I've narrowed it down to these patches
I think I can narrow it down further to a specific case so I can file a
bug...

Will Bruno be back soon or is he no longer working on the project for good?

Cheers,

Nicolas


-----Original Message-----
From: Dale Johannesen [mailto:dalej at apple.com] 
Sent: Tuesday, September 21, 2010 20:12
To: Nicolas Capens
Cc: Dale Johannesen; 'Chris Lattner'; 'Eli Friedman'; llvmdev at
cs.uiuc.edu
Subject: Re: [LLVMdev] LLVM 2.8 and MMX


On Sep 21, 2010, at 10:23 AMPDT, Nicolas Capens wrote:
> Hi all,
> 
> Sorry for the late reply. I got sidetracked by other fun projects. ;-)
> 
> I found that the performance regression is caused by revisions 112804,
> 112805 and 112806. Those changes were made 2 days prior to the 2.8
> branching, so it may have not been the intention to include them there?
> Either way they make my vector-intensive code two times slower so it would
> be much appreciated to revert these changes for the 2.8 release.
> 
> Thanks,
> 
> Nicolas
Interesting.  These are all Bruno's patches, and I'm pretty sure they
weren't intended to affect MMX.   I doubt reverting them is right since the
effect on SSE is presumably positive.  Unfortunately Bruno is not here any
more.  
> -----Original Message-----
> From: Chris Lattner [mailto:clattner at apple.com] 
>> 
>> I think some changes related to MMX landed before 2.8 branched which
>> shouldn't have... please file a bug.
So please file a bug, with example.

Reid Kleckner

2010-Sep-21 23:24 UTC

head link

[LLVMdev] LLVM 2.8 and MMX

This thread confuses me.  I thought Chris said that LLVM 2.8 will not
lower generic vectors to MMX because it breaks x87 code, and I didn't
see an answer to your question about a switch to tell the code
generator otherwise.  However, you're complaining that MMX performance
is subpar, even though LLVM 2.8 isn't supposed to generate MMX
instructions.

Can someone clarify the situation for me?

Thanks,
Reid

On Tue, Sep 21, 2010 at 6:13 PM, Nicolas Capens
<nicolas.capens at gmail.com> wrote:> Hi Dale,
>
> I suspect that these patches were intended to improve 128-bit vector
> performance but caused certain 64-bit vector operations to no longer lower
> to MMX instructions. Anyway, now that I've narrowed it down to these
patches
> I think I can narrow it down further to a specific case so I can file a
> bug...
>
> Will Bruno be back soon or is he no longer working on the project for good?
>
> Cheers,
>
> Nicolas
>
>
> -----Original Message-----
> From: Dale Johannesen [mailto:dalej at apple.com]
> Sent: Tuesday, September 21, 2010 20:12
> To: Nicolas Capens
> Cc: Dale Johannesen; 'Chris Lattner'; 'Eli Friedman';
llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] LLVM 2.8 and MMX
>
>
> On Sep 21, 2010, at 10:23 AMPDT, Nicolas Capens wrote:
>
>> Hi all,
>>
>> Sorry for the late reply. I got sidetracked by other fun projects. ;-)
>>
>> I found that the performance regression is caused by revisions 112804,
>> 112805 and 112806. Those changes were made 2 days prior to the 2.8
>> branching, so it may have not been the intention to include them there?
>> Either way they make my vector-intensive code two times slower so it
would
>> be much appreciated to revert these changes for the 2.8 release.
>>
>> Thanks,
>>
>> Nicolas
>
> Interesting.  These are all Bruno's patches, and I'm pretty sure
they
> weren't intended to affect MMX.   I doubt reverting them is right since
the
> effect on SSE is presumably positive.  Unfortunately Bruno is not here any
> more.
>
>> -----Original Message-----
>> From: Chris Lattner [mailto:clattner at apple.com]
>>>
>>> I think some changes related to MMX landed before 2.8 branched
which
>>> shouldn't have... please file a bug.
>
> So please file a bug, with example.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Bill Wendling

2010-Sep-22 00:30 UTC

head link

[LLVMdev] LLVM 2.8 and MMX

LLVM isn't going to stop generating MMX instructions all together. We
can't do that. :-) If the user specifically wants MMX (by, say, using the
builtins), we have to support that still. The plan to cease generating MMX for
generic vectors is a work-in-progress right now. It's not in 2.8.

-bw

On Sep 21, 2010, at 4:24 PM, Reid Kleckner wrote:
> This thread confuses me.  I thought Chris said that LLVM 2.8 will not
> lower generic vectors to MMX because it breaks x87 code, and I didn't
> see an answer to your question about a switch to tell the code
> generator otherwise.  However, you're complaining that MMX performance
> is subpar, even though LLVM 2.8 isn't supposed to generate MMX
> instructions.
> 
> Can someone clarify the situation for me?
> 
> Thanks,
> Reid
> 
> On Tue, Sep 21, 2010 at 6:13 PM, Nicolas Capens
> <nicolas.capens at gmail.com> wrote:
>> Hi Dale,
>> 
>> I suspect that these patches were intended to improve 128-bit vector
>> performance but caused certain 64-bit vector operations to no longer
lower
>> to MMX instructions. Anyway, now that I've narrowed it down to
these patches
>> I think I can narrow it down further to a specific case so I can file a
>> bug...
>> 
>> Will Bruno be back soon or is he no longer working on the project for
good?
>> 
>> Cheers,
>> 
>> Nicolas
>> 
>> 
>> -----Original Message-----
>> From: Dale Johannesen [mailto:dalej at apple.com]
>> Sent: Tuesday, September 21, 2010 20:12
>> To: Nicolas Capens
>> Cc: Dale Johannesen; 'Chris Lattner'; 'Eli Friedman';
llvmdev at cs.uiuc.edu
>> Subject: Re: [LLVMdev] LLVM 2.8 and MMX
>> 
>> 
>> On Sep 21, 2010, at 10:23 AMPDT, Nicolas Capens wrote:
>> 
>>> Hi all,
>>> 
>>> Sorry for the late reply. I got sidetracked by other fun projects.
;-)
>>> 
>>> I found that the performance regression is caused by revisions
112804,
>>> 112805 and 112806. Those changes were made 2 days prior to the 2.8
>>> branching, so it may have not been the intention to include them
there?
>>> Either way they make my vector-intensive code two times slower so
it would
>>> be much appreciated to revert these changes for the 2.8 release.
>>> 
>>> Thanks,
>>> 
>>> Nicolas
>> 
>> Interesting.  These are all Bruno's patches, and I'm pretty
sure they
>> weren't intended to affect MMX.   I doubt reverting them is right
since the
>> effect on SSE is presumably positive.  Unfortunately Bruno is not here
any
>> more.
>> 
>>> -----Original Message-----
>>> From: Chris Lattner [mailto:clattner at apple.com]
>>>> 
>>>> I think some changes related to MMX landed before 2.8 branched
which
>>>> shouldn't have... please file a bug.
>> 
>> So please file a bug, with example.
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Nicolas Capens

2010-Sep-22 03:30 UTC

head link

[LLVMdev] LLVM 2.8 and MMX

Hi Bill,

I'm currently focusing on 112804 and it definitely results in a performance
regression.

If I understand the comment correctly, it is intended to remove a case that
should never be hit. However, when I check for isUNPCKL_v_undef_Mask(SVOp)
below it my code does hit it. In particular it happens when I generate
"shufflevector <8 x i8> %v1, <8 x i8> %v1, undef, {0, 8, 1, 9,
2, 10, 3,
11}". This used to lower to UNPCKLBW but with revision 112804 it becomes a
bunch of byte moves.

Also note that this change doesn't appear to improve SSE code generation.
I'll look at whether the other patches could also be reverted to fix MMX
performance without hurting SSE...

Cheers,

Nicolas

-----Original Message-----
From: Bill Wendling [mailto:wendling at apple.com] 
Sent: Tuesday, September 21, 2010 23:37
To: Nicolas Capens
Cc: 'Chris Lattner'; 'Eli Friedman'; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] LLVM 2.8 and MMX

On Sep 21, 2010, at 10:23 AM, Nicolas Capens wrote:
> Hi all,
> 
> Sorry for the late reply. I got sidetracked by other fun projects. ;-)
> 
> I found that the performance regression is caused by revisions 112804,
> 112805 and 112806. Those changes were made 2 days prior to the 2.8
> branching, so it may have not been the intention to include them there?
> Either way they make my vector-intensive code two times slower so it would
> be much appreciated to revert these changes for the 2.8 release.
> Hi Nicolas,

Are you able to narrow it down to one of those patches? From the comments,
112804 and 112805 seem fairly innocuous:

------------------------------------------------------------------------
r112805 | bruno | 2010-09-01 21:20:26 -0700 (Wed, 01 Sep 2010) | 1 line

Move condition out to prepare for more matching
------------------------------------------------------------------------
r112804 | bruno | 2010-09-01 20:57:58 -0700 (Wed, 01 Sep 2010) | 1 line

Remove checking for isUNPCKL_v_undef_Mask, the specific node is already
emitted for it
------------------------------------------------------------------------

This one maybe caused the MMX regression?

------------------------------------------------------------------------
r112806 | bruno | 2010-09-01 22:23:12 -0700 (Wed, 01 Sep 2010) | 1 line

Replace unpckl_undef and unpckh_undef matching with target specific opcodes
------------------------------------------------------------------------

If you can narrow it down, file a bug report, and if it doesn't cause a
regression from 2.7 in other areas of the compiler, then we can make a
determination on whether to remove the offending commit from 2.8.

Thanks!
-bw

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Sep 2010 - [LLVMdev] LLVM 2.8 and MMX

[LLVMdev] LLVM 2.8 and MMX

[LLVMdev] LLVM 2.8 and MMX

[LLVMdev] LLVM 2.8 and MMX

[LLVMdev] LLVM 2.8 and MMX

[LLVMdev] LLVM 2.8 and MMX

[LLVMdev] LLVM 2.8 and MMX

[LLVMdev] LLVM 2.8 and MMX

[LLVMdev] LLVM 2.8 and MMX

[LLVMdev] LLVM 2.8 and MMX

Seemingly Similar Threads