thr3ads.net - opus - [opus] [PATCH] fix alignment exceptions [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Felicia Lim

2017-Aug-18 19:11 UTC

[opus] [PATCH] fix alignment exceptions

We see the MOVQ instruction but this patch deliberately uses it rather than
MOVQDA (load 128-bits aligned). We were seeing that with the trace below,
the final invocation is not 128-bit aligned but MOVQDA insists on it (the
calling function was pitch_sse4_1.c:90, in the 4-way N - i >= 4 loop).

07-31 11:00:13.469   210  2540 <(469)%20210-2540> D opus_sse1: RBE
celt_inner_prod_sse4_1: x 0xeff3deb0 y 0xeff3deb0 N 32
07-31 11:00:13.469   210  2540 <(469)%20210-2540> D opus_sse1: RBE
celt_inner_prod_sse4_1: x 0xeff3d7b0 y 0xeff3d7b0 N 32
07-31 11:00:13.469   210  2540 <(469)%20210-2540> D opus_sse1: RBE
celt_inner_prod_sse4_1: x 0xeff3df30 y 0xeff3df30 N 32
07-31 11:00:13.470   210  2540 <(470)%20210-2540> D opus_sse1: RBE
celt_inner_prod_sse4_1: x 0xeff3d850 y 0xeff3d850 N 48
07-31 11:00:13.470   210  2540 <(470)%20210-2540> D opus_sse1: RBE
celt_inner_prod_sse4_1: x 0xeff3dfd0 y 0xeff3dfd0 N 48
07-31 11:00:13.470   210  2540 <(470)%20210-2540> D opus_sse1: RBE
celt_inner_prod_sse4_1: x 0xeff3d8b0 y 0xeff3d8b0 N 64
07-31 11:00:13.470   210  2540 <(470)%20210-2540> D opus_sse1: RBE
celt_inner_prod_sse4_1: x 0xeff3e030 y 0xeff3e030 N 64
07-31 11:00:13.476   210  2540 D opus_sse1: RBE celt_inner_prod_sse4_1: x
0xeff3da38 y 0xeff3da38 N 36


On Fri, Aug 18, 2017 at 11:44 AM Jonathan Lennox <jonathan at vidyo.com>
wrote:
> This would revert a patch I submitted two years ago (
>
http://git.xiph.org/?p=opus.git;a=commitdiff;h=1d60b49e9d95672a17ebe5578319c59fa3963224
> ).
>
> At the time, clang produced an unnecessary MOVQ instruction when it
> compiled with the version of the code with the explicit _mm_loadl_epi64
> intrinsic.  Do you no longer see that?
>
> How does the code compile for you, and what is the issue you’re seeing?
>  (One issue might be that clang’s address sanitizer isn’t smart enough to
> know that PMOVSXWD only loads 8 bytes, despite _mm_cvtepi16_epi32’s
> argument being an __mm128i; I’ve seen it trigger incorrect out-of-bounds
> read errors.)
>
> On Aug 18, 2017, at 12:34 PM, Felicia Lim <flim at google.com> wrote:
>
> Hi,
>
> Please find attached a patch to fix alignment exceptions. Without this
> change, we were seeing occasional alignment faults when using this with
> clang.
>
> Thanks,
> Felicia
>
> <e5c277c.diff>_______________________________________________
> opus mailing list
> opus at xiph.org
> http://lists.xiph.org/mailman/listinfo/opus
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20170818/55dd7ab9/attachment.html>

Ray Essick

2017-Aug-18 21:26 UTC

head link

[opus] [PATCH] fix alignment exceptions

Jonathan,
Here's the code difference we see with the recent change -- what amounts to
reverting your change from a couple years back.
It doesn't look like we're getting superfluous instructions from clang
now.

the bad behavior for us was the alignment exception on the movdqa
instructions when the input data wasn't 128-bit aligned.
We had to change something because the code as-is was taking alignment
faults on the movdqa instructions.

For reference, the clang version  I used for this is:
    | Android clang version 5.0.300080  (based on LLVM 5.0.300080)
    | Target: x86_64-unknown-linux


If we think enough people use older versions of clang, a version of the
patch that looked at __clang_major__ and friends seems fair.

-- Ray

826% diff -c *old *new
*** pitch_sse4_1.s-old 2017-08-18 13:51:39.359084637 -0700
--- pitch_sse4_1.s-new 2017-08-18 13:51:54.595106450 -0700
***************
*** 73,80 ****
  cmpl $4, %eax
  jl .LBB0_8
  # BB#7:
! movdqa (%edx,%edi,2), %xmm2
! movdqa (%esi,%edi,2), %xmm1
  addl $4, %edi
  movdqa %xmm2, %xmm3
  pmullw %xmm1, %xmm2
--- 73,80 ----
  cmpl $4, %eax
  jl .LBB0_8
  # BB#7:
! movq (%edx,%edi,2), %xmm2    # xmm2 = mem[0],zero
! movq (%esi,%edi,2), %xmm1    # xmm1 = mem[0],zero
  addl $4, %edi
  movdqa %xmm2, %xmm3
  pmullw %xmm1, %xmm2


On Fri, Aug 18, 2017 at 12:11 PM, Felicia Lim <flim at google.com> wrote:
> We see the MOVQ instruction but this patch deliberately uses it rather
> than MOVQDA (load 128-bits aligned). We were seeing that with the trace
> below, the final invocation is not 128-bit aligned but MOVQDA insists on it
> (the calling function was pitch_sse4_1.c:90, in the 4-way N - i >= 4
> loop).
>
> 07-31 11:00:13.469   210  2540 <(469)%20210-2540> D opus_sse1: RBE
> celt_inner_prod_sse4_1: x 0xeff3deb0 y 0xeff3deb0 N 32
> 07-31 11:00:13.469   210  2540 <(469)%20210-2540> D opus_sse1: RBE
> celt_inner_prod_sse4_1: x 0xeff3d7b0 y 0xeff3d7b0 N 32
> 07-31 11:00:13.469   210  2540 <(469)%20210-2540> D opus_sse1: RBE
> celt_inner_prod_sse4_1: x 0xeff3df30 y 0xeff3df30 N 32
> 07-31 11:00:13.470   210  2540 <(470)%20210-2540> D opus_sse1: RBE
> celt_inner_prod_sse4_1: x 0xeff3d850 y 0xeff3d850 N 48
> 07-31 11:00:13.470   210  2540 <(470)%20210-2540> D opus_sse1: RBE
> celt_inner_prod_sse4_1: x 0xeff3dfd0 y 0xeff3dfd0 N 48
> 07-31 11:00:13.470   210  2540 <(470)%20210-2540> D opus_sse1: RBE
> celt_inner_prod_sse4_1: x 0xeff3d8b0 y 0xeff3d8b0 N 64
> 07-31 11:00:13.470   210  2540 <(470)%20210-2540> D opus_sse1: RBE
> celt_inner_prod_sse4_1: x 0xeff3e030 y 0xeff3e030 N 64
> 07-31 11:00:13.476   210  2540 D opus_sse1: RBE celt_inner_prod_sse4_1: x
> 0xeff3da38 y 0xeff3da38 N 36
>
>
> On Fri, Aug 18, 2017 at 11:44 AM Jonathan Lennox <jonathan at
vidyo.com>
> wrote:
>
>> This would revert a patch I submitted two years ago (
>> http://git.xiph.org/?p=opus.git;a=commitdiff;h>>
1d60b49e9d95672a17ebe5578319c59fa3963224).
>>
>> At the time, clang produced an unnecessary MOVQ instruction when it
>> compiled with the version of the code with the explicit _mm_loadl_epi64
>> intrinsic.  Do you no longer see that?
>>
>> How does the code compile for you, and what is the issue you’re seeing?
>>  (One issue might be that clang’s address sanitizer isn’t smart enough
to
>> know that PMOVSXWD only loads 8 bytes, despite _mm_cvtepi16_epi32’s
>> argument being an __mm128i; I’ve seen it trigger incorrect
out-of-bounds
>> read errors.)
>>
>> On Aug 18, 2017, at 12:34 PM, Felicia Lim <flim at google.com>
wrote:
>>
>> Hi,
>>
>> Please find attached a patch to fix alignment exceptions. Without this
>> change, we were seeing occasional alignment faults when using this with
>> clang.
>>
>> Thanks,
>> Felicia
>>
>> <e5c277c.diff>_______________________________________________
>> opus mailing list
>> opus at xiph.org
>> http://lists.xiph.org/mailman/listinfo/opus
>>
>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20170818/f7384b44/attachment-0001.html>

Jonathan Lennox

2017-Aug-22 15:34 UTC

head link

[opus] [PATCH] fix alignment exceptions

Hm, yes — with the current Xcode clang (Xcode 8.3.3, Apple LLVM version 8.1.0
(clang-802.0.42)), I don’t see any difference in the generated code with -O2
between the two versions.  (Literally none for x86_64, and only what seem to be
some non-signficiant offset changes for i386).

So if this improves some other cases (notably, it should avoid problems with
-fsanitize=address) I think this is okay.

On Aug 18, 2017, at 5:25 PM, Ray Essick <essick at
google.com<mailto:essick at google.com>> wrote:

Jonathan,
Here's the code difference we see with the recent change -- what amounts to
reverting your change from a couple years back.
It doesn't look like we're getting superfluous instructions from clang
now.

the bad behavior for us was the alignment exception on the movdqa instructions
when the input data wasn't 128-bit aligned.
We had to change something because the code as-is was taking alignment faults on
the movdqa instructions.

For reference, the clang version  I used for this is:
    | Android clang version 5.0.300080  (based on LLVM 5.0.300080)
    | Target: x86_64-unknown-linux


If we think enough people use older versions of clang, a version of the patch
that looked at __clang_major__ and friends seems fair.

-- Ray

826% diff -c *old *new
*** pitch_sse4_1.s-old 2017-08-18 13:51:39.359084637 -0700
--- pitch_sse4_1.s-new 2017-08-18 13:51:54.595106450 -0700
***************
*** 73,80 ****
  cmpl $4, %eax
  jl .LBB0_8
  # BB#7:
! movdqa (%edx,%edi,2), %xmm2
! movdqa (%esi,%edi,2), %xmm1
  addl $4, %edi
  movdqa %xmm2, %xmm3
  pmullw %xmm1, %xmm2
--- 73,80 ----
  cmpl $4, %eax
  jl .LBB0_8
  # BB#7:
! movq (%edx,%edi,2), %xmm2    # xmm2 = mem[0],zero
! movq (%esi,%edi,2), %xmm1    # xmm1 = mem[0],zero
  addl $4, %edi
  movdqa %xmm2, %xmm3
  pmullw %xmm1, %xmm2


On Fri, Aug 18, 2017 at 12:11 PM, Felicia Lim <flim at
google.com<mailto:flim at google.com>> wrote:
We see the MOVQ instruction but this patch deliberately uses it rather than
MOVQDA (load 128-bits aligned). We were seeing that with the trace below, the
final invocation is not 128-bit aligned but MOVQDA insists on it (the calling
function was pitch_sse4_1.c:90, in the 4-way N - i >= 4 loop).

07-31 11:00:13.469   210  2540<tel:(469)%20210-2540> D opus_sse1: RBE
celt_inner_prod_sse4_1: x 0xeff3deb0 y 0xeff3deb0 N 32
07-31 11:00:13.469   210  2540<tel:(469)%20210-2540> D opus_sse1: RBE
celt_inner_prod_sse4_1: x 0xeff3d7b0 y 0xeff3d7b0 N 32
07-31 11:00:13.469   210  2540<tel:(469)%20210-2540> D opus_sse1: RBE
celt_inner_prod_sse4_1: x 0xeff3df30 y 0xeff3df30 N 32
07-31 11:00:13.470   210  2540<tel:(470)%20210-2540> D opus_sse1: RBE
celt_inner_prod_sse4_1: x 0xeff3d850 y 0xeff3d850 N 48
07-31 11:00:13.470   210  2540<tel:(470)%20210-2540> D opus_sse1: RBE
celt_inner_prod_sse4_1: x 0xeff3dfd0 y 0xeff3dfd0 N 48
07-31 11:00:13.470   210  2540<tel:(470)%20210-2540> D opus_sse1: RBE
celt_inner_prod_sse4_1: x 0xeff3d8b0 y 0xeff3d8b0 N 64
07-31 11:00:13.470   210  2540<tel:(470)%20210-2540> D opus_sse1: RBE
celt_inner_prod_sse4_1: x 0xeff3e030 y 0xeff3e030 N 64
07-31 11:00:13.476   210  2540 D opus_sse1: RBE celt_inner_prod_sse4_1: x
0xeff3da38 y 0xeff3da38 N 36


On Fri, Aug 18, 2017 at 11:44 AM Jonathan Lennox <jonathan at
vidyo.com<mailto:jonathan at vidyo.com>> wrote:
This would revert a patch I submitted two years ago
(http://git.xiph.org/?p=opus.git;a=commitdiff;h=1d60b49e9d95672a17ebe5578319c59fa3963224).

At the time, clang produced an unnecessary MOVQ instruction when it compiled
with the version of the code with the explicit _mm_loadl_epi64 intrinsic.  Do
you no longer see that?

How does the code compile for you, and what is the issue you’re seeing?  (One
issue might be that clang’s address sanitizer isn’t smart enough to know that
PMOVSXWD only loads 8 bytes, despite _mm_cvtepi16_epi32’s argument being an
__mm128i; I’ve seen it trigger incorrect out-of-bounds read errors.)

On Aug 18, 2017, at 12:34 PM, Felicia Lim <flim at google.com<mailto:flim
at google.com>> wrote:

Hi,

Please find attached a patch to fix alignment exceptions. Without this change,
we were seeing occasional alignment faults when using this with clang.

Thanks,
Felicia
<e5c277c.diff>_______________________________________________
opus mailing list
opus at xiph.org<mailto:opus at xiph.org>
http://lists.xiph.org/mailman/listinfo/opus



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20170822/98c22cb4/attachment.html>

Possibly Parallel Threads

Search for more reasonably related threads

opus - Aug 2017 - [PATCH] fix alignment exceptions

[opus] [PATCH] fix alignment exceptions

[opus] [PATCH] fix alignment exceptions

[opus] [PATCH] fix alignment exceptions

Possibly Parallel Threads