lvqcl
2014-Mar-21 17:28 UTC
[flac-dev] About "attempt to fix differences between x86 FPU and SSE calculations"
More specifically, about this patch: http://git.xiph.org/?p=flac.git;a=commitdiff;h=70b078cfd5f9d4b0692c33f018cac3c652b14f90 I downloaded the latest code from git (flac-70b078c), disabled all SSE optimizations in the code and compiled it (GCC 4.8.2). This patch doesn't change FLAC output. Either gcc is too smart and optimizes this new code back to the old, or this fix is MSVS-specific. Or both.
Olivier Tristan
2014-Mar-21 18:41 UTC
[flac-dev] About "attempt to fix differences between x86 FPU and SSE calculations"
Check with -mfpmath=387 to be sure that x87 FPU code is used and not some SSE optim made by GCC On Fri, Mar 21, 2014 at 6:28 PM, lvqcl <lvqcl.mail at gmail.com> wrote:> More specifically, about this patch: > http://git.xiph.org/?p=flac.git;a=commitdiff;h=70b078cfd5f9d4b0692c33f018cac3c652b14f90 > > I downloaded the latest code from git (flac-70b078c), disabled > all SSE optimizations in the code and compiled it (GCC 4.8.2). > This patch doesn't change FLAC output. > > Either gcc is too smart and optimizes this new code back to the old, > or this fix is MSVS-specific. Or both. > _______________________________________________ > flac-dev mailing list > flac-dev at xiph.org > http://lists.xiph.org/mailman/listinfo/flac-dev >-- Olivier TRISTAN uvi.net -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/flac-dev/attachments/20140321/99085e96/attachment.htm
lvqcl
2014-Mar-22 08:17 UTC
[flac-dev] About "attempt to fix differences between x86 FPU and SSE calculations"
Olivier Tristan <o.tristan at uvi.net> ?????(?) ? ????? ?????? Fri, 21 Mar 2014 22:41:00 +0400:> Check with -mfpmath=387 to be sure that x87 FPU code is used and not some > SSE optim made by GCCI added "XIPH_ADD_CFLAGS([-mfpmath=387])" into configure.ac Still the result is different from SSE version. --------------- MSVS adds two instructions to the generated code after the patch: fld DWORD PTR [eax] inc ecx fmul ST(0), ST(1) add eax, 4 fstp DWORD PTR tv2337[esp+20] <- this: (copy from FP stack to tmp) fld DWORD PTR tv2337[esp+20] <- and this (copy from tmp to FP stack) fadd DWORD PTR [ebx+ecx*4-4] fstp DWORD PTR [ebx+ecx*4-4] However GCC doesn't do this: lea ecx, [eax+2] fld DWORD PTR [edx+ecx*4] fmul st, st(1) fadd DWORD PTR [ebx+ecx*4] fstp DWORD PTR [ebx+ecx*4] Also MSVS doesn't add these instructions if Floating Point Model is set to Fast (/fp:fast).