search for: mm_sum

Displaying 4 results from an estimated 4 matches for "mm_sum".

2016 May 02
1
[PATCH] workaround for a bug in MSVC 2015 U2
...ill find what's wrong and will create a bugreport... Well, here is the link: https://connect.microsoft.com/VisualStudio/feedback/details/2659191/incorrect-code-generation-for-x86-64 It seems that MSVC miscompiles abs_residual_partition_sums[partition] = (FLAC__uint32)_mm_cvtsi128_si32(mm_sum); into movq QWORD PTR [rsi], xmm2 So it incorrectly copies 8 bytes from mm_sum to abs_residual_partition_sums[partition] (it should copy 4 lower bytes and zero out 4 upper bytes). It should be something like movd eax, xmm2 movq QWORD PTR [rsi], rax
2016 May 02
2
[PATCH] workaround for a bug in MSVC 2015 U2
Erik de Castro Lopo wrote: >> As I wrote earlier, MSVC 2015 U2 incorrectly compiles >> stream_encoder_intrin_*.c files for x86-64 platform. >> As a result, flac works, but compression ratio is close to 1. >> This patch disables some compiler optimizations, and >> compression ratio reverts back to normal values. > > Rather than having the same chunk of code in
2016 May 02
3
[PATCH] MSVC2015U2 workaround, version 2
...39;s a new version of a patch that fixes a problem with MSVC2105 update2, but it doesn't disable any optimization, so the resulting encoding performance should be almost unaffected by this workaround. MSVC compiles abs_residual_partition_sums[partition] = (FLAC__uint32)_mm_cvtsi128_si32(mm_sum); into this: movq QWORD PTR [rsi], xmm2 while it should be movd eax, xmm2 mov QWORD PTR [rsi], rax With this patch, MSVC emits movq QWORD PTR [rsi], xmm2 mov DWORD PTR [rsi+4], r9d so the price of this workaround is 1 extra write instruction per part...
2014 Jun 19
7
[PATCH] stream_encoder : Improve selection of residual accumulator width
...include "private/stream_encoder.h" +#include "private/bitmath.h" #ifdef FLAC__SSE2_SUPPORTED #include <stdlib.h> /* for abs() */ @@ -58,7 +59,7 @@ void FLAC__precompute_partition_info_sums_intrin_sse2(const FLAC__int32 residual unsigned e1, e3; __m128i mm_res, mm_sum, mm_mask; - if(bps <= 16) { + if(FLAC__bitmath_ilog2(default_partition_samples) + bps + FLAC__MAX_EXTRA_RESIDUAL_BPS < 32) { for(partition = residual_sample = 0; partition < partitions; partition++) { end += default_partition_samples; mm_sum = _mm_setzero_si128(); diff --...