Displaying 4 results from an estimated 4 matches for "mm_sum".
2016 May 02
1
[PATCH] workaround for a bug in MSVC 2015 U2
...ill find what's wrong and will create a bugreport...
Well, here is the link: https://connect.microsoft.com/VisualStudio/feedback/details/2659191/incorrect-code-generation-for-x86-64
It seems that MSVC miscompiles
abs_residual_partition_sums[partition] = (FLAC__uint32)_mm_cvtsi128_si32(mm_sum);
into
movq QWORD PTR [rsi], xmm2
So it incorrectly copies 8 bytes from mm_sum to abs_residual_partition_sums[partition]
(it should copy 4 lower bytes and zero out 4 upper bytes).
It should be something like
movd eax, xmm2
movq QWORD PTR [rsi], rax
2016 May 02
2
[PATCH] workaround for a bug in MSVC 2015 U2
Erik de Castro Lopo wrote:
>> As I wrote earlier, MSVC 2015 U2 incorrectly compiles
>> stream_encoder_intrin_*.c files for x86-64 platform.
>> As a result, flac works, but compression ratio is close to 1.
>> This patch disables some compiler optimizations, and
>> compression ratio reverts back to normal values.
>
> Rather than having the same chunk of code in
2016 May 02
3
[PATCH] MSVC2015U2 workaround, version 2
...39;s a new version of a patch that fixes a problem with MSVC2105 update2,
but it doesn't disable any optimization, so the resulting encoding
performance should be almost unaffected by this workaround.
MSVC compiles
abs_residual_partition_sums[partition] = (FLAC__uint32)_mm_cvtsi128_si32(mm_sum);
into this:
movq QWORD PTR [rsi], xmm2
while it should be
movd eax, xmm2
mov QWORD PTR [rsi], rax
With this patch, MSVC emits
movq QWORD PTR [rsi], xmm2
mov DWORD PTR [rsi+4], r9d
so the price of this workaround is 1 extra write instruction per part...
2014 Jun 19
7
[PATCH] stream_encoder : Improve selection of residual accumulator width
...include "private/stream_encoder.h"
+#include "private/bitmath.h"
#ifdef FLAC__SSE2_SUPPORTED
#include <stdlib.h> /* for abs() */
@@ -58,7 +59,7 @@ void FLAC__precompute_partition_info_sums_intrin_sse2(const FLAC__int32 residual
unsigned e1, e3;
__m128i mm_res, mm_sum, mm_mask;
- if(bps <= 16) {
+ if(FLAC__bitmath_ilog2(default_partition_samples) + bps + FLAC__MAX_EXTRA_RESIDUAL_BPS < 32) {
for(partition = residual_sample = 0; partition < partitions; partition++) {
end += default_partition_samples;
mm_sum = _mm_setzero_si128();
diff --...