thr3ads.net - similar to: "x86_64 SSE2/SSE41 optim not used"

Displaying 20 results from an estimated 1000 matches similar to: "x86_64 SSE2/SSE41 optim not used"

2014 Mar 12

x86_64 SSE2/SSE41 optim not used

Olivier Tristan wrote: > In stream_decoder.c when assigning lpc restore function, > only IA32 processor benefits from SS2 and SSE4.1 optimization. > > Shouldn't it be the case for x86_64 processor as well ? I tried, and it didn't make decoding faster. (And even SSE4.1 for IA-32 is... questionable) OTOH, flac decoding is really very fast. It's very hard to make it even

[PATCH 5/5] SIMD: remove outdated SSE2 code

2017 Feb 18

[PATCH 5/5] SIMD: remove outdated SSE2 code

This patch removes FLAC__lpc_restore_signal_16_intrin_sse2(). It's faster than C code, but not faster than MMX-accelerated ASM functions. It's also slower than the new SSE4.1 functions that were added by the previous patch. So this function wasn't very useful before, and now it's even less useful. I don't see a reason to keep it. -------------- next part -------------- A

[PATCH 4/5] SIMD: accelerate decoding of 16-bit FLAC

2017 Feb 18

[PATCH 4/5] SIMD: accelerate decoding of 16-bit FLAC

This patch adds 2 new functions, FLAC__lpc_restore_signal_intrin_sse41() and FLAC__lpc_restore_signal_16_intrin_sse41(). The decoding speed of Subset-compatible 16-bit FLAC files is slightly increased on SSE4.1-compatible CPUs. -------------- next part -------------- A non-text attachment was scrubbed... Name: 04_add_new_intrin_func.patch Type: application/octet-stream Size: 9851 bytes Desc: not

Flac multi channel

2017 Jan 25

Flac multi channel

Hi Guys, I know that FLAC format is currently limited to 8 channels but I was wondering if this hard limitation of the format or if it can be easily circumvented if the flac library is compiled with other settings and/or the software using it don't mind it Thanks ! -- Olivier Tristan Research & Development www.uvi.net

Flac multi channel

2017 Jan 25

Flac multi channel

I see :( That what I would call a good struct size optimisation. Please tell me there was another reason behind this being only 3 instead of 8 or 16 bits, right ? 2017-01-25 18:30 GMT+01:00 Tor-Einar Jarnbjo <tor-einar at jarnbjo.name>: > Hello Olivier, > > the limitation is in the file format itself, as the number of channels is > encoded in a 3 bit field in the streaminfo

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 18

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

What do you base this on? Per https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html : "For the x86-32 compiler, you must use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default." That reads to me like we're fine for SSE2. As stated in my comments, SSSE3 support must be

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 18

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

This drop-in patch increases the performance of the get_checksum1() function on x86-64. On the target slow CPU performance of the function increased by nearly 50% in the x86-64 default SSE2 mode, and by nearly 100% if the compiler was told to enable SSSE3 support. The increase was over 200% on the fastest CPU tested in SSSE3 mode. Transfer time improvement with large files existing on both ends

[LLVMdev] changing -mattr behavior with mmx and sse

2008 Nov 20

[LLVMdev] changing -mattr behavior with mmx and sse

Hi, When setting -mattr option on X86, I would like to treat MMX separately from SSE levels. This would allow a client who sets the attributes directly to set the SSE level independent of MMX, e.g., llc -march=x86 -mattr=sse41, one would get sse4.1 with mmx disabled while llc -march=x86 -mattr=mmx -mattr=sse42 will get mmx and sse42. If anyone objects to this change, please let me

[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 19

[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

I've read up some more on the subject, and it seems the proper way to do this with GCC is g++ and target attributes. I've refactored the patch that way, and it indeed uses SSSE3 automatically on supporting CPUs, regardless of the build host, so this should be ideal both for home builders and distros. Getting the code to build right in c++ mode (checksum_sse2.cpp only) was a bit of an

X86 TRUNCATE cost for AVX & AVX2 mode

2016 Apr 12

X86 TRUNCATE cost for AVX & AVX2 mode

<Copied Cong> Thanks Elena. Mostly I was interested in why such a high cost 30 kept for TRUNCATE v16i32 to v16i8 in SSE41. Looking at the code it appears like TRUNCATE v16i32 to v16i8 in SSE41 is very expensive vs SSE2. I feel this number should be same/close to the cost mentioned for same operation in SSE2ConversionTbl. Below patch from Cong Hou reduce cost for same operation in SSE2

[LLVMdev] Using intrinsics with memory operands

2008 Aug 01

[LLVMdev] Using intrinsics with memory operands

Hi all, I was wondering how to use variations of intrinsic functions that take a memory operand. Take for example the SSE4.1 pmovsxbd instruction. One variant takes two XMM registers, while another has a 32-bit memory location as source operand. The latter is quite interesting if you know you're reading from memory anyway, and if it's not 16-byte aligned. It looks like LLVM's

LLVM and Xeon Skylake v5

2017 May 08

LLVM and Xeon Skylake v5

getProcessTriple just determines operation system, and architecture. It doesn't deal with specific instruction set features. The CPU should be controlled by MCPU on the EngineBuilder i think. The CPU autodetection code lives in getHostCPUName in lib/Support/Host.cpp, but I don't think the JIT calls into. I think its expected the user would call it or pass a specific CPU string to the MCPU

Flac multi channel

2017 Jan 28

Flac multi channel

Don't overlook the FLAC in Ogg container solution. That's established as a standard for some time now, as far as I know, and would probably be better than a new, proprietary multi mono bundle. I haven't used it myself, but people have been talking about it for a while, and I believe that some "FLAC" users are actually working with FLAC in Ogg container files. Brian

Wrong warning in encoder for 24bits WAV

2014 Mar 20

Wrong warning in encoder for 24bits WAV

Hi Guys, I've just faced a wrong warning trying to encode a 24 bits WAV file if(wFormatTag == 1) { if(bps != 8 && bps != 16) { if(bps == 24 || bps == 32) { /* let these slide with a warning since they're unambiguous */ flac__utils_printf(stderr, 1, "%s: WARNING: legacy WAVE file has

about apodization functions

2015 Feb 06

about apodization functions

Hi Guys, Does having multiple apodization functions change something to the decoding process or does it only apply to the encoding process ? I have been able to gain almost 10% by using several apodization functions (it takes way more time to encode but this is quite a non issue in my use case) but I don't want to sacrifice any decoding speed as this is the bottleneck for me. Thanks,

Flac build issue in debug win x32

2017 Feb 14

Flac build issue in debug win x32

Hi Guys, The following code in CPU.c (line 155) won't link if you don't have NASM code built even if FLAC__HAS_X86INTRIN is true as FLAC__cpu_info_asm_ia32 don't exists and the else is compiled if there is no dead code stripping if (FLAC__HAS_X86INTRIN) { FLAC__cpu_info_x86(0, &flags_eax, &flags_ebx, &flags_ecx, &flags_edx); info->ia32.intel =

building issue on OSX GCC 4.2 / Xcode

2014 Mar 19

building issue on OSX GCC 4.2 / Xcode

Hi Guys, The current trunk do not build with GCC 4.2 on OSX when compiling cpu.c <cpuid.h> does not exists and __get_cpuid() is not defined This version of GCC is required if you want to support pre 10.7 systems which are still pretty common. It seems other project had the same issue https://bugzilla.mozilla.org/show_bug.cgi?id=836824 Don't know much what would be the right fix.

[LLVMdev] Using intrinsics with memory operands

2008 Aug 01

[LLVMdev] Using intrinsics with memory operands

On Fri, Aug 1, 2008 at 12:10 AM, Nicolas Capens <nicolas at capens.net> wrote: > I was wondering how to use variations of intrinsic functions that take a > memory operand. Often, for intrinsics where it matters, there's a variant of the intrinsic that takes a pointer operand that you can use, although it looks like there isn't one here. > Take for example the SSE4.1

[LLVMdev] changing -mattr behavior with mmx and sse

2008 Nov 20

[LLVMdev] changing -mattr behavior with mmx and sse

Might you instead consider just adding a -disable-mmx option? Preston On Thu, 2008-20-11 at 02:57 -0500, Mon Ping Wang wrote: > Hi, > > When setting -mattr option on X86, I would like to treat MMX > separately from SSE levels. This would allow a client who sets the > attributes directly to set the SSE level independent of MMX, e.g., llc > -march=x86 -mattr=sse41, one would get

[LLVMdev] changing -mattr behavior with mmx and sse

2008 Nov 20

[LLVMdev] changing -mattr behavior with mmx and sse

On Nov 19, 2008, at 11:57 PMPST, Mon Ping Wang wrote: > Hi, > > When setting -mattr option on X86, I would like to treat MMX > separately from SSE levels. This would allow a client who sets the > attributes directly to set the SSE level independent of MMX, e.g., llc > -march=x86 -mattr=sse41, one would get sse4.1 with mmx disabled while > llc -march=x86 -mattr=mmx

similar to: x86_64 SSE2/SSE41 optim not used