Displaying 20 results from an estimated 1000 matches similar to: "x86_64 SSE2/SSE41 optim not used"
2014 Mar 12
0
x86_64 SSE2/SSE41 optim not used
Olivier Tristan wrote:
> In stream_decoder.c when assigning lpc restore function,
> only IA32 processor benefits from SS2 and SSE4.1 optimization.
>
> Shouldn't it be the case for x86_64 processor as well ?
I tried, and it didn't make decoding faster. (And even SSE4.1 for IA-32 is... questionable)
OTOH, flac decoding is really very fast. It's very hard to make it even
2017 Feb 18
4
[PATCH 5/5] SIMD: remove outdated SSE2 code
This patch removes FLAC__lpc_restore_signal_16_intrin_sse2().
It's faster than C code, but not faster than MMX-accelerated
ASM functions. It's also slower than the new SSE4.1 functions
that were added by the previous patch.
So this function wasn't very useful before, and now it's
even less useful. I don't see a reason to keep it.
-------------- next part --------------
A
2017 Feb 18
2
[PATCH 4/5] SIMD: accelerate decoding of 16-bit FLAC
This patch adds 2 new functions,
FLAC__lpc_restore_signal_intrin_sse41() and
FLAC__lpc_restore_signal_16_intrin_sse41().
The decoding speed of Subset-compatible 16-bit FLAC files
is slightly increased on SSE4.1-compatible CPUs.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 04_add_new_intrin_func.patch
Type: application/octet-stream
Size: 9851 bytes
Desc: not
2017 Jan 25
2
Flac multi channel
Hi Guys,
I know that FLAC format is currently limited to 8 channels but I was
wondering if this hard limitation of the format
or if it can be easily circumvented if the flac library is compiled with
other settings and/or the software using it don't mind it
Thanks !
--
Olivier Tristan
Research & Development
www.uvi.net
2017 Jan 25
1
Flac multi channel
I see :(
That what I would call a good struct size optimisation.
Please tell me there was another reason behind this being only 3 instead of
8 or 16 bits, right ?
2017-01-25 18:30 GMT+01:00 Tor-Einar Jarnbjo <tor-einar at jarnbjo.name>:
> Hello Olivier,
>
> the limitation is in the file format itself, as the number of channels is
> encoded in a 3 bit field in the streaminfo
2020 May 18
3
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
What do you base this on?
Per https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html :
"For the x86-32 compiler, you must use -march=cpu-type, -msse or
-msse2 switches to enable SSE extensions and make this option
effective. For the x86-64 compiler, these extensions are enabled by
default."
That reads to me like we're fine for SSE2. As stated in my comments,
SSSE3 support must be
2020 May 18
6
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
This drop-in patch increases the performance of the get_checksum1()
function on x86-64.
On the target slow CPU performance of the function increased by nearly
50% in the x86-64 default SSE2 mode, and by nearly 100% if the
compiler was told to enable SSSE3 support. The increase was over 200%
on the fastest CPU tested in SSSE3 mode.
Transfer time improvement with large files existing on both ends
2008 Nov 20
4
[LLVMdev] changing -mattr behavior with mmx and sse
Hi,
When setting -mattr option on X86, I would like to treat MMX
separately from SSE levels. This would allow a client who sets the
attributes directly to set the SSE level independent of MMX, e.g., llc
-march=x86 -mattr=sse41, one would get sse4.1 with mmx disabled while
llc -march=x86 -mattr=mmx -mattr=sse42 will get mmx and sse42. If
anyone objects to this change, please let me
2020 May 19
5
[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
I've read up some more on the subject, and it seems the proper way to
do this with GCC is g++ and target attributes. I've refactored the
patch that way, and it indeed uses SSSE3 automatically on supporting
CPUs, regardless of the build host, so this should be ideal both for
home builders and distros.
Getting the code to build right in c++ mode (checksum_sse2.cpp only)
was a bit of an
2016 Apr 12
2
X86 TRUNCATE cost for AVX & AVX2 mode
<Copied Cong>
Thanks Elena.
Mostly I was interested in why such a high cost 30 kept for TRUNCATE v16i32 to v16i8 in SSE41.
Looking at the code it appears like TRUNCATE v16i32 to v16i8 in SSE41 is very expensive
vs SSE2. I feel this number should be same/close to the cost mentioned for same
operation in SSE2ConversionTbl.
Below patch from Cong Hou reduce cost for same operation in SSE2
2008 Aug 01
3
[LLVMdev] Using intrinsics with memory operands
Hi all,
I was wondering how to use variations of intrinsic functions that take a
memory operand.
Take for example the SSE4.1 pmovsxbd instruction. One variant takes two XMM
registers, while another has a 32-bit memory location as source operand. The
latter is quite interesting if you know you're reading from memory anyway,
and if it's not 16-byte aligned. It looks like LLVM's
2017 May 08
2
LLVM and Xeon Skylake v5
getProcessTriple just determines operation system, and architecture. It
doesn't deal with specific instruction set features. The CPU should be
controlled by MCPU on the EngineBuilder i think. The CPU autodetection code
lives in getHostCPUName in lib/Support/Host.cpp, but I don't think the JIT
calls into. I think its expected the user would call it or pass a specific
CPU string to the MCPU
2017 Jan 28
2
Flac multi channel
Don't overlook the FLAC in Ogg container solution. That's established as a standard for some time now, as far as I know, and would probably be better than a new, proprietary multi mono bundle. I haven't used it myself, but people have been talking about it for a while, and I believe that some "FLAC" users are actually working with FLAC in Ogg container files.
Brian
2014 Mar 20
2
Wrong warning in encoder for 24bits WAV
Hi Guys,
I've just faced a wrong warning trying to encode a 24 bits WAV file
if(wFormatTag == 1) {
if(bps != 8 && bps != 16) {
if(bps == 24 || bps == 32) {
/* let these slide with a warning since they're
unambiguous */
flac__utils_printf(stderr, 1, "%s: WARNING:
legacy WAVE file has
2015 Feb 06
2
about apodization functions
Hi Guys,
Does having multiple apodization functions change something to the
decoding process
or does it only apply to the encoding process ?
I have been able to gain almost 10% by using several apodization
functions (it takes way more time to encode but this is quite a non
issue in my use case)
but I don't want to sacrifice any decoding speed as this is the
bottleneck for me.
Thanks,
2017 Feb 14
2
Flac build issue in debug win x32
Hi Guys,
The following code in CPU.c (line 155) won't link if you don't have NASM
code built even if FLAC__HAS_X86INTRIN is true as
FLAC__cpu_info_asm_ia32 don't exists and the else is compiled if there
is no dead code stripping
if (FLAC__HAS_X86INTRIN) {
FLAC__cpu_info_x86(0, &flags_eax, &flags_ebx, &flags_ecx,
&flags_edx);
info->ia32.intel =
2014 Mar 19
3
building issue on OSX GCC 4.2 / Xcode
Hi Guys,
The current trunk do not build with GCC 4.2 on OSX when compiling cpu.c
<cpuid.h> does not exists and __get_cpuid() is not defined
This version of GCC is required if you want to support pre 10.7 systems
which are still pretty common.
It seems other project had the same issue
https://bugzilla.mozilla.org/show_bug.cgi?id=836824
Don't know much what would be the right fix.
2008 Aug 01
0
[LLVMdev] Using intrinsics with memory operands
On Fri, Aug 1, 2008 at 12:10 AM, Nicolas Capens <nicolas at capens.net> wrote:
> I was wondering how to use variations of intrinsic functions that take a
> memory operand.
Often, for intrinsics where it matters, there's a variant of the
intrinsic that takes a pointer operand that you can use, although it
looks like there isn't one here.
> Take for example the SSE4.1
2008 Nov 20
0
[LLVMdev] changing -mattr behavior with mmx and sse
Might you instead consider just adding a -disable-mmx option?
Preston
On Thu, 2008-20-11 at 02:57 -0500, Mon Ping Wang wrote:
> Hi,
>
> When setting -mattr option on X86, I would like to treat MMX
> separately from SSE levels. This would allow a client who sets the
> attributes directly to set the SSE level independent of MMX, e.g., llc
> -march=x86 -mattr=sse41, one would get
2008 Nov 20
0
[LLVMdev] changing -mattr behavior with mmx and sse
On Nov 19, 2008, at 11:57 PMPST, Mon Ping Wang wrote:
> Hi,
>
> When setting -mattr option on X86, I would like to treat MMX
> separately from SSE levels. This would allow a client who sets the
> attributes directly to set the SSE level independent of MMX, e.g., llc
> -march=x86 -mattr=sse41, one would get sse4.1 with mmx disabled while
> llc -march=x86 -mattr=mmx