thr3ads.net - similar to: "[PATCH 3/5] SIMD: accelerate decoding of some 24-bit FLAC"

Displaying 20 results from an estimated 2000 matches similar to: "[PATCH 3/5] SIMD: accelerate decoding of some 24-bit FLAC"

[PATCH 4/5] SIMD: accelerate decoding of 16-bit FLAC

2017 Feb 18

[PATCH 4/5] SIMD: accelerate decoding of 16-bit FLAC

This patch adds 2 new functions, FLAC__lpc_restore_signal_intrin_sse41() and FLAC__lpc_restore_signal_16_intrin_sse41(). The decoding speed of Subset-compatible 16-bit FLAC files is slightly increased on SSE4.1-compatible CPUs. -------------- next part -------------- A non-text attachment was scrubbed... Name: 04_add_new_intrin_func.patch Type: application/octet-stream Size: 9851 bytes Desc: not

[PATCH 5/5] SIMD: remove outdated SSE2 code

2017 Feb 18

[PATCH 5/5] SIMD: remove outdated SSE2 code

This patch removes FLAC__lpc_restore_signal_16_intrin_sse2(). It's faster than C code, but not faster than MMX-accelerated ASM functions. It's also slower than the new SSE4.1 functions that were added by the previous patch. So this function wasn't very useful before, and now it's even less useful. I don't see a reason to keep it. -------------- next part -------------- A

[PATCH 4/4] lpc_intrin_sse41 routines

2014 Sep 20

[PATCH 4/4] lpc_intrin_sse41 routines

This patch increases speed of FLAC__lpc_restore_signal_wide_intrin_sse41 (decoding of 24-bit FLAC files for 32-bit platform). -------------- next part -------------- A non-text attachment was scrubbed... Name: lpc_sse4.zip Type: application/zip Size: 3310 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20140920/a3d8efb4/attachment.zip

"keep qlp coeff precision such that only 32-bit math is required"

2015 Apr 18

"keep qlp coeff precision such that only 32-bit math is required"

Erik de Castro Lopo wrote: > There should be some indication of why in the git history. http://git.xiph.org/?p=flac.git;a=commitdiff;h=27846708fe6271e5e3965a4bbad99baa1ca24c49 Now I remember a discussion about a bug in -p switch: the old code substracts lpc_order instead of FLAC__bitmath_ilog2(lpc_order), and this commit fixes this. It seems that the logic in process_subframe_() and in

RFC: SIMD math-function library

2016 Jul 15

RFC: SIMD math-function library

Is it possible to see the source code of the open-sourced SVML? The diff file does not include the library. I searched the Internet but I could not find. Regards, Naoki Shibata On 2016/07/15 13:55, Tian, Xinmin wrote: > Naoki, > > Intel is planning open-source SVML library (most of them if it not 100%), 6 functions of SVML are open sourced for GCC and LLVM already. But, Intel SVML

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

2013 Aug 22

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

libFLAC have three SSE-accelerated functions FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_N (N = 4, 8, 12). They require lpc_order less than N. The best compression preset (flac -8) uses lpc_order up to 12; it means that during encoding FLAC also uses unaccelerated C function. I'm not very familiar with asm so I took FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_12, changed it and

"keep qlp coeff precision such that only 32-bit math is required"

2015 Apr 18

"keep qlp coeff precision such that only 32-bit math is required"

Ok, I just did a comparison of 1.2.1 with 1.3.2, and the change you're suggesting was already there before. So, now the question becomes: why was the code changed in the first place? Was there a bug that was fixed by changing 17 to 16, or did someone just get overzealous in a code review and thought that 17 was a bad choice? Perhaps 32 bits isn't actually large enough to handle the

Some additions to CELT_RESET_STATE for 0.7.1

2010 Jan 27

Some additions to CELT_RESET_STATE for 0.7.1

Hi Jean-Marc, As the self-appointed keeper of CELT_RESET_STATE (since I may be the only one actually using it) I've been kind of lax lately and have only now tried to reconcile it with the state structures in the 0.7.1 drop. I think (and feel free to contradict me here) that the following lines should be added to CELT_RESET_STATE in celt_encoder_ctl: st->fold_decision = 1;

Patch cleaning up Opus x86 intrinsics configury

2015 Mar 04

Patch cleaning up Opus x86 intrinsics configury

Viswenath, My patch should be against the tip, but it?s the very recent tip, including some changes this past Friday (27 Feb). I mentioned in the IRC room a problem I discovered in creating my patch, and then later improved the fix Tim had made for the problem. Where do you get conflicts merging it to tip? In terms of merging, you posted your patch before I posted mine, so probably I should be

ASM runtime detection and optimizations

2013 May 23

ASM runtime detection and optimizations

I wrote a proof of concept regarding the cpu capabilities runtime detection and choice of optimized function. I follow design which had been discussed on IRC. Also, i notice a little drawback: we must propagate the arch index through functions which don't have codec state as argument. However, if it's look good, i will continue to implement it. Best regards, -- Aur?lien Zanelli

Performance tests of the current version (git-b1b6caf)

2014 May 13

Performance tests of the current version (git-b1b6caf)

Current sources (git-b1b6caf) were compiled with GCC 4.8.2 and GCC 4.9.0 with various -msseN options (the default is -msse2). Then I took two WAV files (one is 16-bit and the other is 24-bit) and compressed them using best compression mode. The results are in the table below. (please remember that the resulting value is an encoding time, not encoding speed) CPU: Intel Core i7 950 (up to SSE4.2)

x86_64 SSE2/SSE41 optim not used

2014 Mar 11

x86_64 SSE2/SSE41 optim not used

Hi Guys, In stream_decoder.c when assigning lpc restore function, only IA32 processor benefits from SS2 and SSE4.1 optimization. Shouldn't it be the case for x86_64 processor as well ? Thanks, -- Olivier TRISTAN uvi.net -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/flac-dev/attachments/20140311/1d49b5c2/attachment.htm

Test failed!!

2015 Nov 26

Test failed!!

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi Jesus, Thanks for the report. As far as I can tell, what's happening is that when intrinsics are enabled, we compile all tests with -msse4.1, even when it's only run-time detected. In most cases, that doesn't cause any issue, but sometimes the compiler will take the C code and generate an SSEx instruction on its own. I think this is

Patch cleaning up Opus x86 intrinsics configury

2015 Mar 04

Patch cleaning up Opus x86 intrinsics configury

On Mar 3, 2015, at 11:08 PM, Viswanath Puttagunta <viswanath.puttagunta at linaro.org<mailto:viswanath.puttagunta at linaro.org>> wrote: On 3 March 2015 at 21:59, Jonathan Lennox <jonathan at vidyo.com<mailto:jonathan at vidyo.com>> wrote: Viswenath, My patch should be against the tip, but it?s the very recent tip, including some changes this past Friday (27 Feb). I

[LLVMdev] changing -mattr behavior with mmx and sse

2008 Nov 20

[LLVMdev] changing -mattr behavior with mmx and sse

Hi, When setting -mattr option on X86, I would like to treat MMX separately from SSE levels. This would allow a client who sets the attributes directly to set the SSE level independent of MMX, e.g., llc -march=x86 -mattr=sse41, one would get sse4.1 with mmx disabled while llc -march=x86 -mattr=mmx -mattr=sse42 will get mmx and sse42. If anyone objects to this change, please let me

RFC: SIMD math-function library

2016 Jul 15

RFC: SIMD math-function library

Hi all, Okay, the point is whether Intel will publish the source code for their SVML. If Intel will make SVML open-source, there would be not much advantage in incorporating SLEEF into LLVM, since it would be also fairly easy to port SVML to other architectures. If Intel will not open-source SVML, then there could be advantage in using SLEEF for x86 by inlining the functions. Is it possible

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

2015 Mar 12

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

From: Jonathan Lennox <jonathan at vidyo.com> * Makes ?enable-intrinsics work with clang and other non-GCC compilers * Enables RTCD for the floating-point-mode SSE code in Celt. * Disables use of RTCD in cases where the compiler targets an instruction set by default. * Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in floating-point mode, not

[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.

2015 Mar 13

[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.

Patch cleaning up Opus x86 intrinsics configury

2015 Mar 07

Patch cleaning up Opus x86 intrinsics configury

Hello Jonathan, Just FYI, I started doing review of your patch and will get back to you in few days. After review, I would like to rebase your patch (as necessary) myself and do some testing.. and re-submit. Regards, Vish On 4 March 2015 at 09:00, Viswanath Puttagunta <viswanath.puttagunta at linaro.org> wrote: > > On 3 March 2015 at 22:17, Jonathan Lennox <jonathan at

LLVM and Xeon Skylake v5

2017 May 08

LLVM and Xeon Skylake v5

getProcessTriple just determines operation system, and architecture. It doesn't deal with specific instruction set features. The CPU should be controlled by MCPU on the EngineBuilder i think. The CPU autodetection code lives in getHostCPUName in lib/Support/Host.cpp, but I don't think the JIT calls into. I think its expected the user would call it or pass a specific CPU string to the MCPU

similar to: [PATCH 3/5] SIMD: accelerate decoding of some 24-bit FLAC