similar to: [PATCH 3/5] SIMD: accelerate decoding of some 24-bit FLAC

Displaying 20 results from an estimated 2000 matches similar to: "[PATCH 3/5] SIMD: accelerate decoding of some 24-bit FLAC"

2017 Feb 18
2
[PATCH 4/5] SIMD: accelerate decoding of 16-bit FLAC
This patch adds 2 new functions, FLAC__lpc_restore_signal_intrin_sse41() and FLAC__lpc_restore_signal_16_intrin_sse41(). The decoding speed of Subset-compatible 16-bit FLAC files is slightly increased on SSE4.1-compatible CPUs. -------------- next part -------------- A non-text attachment was scrubbed... Name: 04_add_new_intrin_func.patch Type: application/octet-stream Size: 9851 bytes Desc: not
2017 Feb 18
4
[PATCH 5/5] SIMD: remove outdated SSE2 code
This patch removes FLAC__lpc_restore_signal_16_intrin_sse2(). It's faster than C code, but not faster than MMX-accelerated ASM functions. It's also slower than the new SSE4.1 functions that were added by the previous patch. So this function wasn't very useful before, and now it's even less useful. I don't see a reason to keep it. -------------- next part -------------- A
2014 Sep 20
2
[PATCH 4/4] lpc_intrin_sse41 routines
This patch increases speed of FLAC__lpc_restore_signal_wide_intrin_sse41 (decoding of 24-bit FLAC files for 32-bit platform). -------------- next part -------------- A non-text attachment was scrubbed... Name: lpc_sse4.zip Type: application/zip Size: 3310 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20140920/a3d8efb4/attachment.zip
2015 Apr 18
2
"keep qlp coeff precision such that only 32-bit math is required"
Erik de Castro Lopo wrote: > There should be some indication of why in the git history. http://git.xiph.org/?p=flac.git;a=commitdiff;h=27846708fe6271e5e3965a4bbad99baa1ca24c49 Now I remember a discussion about a bug in -p switch: the old code substracts lpc_order instead of FLAC__bitmath_ilog2(lpc_order), and this commit fixes this. It seems that the logic in process_subframe_() and in
2016 Jul 15
3
RFC: SIMD math-function library
Is it possible to see the source code of the open-sourced SVML? The diff file does not include the library. I searched the Internet but I could not find. Regards, Naoki Shibata On 2016/07/15 13:55, Tian, Xinmin wrote: > Naoki, > > Intel is planning open-source SVML library (most of them if it not 100%), 6 functions of SVML are open sourced for GCC and LLVM already. But, Intel SVML
2013 Aug 22
2
New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
libFLAC have three SSE-accelerated functions FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_N (N = 4, 8, 12). They require lpc_order less than N. The best compression preset (flac -8) uses lpc_order up to 12; it means that during encoding FLAC also uses unaccelerated C function. I'm not very familiar with asm so I took FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_12, changed it and
2015 Apr 18
2
"keep qlp coeff precision such that only 32-bit math is required"
Ok, I just did a comparison of 1.2.1 with 1.3.2, and the change you're suggesting was already there before. So, now the question becomes: why was the code changed in the first place? Was there a bug that was fixed by changing 17 to 16, or did someone just get overzealous in a code review and thought that 17 was a bad choice? Perhaps 32 bits isn't actually large enough to handle the
2010 Jan 27
1
Some additions to CELT_RESET_STATE for 0.7.1
Hi Jean-Marc, As the self-appointed keeper of CELT_RESET_STATE (since I may be the only one actually using it) I've been kind of lax lately and have only now tried to reconcile it with the state structures in the 0.7.1 drop. I think (and feel free to contradict me here) that the following lines should be added to CELT_RESET_STATE in celt_encoder_ctl: st->fold_decision = 1;
2015 Mar 04
2
Patch cleaning up Opus x86 intrinsics configury
Viswenath, My patch should be against the tip, but it?s the very recent tip, including some changes this past Friday (27 Feb). I mentioned in the IRC room a problem I discovered in creating my patch, and then later improved the fix Tim had made for the problem. Where do you get conflicts merging it to tip? In terms of merging, you posted your patch before I posted mine, so probably I should be
2013 May 23
2
ASM runtime detection and optimizations
I wrote a proof of concept regarding the cpu capabilities runtime detection and choice of optimized function. I follow design which had been discussed on IRC. Also, i notice a little drawback: we must propagate the arch index through functions which don't have codec state as argument. However, if it's look good, i will continue to implement it. Best regards, -- Aur?lien Zanelli
2014 May 13
1
Performance tests of the current version (git-b1b6caf)
Current sources (git-b1b6caf) were compiled with GCC 4.8.2 and GCC 4.9.0 with various -msseN options (the default is -msse2). Then I took two WAV files (one is 16-bit and the other is 24-bit) and compressed them using best compression mode. The results are in the table below. (please remember that the resulting value is an encoding time, not encoding speed) CPU: Intel Core i7 950 (up to SSE4.2)
2014 Mar 11
2
x86_64 SSE2/SSE41 optim not used
Hi Guys, In stream_decoder.c when assigning lpc restore function, only IA32 processor benefits from SS2 and SSE4.1 optimization. Shouldn't it be the case for x86_64 processor as well ? Thanks, -- Olivier TRISTAN uvi.net -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/flac-dev/attachments/20140311/1d49b5c2/attachment.htm
2015 Nov 26
2
Test failed!!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi Jesus, Thanks for the report. As far as I can tell, what's happening is that when intrinsics are enabled, we compile all tests with -msse4.1, even when it's only run-time detected. In most cases, that doesn't cause any issue, but sometimes the compiler will take the C code and generate an SSEx instruction on its own. I think this is
2015 Mar 04
2
Patch cleaning up Opus x86 intrinsics configury
On Mar 3, 2015, at 11:08 PM, Viswanath Puttagunta <viswanath.puttagunta at linaro.org<mailto:viswanath.puttagunta at linaro.org>> wrote: On 3 March 2015 at 21:59, Jonathan Lennox <jonathan at vidyo.com<mailto:jonathan at vidyo.com>> wrote: Viswenath, My patch should be against the tip, but it?s the very recent tip, including some changes this past Friday (27 Feb). I
2008 Nov 20
4
[LLVMdev] changing -mattr behavior with mmx and sse
Hi, When setting -mattr option on X86, I would like to treat MMX separately from SSE levels. This would allow a client who sets the attributes directly to set the SSE level independent of MMX, e.g., llc -march=x86 -mattr=sse41, one would get sse4.1 with mmx disabled while llc -march=x86 -mattr=mmx -mattr=sse42 will get mmx and sse42. If anyone objects to this change, please let me
2016 Jul 15
3
RFC: SIMD math-function library
Hi all, Okay, the point is whether Intel will publish the source code for their SVML. If Intel will make SVML open-source, there would be not much advantage in incorporating SLEEF into LLVM, since it would be also fairly easy to port SVML to other architectures. If Intel will not open-source SVML, then there could be advantage in using SLEEF for x86 by inlining the functions. Is it possible
2015 Mar 12
1
[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.
From: Jonathan Lennox <jonathan at vidyo.com> * Makes ?enable-intrinsics work with clang and other non-GCC compilers * Enables RTCD for the floating-point-mode SSE code in Celt. * Disables use of RTCD in cases where the compiler targets an instruction set by default. * Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in floating-point mode, not
2015 Mar 13
1
[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.
From: Jonathan Lennox <jonathan at vidyo.com> * Makes ?enable-intrinsics work with clang and other non-GCC compilers * Enables RTCD for the floating-point-mode SSE code in Celt. * Disables use of RTCD in cases where the compiler targets an instruction set by default. * Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in floating-point mode, not
2015 Mar 07
1
Patch cleaning up Opus x86 intrinsics configury
Hello Jonathan, Just FYI, I started doing review of your patch and will get back to you in few days. After review, I would like to rebase your patch (as necessary) myself and do some testing.. and re-submit. Regards, Vish On 4 March 2015 at 09:00, Viswanath Puttagunta <viswanath.puttagunta at linaro.org> wrote: > > On 3 March 2015 at 22:17, Jonathan Lennox <jonathan at
2017 May 08
2
LLVM and Xeon Skylake v5
getProcessTriple just determines operation system, and architecture. It doesn't deal with specific instruction set features. The CPU should be controlled by MCPU on the EngineBuilder i think. The CPU autodetection code lives in getHostCPUName in lib/Support/Host.cpp, but I don't think the JIT calls into. I think its expected the user would call it or pass a specific CPU string to the MCPU