similar to: PATCH for lpc_intrin_sse41.c: faster shifts

Displaying 20 results from an estimated 100 matches similar to: "PATCH for lpc_intrin_sse41.c: faster shifts"

2014 Jan 30
0
PATCH for lpc_intrin_sse41.c: faster shifts
lvqcl wrote: > It turns out that int64 shift is quite slow... > > This patch changes the code from: > (FLAC__int32)(xmm.m128i_i64[0] >> lp_quantization) > into: > _mm_cvtsi128_si32(_mm_srli_epi64(xmm, lp_quantization)); > > Encoding of 24-bit .wav files with 32-bit FLAC became noticeably faster. > > > The new code works only if quantization <= 32,
2014 Jan 14
1
PATCH for lpc_asm.nasm
1) Two comments ";ASSERT(lp_quantization <= 31)" in the new functions ..._wide_asm_ia32() -- just to mention this constraint. (max. possible value of lp_quantization is 15, so it's not a problem) 2) "mov cl, ..." was replaced with "mov ecx, ..." (again Agner Fog, optimizing_assembly.pdf) summary: write to a partial register may result in false dependencies
2004 Sep 10
3
const issue in FLAC__lpc_compute_residual_from_qlp_coefficients (libFLAC/lpc.c:233)
Hello, I just tried to compile libFLAC (using Borland C++ Builder 6 on Windows). The compilers yells at me on line 233 of libFLAC/lpc.c *(residual++) = *(data++) - (sum >> lp_quantization); --> data is const and cannot be modified Funny thing is, if data is declared: const FLAC__int32 *data instead of const FLAC__int32 data[] everything is ok. Is this a bug in my compiler, or
2004 Sep 10
2
const issue in FLAC__lpc_compute_residual_from_qlp_coefficients (libFLAC/lpc.c:233)
On Tue, Jan 13, 2004 at 02:04:48PM -0800, Josh Coalson wrote: > --- Denis Chatelain <listes@octopodus.com> wrote: > > Hello, > > > > > > I just tried to compile libFLAC (using Borland C++ Builder 6 on > > Windows). > > > > The compilers yells at me on line 233 of libFLAC/lpc.c > > > > *(residual++) = *(data++) - (sum >>
2014 Sep 20
2
[PATCH 4/4] lpc_intrin_sse41 routines
This patch increases speed of FLAC__lpc_restore_signal_wide_intrin_sse41 (decoding of 24-bit FLAC files for 32-bit platform). -------------- next part -------------- A non-text attachment was scrubbed... Name: lpc_sse4.zip Type: application/zip Size: 3310 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20140920/a3d8efb4/attachment.zip
2005 Feb 02
0
two small-ish optimizations (death by a thousand cuts)
This lpc_restore_order was partially inspired by Miroslav's affd, though my (not very great) ARM asm version resembled this, as well. The other two reduce CPU array indexing overhead in loops a little. Additionally, a request for help: My not very optimized lpc_restore_signal is at the below URL, I couldn't get the ldm* instructions to work as advertised, even though I've talked
2004 Oct 01
1
[PATCH] fix compile errors with asm disabled
The #endifs are mismatched, and my builds were failing because lpc_restore_signal* weren't getting declared. I've also commented the endifs to make them easier to match. Also, is there any reason #ifdefs for FLAC__HAS_NASM and FLAC__CPU_IA32 are separate and nested the way they are and not combined like this?: #if defined(FLAC__CPU_IA32) && defined(FLAC__HAS_NASM) I'm not
2004 Oct 06
3
flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)
Sadly the latest optimization broke completely everything. The asm code isn't gas compliant. the libFLAC linker script has a typo, disabling the asm optimization and/or altivec won't let a correct build anyway. Instant fixes for the asm stuff: sed -i -e"s:;:\#:" on the lpc_asm.s to load address instead of addis+ori you could use lis and la and PLEASE use the @l(register)
2007 Aug 31
2
1.2.0: Test suite failures on LP64 archs?
Running the basic (--disable-thorough-tests) test suite, I get these failures round-trip test (rt-1-24-111.raw) encode... Segmentation fault (core dumped) ERROR FAIL: ./test_flac.sh fsd24-01 (--channels=1 --bps=24 -0 -l 16 --lax -m -e -p): encode...ERROR during encode of fsd24-01 FAIL: ./test_streams.sh on alpha and amd64. By contrast, i386 is fine. (All OpenBSD/4.2.) Could be a generic LP64
2004 Sep 10
1
altivec lpc_restore_signal
I've had this a long time but haven't submitted it yet. I've tried to mirror the ia32 setup, so there should be a new subdirectory src/libFLAC/ppc . The first two attachments go there. The third is a context diff for src/libFLAC/Makefile.am . I have some more modified files, which I figured I'd submit after the above are checked in and working for somebody other than me. If you
2004 Sep 10
3
Altivec, automake
I think I've gotten FLAC__lpc_restore_signal() about as good as I'm going to get it. Here's what I have: -a new file, lpc_asm.s, which has the assembly routines -changes to cpu.h, cpu.c, and stream_decoder.c to enable them -changes to configure.in to support the new cpu stuff -a preliminary Makefile.am -maybe something else I'm forgetting Now automake complains that configure.in
2004 Sep 10
2
Altivec, automake
Here's what I listed in that email. Merging doesn't appear to be necessary. If you have any build problems, let me know. Note that my detection code is Darwin-specific. It's a BSD call (sysctl()), so a change to the platform-detection macros should enable it to work on other BSDs. However, I don't know what that would be, and I couldn't determine any safe way to do the check
2005 Jan 29
4
A couple of points about flac 1.1.1 on ppc/linux/altivec
On Thu, 27 Jan 2005, John Steele Scott wrote: > That looks fine to me as well. However, the best solution is something which > Luca suggested a few months ago, which is to use the functions defined in > altivec.h. These are C functions which map directly to Altivec machine > instructions. I am willing to help out, but I don't find the current lpc_asm.s > very easy to follow, and
2007 Sep 01
2
Re: 1.2.0: Test suite failures on LP64 archs?
Christian Weisgerber <naddy@mips.inka.de> wrote: > #0 0x0000000040d18810 in FLAC__lpc_compute_residual_from_qlp_coefficients_wide > (data=0x49e4c014, data_len=110, qlp_coeff=0x7f7ffffece70, order=1, > lp_quantization=14, residual=0x4fced000) at lpc.c:745 > 745 residual[i] = > data[i] - (FLAC__int32)((qlp_coeff[0] *
2013 Oct 04
2
Again about encoding speed of different compiles
I downloaded current version of FLAC sources and compiled it with: * GCC 4.8.1 (MSYS from http://xhmikosr.1f0.de/tools/) * Intel C++ Composer XE 2013 update 5 * MSVS 2010 SP1 * MSVS 2012 update 3 (SSSE3 and SSE4.1 code was disabled for all compilers) Stereo 24-bit WAV file was encoded with -8 preset. Encoding time, in seconds: GCC 32-bit: 209 ICC 32-bit: 130 VS10 32-bit: 116 VS12 32-bit: 114
2005 Jan 01
2
libFLAC bitbuffer optimizations
Josh Coalson <xflac@yahoo.com> wrote: > thanks for the patch. No prob :) > also, if you have miroslav's patch again a more updated version > of bitbuffer.c that would be great. I have been meaning to get > around to applying it for a long time. This is Miroslav's patch, from the mailing list post I dug up in the archives: --- orig/src/libFLAC/bitbuffer.c +++
2005 Oct 25
2
Re: Reg. FLAC decoding
Sorry for the delay in getting back to you., I was working on something else and just now got FLAC to work. Ok., FLAC files are playing now :) Cheers. There is a slight noise happening in the background., which i'm figuring out. I hope that it'll be solved soon. However, i wanted to know if there are any ARM specific optimizations that can be done. The processor is a 166MHz processor. Do
2005 Oct 25
0
Re: Reg. FLAC decoding
--- Joe Steeve <joesteeve@zodiactorp.com> wrote: > Sorry for the delay in getting back to you., I was working on > something > else and just now got FLAC to work. > > Ok., FLAC files are playing now :) Cheers. There is a slight noise > happening in the background., which i'm figuring out. I hope that > it'll > be solved soon. However, i wanted to know if
2004 Dec 28
2
libFLAC bitbuffer optimizations
Pulled from my Arch archive, this following patch seems to have made quite a difference in getting my ARM7TDMI chip to play FLAC (compression levels 0-2) on my ipod. I don't have benchmarks with hard numbers, but playing with skips vs playing without skips is a fairly noticeable difference. memcpy and memset on uClibc are optimized in asm for the ARM7TDMI in uClibc. Other hardware/libc
2014 Jun 28
0
[PATCH 14] preprocessor macros in lpc_intrin_sseN.c
Currently both lpc_intrin_sse2.c and lpc_intrin_sse41.c define macros RESIDUAL_RESULT and DATA_RESULT. This patch changes their names so they become different. Reason: FLAC build systems don't apply specific options (such as -msse4.1) to specific files. So it makes little sense to have separate *_intrin_sseA.c and *_intrin_sseB.c files. IMHO it's not unreasonable to merge