It's not possible to use ia32/*.nasm code in 64-bit compiles. There's still no 64-bit asm code in FLAC. I'm not familiar with asm too, so I wrote SSE-accelerated code using intrinsics. This code uses two new preprocessor macros: FLAC__CPU_X86_64 (analogous to FLAC__CPU_IA32) and FLAC__HAS_X86INTRIN (analogous to FLAC__HAS_NASM) Patch for cpu.c/cpu.h adds CPU features (sse3, ssse3) detection code for x86-64 architecture. Another patch adds SSE-accelerated functions: FLAC__lpc_compute_autocorrelation_intrin_sse_lag_4() FLAC__lpc_compute_autocorrelation_intrin_sse_lag_8() FLAC__lpc_compute_autocorrelation_intrin_sse_lag_12() FLAC__lpc_compute_autocorrelation_intrin_sse_lag_16() FLAC__lpc_compute_residual_from_qlp_coefficients_16_intrin_sse2() Note that the new code works only if both FLAC__CPU_X86_64 and FLAC__HAS_X86INTRIN macros are defined somewhere in config files. Appropriate changes in *.vcproj, makefiles, configure.ac files are necessary. Unfortunately MSVS 2005 Express Edition doesn't support creation of 64-bit programs. -------------- next part -------------- A non-text attachment was scrubbed... Name: cpu.patch Type: application/octet-stream Size: 4483 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20130908/bcf0ee91/attachment-0002.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: intrin.patch Type: application/octet-stream Size: 24537 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20130908/bcf0ee91/attachment-0003.obj
I try to make my code more clean and have a few questions: 1. GCC related: MinGW contains cpuid.h header file that defines __get_cpuid() function. Is it standard? Or is it better to use inline asm to get cpuid info? 2. Is it good idea to do m?nual loop unrolling? Unrolling FLAC__lpc_compute_autocorrelation_intrin_sse_lag_NN gives +2% speed increase for flac -8 and +5% for flac -5. But it's true only for MSVS compiles. When I use Intel C compiler the difference is very small: 0.1% for 'flac -8' and 0.5% for 'flac -5'. 3. How to properly compile functions that use different SSE versions? It seems that a proper way is to create several units of translation and put all functions that use SSE1 to one .c file, all functions that use SSE2 to another .c file, and so on. Is it correct?
Erik de Castro Lopo
2013-Sep-14 23:33 UTC
[flac-dev] PATCH: x86-64 support and SSE intrinscis code
lvqcl wrote:> It's not possible to use ia32/*.nasm code in 64-bit compiles. > There's still no 64-bit asm code in FLAC. I'm not familiar with asm too, > so I wrote SSE-accelerated code using intrinsics. > > This code uses two new preprocessor macros: > FLAC__CPU_X86_64 (analogous to FLAC__CPU_IA32)Ok, I have defined FLAC__CPU_X86_64 in configure.ac.> and FLAC__HAS_X86INTRIN (analogous to FLAC__HAS_NASM)When should FLAC__HAS_X86INTRIN be defined? What header file should I be checking for? Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
Erik de Castro Lopo
2013-Sep-14 23:34 UTC
[flac-dev] x86-64 support and SSE intrinscis code
lvqcl wrote:> I try to make my code more clean and have a few questions: > > 1. GCC related: MinGW contains cpuid.h header file that defines __get_cpuid() > function. Is it standard? Or is it better to use inline asm to get cpuid info?cpuid.h should be good for all GCC related compilers.> 2. Is it good idea to do m?nual loop unrolling? > Unrolling FLAC__lpc_compute_autocorrelation_intrin_sse_lag_NN gives +2% speed > increase for flac -8 and +5% for flac -5. But it's true only for MSVS compiles. > When I use Intel C compiler the difference is very small: 0.1% for 'flac -8' > and 0.5% for 'flac -5'.Would be interested to see what it does for GCC.> 3. How to properly compile functions that use different SSE versions? It seems > that a proper way is to create several units of translation and put all functions > that use SSE1 to one .c file, all functions that use SSE2 to another .c file, > and so on. Is it correct?Yes, that makes sense. Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
Erik de Castro Lopo
2013-Sep-14 23:57 UTC
[flac-dev] PATCH: x86-64 support and SSE intrinscis code
Erik de Castro Lopo wrote:> When should FLAC__HAS_X86INTRIN be defined? What header file should I be > checking for?Ah, should be checking for <x86intrin.h>. The rest seems to be coming together. Testing this now. Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
Erik de Castro Lopo
2013-Sep-15 10:21 UTC
[flac-dev] PATCH: x86-64 support and SSE intrinscis code
lvqcl wrote:> It's not possible to use ia32/*.nasm code in 64-bit compiles. > There's still no 64-bit asm code in FLAC. I'm not familiar with asm too, > so I wrote SSE-accelerated code using intrinsics.Thanks for your work on this. I've applied these patches, updated the configure script to detect the required features and then tweaked things slightly. The biggest of these tweaks weas to disable the intrinsics version fero FLAC__CPU_IA32 because I couldn't get this to compile on i386-linux (and we have the nasm versions). Still open to re-enabling this if someone can get it to work. Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
Erik de Castro Lopo
2013-Sep-15 10:35 UTC
[flac-dev] PATCH: x86-64 support and SSE intrinscis code
Erik de Castro Lopo wrote:> lvqcl wrote: > > > It's not possible to use ia32/*.nasm code in 64-bit compiles. > > There's still no 64-bit asm code in FLAC. I'm not familiar with asm too, > > so I wrote SSE-accelerated code using intrinsics. > > Thanks for your work on this. > > I've applied these patches, updated the configure script to detect > the required features and then tweaked things slightly. > > The biggest of these tweaks weas to disable the intrinsics version > fero FLAC__CPU_IA32 because I couldn't get this to compile on > i386-linux (and we have the nasm versions). Still open to re-enabling > this if someone can get it to work.BTW, tested this on: x86-linux x86-64-linux x86-mingw (cross-compiled from linux) x86_64-mingw (cross-compiled from linux) Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
Erik de Castro Lopo <mle+la at mega-nerd.com> wrote:> The biggest of these tweaks weas to disable the intrinsics version > fero FLAC__CPU_IA32 because I couldn't get this to compile on > i386-linux (and we have the nasm versions). Still open to re-enabling > this if someone can get it to work.I know you're a skilled programmer, but... maybe you forgot to add -msse compiler option? -msse for SSE code, -msse2 for SSE2 code, -msse4.1 for SSE4.1 code