similar to: use xmm intrinsics for lrintf() with mingw-w64

Displaying 18 results from an estimated 18 matches similar to: "use xmm intrinsics for lrintf() with mingw-w64"

2020 Jun 14
0
Prefer SSE and ASM implementation of float2int before lrintf for MSVC patch
This commit https://github.com/xiph/opus/commit/94b68f341cadd5433a10d346c1c248a641d8be57 Enabled HAVE_LRINTF defined in CMake builds. As later versions of visual studio have LRINTF it got enabled by default due to precedence over SSE in MSVC. The use of lrintf is a lot slower which can easily be seen in the tests >From test result Windows X64 (similar results on X86): LRINTF 4/4 Test #4:
2010 May 07
1
[LLVMdev] Missuse of xmm register on X86-64
All, I've been working on a new scheduler and have somehow affected register selection. My problem is that an xmm register is being used as an index expression. Specifically, addss (%xmm1,%rax,4), %xmm0 I like the idea of a floating-point index, but, like the assembler, I don't know what that means. Any suggestions on where I should look for a solution to my problem?
2010 Jun 07
1
[LLVMdev] XMM in X86 Backend
Hi all, I am observing an excessive use of xmm registers in the output assembly produced by x86 backend. Basically, for a code like this double test(double a, double b) { double c; c = 1.0 + sin (a + b*b); return c; } llc produced somthing like.... movsd 16(%ebp), %xmm0 mulsd %xmm0, %xmm0 addsd 8(%ebp), %xmm0 movsd %xmm0, (%esp) ....... fstpl
2013 Sep 20
0
[LLVMdev] Passing a 256 bit integer vector with XMM registers
I am implementing a new calling convention for X86 which requires to pass a 256 bit integer vector with two XMM registers rather than one YMM register. For example define <8 x i32> @add(<8 x i32> %a, <8 x i32> %b) { %add = add <8 x i32> %a, %b ret <8 x i32> %add } With march=X86-64 and mcpu=corei7-avx, llc with the default calling convention generates the
2019 Aug 09
0
[RFC PATCH v6 79/92] kvm: x86: emulate movsd xmm, m64
From: Mihai Don?u <mdontu at bitdefender.com> This is needed in order to be able to support guest code that uses movsd to write into pages that are marked for write tracking. Signed-off-by: Mihai Don?u <mdontu at bitdefender.com> Signed-off-by: Adalbert Laz?r <alazar at bitdefender.com> --- arch/x86/kvm/emulate.c | 32 +++++++++++++++++++++++++++----- 1 file changed, 27
2014 Aug 13
1
[PATCH] simpler xmm -> int64 code
This patch simplifies XMM -> int64 conversion in fixed_intrin_sse2.c and fixed_intrin_ssse3.c -------------- next part -------------- A non-text attachment was scrubbed... Name: fixed_sse.zip Type: application/zip Size: 778 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20140813/49f18196/attachment.zip
2012 Sep 25
0
[libopusfile PATCH] build: implement autotools build system for libopusfile. (v3)
This time it's complete with assertions on make debug, proper ./configure switches for the optional features, visibility and warning flags, and summary at the end of the configuration. Signed-off-by: Diego Elio Petten? <flameeyes at flameeyes.eu> --- .gitignore | 29 +++++ Makefile.am | 24 +++++ autogen.sh | 3 + configure.ac | 66 ++++++++++++
2012 Sep 25
3
[libopusfile PATCH] build: implement autotools build system for libopusfile. (v2)
This time it's complete with assertions on make debug, proper ./configure switches for the optional features, visibility and warning flags, and summary at the end of the configuration. Signed-off-by: Diego Elio Petten? <flameeyes at flameeyes.eu> --- .gitignore | 29 +++++ Makefile.am | 24 +++++ configure.ac | 66 ++++++++++++ m4/attributes.m4 | 321
2019 Jun 04
2
variadic functions on X86_64 should (conditionally) save XMM regs even if -no-implicit-float
Thanks for reviving this topic! Interestingly we have essentially the same fix you mention below ( https://reviews.llvm.org/D62639) as a local change in our Wind River version of LLVM. The reason we didn't try to push it upstream (and in fact have considered removing it) is due to an unfortunate side-effect which is either "expected" or a "bug" depending on your
2017 Nov 28
2
variadic functions on X86_64 should (conditionally) save XMM regs even if -no-implicit-float
Specifying -no-implicit-float prevents LLVM from using non-GPR registers for purely integer operations. This is useful for operating systems (such as Wind River's VxWorks) that support tasks that do not save all registers on context switch. This presents an interesting problem for variadic functions that may optionally take non-integer arguments (e.g. printf style functions). Should non-GPR
2010 Sep 29
0
[LLVMdev] spilling & xmm register usage
On Sep 29, 2010, at 8:35 AMPDT, Ralf Karrenberg wrote: > Hello everybody, > > I have stumbled upon a test case (the attached module is a slightly > reduced version) that shows extremely reduced performance on linux > compared to windows when executed using LLVM's JIT. > > We narrowed the problem down to the actual code being generated, the > source IR on both systems
2012 Sep 29
2
[libopusfile PATCH] build: implement autotools build system for libopusfile. (v4)
Includes - A make debug target that disables optimizations and enables assertions, - Proper ./configure switches for the optional features, - A configuration summary, - libtool versioning information, - Visibility and warning flags, - API documentation, and - Support for out-of-tree builds. Signed-off-by: Diego Elio Petten? <flameeyes at flameeyes.eu> --- .gitignore | 29 +++++
2010 Sep 29
3
[LLVMdev] spilling & xmm register usage
Hello everybody, I have stumbled upon a test case (the attached module is a slightly reduced version) that shows extremely reduced performance on linux compared to windows when executed using LLVM's JIT. We narrowed the problem down to the actual code being generated, the source IR on both systems is the same. Try compiling the attached module: llc -O3 -filetype=asm -o BAD.s BAD.ll Under
2014 Sep 04
0
Opus decoding performance on ARM devices
Hi Dan, I suggest you try the code in git master, which has further ARM optimizations compared to 1.1. Cheers, Jean-Marc On 04/09/14 08:00 AM, Dan Nilsson wrote: > Hi everyone, > > I have lately been evaluating the performance of various audio decoders, > particularly for ARM devices (Cortex A8 / A9). The context is audio > playback in a game engine, and thus decoding
2010 Jun 07
0
No subject
eds to drag in the source and header files from the libcelt directory into = the project and define HAVE_CONFIG_H is the project's pre-processor definit= ion. The tricky part is to build a config.h file. To get it to work on VS some = of the important settings include. =20 #define CELT_BUILD #define USE_ALLOCA #undef VAR_ARRAYS #undef restrict #undef HAVE_STDINT_H #undef inline #define
2014 Sep 04
2
Opus decoding performance on ARM devices
Hi everyone, I have lately been evaluating the performance of various audio decoders, particularly for ARM devices (Cortex A8 / A9). The context is audio playback in a game engine, and thus decoding performance is of particular interest. Looking at Opus versus Vorbis on a Cortex A9 smartphone, the numbers look approximately like this: Vorbis (tremolo decoder) 9.3 Mb PCM/s Opus (libopus 1.1)
2014 Sep 05
2
Opus decoding performance on ARM devices
Hi, Thank you for your response. I pulled yesterday to commit da97db1ca1f92592af3534c9a2596da0e9a009ca, added a bunch of more defines to my compile options, and assembled & linked in armopts.s,celt_pitch_xcorr_arm.s. Performance jumped up from about 4.8 Mb/s to 5.3 Mb/s on the same device, so it is improvement. Not sure what other tweaks there would be to try, but if it could match the
2019 Apr 14
1
Opus cmake build
Hi Marcus, Thanks for the fixes. I did some more cmake build testing and encountered a few issues: The option -DFORTIFY_SOURCE=2 should be -D_FORTIFY_SOURCE=2, as the macro has a leading underscore. In the autotools build it defines this if it is not already defined (m4/ax_add_fortify_source.m4). When custom modes are not enabled, the cmake build is nevertheless installing the include file