thr3ads.net - similar to: "use xmm intrinsics for lrintf() with mingw-w64"

Displaying 18 results from an estimated 18 matches similar to: "use xmm intrinsics for lrintf() with mingw-w64"

Prefer SSE and ASM implementation of float2int before lrintf for MSVC patch

2020 Jun 14

Prefer SSE and ASM implementation of float2int before lrintf for MSVC patch

This commit https://github.com/xiph/opus/commit/94b68f341cadd5433a10d346c1c248a641d8be57 Enabled HAVE_LRINTF defined in CMake builds. As later versions of visual studio have LRINTF it got enabled by default due to precedence over SSE in MSVC. The use of lrintf is a lot slower which can easily be seen in the tests >From test result Windows X64 (similar results on X86): LRINTF 4/4 Test #4:

[LLVMdev] Missuse of xmm register on X86-64

2010 May 07

[LLVMdev] Missuse of xmm register on X86-64

All, I've been working on a new scheduler and have somehow affected register selection. My problem is that an xmm register is being used as an index expression. Specifically, addss (%xmm1,%rax,4), %xmm0 I like the idea of a floating-point index, but, like the assembler, I don't know what that means. Any suggestions on where I should look for a solution to my problem?

[LLVMdev] XMM in X86 Backend

2010 Jun 07

[LLVMdev] XMM in X86 Backend

Hi all, I am observing an excessive use of xmm registers in the output assembly produced by x86 backend. Basically, for a code like this double test(double a, double b) { double c; c = 1.0 + sin (a + b*b); return c; } llc produced somthing like.... movsd 16(%ebp), %xmm0 mulsd %xmm0, %xmm0 addsd 8(%ebp), %xmm0 movsd %xmm0, (%esp) ....... fstpl

[LLVMdev] Passing a 256 bit integer vector with XMM registers

2013 Sep 20

[LLVMdev] Passing a 256 bit integer vector with XMM registers

I am implementing a new calling convention for X86 which requires to pass a 256 bit integer vector with two XMM registers rather than one YMM register. For example define <8 x i32> @add(<8 x i32> %a, <8 x i32> %b) { %add = add <8 x i32> %a, %b ret <8 x i32> %add } With march=X86-64 and mcpu=corei7-avx, llc with the default calling convention generates the

[RFC PATCH v6 79/92] kvm: x86: emulate movsd xmm, m64

2019 Aug 09

[RFC PATCH v6 79/92] kvm: x86: emulate movsd xmm, m64

From: Mihai Don?u <mdontu at bitdefender.com> This is needed in order to be able to support guest code that uses movsd to write into pages that are marked for write tracking. Signed-off-by: Mihai Don?u <mdontu at bitdefender.com> Signed-off-by: Adalbert Laz?r <alazar at bitdefender.com> --- arch/x86/kvm/emulate.c | 32 +++++++++++++++++++++++++++----- 1 file changed, 27

[PATCH] simpler xmm -> int64 code

2014 Aug 13

[PATCH] simpler xmm -> int64 code

This patch simplifies XMM -> int64 conversion in fixed_intrin_sse2.c and fixed_intrin_ssse3.c -------------- next part -------------- A non-text attachment was scrubbed... Name: fixed_sse.zip Type: application/zip Size: 778 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20140813/49f18196/attachment.zip

[libopusfile PATCH] build: implement autotools build system for libopusfile. (v3)

2012 Sep 25

[libopusfile PATCH] build: implement autotools build system for libopusfile. (v3)

This time it's complete with assertions on make debug, proper ./configure switches for the optional features, visibility and warning flags, and summary at the end of the configuration. Signed-off-by: Diego Elio Petten? <flameeyes at flameeyes.eu> --- .gitignore | 29 +++++ Makefile.am | 24 +++++ autogen.sh | 3 + configure.ac | 66 ++++++++++++

[libopusfile PATCH] build: implement autotools build system for libopusfile. (v2)

2012 Sep 25

[libopusfile PATCH] build: implement autotools build system for libopusfile. (v2)

variadic functions on X86_64 should (conditionally) save XMM regs even if -no-implicit-float

2019 Jun 04

variadic functions on X86_64 should (conditionally) save XMM regs even if -no-implicit-float

Thanks for reviving this topic! Interestingly we have essentially the same fix you mention below ( https://reviews.llvm.org/D62639) as a local change in our Wind River version of LLVM. The reason we didn't try to push it upstream (and in fact have considered removing it) is due to an unfortunate side-effect which is either "expected" or a "bug" depending on your

variadic functions on X86_64 should (conditionally) save XMM regs even if -no-implicit-float

2017 Nov 28

variadic functions on X86_64 should (conditionally) save XMM regs even if -no-implicit-float

Specifying -no-implicit-float prevents LLVM from using non-GPR registers for purely integer operations. This is useful for operating systems (such as Wind River's VxWorks) that support tasks that do not save all registers on context switch. This presents an interesting problem for variadic functions that may optionally take non-integer arguments (e.g. printf style functions). Should non-GPR

[LLVMdev] spilling & xmm register usage

2010 Sep 29

[LLVMdev] spilling & xmm register usage

On Sep 29, 2010, at 8:35 AMPDT, Ralf Karrenberg wrote: > Hello everybody, > > I have stumbled upon a test case (the attached module is a slightly > reduced version) that shows extremely reduced performance on linux > compared to windows when executed using LLVM's JIT. > > We narrowed the problem down to the actual code being generated, the > source IR on both systems

[libopusfile PATCH] build: implement autotools build system for libopusfile. (v4)

2012 Sep 29

[libopusfile PATCH] build: implement autotools build system for libopusfile. (v4)

Includes - A make debug target that disables optimizations and enables assertions, - Proper ./configure switches for the optional features, - A configuration summary, - libtool versioning information, - Visibility and warning flags, - API documentation, and - Support for out-of-tree builds. Signed-off-by: Diego Elio Petten? <flameeyes at flameeyes.eu> --- .gitignore | 29 +++++

[LLVMdev] spilling & xmm register usage

2010 Sep 29

[LLVMdev] spilling & xmm register usage

Hello everybody, I have stumbled upon a test case (the attached module is a slightly reduced version) that shows extremely reduced performance on linux compared to windows when executed using LLVM's JIT. We narrowed the problem down to the actual code being generated, the source IR on both systems is the same. Try compiling the attached module: llc -O3 -filetype=asm -o BAD.s BAD.ll Under

Opus decoding performance on ARM devices

2014 Sep 04

Opus decoding performance on ARM devices

Hi Dan, I suggest you try the code in git master, which has further ARM optimizations compared to 1.1. Cheers, Jean-Marc On 04/09/14 08:00 AM, Dan Nilsson wrote: > Hi everyone, > > I have lately been evaluating the performance of various audio decoders, > particularly for ARM devices (Cortex A8 / A9). The context is audio > playback in a game engine, and thus decoding

No subject

2010 Jun 07

No subject

eds to drag in the source and header files from the libcelt directory into = the project and define HAVE_CONFIG_H is the project's pre-processor definit= ion. The tricky part is to build a config.h file. To get it to work on VS some = of the important settings include. =20 #define CELT_BUILD #define USE_ALLOCA #undef VAR_ARRAYS #undef restrict #undef HAVE_STDINT_H #undef inline #define

Opus decoding performance on ARM devices

2014 Sep 04

Opus decoding performance on ARM devices

Hi everyone, I have lately been evaluating the performance of various audio decoders, particularly for ARM devices (Cortex A8 / A9). The context is audio playback in a game engine, and thus decoding performance is of particular interest. Looking at Opus versus Vorbis on a Cortex A9 smartphone, the numbers look approximately like this: Vorbis (tremolo decoder) 9.3 Mb PCM/s Opus (libopus 1.1)

Opus decoding performance on ARM devices

2014 Sep 05

Opus decoding performance on ARM devices

Hi, Thank you for your response. I pulled yesterday to commit da97db1ca1f92592af3534c9a2596da0e9a009ca, added a bunch of more defines to my compile options, and assembled & linked in armopts.s,celt_pitch_xcorr_arm.s. Performance jumped up from about 4.8 Mb/s to 5.3 Mb/s on the same device, so it is improvement. Not sure what other tweaks there would be to try, but if it could match the

Opus cmake build

2019 Apr 14

Opus cmake build

Hi Marcus, Thanks for the fixes. I did some more cmake build testing and encountered a few issues: The option -DFORTIFY_SOURCE=2 should be -D_FORTIFY_SOURCE=2, as the macro has a leading underscore. In the autotools build it defines this if it is not already defined (m4/ax_add_fortify_source.m4). When custom modes are not enabled, the cmake build is nevertheless installing the include file

similar to: use xmm intrinsics for lrintf() with mingw-w64