thr3ads.net - similar to: "[PATCH] simpler xmm -> int64 code"

Displaying 20 results from an estimated 4000 matches similar to: "[PATCH] simpler xmm -> int64 code"

variadic functions on X86_64 should (conditionally) save XMM regs even if -no-implicit-float

2019 Jun 04

variadic functions on X86_64 should (conditionally) save XMM regs even if -no-implicit-float

Thanks for reviving this topic! Interestingly we have essentially the same fix you mention below ( https://reviews.llvm.org/D62639) as a local change in our Wind River version of LLVM. The reason we didn't try to push it upstream (and in fact have considered removing it) is due to an unfortunate side-effect which is either "expected" or a "bug" depending on your

variadic functions on X86_64 should (conditionally) save XMM regs even if -no-implicit-float

2017 Nov 28

variadic functions on X86_64 should (conditionally) save XMM regs even if -no-implicit-float

Specifying -no-implicit-float prevents LLVM from using non-GPR registers for purely integer operations. This is useful for operating systems (such as Wind River's VxWorks) that support tasks that do not save all registers on context switch. This presents an interesting problem for variadic functions that may optionally take non-integer arguments (e.g. printf style functions). Should non-GPR

[LLVMdev] XMM in X86 Backend

2010 Jun 07

[LLVMdev] XMM in X86 Backend

Hi all, I am observing an excessive use of xmm registers in the output assembly produced by x86 backend. Basically, for a code like this double test(double a, double b) { double c; c = 1.0 + sin (a + b*b); return c; } llc produced somthing like.... movsd 16(%ebp), %xmm0 mulsd %xmm0, %xmm0 addsd 8(%ebp), %xmm0 movsd %xmm0, (%esp) ....... fstpl

[LLVMdev] Missuse of xmm register on X86-64

2010 May 07

[LLVMdev] Missuse of xmm register on X86-64

All, I've been working on a new scheduler and have somehow affected register selection. My problem is that an xmm register is being used as an index expression. Specifically, addss (%xmm1,%rax,4), %xmm0 I like the idea of a floating-point index, but, like the assembler, I don't know what that means. Any suggestions on where I should look for a solution to my problem?

[RFC PATCH v6 79/92] kvm: x86: emulate movsd xmm, m64

2019 Aug 09

[RFC PATCH v6 79/92] kvm: x86: emulate movsd xmm, m64

From: Mihai Don?u <mdontu at bitdefender.com> This is needed in order to be able to support guest code that uses movsd to write into pages that are marked for write tracking. Signed-off-by: Mihai Don?u <mdontu at bitdefender.com> Signed-off-by: Adalbert Laz?r <alazar at bitdefender.com> --- arch/x86/kvm/emulate.c | 32 +++++++++++++++++++++++++++----- 1 file changed, 27

[LLVMdev] Simpler subreg ops in machine code IR

2010 Jun 16

[LLVMdev] Simpler subreg ops in machine code IR

> 1. copyRegToReg() won't be able to use register classes to pick a copy opcode. For instance, an XMM register will no longer be copied by MOVSS or MOVSD. Given just the physical register, MOVAPS will be used. Is that a problem? I haven't had time to really look into it, but have been playing around with the idea that instead of two register classes copyRegToReg and some of the load

PATCH for lpc_intrin_sse41.c: faster shifts

2014 Jan 24

PATCH for lpc_intrin_sse41.c: faster shifts

It turns out that int64 shift is quite slow... This patch changes the code from: (FLAC__int32)(xmm.m128i_i64[0] >> lp_quantization) into: _mm_cvtsi128_si32(_mm_srli_epi64(xmm, lp_quantization)); Encoding of 24-bit .wav files with 32-bit FLAC became noticeably faster. The new code works only if quantization <= 32, but its max value is 15 so the code always work. (max_shiftlimit == (1

[LLVMdev] Simpler subreg ops in machine code IR

2010 Jun 15

[LLVMdev] Simpler subreg ops in machine code IR

I am considering adding a new target independent codegen-only COPY instruction to our MachineInstr representation. It would be used to replace INSERT_SUBREG, EXTRACT_SUBREG, and virtual register copies after instruction selection. Selection DAG still needs {INSERT,EXTRACT}_SUBREG, but they would not appear as MachineInstrs any longer. The COPY instruction handles subreg operations with less

expand gridded matrix to higher resolution

2017 Jul 05

expand gridded matrix to higher resolution

You probably ought to be using the raster package. See the CRAN Spatial Task View. -- Sent from my phone. Please excuse my brevity. On July 5, 2017 12:20:28 AM PDT, "Anthoni, Peter (IMK)" <peter.anthoni at kit.edu> wrote: >Hi all, >(if me email goes out as html, than my email client don't do as told, >and I apologies already.) > >We need to downscale climate

expand gridded matrix to higher resolution

2017 Jul 05

expand gridded matrix to higher resolution

Hi all, (if me email goes out as html, than my email client don't do as told, and I apologies already.) We need to downscale climate data and therefore first need to expand the climate from 0.5deg to the higher resolution 10min, before we can add high resolution deviations. We basically need to have the original data at each gridcell replicated into 3x3 gridcells. A simple for loop can do

expand gridded matrix to higher resolution

2017 Jul 05

expand gridded matrix to higher resolution

Hi Peter, apply(t(apply(mm,1,rep,each=3)),2,rep,each=3) Jim On Wed, Jul 5, 2017 at 5:20 PM, Anthoni, Peter (IMK) <peter.anthoni at kit.edu> wrote: > Hi all, > (if me email goes out as html, than my email client don't do as told, and I apologies already.) > > We need to downscale climate data and therefore first need to expand the climate from 0.5deg to the higher

use xmm intrinsics for lrintf() with mingw-w64

2015 Mar 21

use xmm intrinsics for lrintf() with mingw-w64

The following tiny patches make opus and opusfile to use xmm intrinsics for lrintf() with mingw-w64 builds when targetting x64 instead of their default x87 asm. Regards. -- O.S. diff --git a/celt/float_cast.h b/celt/float_cast.h index ed5a39b..b9b8484 100644 --- a/celt/float_cast.h +++ b/celt/float_cast.h @@ -61,7 +61,14 @@ ** the config.h file. */ -#if (HAVE_LRINTF) +#if

[LLVMdev] Passing a 256 bit integer vector with XMM registers

2013 Sep 20

[LLVMdev] Passing a 256 bit integer vector with XMM registers

I am implementing a new calling convention for X86 which requires to pass a 256 bit integer vector with two XMM registers rather than one YMM register. For example define <8 x i32> @add(<8 x i32> %a, <8 x i32> %b) { %add = add <8 x i32> %a, %b ret <8 x i32> %add } With march=X86-64 and mcpu=corei7-avx, llc with the default calling convention generates the

[LLVMdev] spilling & xmm register usage

2010 Sep 29

[LLVMdev] spilling & xmm register usage

On Sep 29, 2010, at 8:35 AMPDT, Ralf Karrenberg wrote: > Hello everybody, > > I have stumbled upon a test case (the attached module is a slightly > reduced version) that shows extremely reduced performance on linux > compared to windows when executed using LLVM's JIT. > > We narrowed the problem down to the actual code being generated, the > source IR on both systems

[PATCH] Make SSE Run Time option.

2004 Aug 06

[PATCH] Make SSE Run Time option.

Hi Jean Marc, I think there is just a confusion over terminology going on here- I agree that support for 3dnow base version may not necessarily be relevant; However, even though 3dNow extended is a bastardized version of SSE, it still supports the same instructions, and that is what is important- I don't think we intend to add any AMD specfic code. The real issue is cross CPU SSE support,

[LLVMdev] Bug #16941

2013 Oct 25

[LLVMdev] Bug #16941

Nadav, The problem appears only for vectors longer than available hardware register (in doubleword elements, i.e. more than 4 on SSE4 and more than 8 on AVX). Select does weird thing. <8 x i1> mask comes as two XMM registers, select converts them to a single XMM registers (i.e. 8 x 16 bit), immediately after it converts back to two XMM registers and does blend. Conversion forth and back has

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

[PATCH] Make SSE Run Time option. Add Win32 SSE code

Jean-Marc, >I'm still not sure I get it. On an Athlon XP, I can do something like >"mulps xmm0, xmm1", which means that the xmm registers are indeed >supported. Besides, without the xmm registers, you can't use much of >SSE. In the Atholon XP 2400+ that we have in our QA lab (Win2000 ) if you run that code it generates an Illegal Instruction Error. In addition,

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 09

[LLVMdev] Calling conventions for YMM registers on AVX

On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > I'll explain what we see in the code. > 1. The caller saves XMM registers across the call if needed (according to DEFS definition). > YMMs are not in the set, so caller does not take care. This is not how the register allocator works. It saves the registers holding values, it doesn't care which alias is clobbered. Are you

[LLVMdev] Bug #16941

2013 Oct 26

[LLVMdev] Bug #16941

Hi Dmitry, Yes, this is a known problem with legalizing vector masks. The type <8 x i1> is legalized to 8 x i16, on SSE, but your operands are legalized to <4 x i32>. Type-legalization is performed per-node and we don’t have a good way to support instructions that mix the mask and operand type. Why does ISPC generate illegal vector types ? Does ISPC rely on the LLVM codegen to

[LLVMdev] llvm register reload/spilling around calls

2010 Oct 20

[LLVMdev] llvm register reload/spilling around calls

On 20.10.2010 05:00, Jakob Stoklund Olesen wrote: > On Oct 19, 2010, at 6:37 PM, Roland Scheidegger wrote: > >> Thanks for giving it a look! >> >> On 19.10.2010 23:21, Jakob Stoklund Olesen wrote: >>> On Oct 19, 2010, at 11:40 AM, Roland Scheidegger wrote: >>> >>>> So I saw that the code is doing lots of register >>>>

similar to: [PATCH] simpler xmm -> int64 code