thr3ads.net - search: "xmm"

Displaying 20 results from an estimated 289 matches for "xmm".

Did you mean: mm

expand gridded matrix to higher resolution

2017 Jul 05

expand gridded matrix to higher resolution

...than can do what we need already? I tried matrix with rep, but I am missing some magic there, since it doesn't do what we need. replicate might be promising, but then still need to rearrange the output into the column and row format we need. A simple example: mm=matrix(1:15,nrow=3,byrow = T) xmm=matrix(NA,nrow=nrow(mm)*3,ncol=ncol(mm)*3) for(icol in 1:ncol(mm)) { for(irow in 1:nrow(mm)) { xicol=(icol-1)*3 +c(1:3) xirow=(irow-1)*3 +c(1:3) xmm[xirow,xicol]=mm[irow,icol] } } mm > > mm > [,1] [,2] [,3] [,4] [,5] > [1,] 1 2 3 4 5 > [2,] 6...

[PATCH 2/4] x86/emulator: add emulation of SIMD FP moves

2011 Nov 30

[PATCH 2/4] x86/emulator: add emulation of SIMD FP moves

...: Jan Beulich <jbeulich@suse.com> --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -629,6 +629,60 @@ int main(int argc, char **argv) else printf("skipped\n"); + printf("%-40s", "Testing movsd %xmm5,(%ecx)..."); + memset(res, 0x77, 64); + memset(res + 10, 0x66, 8); + if ( stack_exec && cpu_has_sse2 ) + { + extern const unsigned char movsd_to_mem[]; + + asm volatile ( "movlpd %0, %%xmm5\n\t" + "movhpd %0, %%xmm5\n&quot...

expand gridded matrix to higher resolution

2017 Jul 05

expand gridded matrix to higher resolution

...? >I tried matrix with rep, but I am missing some magic there, since it >doesn't do what we need. >replicate might be promising, but then still need to rearrange the >output into the column and row format we need. > >A simple example: >mm=matrix(1:15,nrow=3,byrow = T) >xmm=matrix(NA,nrow=nrow(mm)*3,ncol=ncol(mm)*3) >for(icol in 1:ncol(mm)) { > for(irow in 1:nrow(mm)) { > xicol=(icol-1)*3 +c(1:3) > xirow=(irow-1)*3 +c(1:3) > xmm[xirow,xicol]=mm[irow,icol] > } >} >mm >> > mm >> [,1] [,2] [,3] [,4] [,5] >> [1,...

variadic functions on X86_64 should (conditionally) save XMM regs even if -no-implicit-float

2019 Jun 04

variadic functions on X86_64 should (conditionally) save XMM regs even if -no-implicit-float

...n't try to push it upstream (and in fact have considered removing it) is due to an unfortunate side-effect which is either "expected" or a "bug" depending on your perspective. The problem is that now, when optimization is disabled, the compiler will *unconditionally* access XMM registers in the prolog of varargs functions. This is *not* the usual code to spill floating point varargs arguments (which is correctly guarded by testing %al). Instead the compiler: Unconditionally: - spills the XMM argument registers Conditionally: - reloads those values - stores them in the v...

expand gridded matrix to higher resolution

2017 Jul 05

expand gridded matrix to higher resolution

...ready? > I tried matrix with rep, but I am missing some magic there, since it doesn't do what we need. > replicate might be promising, but then still need to rearrange the output into the column and row format we need. > > A simple example: > mm=matrix(1:15,nrow=3,byrow = T) > xmm=matrix(NA,nrow=nrow(mm)*3,ncol=ncol(mm)*3) > for(icol in 1:ncol(mm)) { > for(irow in 1:nrow(mm)) { > xicol=(icol-1)*3 +c(1:3) > xirow=(irow-1)*3 +c(1:3) > xmm[xirow,xicol]=mm[irow,icol] > } > } > mm >> > mm >> [,1] [,2] [,3] [,4] [,5] >...

variadic functions on X86_64 should (conditionally) save XMM regs even if -no-implicit-float

2017 Nov 28

variadic functions on X86_64 should (conditionally) save XMM regs even if -no-implicit-float

...deally we would do so only if non-GPR register arguments are actually passed by the caller. This would require a runtime check, and whether or not such a check is feasible depends on the target and ABI. However for X86_64, the standard va_start code already has such a check - the number of vector (XMM) register arguments is stored in %al, and the code normally generated for variadic functions (in the absence of -no-implicit-float) includes a guard around the XMM spill code that checks for %al != 0. Therefore I believe it would be "in the spirit" of -no-implicit-float to remove the NoI...

[RFC PATCH v6 79/92] kvm: x86: emulate movsd xmm, m64

2019 Aug 09

[RFC PATCH v6 79/92] kvm: x86: emulate movsd xmm, m64

...+++ b/arch/x86/kvm/emulate.c @@ -1177,6 +1177,27 @@ static int em_fnstsw(struct x86_emulate_ctxt *ctxt) return X86EMUL_CONTINUE; } +static u8 simd_prefix_to_bytes(const struct x86_emulate_ctxt *ctxt, + int simd_prefix) +{ + u8 bytes; + + switch (ctxt->b) { + case 0x11: + /* movsd xmm, m64 */ + /* movups xmm, m128 */ + if (simd_prefix == 0xf2) { + bytes = 8; + break; + } + /* fallthrough */ + default: + bytes = 16; + break; + } + return bytes; +} + static void decode_register_operand(struct x86_emulate_ctxt *ctxt, struct operand *op) { @@ -1187,7 +1208,7 @@...

[PATCH] Make SSE Run Time option.

2004 Aug 06

[PATCH] Make SSE Run Time option.

...y be relevant; However, even though 3dNow extended is a bastardized version of SSE, it still supports the same instructions, and that is what is important- I don't think we intend to add any AMD specfic code. The real issue is cross CPU SSE support, and whether in addition there is access to XMM registers or not- whether the OS actually supports XMM as well. We have a fair amount of other stuff we do in assembler, much of which requires SSE instruction sets but *not* XMM registers, and some of which is just MMX only. In speex, I can see how you would always want to use the widest reg...

[LLVMdev] XMM in X86 Backend

2010 Jun 07

[LLVMdev] XMM in X86 Backend

Hi all, I am observing an excessive use of xmm registers in the output assembly produced by x86 backend. Basically, for a code like this double test(double a, double b) { double c; c = 1.0 + sin (a + b*b); return c; } llc produced somthing like.... movsd 16(%ebp), %xmm0 mulsd %xmm0, %xmm0 addsd 8(%ebp), %xmm0...

[LLVMdev] Bug #16941

2013 Oct 25

[LLVMdev] Bug #16941

Nadav, The problem appears only for vectors longer than available hardware register (in doubleword elements, i.e. more than 4 on SSE4 and more than 8 on AVX). Select does weird thing. <8 x i1> mask comes as two XMM registers, select converts them to a single XMM registers (i.e. 8 x 16 bit), immediately after it converts back to two XMM registers and does blend. Conversion forth and back has huge overhead. I'm attaching 3 files with vectors of length 4, 8 and 16. Try 4 on SEE4 and you'll see that both...

[LLVMdev] Bug in x86-64/Win64 Calling Convention

2009 Mar 04

[LLVMdev] Bug in x86-64/Win64 Calling Convention

Hello, I think I've found a bug in the calling convention support for X86-64/ Win64. It doesn't correctly save and restore the XMM registers in the function prolog/epilog. (The problem only exists on Win64, since Linux and Mac OS use calling convention in which these registers are volatile and not callee-saved.) X86RegisterInfo::getCalleeSavedRegs() when called for a Win64 target does return an array of registers whi...

[LLVMdev] Missuse of xmm register on X86-64

2010 May 07

[LLVMdev] Missuse of xmm register on X86-64

All, I've been working on a new scheduler and have somehow affected register selection. My problem is that an xmm register is being used as an index expression. Specifically, addss (%xmm1,%rax,4), %xmm0 I like the idea of a floating-point index, but, like the assembler, I don't know what that means. Any suggestions on where I should look for a solution to my problem? Aran -------------- next part...

[PATCH] simpler xmm -> int64 code

2014 Aug 13

[PATCH] simpler xmm -> int64 code

This patch simplifies XMM -> int64 conversion in fixed_intrin_sse2.c and fixed_intrin_ssse3.c -------------- next part -------------- A non-text attachment was scrubbed... Name: fixed_sse.zip Type: application/zip Size: 778 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20140813/49f1...

SIMD interest

2004 Aug 06

SIMD interest

Greetings, <p>my apologies for putting this trash in the mailing list but the topic about SSE run-time option interested me pretty much. Looks like some people is really experienced on the topic. I would really appreciate if somebody could point me to good resources about SSE and Altivec (not necessarly on the net, I'm ready to invest some money if necessary). I already have intel

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

[PATCH] Make SSE Run Time option. Add Win32 SSE code

Jean-Marc, There is a big difference between SSE and SSEFP. The SSEFP means that the CPU supports the xmm registers. All Intel chips with SSE support do, however no current 32 bit AMD chips support the XMM registers. They will support the SSE instructions but not those registers. You are right about the SSE2 not being used. The AMD Opterons are the first AMD CPU's which support xmm registers. T...

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 09

[LLVMdev] Calling conventions for YMM registers on AVX

I'll explain what we see in the code. 1. The caller saves XMM registers across the call if needed (according to DEFS definition). YMMs are not in the set, so caller does not take care. 2. The callee preserves XMMs but works with YMMs and clobbering them. 3. So after the call, the upper part of YMM is gone. - Elena -----Original Message----- From: Bruno Card...

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

[PATCH] Make SSE Run Time option. Add Win32 SSE code

Jean-Marc, >I'm still not sure I get it. On an Athlon XP, I can do something like >"mulps xmm0, xmm1", which means that the xmm registers are indeed >supported. Besides, without the xmm registers, you can't use much of >SSE. In the Atholon XP 2400+ that we have in our QA lab (Win2000 ) if you run that code it generates an Illegal Instruction Error. In addition, an AMD Du...

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 09

[LLVMdev] Calling conventions for YMM registers on AVX

On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote: > > On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > >> I'll explain what we see in the code. >> 1. The caller saves XMM registers across the call if needed (according to DEFS definition). >> YMMs are not in the set, so caller does not take care. > > This is not how the register allocator works. It saves the registers holding values, it doesn't care which alias is clobbered. > > Are you saying...

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 09

[LLVMdev] Calling conventions for YMM registers on AVX

On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > I'll explain what we see in the code. > 1. The caller saves XMM registers across the call if needed (according to DEFS definition). > YMMs are not in the set, so caller does not take care. This is not how the register allocator works. It saves the registers holding values, it doesn't care which alias is clobbered. Are you saying that only the xmm part...

[LLVMdev] Passing a 256 bit integer vector with XMM registers

2013 Sep 20

[LLVMdev] Passing a 256 bit integer vector with XMM registers

I am implementing a new calling convention for X86 which requires to pass a 256 bit integer vector with two XMM registers rather than one YMM register. For example define <8 x i32> @add(<8 x i32> %a, <8 x i32> %b) { %add = add <8 x i32> %a, %b ret <8 x i32> %add } With march=X86-64 and mcpu=corei7-avx, llc with the default calling convention generates the following code...

search for: xmm