search for: xmm

Displaying 20 results from an estimated 289 matches for "xmm".

Did you mean: mm
2017 Jul 05
4
expand gridded matrix to higher resolution
...than can do what we need already? I tried matrix with rep, but I am missing some magic there, since it doesn't do what we need. replicate might be promising, but then still need to rearrange the output into the column and row format we need. A simple example: mm=matrix(1:15,nrow=3,byrow = T) xmm=matrix(NA,nrow=nrow(mm)*3,ncol=ncol(mm)*3) for(icol in 1:ncol(mm)) { for(irow in 1:nrow(mm)) { xicol=(icol-1)*3 +c(1:3) xirow=(irow-1)*3 +c(1:3) xmm[xirow,xicol]=mm[irow,icol] } } mm > > mm > [,1] [,2] [,3] [,4] [,5] > [1,] 1 2 3 4 5 > [2,] 6...
2011 Nov 30
0
[PATCH 2/4] x86/emulator: add emulation of SIMD FP moves
...: Jan Beulich <jbeulich@suse.com> --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -629,6 +629,60 @@ int main(int argc, char **argv) else printf("skipped\n"); + printf("%-40s", "Testing movsd %xmm5,(%ecx)..."); + memset(res, 0x77, 64); + memset(res + 10, 0x66, 8); + if ( stack_exec && cpu_has_sse2 ) + { + extern const unsigned char movsd_to_mem[]; + + asm volatile ( "movlpd %0, %%xmm5\n\t" + "movhpd %0, %%xmm5\n&quot...
2017 Jul 05
0
expand gridded matrix to higher resolution
...? >I tried matrix with rep, but I am missing some magic there, since it >doesn't do what we need. >replicate might be promising, but then still need to rearrange the >output into the column and row format we need. > >A simple example: >mm=matrix(1:15,nrow=3,byrow = T) >xmm=matrix(NA,nrow=nrow(mm)*3,ncol=ncol(mm)*3) >for(icol in 1:ncol(mm)) { > for(irow in 1:nrow(mm)) { > xicol=(icol-1)*3 +c(1:3) > xirow=(irow-1)*3 +c(1:3) > xmm[xirow,xicol]=mm[irow,icol] > } >} >mm >> > mm >> [,1] [,2] [,3] [,4] [,5] >> [1,...
2019 Jun 04
2
variadic functions on X86_64 should (conditionally) save XMM regs even if -no-implicit-float
...n't try to push it upstream (and in fact have considered removing it) is due to an unfortunate side-effect which is either "expected" or a "bug" depending on your perspective. The problem is that now, when optimization is disabled, the compiler will *unconditionally* access XMM registers in the prolog of varargs functions. This is *not* the usual code to spill floating point varargs arguments (which is correctly guarded by testing %al). Instead the compiler: Unconditionally: - spills the XMM argument registers Conditionally: - reloads those values - stores them in the v...
2017 Jul 05
0
expand gridded matrix to higher resolution
...ready? > I tried matrix with rep, but I am missing some magic there, since it doesn't do what we need. > replicate might be promising, but then still need to rearrange the output into the column and row format we need. > > A simple example: > mm=matrix(1:15,nrow=3,byrow = T) > xmm=matrix(NA,nrow=nrow(mm)*3,ncol=ncol(mm)*3) > for(icol in 1:ncol(mm)) { > for(irow in 1:nrow(mm)) { > xicol=(icol-1)*3 +c(1:3) > xirow=(irow-1)*3 +c(1:3) > xmm[xirow,xicol]=mm[irow,icol] > } > } > mm >> > mm >> [,1] [,2] [,3] [,4] [,5] >...
2017 Nov 28
2
variadic functions on X86_64 should (conditionally) save XMM regs even if -no-implicit-float
...deally we would do so only if non-GPR register arguments are actually passed by the caller. This would require a runtime check, and whether or not such a check is feasible depends on the target and ABI. However for X86_64, the standard va_start code already has such a check - the number of vector (XMM) register arguments is stored in %al, and the code normally generated for variadic functions (in the absence of -no-implicit-float) includes a guard around the XMM spill code that checks for %al != 0. Therefore I believe it would be "in the spirit" of -no-implicit-float to remove the NoI...
2019 Aug 09
0
[RFC PATCH v6 79/92] kvm: x86: emulate movsd xmm, m64
...+++ b/arch/x86/kvm/emulate.c @@ -1177,6 +1177,27 @@ static int em_fnstsw(struct x86_emulate_ctxt *ctxt) return X86EMUL_CONTINUE; } +static u8 simd_prefix_to_bytes(const struct x86_emulate_ctxt *ctxt, + int simd_prefix) +{ + u8 bytes; + + switch (ctxt->b) { + case 0x11: + /* movsd xmm, m64 */ + /* movups xmm, m128 */ + if (simd_prefix == 0xf2) { + bytes = 8; + break; + } + /* fallthrough */ + default: + bytes = 16; + break; + } + return bytes; +} + static void decode_register_operand(struct x86_emulate_ctxt *ctxt, struct operand *op) { @@ -1187,7 +1208,7 @@...
2004 Aug 06
2
[PATCH] Make SSE Run Time option.
...y be relevant; However, even though 3dNow extended is a bastardized version of SSE, it still supports the same instructions, and that is what is important- I don't think we intend to add any AMD specfic code. The real issue is cross CPU SSE support, and whether in addition there is access to XMM registers or not- whether the OS actually supports XMM as well. We have a fair amount of other stuff we do in assembler, much of which requires SSE instruction sets but *not* XMM registers, and some of which is just MMX only. In speex, I can see how you would always want to use the widest reg...
2010 Jun 07
1
[LLVMdev] XMM in X86 Backend
Hi all, I am observing an excessive use of xmm registers in the output assembly produced by x86 backend. Basically, for a code like this double test(double a, double b) { double c; c = 1.0 + sin (a + b*b); return c; } llc produced somthing like.... movsd 16(%ebp), %xmm0 mulsd %xmm0, %xmm0 addsd 8(%ebp), %xmm0...
2013 Oct 25
2
[LLVMdev] Bug #16941
Nadav, The problem appears only for vectors longer than available hardware register (in doubleword elements, i.e. more than 4 on SSE4 and more than 8 on AVX). Select does weird thing. <8 x i1> mask comes as two XMM registers, select converts them to a single XMM registers (i.e. 8 x 16 bit), immediately after it converts back to two XMM registers and does blend. Conversion forth and back has huge overhead. I'm attaching 3 files with vectors of length 4, 8 and 16. Try 4 on SEE4 and you'll see that both...
2009 Mar 04
1
[LLVMdev] Bug in x86-64/Win64 Calling Convention
Hello, I think I've found a bug in the calling convention support for X86-64/ Win64. It doesn't correctly save and restore the XMM registers in the function prolog/epilog. (The problem only exists on Win64, since Linux and Mac OS use calling convention in which these registers are volatile and not callee-saved.) X86RegisterInfo::getCalleeSavedRegs() when called for a Win64 target does return an array of registers whi...
2010 May 07
1
[LLVMdev] Missuse of xmm register on X86-64
All, I've been working on a new scheduler and have somehow affected register selection. My problem is that an xmm register is being used as an index expression. Specifically, addss (%xmm1,%rax,4), %xmm0 I like the idea of a floating-point index, but, like the assembler, I don't know what that means. Any suggestions on where I should look for a solution to my problem? Aran -------------- next part...
2014 Aug 13
1
[PATCH] simpler xmm -> int64 code
This patch simplifies XMM -> int64 conversion in fixed_intrin_sse2.c and fixed_intrin_ssse3.c -------------- next part -------------- A non-text attachment was scrubbed... Name: fixed_sse.zip Type: application/zip Size: 778 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20140813/49f1...
2004 Aug 06
5
SIMD interest
Greetings, <p>my apologies for putting this trash in the mailing list but the topic about SSE run-time option interested me pretty much. Looks like some people is really experienced on the topic. I would really appreciate if somebody could point me to good resources about SSE and Altivec (not necessarly on the net, I'm ready to invest some money if necessary). I already have intel
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
Jean-Marc, There is a big difference between SSE and SSEFP. The SSEFP means that the CPU supports the xmm registers. All Intel chips with SSE support do, however no current 32 bit AMD chips support the XMM registers. They will support the SSE instructions but not those registers. You are right about the SSE2 not being used. The AMD Opterons are the first AMD CPU's which support xmm registers. T...
2012 Jan 09
2
[LLVMdev] Calling conventions for YMM registers on AVX
I'll explain what we see in the code. 1. The caller saves XMM registers across the call if needed (according to DEFS definition). YMMs are not in the set, so caller does not take care. 2. The callee preserves XMMs but works with YMMs and clobbering them. 3. So after the call, the upper part of YMM is gone. - Elena -----Original Message----- From: Bruno Card...
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
Jean-Marc, >I'm still not sure I get it. On an Athlon XP, I can do something like >"mulps xmm0, xmm1", which means that the xmm registers are indeed >supported. Besides, without the xmm registers, you can't use much of >SSE. In the Atholon XP 2400+ that we have in our QA lab (Win2000 ) if you run that code it generates an Illegal Instruction Error. In addition, an AMD Du...
2012 Jan 09
3
[LLVMdev] Calling conventions for YMM registers on AVX
On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote: > > On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > >> I'll explain what we see in the code. >> 1. The caller saves XMM registers across the call if needed (according to DEFS definition). >> YMMs are not in the set, so caller does not take care. > > This is not how the register allocator works. It saves the registers holding values, it doesn't care which alias is clobbered. > > Are you saying...
2012 Jan 09
0
[LLVMdev] Calling conventions for YMM registers on AVX
On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > I'll explain what we see in the code. > 1. The caller saves XMM registers across the call if needed (according to DEFS definition). > YMMs are not in the set, so caller does not take care. This is not how the register allocator works. It saves the registers holding values, it doesn't care which alias is clobbered. Are you saying that only the xmm part...
2013 Sep 20
0
[LLVMdev] Passing a 256 bit integer vector with XMM registers
I am implementing a new calling convention for X86 which requires to pass a 256 bit integer vector with two XMM registers rather than one YMM register. For example define <8 x i32> @add(<8 x i32> %a, <8 x i32> %b) { %add = add <8 x i32> %a, %b ret <8 x i32> %add } With march=X86-64 and mcpu=corei7-avx, llc with the default calling convention generates the following code...