similar to: Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others

Displaying 20 results from an estimated 3000 matches similar to: "Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others"

2013 Jul 19
0
[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
(Changing subject line as diagnosis has changed) I'm attaching the compiled code that I've been getting, both with CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with CodeGenOpt::None, but that seems to be because ECX isn't being used - it still gets set to 0x7fffffff by one of the calls to 76719BA1 I notice that X86::SQRTPD[m|r] appear in
2008 Jul 12
0
[LLVMdev] Shuffle regression
I have fixed a related bug: 52740. Can you check if that fixes this problem? Evan On Jul 11, 2008, at 6:43 PM, Nicolas Capens wrote: > Hi all, > > I think I found a regression in the shuffle instruction. I’ve > attached a replacement of fibonacci.cpp to reproduce the issue. It > runs fine on release 2.3 but revision 52648 fails, and I suspect > that the issue is still
2008 Jul 12
2
[LLVMdev] Shuffle regression
Hi all, I think I found a regression in the shuffle instruction. I've attached a replacement of fibonacci.cpp to reproduce the issue. It runs fine on release 2.3 but revision 52648 fails, and I suspect that the issue is still present. 2.3 generates the following x86 code: 03A10010 push ebp 03A10011 mov ebp,esp 03A10013 and esp,0FFFFFFF0h 03A10019
2013 Aug 22
2
New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
libFLAC have three SSE-accelerated functions FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_N (N = 4, 8, 12). They require lpc_order less than N. The best compression preset (flac -8) uses lpc_order up to 12; it means that during encoding FLAC also uses unaccelerated C function. I'm not very familiar with asm so I took FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_12, changed it and
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
All, Attached is a patch that does two things. First it makes the use of the current SSE code a run time option through the use of speex_decoder_ctl() and speex_encoder_ctl It does this twofold. First there is a modification to the configure.in script which introduces a check based upon platform. It will compile in the sse assembly if you are on an i?86 based platform by making a
2008 Jul 10
3
[LLVMdev] InstructionCombining forgets alignment of globals
Hi all, The InstructionCombining pass causes alignment of globals to be ignored. I've attached a replacement of Fibonacci.cpp which reproduces this (I used 2.3 release). Here's the x86 code it produces: 03C20019 movaps xmm0,xmmword ptr ds:[164E799h] 03C20020 mulps xmm0,xmmword ptr ds:[164E79Ah] 03C20027 movaps xmmword ptr ds:[164E799h],xmm0 03C2002E
2008 Jul 10
0
[LLVMdev] InstructionCombining forgets alignment of globals
I think I found it. In InstCombiner::ComputeMaskedBits we have the following lines: if (GlobalValue *GV = dyn_cast<GlobalValue>(V)) { unsigned Align = GV->getAlignment(); if (Align == 0 && TD && GV->getType()->getElementType()->isSized()) Align = TD->getPrefTypeAlignment(GV->getType()->getElementType()); It assumes that global
2007 Jul 20
0
[LLVMdev] Seg faulting on vector ops
Hi Chuck! On Jul 20, 2007, at 11:36 AM, Chuck Rose III wrote: > Hola LLVMers, > > > > I’m looking to make use of the vectorization primitives in the > Intel chip with the code we generate from LLVM and so I’ve started > experimenting with it. What is the state of the machine code > generated for vectors? In my tinkering, I seem to be getting some > wonky
2007 Jul 26
0
[LLVMdev] Seg faulting on vector ops
I am fairly certain this is right. Chuck, can you do a quick experiment for me? Go back to your original code but make sure the alloca instruction specify 16-byte alignment. The code should work. If not, please file a bug. Thanks, Evan On Jul 24, 2007, at 1:58 PM, Evan Cheng wrote: > Hrm. This problem shouldn't be target specific. I am pretty sure > prologue / epilogue inserter
2007 Jul 21
0
[LLVMdev] Seg faulting on vector ops
On Fri, 20 Jul 2007, Chuck Rose III wrote: > I'm looking to make use of the vectorization primitives in the Intel > chip with the code we generate from LLVM and so I've started > experimenting with it. What is the state of the machine code generated > for vectors? In my tinkering, I seem to be getting some wonky machine > instructions, but I'm most likely just doing
2007 Oct 18
3
[LLVMdev] movaps being generated despite alignment 1 being specified
Hello LLVMers, High order bit: Presence of a called function is causing a store on an unrelated vector to generate an aligned store rather an unaligned one despite unaligned store being indicated in the associated StoreInst. Details: I pulled down the latest source, so this is something I'm finding with the current LLVM. I'm hoping you'll have an idea what's
2007 Jul 24
2
[LLVMdev] Seg faulting on vector ops
Hrm. This problem shouldn't be target specific. I am pretty sure prologue / epilogue inserter aligns stack correctly if there are stack objects with greater than default stack alignment requirement. Seems to be the initial alloca() instruction should specify 16 byte alignment? Evan On Jul 21, 2007, at 2:51 PM, Chris Lattner wrote: > On Fri, 20 Jul 2007, Chuck Rose III wrote:
2007 Oct 19
0
[LLVMdev] movaps being generated despite alignment 1 being specified
On Oct 18, 2007, at 1:52 PM, Chuck Rose III wrote: > > Here are the instructions for evaluateDependents. The JITter > hasn’t compiled foo yet. What’s confusing to me is why did my > movups suddenly become a movaps? All the stores and loads have > align 1 on them. Hi Chuck, I believe this is a bug but am unable to reproduce it with the test case you've provided. I
2004 Aug 06
0
Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others
> 1. Compile Error with regular mode (FIXED_POINT undefined) at lsp.c line 104 > static inline spx_word16_t spx_cos(spx_word16_t x) . VS6 does not like > the inline keyword here. Removing it allows compiling. > > same with cb_search_sse.h line 34. It seems like your compiler simply doesn't like "inline". I suggest doing a -Dinline= which is what autoconf
2016 Apr 01
2
RFC: A proposal for vectorizing loops with calls to math functions using SVML
RFC: A proposal for vectorizing loops with calls to math functions using SVML (short vector math library). ========= Overview ========= Very simply, SVML (Intel short vector math library) functions are vector variants of scalar math functions that take vector arguments, apply an operation to each element, and store the result in a vector register. These vector variants can be generated by the
2007 Jul 20
5
[LLVMdev] Seg faulting on vector ops
Hola LLVMers, I'm looking to make use of the vectorization primitives in the Intel chip with the code we generate from LLVM and so I've started experimenting with it. What is the state of the machine code generated for vectors? In my tinkering, I seem to be getting some wonky machine instructions, but I'm most likely just doing something wrong and I'm hoping you can set me in
2016 Apr 04
2
RFC: A proposal for vectorizing loops with calls to math functions using SVML
Hi Sanjay, For sincos calls, I’m currently just going through isTriviallyVectorizable(), which was good enough to get things working so that I could test the translation. I don’t see why this cannot be changed to use addVectorizableFunctionsFromVecLib(). The other functions that I’m working with are already vectorized using the loop pragma. Those include sin, cos, exp, log, and pow. From: Sanjay
2010 May 11
0
[LLVMdev] How does SSEDomainFix work?
On May 10, 2010, at 9:07 PM, NAKAMURA Takumi wrote: > Hello. This is my 1st post. ようこそ! > I have tried SSE execution domain fixup pass. > But I am not able to see any improvements. Did you actually measure runtime, or did you look at assembly? > I expect for the example below to use MOVDQA, PAND &c. > (On nehalem, ANDPS is extremely slower than PAND) Are you sure? The
2020 Aug 30
5
BUG: complete misunterstanding of the MS-ABI
Objects compiled for the MS-ABI don't conform to it! Data types beyond 64 bit MUST BE returned by the callee via the hidden first argument allocated by the caller, NOT in XMM0! Demo/proof: from this source --- llvm-bug.c --- #ifndef __clang__ typedef struct { unsigned __int64 low; unsigned __int64 high; } __uint128_t; #else __attribute__((ms_abi)) #endif __uint128_t
2012 Jul 06
0
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Sat, Jul 7, 2012 at 12:25 AM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: > On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote: >> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: >>> [...] >>> movaps 32(%rdi), %xmm3 >>> movaps 48(%rdi), %xmm2 >>>