thr3ads.net - similar to: "[LLVMdev] Passing a 256 bit integer vector with XMM registers"

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] Passing a 256 bit integer vector with XMM registers"

[PATCH] x86: AVX instruction emulation fixes

2013 Aug 28

[PATCH] x86: AVX instruction emulation fixes

- we used the C4/C5 (first prefix) byte instead of the apparent ModR/M one as the second prefix byte - early decoding normalized vex.reg, thus corrupting it for the main consumer (copy_REX_VEX()), resulting in #UD on the two-operand instructions we emulate Also add respective test cases to the testing utility plus - fix get_fpu() (the fall-through order was inverted) - add cpu_has_avx2,

[LLVMdev] inefficient code generation for 128-bit->256-bit typecast intrinsics

2013 Apr 09

[LLVMdev] inefficient code generation for 128-bit->256-bit typecast intrinsics

Hello, LLVM generates two additional instructions for 128->256 bit typecasts (e.g. _mm256_castsi128_si256()) to clear out the upper 128 bits of YMM register corresponding to source XMM register. vxorps xmm2,xmm2,xmm2 vinsertf128 ymm0,ymm2,xmm0,0x0 Most of the industry-standard C/C++ compilers (GCC, Intel's compiler, Visual Studio compiler) don't generate any extra moves

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 10

[LLVMdev] Calling conventions for YMM registers on AVX

This is the wrong code: declare <16 x float> @foo(<16 x float>) define <16 x float> @test(<16 x float> %x, <16 x float> %y) nounwind { entry: %x1 = fadd <16 x float> %x, %y %call = call <16 x float> @foo(<16 x float> %x1) nounwind %y1 = fsub <16 x float> %call, %y ret <16 x float> %y1 } ./llc -mattr=+avx

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jun 29

[RFC][VECLIB] how should we legalize VECLIB calls?

Illustrative Example: clang -fveclib=SVML -O3 svml.c -mavx #include <math.h> void foo(double *a, int N){ int i; #pragma clang loop vectorize_width(8) for (i=0;i<N;i++){ a[i] = sin(i); } } Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer. This is 8-element SVML sin() called with 8-element argument. On the surface,

Unnecessary spill/fill issue

2016 May 06

Unnecessary spill/fill issue

Hi, I am using mcjit in llvm 3.6 to jit kernels to x86 avx2. I've noticed some inefficient use of the stack around constant vectors. In one example, I have code that computes a series of constant vectors at compile time. Each vector has a single use. In the final asm, I see a series of spills at the top of the function of all the constant vectors immediately to stack, then each use references

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jun 29

[RFC][VECLIB] how should we legalize VECLIB calls?

Ashutosh, Thanks for the repy. Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there. https://reviews.llvm.org/D19544 There, I see Hal's review comment "let's start only with the directly-legal calls". Apparently, what we have right now in the trunk is "not legal enough". I'll work on the patch to stop

[LLVMdev] ABI incompatability when passing vector parameters on 32-bit x86

2014 Dec 15

[LLVMdev] ABI incompatability when passing vector parameters on 32-bit x86

Hi all, Recently, Reid Kleckner found an ABI incompatibility between clang and GCC in the way vector parameters are passed on 32-bit x86. (This is documented in PR21510.) Specifically, GCC uses XMM0-XMM2 to pass the first 3 __m128 parameters, and the rest are passed on the stack. Clang passes an additional parameter by register, using XMM0-XMM3. The same applies to __m256 with YMM0-2 vs. YMM0-3.

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jul 02

[RFC][VECLIB] how should we legalize VECLIB calls?

Adding to Ashutosh's comments, We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22). reference: https://sourceware.org/glibc/wiki/libmvec Using the example case given in the reference, we found there are 2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin

[LLVMdev] Selecting Vector Shuffle of Different Types

2009 Dec 02

[LLVMdev] Selecting Vector Shuffle of Different Types

The AVX saga continues. I am attempting to write a pattern for VEXTRACTF128 but am having some problems. My attempt looks something like this: defm EXTRACTF128 : avx_fp_extract_vector_osta_node_mri_256<0x19, MRMDestReg, MRMDestMem, "extractf128", undef, X86f32, X86i32i8, // rr [(set VR128:$dst,

[LLVMdev] Selecting Vector Shuffle of Different Types

2009 Dec 03

[LLVMdev] Selecting Vector Shuffle of Different Types

On Wed, Dec 2, 2009 at 3:46 PM, David Greene <dag at cray.com> wrote: > The AVX saga continues. > > I am attempting to write a pattern for VEXTRACTF128 but am having some > problems. My attempt looks something like this: > > defm EXTRACTF128 : avx_fp_extract_vector_osta_node_mri_256<0x19, MRMDestReg, > MRMDestMem, "extractf128", undef,

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jul 02

[RFC][VECLIB] how should we legalize VECLIB calls?

It may not be a full solution for the problems you're trying to solve, but I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a problem in itself. Certainly, it's a mess that could be organized, especially so we're not repeating everything for each data type as we do right now. So yes, I think that would allow us to remove the VecLib mappings because we are

[LLVMdev] AVX calling convention?

2013 Sep 05

[LLVMdev] AVX calling convention?

I am tracking down an x86-64 code generation problem that has to do with AVX instructions. The symptom is: a function is called, and the upper half of the function argument (which is short16) is zero. This happens only when I compile code with pocl, but not when I use clang and/or llc manually. I tracked this down to the following. The call site looks like vmovdqa 24064(%rsp), %ymm0 vmovdqa

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

[PATCH] Make SSE Run Time option. Add Win32 SSE code

All, Attached is a patch that does two things. First it makes the use of the current SSE code a run time option through the use of speex_decoder_ctl() and speex_encoder_ctl It does this twofold. First there is a modification to the configure.in script which introduces a check based upon platform. It will compile in the sse assembly if you are on an i?86 based platform by making a

[LLVMdev] Execution domain for VEXTRACTF128/VINSERTF128

2012 Jan 05

[LLVMdev] Execution domain for VEXTRACTF128/VINSERTF128

I think that it should not belong to any domain. And I see a problem with this table. If you run in AVX mode and call lookup with VEXTRACTF128rr you fail with assertion. - Elena From: Craig Topper [mailto:craig.topper at gmail.com] Sent: Wednesday, January 04, 2012 19:32 To: Demikhovsky, Elena Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Execution domain for VEXTRACTF128/VINSERTF128 What

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

2013 Jul 19

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

(Changing subject line as diagnosis has changed) I'm attaching the compiled code that I've been getting, both with CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with CodeGenOpt::None, but that seems to be because ECX isn't being used - it still gets set to 0x7fffffff by one of the calls to 76719BA1 I notice that X86::SQRTPD[m|r] appear in

[LLVMdev] More AVX Advice Needed

2009 Dec 02

[LLVMdev] More AVX Advice Needed

I'm working on some of the AVX insert/extract instructions. They're stupid. They do not operate on ymm registers, meaning we have to use VINSERTF128/VEXTRACTF128 and then do the real operation. Anyway, I'm looking at how INSERTPS and friends work and noticed that there are special SelectionDAG nodes for them and corresponding TableGen dag operators (X86insrtps, for example).

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

2013 Aug 22

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

libFLAC have three SSE-accelerated functions FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_N (N = 4, 8, 12). They require lpc_order less than N. The best compression preset (flac -8) uses lpc_order up to 12; it means that during encoding FLAC also uses unaccelerated C function. I'm not very familiar with asm so I took FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_12, changed it and

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

KNL Assembly Code for Matrix Multiplication

Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=

[LLVMdev] More AVX Advice Needed

2009 Dec 02

[LLVMdev] More AVX Advice Needed

On Wednesday 02 December 2009 16:51, Eli Friedman wrote: > On Wed, Dec 2, 2009 at 2:44 PM, David Greene <dag at cray.com> wrote: > > I'm working on some of the AVX insert/extract instructions. They're > > stupid. They do not operate on ymm registers, meaning we have to > > use VINSERTF128/VEXTRACTF128 and then do the real operation. > > > > Anyway,

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 29

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

On Wed, Jan 28, 2015 at 4:47 PM, Chandler Carruth <chandlerc at gmail.com> wrote: > > On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> > wrote: > >> Hi Chandler, >> >> I've been looking at the regressions Quentin mentioned, and filed a PR >> for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377

similar to: [LLVMdev] Passing a 256 bit integer vector with XMM registers