Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] Passing a 256 bit integer vector with XMM registers"
2013 Aug 28
3
[PATCH] x86: AVX instruction emulation fixes
- we used the C4/C5 (first prefix) byte instead of the apparent ModR/M
one as the second prefix byte
- early decoding normalized vex.reg, thus corrupting it for the main
consumer (copy_REX_VEX()), resulting in #UD on the two-operand
instructions we emulate
Also add respective test cases to the testing utility plus
- fix get_fpu() (the fall-through order was inverted)
- add cpu_has_avx2,
2013 Apr 09
1
[LLVMdev] inefficient code generation for 128-bit->256-bit typecast intrinsics
Hello,
LLVM generates two additional instructions for 128->256 bit typecasts
(e.g. _mm256_castsi128_si256()) to clear out the upper 128 bits of YMM register corresponding to source XMM register.
vxorps xmm2,xmm2,xmm2
vinsertf128 ymm0,ymm2,xmm0,0x0
Most of the industry-standard C/C++ compilers (GCC, Intel's compiler, Visual Studio compiler) don't
generate any extra moves
2012 Jan 10
0
[LLVMdev] Calling conventions for YMM registers on AVX
This is the wrong code:
declare <16 x float> @foo(<16 x float>)
define <16 x float> @test(<16 x float> %x, <16 x float> %y) nounwind {
entry:
%x1 = fadd <16 x float> %x, %y
%call = call <16 x float> @foo(<16 x float> %x1) nounwind
%y1 = fsub <16 x float> %call, %y
ret <16 x float> %y1
}
./llc -mattr=+avx
2018 Jun 29
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Illustrative Example:
clang -fveclib=SVML -O3 svml.c -mavx
#include <math.h>
void foo(double *a, int N){
int i;
#pragma clang loop vectorize_width(8)
for (i=0;i<N;i++){
a[i] = sin(i);
}
}
Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.
This is 8-element SVML sin() called with 8-element argument. On the surface,
2016 May 06
3
Unnecessary spill/fill issue
Hi, I am using mcjit in llvm 3.6 to jit kernels to x86 avx2. I've noticed
some inefficient use of the stack around constant vectors. In one example,
I have code that computes a series of constant vectors at compile time.
Each vector has a single use. In the final asm, I see a series of spills at
the top of the function of all the constant vectors immediately to stack,
then each use references
2018 Jun 29
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Ashutosh,
Thanks for the repy.
Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.
https://reviews.llvm.org/D19544
There, I see Hal's review comment "let's start only with the directly-legal calls". Apparently, what we have right now
in the trunk is "not legal enough". I'll work on the patch to stop
2014 Dec 15
2
[LLVMdev] ABI incompatability when passing vector parameters on 32-bit x86
Hi all,
Recently, Reid Kleckner found an ABI incompatibility between clang and GCC in the way vector parameters are passed on 32-bit x86.
(This is documented in PR21510.)
Specifically, GCC uses XMM0-XMM2 to pass the first 3 __m128 parameters, and the rest are passed on the stack. Clang passes an additional parameter by register, using XMM0-XMM3. The same applies to __m256 with YMM0-2 vs. YMM0-3.
2018 Jul 02
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Adding to Ashutosh's comments, We are also interested in making LLVM
generate vector math library calls that are available with glibc (version >
2.22).
reference: https://sourceware.org/glibc/wiki/libmvec
Using the example case given in the reference, we found there are 2 vector
versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx)
version and _ZGVdN4v_sin
2009 Dec 02
5
[LLVMdev] Selecting Vector Shuffle of Different Types
The AVX saga continues.
I am attempting to write a pattern for VEXTRACTF128 but am having some
problems. My attempt looks something like this:
defm EXTRACTF128 : avx_fp_extract_vector_osta_node_mri_256<0x19, MRMDestReg,
MRMDestMem, "extractf128", undef, X86f32, X86i32i8,
// rr
[(set VR128:$dst,
2009 Dec 03
0
[LLVMdev] Selecting Vector Shuffle of Different Types
On Wed, Dec 2, 2009 at 3:46 PM, David Greene <dag at cray.com> wrote:
> The AVX saga continues.
>
> I am attempting to write a pattern for VEXTRACTF128 but am having some
> problems. My attempt looks something like this:
>
> defm EXTRACTF128 : avx_fp_extract_vector_osta_node_mri_256<0x19, MRMDestReg,
> MRMDestMem, "extractf128", undef,
2018 Jul 02
2
[RFC][VECLIB] how should we legalize VECLIB calls?
It may not be a full solution for the problems you're trying to solve, but
I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a
problem in itself. Certainly, it's a mess that could be organized,
especially so we're not repeating everything for each data type as we do
right now.
So yes, I think that would allow us to remove the VecLib mappings because
we are
2013 Sep 05
1
[LLVMdev] AVX calling convention?
I am tracking down an x86-64 code generation problem that has to do with AVX instructions. The symptom is: a function is called, and the upper half of the function argument (which is short16) is zero. This happens only when I compile code with pocl, but not when I use clang and/or llc manually.
I tracked this down to the following. The call site looks like
vmovdqa 24064(%rsp), %ymm0
vmovdqa
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
All,
Attached is a patch that does two things. First it makes the use
of the current SSE code a run time option through the use
of speex_decoder_ctl() and speex_encoder_ctl
It does this twofold. First there is a modification to the configure.in
script which introduces a check based upon platform. It will compile in the
sse assembly if you are on an i?86 based platform by making a
2012 Jan 05
1
[LLVMdev] Execution domain for VEXTRACTF128/VINSERTF128
I think that it should not belong to any domain.
And I see a problem with this table. If you run in AVX mode and call lookup with VEXTRACTF128rr you fail with assertion.
- Elena
From: Craig Topper [mailto:craig.topper at gmail.com]
Sent: Wednesday, January 04, 2012 19:32
To: Demikhovsky, Elena
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Execution domain for VEXTRACTF128/VINSERTF128
What
2013 Jul 19
0
[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
(Changing subject line as diagnosis has changed)
I'm attaching the compiled code that I've been getting, both with
CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring
with CodeGenOpt::None, but that seems to be because ECX isn't being used
- it still gets set to 0x7fffffff by one of the calls to 76719BA1
I notice that X86::SQRTPD[m|r] appear in
2009 Dec 02
2
[LLVMdev] More AVX Advice Needed
I'm working on some of the AVX insert/extract instructions. They're
stupid. They do not operate on ymm registers, meaning we have to
use VINSERTF128/VEXTRACTF128 and then do the real operation.
Anyway, I'm looking at how INSERTPS and friends work and noticed that
there are special SelectionDAG nodes for them and corresponding TableGen
dag operators (X86insrtps, for example).
2013 Aug 22
2
New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
libFLAC have three SSE-accelerated functions FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_N (N = 4, 8, 12). They require lpc_order less than N.
The best compression preset (flac -8) uses lpc_order up to 12; it means that during encoding FLAC also uses unaccelerated C function.
I'm not very familiar with asm so I took FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_12, changed it and
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You,
It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 =
[8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are
indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from
these locations. and zmm2 contains constant 4000. so,
vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000,
as for array b the stride is 4000.
zmm14=
2009 Dec 02
2
[LLVMdev] More AVX Advice Needed
On Wednesday 02 December 2009 16:51, Eli Friedman wrote:
> On Wed, Dec 2, 2009 at 2:44 PM, David Greene <dag at cray.com> wrote:
> > I'm working on some of the AVX insert/extract instructions. They're
> > stupid. They do not operate on ymm registers, meaning we have to
> > use VINSERTF128/VEXTRACTF128 and then do the real operation.
> >
> > Anyway,
2015 Jan 29
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
On Wed, Jan 28, 2015 at 4:47 PM, Chandler Carruth <chandlerc at gmail.com>
wrote:
>
> On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com>
> wrote:
>
>> Hi Chandler,
>>
>> I've been looking at the regressions Quentin mentioned, and filed a PR
>> for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377