Displaying 20 results from an estimated 5000 matches similar to: "[LLVMdev] Memory alignment model on AVX, AVX2 and AVX-512 targets"
2014 Dec 15
2
[LLVMdev] Memory alignment model on AVX, AVX2 and AVX-512 targets
AFAIK, there is no additional penalty for AMD processors.
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Chandler Carruth
Sent: Monday, December 15, 2014 3:57 AM
To: Demikhovsky, Elena
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Memory alignment model on AVX, AVX2 and AVX-512 targets
FWIW, this makes sense to me. I'd be interested to hear from
2016 Nov 24
3
RFC: code size reduction in X86 by replacing EVEX with VEX encoding
> I would like a command line option to disable this optimization. That way tests can still verify that EVEX instructions came out of isel by using -show-mc-encoding.
I think that keeping tests compatibility is not a reason for an additional “llc” flag. We check encoding in test/MC/X86 dir.
Is there any option to report-out from llc in non-debug mode? It should be an option to control
2016 Apr 11
2
X86 TRUNCATE cost for AVX & AVX2 mode
Hi,
I was going through the X86TTIImpl::getCastInstrCost, and got a doubt on cost
calculation for TRUNCATE instruction in AVX mode.
In AVX2ConversionTbl & AVXConversionTbl table there is no cost defined for
TRUNCATE v16i32 to v16i8, as a fallback it goes to SSE41ConversionTbl table and there
it finds cost as 30 for this operation. 30 cost for this operation looks very high.
Wondering why
2016 Apr 12
2
X86 TRUNCATE cost for AVX & AVX2 mode
<Copied Cong>
Thanks Elena.
Mostly I was interested in why such a high cost 30 kept for TRUNCATE v16i32 to v16i8 in SSE41.
Looking at the code it appears like TRUNCATE v16i32 to v16i8 in SSE41 is very expensive
vs SSE2. I feel this number should be same/close to the cost mentioned for same
operation in SSE2ConversionTbl.
Below patch from Cong Hou reduce cost for same operation in SSE2
2013 May 20
2
[LLVMdev] VCOMISS instruction in X86
Hi,
I'm looking at scalar and packed instructions in X86.
The instruction VCOMISS is scalar. May I remove SSEPackedSingle/SSEPackedDouble domain from it?
defm VUCOMISS : sse12_ord_cmp<0x2E, FR32, X86cmp, f32, f32mem, loadf32,
"ucomiss", SSEPackedSingle>, TB, VEX, VEX_LIG;
defm VUCOMISD : sse12_ord_cmp<0x2E, FR64, X86cmp, f64,
2016 Nov 23
2
RFC: code size reduction in X86 by replacing EVEX with VEX encoding
I would like a command line option to disable this optimization. That way
tests can still verify that EVEX instructions came out of isel by using
-show-mc-encoding.
On Wed, Nov 23, 2016 at 5:01 AM Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> ------------------------------
>
> *From: *"Gadi via llvm-dev Haber" <llvm-dev at lists.llvm.org>
>
2009 Apr 30
2
[LLVMdev] RFC: AVX Feature Specification
I've been working on adding AVX to LLVM and have run across a number of
questions. Here's the first one.
In some ways AVX is "just another" SSE level. Having AVX implies you have
SSE1-SSE4.2. However AVX is very different from SSE and there are a number
of sub-features which may or may not be available on various implementations.
So right now I've done this:
def
2012 Mar 01
3
[LLVMdev] Stack alignment on X86 AVX seems incorrect
Even if you explicitly specify –stack-alignment=16 the aligned movs are still generated.
It is not an issue related to ABI.
See my original mail:
./llc -mattr=+avx -stack-alignment=16 < basic.ll | grep movaps | grep ymm | grep rbp
vmovaps -176(%rbp), %ymm14
vmovaps -144(%rbp), %ymm11
vmovaps -240(%rbp), %ymm13
- Elena
From: Cameron McInally
2012 Mar 01
2
[LLVMdev] Stack alignment on X86 AVX seems incorrect
On Thu, Mar 01, 2012 at 06:16:46PM +0000, Demikhovsky, Elena wrote:
> vmovaps should not access stack if it is not aligned to 32
I'm not completely sure I understand your problem. Are you saying that
the generated code assumes 256bit alignment, your default stack
alignment is 128bit and LLVM doesn't adjust it automatically?
Joerg
2012 Mar 01
3
[LLVMdev] Stack alignment on X86 AVX seems incorrect
Hi Elena,
You're correct. LLVM does not align the stack to 32-bytes for AVX and
unaligned moves should be used for YMM spills.
I wrote some code to align the stack to 32-bytes when AVX spills are
present; it does break the x86-64 ABI though. If upstream would be
interested in this code, I can arrange with my employer to send a patch to
the mailing list.
-Cameron
On Mar 1, 2012, at 4:09 PM,
2012 Mar 02
0
[LLVMdev] Stack alignment on X86 AVX seems incorrect
Hi Elena,
On Thu, Mar 1, 2012 at 8:28 PM, Demikhovsky, Elena
<elena.demikhovsky at intel.com> wrote:
> Even if you explicitly specify –stack-alignment=16 the aligned movs are
> still generated.
>
> It is not an issue related to ABI.
This looks like PR10841, explanation and the way to solve it:
http://llvm.org/bugs/show_bug.cgi?id=10841
Cheers,
--
Bruno Cardoso Lopes
2012 Mar 01
0
[LLVMdev] Stack alignment on X86 AVX seems incorrect
When stack is unaligned, LLVM should generate vmovups instead of vmovaps.
- Elena
-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Joerg Sonnenberger
Sent: Thursday, March 01, 2012 20:31
To: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Stack alignment on X86 AVX seems incorrect
On Thu, Mar 01, 2012 at 06:16:46PM +0000,
2012 Jan 09
3
[LLVMdev] Calling conventions for YMM registers on AVX
On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote:
>
> On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote:
>
>> I'll explain what we see in the code.
>> 1. The caller saves XMM registers across the call if needed (according to DEFS definition).
>> YMMs are not in the set, so caller does not take care.
>
> This is not how the register allocator
2012 Jan 09
2
[LLVMdev] Calling conventions for YMM registers on AVX
I'll explain what we see in the code.
1. The caller saves XMM registers across the call if needed (according to DEFS definition).
YMMs are not in the set, so caller does not take care.
2. The callee preserves XMMs but works with YMMs and clobbering them.
3. So after the call, the upper part of YMM is gone.
- Elena
-----Original Message-----
From: Bruno Cardoso Lopes [mailto:bruno.cardoso at
2016 Feb 25
2
how to force llvm generate gather intrinsic
It seems that http://reviews.llvm.org/D15690 only implemented
gather/scatter for AVX-512, but not for AVX/AVX2. Is there any plan to
enable gather for AVX/2? Thanks.
Best,
Zhi
On Thu, Feb 25, 2016 at 8:28 AM, Sanjay Patel <spatel at rotateright.com>
wrote:
> I don't think gather has been enabled for AVX2 as of r261875.
> Masked load/store were enabled for AVX with:
>
2016 Feb 26
2
how to force llvm generate gather intrinsic
If I'm understanding correctly, you're saying that vgather* is slow on all
of Excavator, Haswell, Broadwell, and Skylake (client). Therefore, we will
not generate it for any of those machines.
Even if that's true, we should not define "gatherIsSlow()" as "hasAVX2() &&
!hasAVX512()". It could break for some hypothetical future processor that
manages to
2017 Jun 25
2
AVX Scheduling and Parallelism
Hi Ahmed,
>From what can be seen in the code snippet you provided, the reuse of XMM0 and XMM1 across loop-unroll instances does not inhibit instruction-level parallelism.
Modern X86 processors use register renaming that can eliminate the dependencies in the instruction stream. In the example you provided, the processor should be able to identify the 2-vloads + vadd + vstore sequences as
2017 Jun 25
0
AVX Scheduling and Parallelism
Hi, Zvi,
I agree. In the context of targeting the KNL, however, I'm a bit
concerned about the addressing, and specifically, the size of the
resulting encoding:
> vmovdqu32 zmm0, zmmword ptr [rax + c+401280] ;load b[401280] in
> zmm0
>
> vpaddd zmm1, zmm1, zmmword ptr [rax + b+401344]
> ; zmm1<-zmm1+b[401344]
The KNL can only
2016 Feb 26
0
how to force llvm generate gather intrinsic
That makes great sense. It would be great if we have profitability mode to
see the necessity to use gathers. Or it also would be good if there is a
compiler option for the users to enable LLVM to generate the gather
instructions no matter it is faster or slow.
Best,
Zhi
On Fri, Feb 26, 2016 at 12:49 PM, Sanjay Patel <spatel at rotateright.com>
wrote:
> If I'm understanding
2016 Feb 25
2
how to force llvm generate gather intrinsic
Yes, masked load/store/gather/scatter are completed.
- Elena
From: zhi chen [mailto:zchenhn at gmail.com]
Sent: Thursday, February 25, 2016 01:20
To: Demikhovsky, Elena <elena.demikhovsky at intel.com>
Cc: Sanjay Patel <spatel at rotateright.com>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] how to