Displaying 20 results from an estimated 7000 matches similar to: "[LLVMdev] [3.6 Release] Bugfixes in Masked Load/Store"
2014 Oct 27
4
[LLVMdev] Adding masked vector load and store intrinsics
we just follow a common recommendation to start with intrinsics:
http://llvm.org/docs/ExtendingLLVM.html
- Elena
From: Owen Anderson [mailto:resistor at mac.com]
Sent: Sunday, October 26, 2014 23:57
To: Demikhovsky, Elena
Cc: llvmdev at cs.uiuc.edu; dag at cray.com
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics
What is the motivation for using intrinsics
2014 Oct 28
2
[LLVMdev] Adding masked vector load and store intrinsics
Many oveloaded intrinsics may be replaced with instructions - fabs or fma or sqrt.
Chandler will probably explain the criteria. What the diff between fma and fadd? Or fptrunc and fabs?
A new instruction like
%a = loadm <4 x i32>* %addr, <4 x i32> %passthru, i32 4, <4 x i1>%mask
is possible, but may be not very useful for most of targets.
So we start from intrinsics.
-
2014 Oct 24
20
[LLVMdev] Adding masked vector load and store intrinsics
Hi,
We would like to add support for masked vector loads and stores by introducing new target-independent intrinsics. The loop vectorizer will then be enhanced to optimize loops containing conditional memory accesses by generating these intrinsics for existing targets such as AVX2 and AVX-512. The vectorizer will first ask the target about availability of masked vector loads and stores. The SLP
2014 Oct 24
2
[LLVMdev] Adding masked vector load and store intrinsics
> Why can't we represent the loads as select(mask, load(addr), passthru)?
This suggests masked-off lanes are free to speculatively load from memory. Whereas proposed semantics is that:
> The addressed memory will not be touched for masked-off lanes. In
> particular, if all lanes are masked off no address will be accessed.
Ayal.
-----Original Message-----
From: llvmdev-bounces at
2016 Feb 25
2
how to force llvm generate gather intrinsic
It seems that http://reviews.llvm.org/D15690 only implemented
gather/scatter for AVX-512, but not for AVX/AVX2. Is there any plan to
enable gather for AVX/2? Thanks.
Best,
Zhi
On Thu, Feb 25, 2016 at 8:28 AM, Sanjay Patel <spatel at rotateright.com>
wrote:
> I don't think gather has been enabled for AVX2 as of r261875.
> Masked load/store were enabled for AVX with:
>
2016 Feb 26
2
how to force llvm generate gather intrinsic
If I'm understanding correctly, you're saying that vgather* is slow on all
of Excavator, Haswell, Broadwell, and Skylake (client). Therefore, we will
not generate it for any of those machines.
Even if that's true, we should not define "gatherIsSlow()" as "hasAVX2() &&
!hasAVX512()". It could break for some hypothetical future processor that
manages to
2014 Oct 24
3
[LLVMdev] Adding masked vector load and store intrinsics
> For the loads, I'm must less sure. Why can't we represent the loads as select(mask, load(addr), passthru)? It is true, that the load might get separated from the select so that isel might not see it (because isel if basic-block local), but we can add some code in CodeGenPrep to fix that for targets on which it is useful to do so (which is a more-general solution than the intrinsic
2016 Feb 25
2
how to force llvm generate gather intrinsic
Yes, masked load/store/gather/scatter are completed.
- Elena
From: zhi chen [mailto:zchenhn at gmail.com]
Sent: Thursday, February 25, 2016 01:20
To: Demikhovsky, Elena <elena.demikhovsky at intel.com>
Cc: Sanjay Patel <spatel at rotateright.com>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] how to
2016 Feb 26
0
how to force llvm generate gather intrinsic
That makes great sense. It would be great if we have profitability mode to
see the necessity to use gathers. Or it also would be good if there is a
compiler option for the users to enable LLVM to generate the gather
instructions no matter it is faster or slow.
Best,
Zhi
On Fri, Feb 26, 2016 at 12:49 PM, Sanjay Patel <spatel at rotateright.com>
wrote:
> If I'm understanding
2016 Feb 26
0
how to force llvm generate gather intrinsic
No. Gather operation is slow on AVX2 processors.
- Elena
From: zhi chen [mailto:zchenhn at gmail.com]
Sent: Thursday, February 25, 2016 20:48
To: Sanjay Patel <spatel at rotateright.com>
Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] how to force
2016 Feb 25
0
how to force llvm generate gather intrinsic
I don't think gather has been enabled for AVX2 as of r261875.
Masked load/store were enabled for AVX with:
http://reviews.llvm.org/D16528 / http://reviews.llvm.org/rL258675
On Wed, Feb 24, 2016 at 11:39 PM, Demikhovsky, Elena <
elena.demikhovsky at intel.com> wrote:
> Yes, masked load/store/gather/scatter are completed.
>
>
>
> - * Elena*
>
>
>
>
2016 Feb 24
0
how to force llvm generate gather intrinsic
Hi Elena,
Are the masked_load and gather working now?
Best,
Zhi
On Sat, Jan 23, 2016 at 12:06 PM, Demikhovsky, Elena <
elena.demikhovsky at intel.com> wrote:
> Ø Can we legalize the same set of masked load/store operations for AVX1
> as AVX2?
>
> Yes, of course.
>
>
>
> - * Elena*
>
>
>
> *From:* Sanjay Patel [mailto:spatel at
2016 Jan 23
3
how to force llvm generate gather intrinsic
Thanks for your response, Sanjay. I know there are intrinsics available in
C/C++. But the problem is that I want to instrument my code at the IR level
and generate those instructions. I don't want to touch the source code.
Best,
Zhi
On Fri, Jan 22, 2016 at 4:54 PM, Sanjay Patel <spatel at rotateright.com>
wrote:
> I was just looking at the related masked load/store operations, and
2016 Jan 23
2
how to force llvm generate gather intrinsic
Ø Can we legalize the same set of masked load/store operations for AVX1 as AVX2?
Yes, of course.
- Elena
From: Sanjay Patel [mailto:spatel at rotateright.com]
Sent: Saturday, January 23, 2016 18:42
To: Nema, Ashutosh <Ashutosh.Nema at amd.com>
Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; zhi chen <zchenhn at gmail.com>; llvm-dev <llvm-dev at
2014 Oct 24
6
[LLVMdev] Adding masked vector load and store intrinsics
> On Oct 24, 2014, at 10:57 AM, Adam Nemet <anemet at apple.com> wrote:
>
> On Oct 24, 2014, at 4:24 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com <mailto:elena.demikhovsky at intel.com>> wrote:
>
>> Hi,
>>
>> We would like to add support for masked vector loads and stores by introducing new target-independent intrinsics. The loop
2015 Mar 15
2
[LLVMdev] Indexed Load and Store Intrinsics - proposal
hi Hao,
I started to upstream and the second patch is stalled under review now.
- Elena
-----Original Message-----
From: Hao Liu [mailto:haoliuts at gmail.com]
Sent: Friday, March 13, 2015 05:56
To: Demikhovsky, Elena
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Indexed Load and Store Intrinsics - proposal
Hi Elena,
I think such intrinsics are very useful.
Do you have any plan to
2016 Jan 23
2
how to force llvm generate gather intrinsic
Thanks Sanjay for highlighting this, few days back I also faced similar problem
while generating masked store in avx1 mode, found its only supported under
avx2 else we scalarize it.
> 1) I did not switch-on masked_load/store to AVX1, I can do this.
Yes Elena, This should be supported for FP type in avx1 mode (for INT type, I doubt X86 has masked_load/store instruction in avx1 mode).
2014 Dec 18
8
[LLVMdev] Indexed Load and Store Intrinsics - proposal
Hi,
Recent Intel architectures AVX-512 and AVX2 provide vector gather and/or scatter instructions.
Gather/scatter instructions allow read/write access to multiple memory addresses. The addresses are specified using a base address and a vector of indices.
We'd like Vectorizers to tap this functionality, and propose to do so by introducing new intrinsics:
VectorValue = @llvm.sindex.load
2016 Apr 12
2
X86 TRUNCATE cost for AVX & AVX2 mode
<Copied Cong>
Thanks Elena.
Mostly I was interested in why such a high cost 30 kept for TRUNCATE v16i32 to v16i8 in SSE41.
Looking at the code it appears like TRUNCATE v16i32 to v16i8 in SSE41 is very expensive
vs SSE2. I feel this number should be same/close to the cost mentioned for same
operation in SSE2ConversionTbl.
Below patch from Cong Hou reduce cost for same operation in SSE2
2014 Dec 15
2
[LLVMdev] Memory alignment model on AVX, AVX2 and AVX-512 targets
AFAIK, there is no additional penalty for AMD processors.
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Chandler Carruth
Sent: Monday, December 15, 2014 3:57 AM
To: Demikhovsky, Elena
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Memory alignment model on AVX, AVX2 and AVX-512 targets
FWIW, this makes sense to me. I'd be interested to hear from