similar to: Fwd: [PATCH] D17497: Support arbitrary address space for intrinsics

Displaying 20 results from an estimated 1000 matches similar to: "Fwd: [PATCH] D17497: Support arbitrary address space for intrinsics"

2016 Mar 04
2
Fwd: [PATCH] D17497: Support arbitrary address space for intrinsics
Per my previous email, I have just signed off on Artur's original patch. Philip On 03/02/2016 11:21 AM, Philip Reames via llvm-dev wrote: > Elena, > > I'd like to propose that we move forward withArtur's original patch > <http://reviews.llvm.org/D17270> and separate the discussion of how we > might change our intrinsic naming scheme. Artur's patch is
2016 Feb 24
0
Fwd: [PATCH] D17497: Support arbitrary address space for intrinsics
My gut feeling is that it’s not worth it. When we move from typed to untyped pointers, we’re going to change the mangling from something like p200i8 to just p200, which is already quite a bit cleaner, and actually looks cleaner to me than the version proposed in this patch. David > On 24 Feb 2016, at 17:28, Philip Reames via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > This
2016 Aug 16
2
enabling interleaved access loop vectorization
Hi Ayal, Elena, I'd really like to enable this by default. As I wrote above, I didn't see any regressions in internal benchmarks, and there doesn't seem to be anything in SPEC2006 either. I do see a performance improvement in an internal benchmark (that is, a real workload). Would you be able to provide an example that gets pessimized? I have no doubt you've seen regressions
2016 Sep 01
2
enabling interleaved access loop vectorization
So turns out it is a full reproducer after all (choosing to vectorize on AVX), good. > The details are in PR29025. Interesting. (So we should carefully insert unconditional branches inside shuffle sequences, eh? ;-) > But if we modify the program by adding "*out++ = 0" right after "*out++ = q;" (thus eliminating the pesky <12 x i8>), we get: Indeed such
2016 Aug 07
2
enabling interleaved access loop vectorization
We checked the gathered data again. All regressions that we see are in 32-bit mode. The 64-bit mode looks good overall. - Elena From: Michael Kuperstein [mailto:mkuper at google.com] Sent: Saturday, August 06, 2016 02:56 To: Renato Golin <renato.golin at linaro.org> Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Matthew Simpson <mssimpso at codeaurora.org>;
2016 Aug 17
2
enabling interleaved access loop vectorization
Thanks Ayal! On Wed, Aug 17, 2016 at 2:14 PM, Zaks, Ayal <ayal.zaks at intel.com> wrote: > Hi Michael, > > > > Don’t quite have a full reproducer for you yet. You’re welcome to try and > see what’s happening in 32 bit mode when enabling interleaving for the > following, based on “https://en.wikipedia.org/wiki/YIQ#From_RGB_to_YIQ”: > > > > void rgb2yik
2016 Aug 09
2
enabling interleaved access loop vectorization
Thanks Ayal! I'll take a look at DENBench. As another data point - I tried enabling this on our internal benchmarks. I'm seeing one regression, and it seems to be a regression of the "good" kind - without interleaving we don't vectorize the innermost loop, and with interleaving we do. The vectorized loop is actually significantly faster when benchmarked in isolation, but in
2014 Oct 24
2
[LLVMdev] Adding masked vector load and store intrinsics
> Why can't we represent the loads as select(mask, load(addr), passthru)? This suggests masked-off lanes are free to speculatively load from memory. Whereas proposed semantics is that: > The addressed memory will not be touched for masked-off lanes. In > particular, if all lanes are masked off no address will be accessed. Ayal. -----Original Message----- From: llvmdev-bounces at
2014 Oct 24
20
[LLVMdev] Adding masked vector load and store intrinsics
Hi, We would like to add support for masked vector loads and stores by introducing new target-independent intrinsics. The loop vectorizer will then be enhanced to optimize loops containing conditional memory accesses by generating these intrinsics for existing targets such as AVX2 and AVX-512. The vectorizer will first ask the target about availability of masked vector loads and stores. The SLP
2016 Sep 25
5
RFC: New intrinsics masked.expandload and masked.compressstore
| |Hi Elena, | |Technically speaking, this seems straightforward. | |I wonder, however, how target-independent this is in a practical |sense; will there be an efficient lowering when targeting any other |ISA? I don't want to get into the territory where, because the |vectorizer is supposed to be architecture independent, we need to |add target-independent intrinsics for all
2014 Oct 27
4
[LLVMdev] Adding masked vector load and store intrinsics
we just follow a common recommendation to start with intrinsics: http://llvm.org/docs/ExtendingLLVM.html - Elena From: Owen Anderson [mailto:resistor at mac.com] Sent: Sunday, October 26, 2014 23:57 To: Demikhovsky, Elena Cc: llvmdev at cs.uiuc.edu; dag at cray.com Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics What is the motivation for using intrinsics
2014 Dec 24
2
[LLVMdev] Indexed Load and Store Intrinsics - proposal
----- Original Message ----- > From: "Ayal Zaks" <ayal.zaks at intel.com> > To: "Philip Reames" <listmail at philipreames.com>, dag at cray.com, "Elena Demikhovsky" <elena.demikhovsky at intel.com> > Cc: "Robert Khasanov" <robert.khasanov at intel.com>, llvmdev at cs.uiuc.edu > Sent: Monday, December 22, 2014 8:05:43 AM
2014 Dec 24
2
[LLVMdev] Indexed Load and Store Intrinsics - proposal
----- Original Message ----- > From: "Xinmin Tian" <xinmin.tian at intel.com> > To: "Hal Finkel" <hfinkel at anl.gov>, "Ayal Zaks" <ayal.zaks at intel.com> > Cc: dag at cray.com, "Robert Khasanov" <robert.khasanov at intel.com>, llvmdev at cs.uiuc.edu > Sent: Tuesday, December 23, 2014 7:36:44 PM > Subject: RE:
2014 Oct 24
3
[LLVMdev] Adding masked vector load and store intrinsics
> For the loads, I'm must less sure. Why can't we represent the loads as select(mask, load(addr), passthru)? It is true, that the load might get separated from the select so that isel might not see it (because isel if basic-block local), but we can add some code in CodeGenPrep to fix that for targets on which it is useful to do so (which is a more-general solution than the intrinsic
2016 Sep 26
2
RFC: New intrinsics masked.expandload and masked.compressstore
| |How would this work in this case? The result would need to affect the |legality and cost of the memory instruction. From your poster, it looks |like we're talking about loops with constructs like this: | |for (i =0; i < N; i++) { | if (topVal > b[i]) { | *dst = a[i]; | dst++; | } |} | |is this loop vectorizable at all without these constructs? Good
2014 Oct 28
2
[LLVMdev] Adding masked vector load and store intrinsics
Many oveloaded intrinsics may be replaced with instructions - fabs or fma or sqrt. Chandler will probably explain the criteria. What the diff between fma and fadd? Or fptrunc and fabs? A new instruction like %a = loadm <4 x i32>* %addr, <4 x i32> %passthru, i32 4, <4 x i1>%mask is possible, but may be not very useful for most of targets. So we start from intrinsics. -
2014 Dec 21
3
[LLVMdev] Indexed Load and Store Intrinsics - proposal
On 12/18/2014 11:56 AM, dag at cray.com wrote: > "Demikhovsky, Elena" <elena.demikhovsky at intel.com> writes: > >> Semantics: >> For i=0,1,…,N-1: if (Mask[i]) {*(BaseAddr + VectorOfIndices[i]*Scale) >> = VectorValue[i];} >> VectorValue: any float or integer vector type. >> BaseAddr: a pointer; may be zero if full address is placed in the
2016 Feb 15
5
Masked intrinsics and non-default address spaces
Masked load/store are overloaded intrinsics, the only generic type is the type of the value being loaded/stored. The signature of the intrinsic is generated based on this type. The type of the pointer argument is generated as a pointer to the return type with default addrspace. E.g.: declare <8 x i32> @llvm.masked.load.v8i32(<8 x i32>*, i32, <8 x i1>, <8 x i32>) The
2016 Sep 19
2
RFC: New intrinsics masked.expandload and masked.compressstore
Hi all, AVX-512 ISA introduces new vector instructions VCOMPRESS and VEXPAND in order to allow vectorization of the following loops with two specific types of cross-iteration dependencies: Compress: for (int i=0; i<N; ++i) If (t[i]) *A++ = expr; Expand: for (i=0; i<N; ++i) If (t[i]) X[i] = *A++; else
2016 Aug 05
3
enabling interleaved access loop vectorization
On 6 August 2016 at 00:18, Michael Kuperstein <mkuper at google.com> wrote: > I agree that we can get *more* improvement with better cost modeling, but > I'd expect to be able to get *some* improvement the way things are right > now. Elena said she saw "some" improvements. :) > That's why I'm curious about where we saw regressions - I'm wondering >