search for: avx1

Displaying 20 results from an estimated 25 matches for "avx1".

Did you mean: avx
2016 Jan 23
2
how to force llvm generate gather intrinsic
Ø Can we legalize the same set of masked load/store operations for AVX1 as AVX2? Yes, of course. - Elena From: Sanjay Patel [mailto:spatel at rotateright.com] Sent: Saturday, January 23, 2016 18:42 To: Nema, Ashutosh <Ashutosh.Nema at amd.com> Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; zhi chen <zchenhn at gmail.com>; llvm-de...
2016 Jan 23
2
how to force llvm generate gather intrinsic
Thanks Sanjay for highlighting this, few days back I also faced similar problem while generating masked store in avx1 mode, found its only supported under avx2 else we scalarize it. > 1) I did not switch-on masked_load/store to AVX1, I can do this. Yes Elena, This should be supported for FP type in avx1 mode (for INT type, I doubt X86 has masked_load/store instruction in avx1 mode). Thanks, Ashutosh From...
2016 Feb 25
2
how to force llvm generate gather intrinsic
...ather intrinsic Hi Elena, Are the masked_load and gather working now? Best, Zhi On Sat, Jan 23, 2016 at 12:06 PM, Demikhovsky, Elena <elena.demikhovsky at intel.com<mailto:elena.demikhovsky at intel.com>> wrote: > Can we legalize the same set of masked load/store operations for AVX1 as AVX2? Yes, of course. - Elena From: Sanjay Patel [mailto:spatel at rotateright.com<mailto:spatel at rotateright.com>] Sent: Saturday, January 23, 2016 18:42 To: Nema, Ashutosh <Ashutosh.Nema at amd.com<mailto:Ashutosh.Nema at amd.com>> Cc: Demikhovsky, Elena <ele...
2016 Feb 24
0
how to force llvm generate gather intrinsic
Hi Elena, Are the masked_load and gather working now? Best, Zhi On Sat, Jan 23, 2016 at 12:06 PM, Demikhovsky, Elena < elena.demikhovsky at intel.com> wrote: > Ø Can we legalize the same set of masked load/store operations for AVX1 > as AVX2? > > Yes, of course. > > > > - * Elena* > > > > *From:* Sanjay Patel [mailto:spatel at rotateright.com] > *Sent:* Saturday, January 23, 2016 18:42 > *To:* Nema, Ashutosh <Ashutosh.Nema at amd.com> > *Cc:* Demikhovsky, Elena <elen...
2016 Feb 25
0
how to force llvm generate gather intrinsic
...t; > > Are the masked_load and gather working now? > > > > Best, > > Zhi > > > > On Sat, Jan 23, 2016 at 12:06 PM, Demikhovsky, Elena < > elena.demikhovsky at intel.com> wrote: > > Ø Can we legalize the same set of masked load/store operations for AVX1 > as AVX2? > > Yes, of course. > > > > - * Elena* > > > > *From:* Sanjay Patel [mailto:spatel at rotateright.com] > *Sent:* Saturday, January 23, 2016 18:42 > *To:* Nema, Ashutosh <Ashutosh.Nema at amd.com> > *Cc:* Demikhovsky, Elena <elen...
2016 Feb 25
2
how to force llvm generate gather intrinsic
...now? >> >> >> >> Best, >> >> Zhi >> >> >> >> On Sat, Jan 23, 2016 at 12:06 PM, Demikhovsky, Elena < >> elena.demikhovsky at intel.com> wrote: >> >> Ø Can we legalize the same set of masked load/store operations for AVX1 >> as AVX2? >> >> Yes, of course. >> >> >> >> - * Elena* >> >> >> >> *From:* Sanjay Patel [mailto:spatel at rotateright.com] >> *Sent:* Saturday, January 23, 2016 18:42 >> *To:* Nema, Ashutosh <Ashutosh.Nema a...
2015 May 04
3
[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo
Thanks Nadav for the info. It clears my query :) Yes its an integer ADD, and since AVX2 supports 256 bits integer arithmetic, so its cost is less than AVX1. One query though - shouldn't then the cost of integer ADD/SUB/MUL (which would be 1) be explicitly specified in AVX2 cost table? Because right now this entry is missing and cost of these operations are taken from BaseTTI (which is generic). IMO, it will make things more clear. Your thoughts...
2016 Feb 26
2
how to force llvm generate gather intrinsic
...t; > > Are the masked_load and gather working now? > > > > Best, > > Zhi > > > > On Sat, Jan 23, 2016 at 12:06 PM, Demikhovsky, Elena < > elena.demikhovsky at intel.com> wrote: > > Ø Can we legalize the same set of masked load/store operations for AVX1 > as AVX2? > > Yes, of course. > > > > - * Elena* > > > > *From:* Sanjay Patel [mailto:spatel at rotateright.com] > *Sent:* Saturday, January 23, 2016 18:42 > *To:* Nema, Ashutosh <Ashutosh.Nema at amd.com> > *Cc:* Demikhovsky, Elena <elen...
2016 Feb 26
0
how to force llvm generate gather intrinsic
...ather intrinsic Hi Elena, Are the masked_load and gather working now? Best, Zhi On Sat, Jan 23, 2016 at 12:06 PM, Demikhovsky, Elena <elena.demikhovsky at intel.com<mailto:elena.demikhovsky at intel.com>> wrote: > Can we legalize the same set of masked load/store operations for AVX1 as AVX2? Yes, of course. - Elena From: Sanjay Patel [mailto:spatel at rotateright.com<mailto:spatel at rotateright.com>] Sent: Saturday, January 23, 2016 18:42 To: Nema, Ashutosh <Ashutosh.Nema at amd.com<mailto:Ashutosh.Nema at amd.com>> Cc: Demikhovsky, Elena <ele...
2016 Feb 26
0
how to force llvm generate gather intrinsic
...now? >> >> >> >> Best, >> >> Zhi >> >> >> >> On Sat, Jan 23, 2016 at 12:06 PM, Demikhovsky, Elena < >> elena.demikhovsky at intel.com> wrote: >> >> Ø Can we legalize the same set of masked load/store operations for AVX1 >> as AVX2? >> >> Yes, of course. >> >> >> >> - * Elena* >> >> >> >> *From:* Sanjay Patel [mailto:spatel at rotateright.com] >> *Sent:* Saturday, January 23, 2016 18:42 >> *To:* Nema, Ashutosh <Ashutosh.Nema a...
2016 Jan 23
3
how to force llvm generate gather intrinsic
...st, Zhi On Fri, Jan 22, 2016 at 4:54 PM, Sanjay Patel <spatel at rotateright.com> wrote: > I was just looking at the related masked load/store operations, and I > think there are at least 2 bugs: > > 1. X86TTIImpl::isLegalMaskedLoad/Store() should be legal for FP types with > AVX1 (not just AVX2). > 2. X86TTIImpl::isLegalMaskedGather/Scatter() should be legal for 128/256 > bit vectors with AVX2 (not just AVX512). > > I looked at this for the first time today, so I may be missing something... > > So for the moment, the answer to your question is 'no'...
2015 May 04
2
[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo
Hi all, I have a query regarding Cost Table for AVX2 in TargetTransformInfo. The table consist of entries for shift and div operations only. There are no entries for ADD, SUB and MUL for AVX2 cost table. Those entries are present in Cost Table for AVX. The reason for query is - when my sub target feature is AVX2, in SLP Vectorization, while calculating scalar cost of ADD, it doesn't see
2010 Jun 29
1
Performance enhancement for ave
library(plyr) n<-100000 grp1<-sample(1:750, n, replace=T) grp2<-sample(1:750, n, replace=T) d<-data.frame(x=rnorm(n), y=rnorm(n), grp1=grp1, grp2=grp2) system.time({ d$avx1 <- ave(d$x, list(d$grp1, d$grp2)) d$avy1 <- ave(d$y, list(d$grp1, d$grp2)) }) # user system elapsed # 39.300 0.279 40.809 system.time({ d$avx2 <- ave(d$x, interaction(d$grp1, d$grp2, drop = T)) d$avy2 <- ave(d$y, interaction(d$grp1, d$grp2, drop = T)) }) # user system elap...
2017 May 08
2
LLVM and Xeon Skylake v5
...getHostCPUName in LLVM 3.5 doesn't recognize Kabylake or Skylake. The Cannot select: means that an intrinsic was used but no pattern could be found in lib/Target/X86/X86GenDAGISel.inc that applies to the enabled feature set. We have separate patterns for that intrinsic for at least SSE4.1 and AVX1 in 3.5. So that implies that the EngineBuilder thinks your CPU doesn't support SSE4.1 or AVX1 either. But I'm not sure why you would be getting different behavior on Kabylake. Can you try setting EngineBuilder's MCPU to "core-avx2"? ~Craig On Mon, May 8, 2017 at 10:06 AM, A...
2017 Jan 20
3
getScalarizationOverhead()
On 2017-01-20 14:31, Hal Finkel wrote: > > On 01/20/2017 06:11 AM, Jonas Paulsson via llvm-dev wrote: >> Hi, >> >> I wonder why getScalarizationOverhead() does not take into account >> the number of operands of the instruction? This should influence the >> number of extracts needed, so instead of >> >> Scalarization cost = NumEls * (insert +
2016 Jan 23
2
how to force llvm generate gather intrinsic
Hi, I used clang -O3 -c -emit-llvm on the follow code to generate a bitcode, say a.bc. I read the .ll file and didn't see any gather intrinsic. Also, I used opt -O3 -mcpu=core-avx2/-mcpu=skx, but there is still no gather intrinsic generated. int foo(int A[800], int B[800], int C[800]) { for (int i = 0; i < 800; i++) { A[B[i]] = i + 5; } for (int i = 0; i < 800;
2011 Dec 14
1
[LLVMdev] [LLVM, llc] TypeLegalization, DAGCombining, vectors loading
...enerates this vector code) should be aware of the target instruction set and decide on the vectorization factor accordingly. When our vectorizer[1] decides on the vectorization factor, it takes into account the available instruction set, as well as the operations used in the program. For example, AVX1 focuses on floating point operations, and vectorizing integer code to VF=8, would generate suboptimal code, because it would require the op legalizer to unpack/pack operations on each 'hole' in the instruction set. Thanks, Nadav [1] Intel's OpenCL SDK Vectorizer -----Original Me...
2013 Sep 12
3
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
...can't reproduce the problem here at the moment: both debug and > release builds give identical assembly for Host.cpp. OK. I know the reason you cannot reproduce it, before posting the patch I've decided to check for AVX before checking AVX2, just not to cpuid AVX2 when we don't have AVX1 anyway. So the problem exists with following predicate: (1) bool HasAVX2 = !GetX86CpuIDAndInfo(0x7, &EAX, &EBX, &ECX, &EDX) && (EBX & 0x20); However it is working absolutely fine if I add "volatile": (2) volatile bool HasAVX2 = !GetX86Cpu...
2017 May 08
2
LLVM and Xeon Skylake v5
Thank you. I'm letting it auto detect by setting the target using getProcessTarget. I disabled avx512 support by passing -avx512f (and the other variants) to setMAttrs on EngineBuilder. I can see refs to avx512 in X86.td. It's the exact same executable running on Kabylake. What does the Cannot select: specifically mean? Is there some table that doesn't have a definition for a key in
2013 Sep 12
0
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
Hi Adam, > * I have marked HasAVX2 as "volatile", since otherwise it gets > magically zeroed (by optimizer?) when compiling clang with latest > clang build from trunk That's far more worrying to me than not being able to detect Haswell. I can't reproduce the problem here at the moment: both debug and release builds give identical assembly for Host.cpp. I don't