thr3ads.net - search: "avx2"

Displaying 20 results from an estimated 426 matches for "avx2".

Did you mean: avx

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

2015 May 04

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

Thanks Nadav for the info. It clears my query :) Yes its an integer ADD, and since AVX2 supports 256 bits integer arithmetic, so its cost is less than AVX1. One query though - shouldn't then the cost of integer ADD/SUB/MUL (which would be 1) be explicitly specified in AVX2 cost table? Because right now this entry is missing and cost of these operations are taken from BaseTTI (whi...

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

2015 May 04

[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo

Hi all, I have a query regarding Cost Table for AVX2 in TargetTransformInfo. The table consist of entries for shift and div operations only. There are no entries for ADD, SUB and MUL for AVX2 cost table. Those entries are present in Cost Table for AVX. The reason for query is - when my sub target feature is AVX2, in SLP Vectorization, while calcul...

[LLVMdev] AVX2 in 3.2

2013 Jan 07

[LLVMdev] AVX2 in 3.2

The 3.2 release notes mention "Small codegen optimizations, especially for AVX2." Can someone provide a little more information about that? What kinds of things were improved for AVX2? Thanks! -David

error of using GATHER intrinsic

2016 Jan 20

error of using GATHER intrinsic

> On Jan 20, 2016, at 12:59 PM, Tim Northover via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi Zhi, > > On 18 January 2016 at 11:28, zhi chen via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Any idea about this error? Or could anyone give me an example how to use the >> gather intrinsic if there is something wrong with the way I am using it?

AVX2 / 3DNow.

2014 Sep 30

AVX2 / 3DNow.

It is relatively easy to convert some SSE2/3/4 code into AVX2: just use AVX2 intrinsics instead of SSE and the logic of the functions. Unfortunately my CPU doesn't have AVX2. But today I managed to briefly test AVX2 code on i5 Haswell CPU. Unfortunately I wasn't able to run full test suite on Haswell, but it seems that the new code works correctly. Th...

how to force llvm generate gather intrinsic

2016 Jan 23

how to force llvm generate gather intrinsic

..., Jan 22, 2016 at 4:54 PM, Sanjay Patel <spatel at rotateright.com> wrote: > I was just looking at the related masked load/store operations, and I > think there are at least 2 bugs: > > 1. X86TTIImpl::isLegalMaskedLoad/Store() should be legal for FP types with > AVX1 (not just AVX2). > 2. X86TTIImpl::isLegalMaskedGather/Scatter() should be legal for 128/256 > bit vectors with AVX2 (not just AVX512). > > I looked at this for the first time today, so I may be missing something... > > So for the moment, the answer to your question is 'no'; there's n...

[LLVMdev] AVX2 in 3.2

2013 Jan 07

[LLVMdev] AVX2 in 3.2

Hi David, There were many changes. For example, efficient lowering of vector casts (trunc, zext, sext), simplified vblend patterns, shuffle patterns, AVX2 gathers, just to name a few. Nadav On Jan 7, 2013, at 12:13 PM, dag at cray.com wrote: > The 3.2 release notes mention "Small codegen optimizations, especially > for AVX2." Can someone provide a little more information about that? > What kinds of things were improved for A...

how to force llvm generate gather intrinsic

2016 Feb 25

how to force llvm generate gather intrinsic

It seems that http://reviews.llvm.org/D15690 only implemented gather/scatter for AVX-512, but not for AVX/AVX2. Is there any plan to enable gather for AVX/2? Thanks. Best, Zhi On Thu, Feb 25, 2016 at 8:28 AM, Sanjay Patel <spatel at rotateright.com> wrote: > I don't think gather has been enabled for AVX2 as of r261875. > Masked load/store were enabled for AVX with: > http://reviews.llv...

error of using GATHER intrinsic

2016 Jan 20

error of using GATHER intrinsic

Hi Tim, Thanks for your response. The attached is the .bc file after my pass. I could generate the assembly with -mcpu=skx but not with -mcpu=core-avx2. Could you please take a look? BTW, I am using LLVM-3.7. Best, Zhi On Wed, Jan 20, 2016 at 1:21 PM, Tim Northover <t.p.northover at gmail.com> wrote: > > Only typo that caught my eye is ‘llvm.masked.gather.v8f64’ which should > have v2 instead of v8 to match the <2 x double>...

how to force llvm generate gather intrinsic

2016 Feb 26

how to force llvm generate gather intrinsic

If I'm understanding correctly, you're saying that vgather* is slow on all of Excavator, Haswell, Broadwell, and Skylake (client). Therefore, we will not generate it for any of those machines. Even if that's true, we should not define "gatherIsSlow()" as "hasAVX2() && !hasAVX512()". It could break for some hypothetical future processor that manages to implement it properly. The AVX2 spec includes gather; whether it's slow or fast is an implementation detail. We need a feature bit / cost model entry somewhere to signify this, so we're no...

how to force llvm generate gather intrinsic

2016 Jan 23

how to force llvm generate gather intrinsic

Thanks Sanjay for highlighting this, few days back I also faced similar problem while generating masked store in avx1 mode, found its only supported under avx2 else we scalarize it. > 1) I did not switch-on masked_load/store to AVX1, I can do this. Yes Elena, This should be supported for FP type in avx1 mode (for INT type, I doubt X86 has masked_load/store instruction in avx1 mode). Thanks, Ashutosh From: llvm-dev [mailto:llvm-dev-bounces at lis...

[PATCH 5/5]

2014 Oct 03

[PATCH 5/5]

This patch adds two AVX2 files and adds AVX2 support code into init_stream_internal_() in stream_encoder.c. -------------- next part -------------- A non-text attachment was scrubbed... Name: 05_avx2.zip Type: application/zip Size: 7279 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20...

X86 TRUNCATE cost for AVX & AVX2 mode

2016 Apr 11

X86 TRUNCATE cost for AVX & AVX2 mode

Hi, I was going through the X86TTIImpl::getCastInstrCost, and got a doubt on cost calculation for TRUNCATE instruction in AVX mode. In AVX2ConversionTbl & AVXConversionTbl table there is no cost defined for TRUNCATE v16i32 to v16i8, as a fallback it goes to SSE41ConversionTbl table and there it finds cost as 30 for this operation. 30 cost for this operation looks very high. Wondering why such a high cost kept for this, any pointer...

X86 TRUNCATE cost for AVX & AVX2 mode

2016 Apr 12

X86 TRUNCATE cost for AVX & AVX2 mode

...Demikhovsky, Elena [mailto:elena.demikhovsky at intel.com] Sent: Monday, April 11, 2016 9:05 PM To: Nema, Ashutosh <Ashutosh.Nema at amd.com> Cc: llvm-dev <llvm-dev at lists.llvm.org>; Zuckerman, Michael <michael.zuckerman at intel.com> Subject: RE: X86 TRUNCATE cost for AVX & AVX2 mode Hi, One day I worked hard and refactored the cost calculation for all X86 targets. http://reviews.llvm.org/D15604 But this revision was not accepted. I fixed conversions, but assume that truncation suffers from the same problem. I used "SplitFactor" in order to process wide types....

how to force llvm generate gather intrinsic

2016 Feb 26

how to force llvm generate gather intrinsic

No. Gather operation is slow on AVX2 processors. - Elena From: zhi chen [mailto:zchenhn at gmail.com] Sent: Thursday, February 25, 2016 20:48 To: Sanjay Patel <spatel at rotateright.com> Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; llvm-dev <llvm-d...

how to force llvm generate gather intrinsic

2016 Feb 25

how to force llvm generate gather intrinsic

...trinsic Hi Elena, Are the masked_load and gather working now? Best, Zhi On Sat, Jan 23, 2016 at 12:06 PM, Demikhovsky, Elena <elena.demikhovsky at intel.com<mailto:elena.demikhovsky at intel.com>> wrote: > Can we legalize the same set of masked load/store operations for AVX1 as AVX2? Yes, of course. - Elena From: Sanjay Patel [mailto:spatel at rotateright.com<mailto:spatel at rotateright.com>] Sent: Saturday, January 23, 2016 18:42 To: Nema, Ashutosh <Ashutosh.Nema at amd.com<mailto:Ashutosh.Nema at amd.com>> Cc: Demikhovsky, Elena <elena.demik...

New x86-64 micro-architecture levels

2020 Jul 21

New x86-64 micro-architecture levels

...3DNow! (essentially the shared x86-64/EMT64 baseline), but I find this a bit confusing. > 2. 100/101 not very intuitive Any suggestions? The advantage is that these numbers show a strong preference ordering. They do make in false suggestions about feature sets: if we named Level C "x86-avx2", it would still be wrong for glibc to load libraries found in that directory just because a system has AVX2 support, because the libraries might also need FMA, based on the Level C definition). On the GCC side, it avoids a confusion between -mavx2 and -march=x86-avx2. If numbers are out, wh...

[LLVMdev] [3.6 Release] Bugfixes in Masked Load/Store

2015 Feb 17

[LLVMdev] [3.6 Release] Bugfixes in Masked Load/Store

Hi Hans, I fixed 2 bugs in the trunk branch related to Masked Load / Store. Since these intrinsics are generated by Loop Vectorizer on AVX2, a wrong code may be generated. One of the bugs was detected while gcc benchmark testing. http://llvm.org/bugs/show_bug.cgi?id=22225 I think that the bugs should be fixed in 3.6. I have 2 options (1) promote changes of revisions 226791 and 226808 from trunk to 3.6 (2) disable the Masked Load / Sto...

how to force llvm generate gather intrinsic

2016 Jan 23

how to force llvm generate gather intrinsic

Hi, I used clang -O3 -c -emit-llvm on the follow code to generate a bitcode, say a.bc. I read the .ll file and didn't see any gather intrinsic. Also, I used opt -O3 -mcpu=core-avx2/-mcpu=skx, but there is still no gather intrinsic generated. int foo(int A[800], int B[800], int C[800]) { for (int i = 0; i < 800; i++) { A[B[i]] = i + 5; } for (int i = 0; i < 800; i++) { A[B[i]]++; } for (int i = 0; i < 800; i++) { A[i] = B...

how to force llvm generate gather intrinsic

2016 Feb 26

how to force llvm generate gather intrinsic

...39;m understanding correctly, you're saying that vgather* is slow on all > of Excavator, Haswell, Broadwell, and Skylake (client). Therefore, we will > not generate it for any of those machines. > > Even if that's true, we should not define "gatherIsSlow()" as "hasAVX2() > && !hasAVX512()". It could break for some hypothetical future processor > that manages to implement it properly. The AVX2 spec includes gather; > whether it's slow or fast is an implementation detail. We need a feature > bit / cost model entry somewhere to signify t...

search for: avx2