thr3ads.net - search: "x86ttiimpl"

Displaying 20 results from an estimated 23 matches for "x86ttiimpl".

KNL Vectorization with larger vector width

2018 Jul 24

KNL Vectorization with larger vector width

Thank You. Right now to see the effect i did following changes; unsigned X86TTIImpl::getRegisterBitWidth(bool Vector) { if (Vector) { if (ST->hasAVX512()) return 65536; here i changed 512 to 65536. Then in loopvectorize.cpp i did following; assert(MaxVectorSize <= 2048 && "Did not expect to pack so many elements"...

KNL Vectorization with larger vector width

2018 Jul 24

KNL Vectorization with larger vector width

...32> vector instructions. How to do this? What adjustments are needed? Please help I m trying this but unable to solve. Thank You On Tue, Jul 24, 2018 at 4:44 PM, hameeza ahmed <hahmed2305 at gmail.com> wrote: > Hello, > Do i need to change following function; > > unsigned X86TTIImpl::getNumberOfRegisters(bool Vector) { > if (Vector && !ST->hasSSE1()) > return 0; > > if (ST->is64Bit()) { > if (Vector && ST->hasAVX512()) > return 32; > return 16; > } > return 8; > } > > to > > if (ST-&gt...

how to force llvm generate gather intrinsic

2016 Jan 23

how to force llvm generate gather intrinsic

...d generate those instructions. I don't want to touch the source code. Best, Zhi On Fri, Jan 22, 2016 at 4:54 PM, Sanjay Patel <spatel at rotateright.com> wrote: > I was just looking at the related masked load/store operations, and I > think there are at least 2 bugs: > > 1. X86TTIImpl::isLegalMaskedLoad/Store() should be legal for FP types with > AVX1 (not just AVX2). > 2. X86TTIImpl::isLegalMaskedGather/Scatter() should be legal for 128/256 > bit vectors with AVX2 (not just AVX512). > > I looked at this for the first time today, so I may be missing something... &...

how to force llvm generate gather intrinsic

2016 Jan 23

how to force llvm generate gather intrinsic

Hi, I used clang -O3 -c -emit-llvm on the follow code to generate a bitcode, say a.bc. I read the .ll file and didn't see any gather intrinsic. Also, I used opt -O3 -mcpu=core-avx2/-mcpu=skx, but there is still no gather intrinsic generated. int foo(int A[800], int B[800], int C[800]) { for (int i = 0; i < 800; i++) { A[B[i]] = i + 5; } for (int i = 0; i < 800;

KNL Vectorization with larger vector width

2018 Jul 23

KNL Vectorization with larger vector width

Thank You. I got it. Version issue. TTI.getRegisterBitWidth(true) How to put my target machine info in TTI? Please help. On Mon, Jul 23, 2018 at 11:33 PM, Friedman, Eli <efriedma at codeaurora.org> wrote: > On 7/23/2018 10:49 AM, hameeza ahmed via llvm-dev wrote: > > Thank You. > > But I cannot find your mentioned function

generate vectorized code

2016 Mar 17

generate vectorized code

...mini at apple.com> wrote: > Hi Rail, > > Two hints to begin with: > > 1) Makes sure you example is vectorized on X86 for example > 2) Is your target correctly overriding the TTI (declaring the vector > register size for example) so that the vectorizer can kicks-in (see > X86TTIImpl::getRegisterBitWidth for instance). Alternatively you can test > the SLP vectorizer by passing to clang: -mllvm -slp-max-reg-size -mllvm 512 > (I don't see an equivalent option for the loop vectorizer though). > > Well, it sort of worked. I added a getRegisterBitWidth(...) but then...

X86 TRUNCATE cost for AVX & AVX2 mode

2016 Apr 11

X86 TRUNCATE cost for AVX & AVX2 mode

Hi, I was going through the X86TTIImpl::getCastInstrCost, and got a doubt on cost calculation for TRUNCATE instruction in AVX mode. In AVX2ConversionTbl & AVXConversionTbl table there is no cost defined for TRUNCATE v16i32 to v16i8, as a fallback it goes to SSE41ConversionTbl table and there it finds cost as 30 for this operation....

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 01

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

...8 as SubtargetFeatures in X86.td not mapped to any CPU. -Add mprefer-avx256 and mprefer-avx128 and the corresponding -mno-prefer-avx128/256 options to clang's driver Options.td file. I believe this will allow clang to pass these straight through to the -target-feature attribute in IR. -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if AVX512 is enabled and prefer-avx256 and prefer-avx128 is not set. Similarly return 256 if AVX is enabled and prefer-avx128 is not set. There may be some other backend changes needed, but I plan to address those as we find them. At a later point, consi...

generate vectorized code

2016 Mar 17

generate vectorized code

...t;> Hi Rail, >> >> Two hints to begin with: >> >> 1) Makes sure you example is vectorized on X86 for example >> 2) Is your target correctly overriding the TTI (declaring the vector >> register size for example) so that the vectorizer can kicks-in (see >> X86TTIImpl::getRegisterBitWidth for instance). Alternatively you can test >> the SLP vectorizer by passing to clang: -mllvm -slp-max-reg-size -mllvm 512 >> (I don't see an equivalent option for the loop vectorizer though). >> >> Well, it sort of worked. I added a getRegisterBitWid...

generate vectorized code

2016 Mar 16

generate vectorized code

My question is: How do I make clang to generate assembly with vector instruction for my target? The back story is: I've added a few vector instructions to my target and confirmed that they are used by running my code on the test below and using a following command: opt i.esencia.ll -S -march=esencia -mcpu=esencia -loop-vectorize | llc -mcpu=esencia -o i.esencia.s target datalayout =

X86 TRUNCATE cost for AVX & AVX2 mode

2016 Apr 12

X86 TRUNCATE cost for AVX & AVX2 mode

...vsky at intel.com>>; Zuckerman, Michael <michael.zuckerman at intel.com<mailto:michael.zuckerman at intel.com>> Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Subject: X86 TRUNCATE cost for AVX & AVX2 mode Hi, I was going through the X86TTIImpl::getCastInstrCost, and got a doubt on cost calculation for TRUNCATE instruction in AVX mode. In AVX2ConversionTbl & AVXConversionTbl table there is no cost defined for TRUNCATE v16i32 to v16i8, as a fallback it goes to SSE41ConversionTbl table and there it finds cost as 30 for this operation....

generate vectorized code

2016 Mar 18

generate vectorized code

...nts to begin with: >>>> >>>> 1) Makes sure you example is vectorized on X86 for example >>>> 2) Is your target correctly overriding the TTI (declaring the vector >>>> register size for example) so that the vectorizer can kicks-in (see >>>> X86TTIImpl::getRegisterBitWidth for instance). Alternatively you can test >>>> the SLP vectorizer by passing to clang: -mllvm -slp-max-reg-size -mllvm 512 >>>> (I don't see an equivalent option for the loop vectorizer though). >>>> >>>> Well, it sort of wor...

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 03

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

...gt; >> -Add mprefer-avx256 and mprefer-avx128 and the corresponding >> -mno-prefer-avx128/256 options to clang's driver Options.td file. I believe >> this will allow clang to pass these straight through to the -target-feature >> attribute in IR. >> >> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if AVX512 is >> enabled and prefer-avx256 and prefer-avx128 is not set. Similarly return >> 256 if AVX is enabled and prefer-avx128 is not set. >> > > Instead of multiple flags that have difficult to understand intersecting > behavi...

how to force llvm generate gather intrinsic

2016 Jan 23

how to force llvm generate gather intrinsic

...tructions. I don't want to touch the source code. Best, Zhi On Fri, Jan 22, 2016 at 4:54 PM, Sanjay Patel <spatel at rotateright.com<mailto:spatel at rotateright.com>> wrote: I was just looking at the related masked load/store operations, and I think there are at least 2 bugs: 1. X86TTIImpl::isLegalMaskedLoad/Store() should be legal for FP types with AVX1 (not just AVX2). 2. X86TTIImpl::isLegalMaskedGather/Scatter() should be legal for 128/256 bit vectors with AVX2 (not just AVX512). I looked at this for the first time today, so I may be missing something... So for the moment, the an...

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 07

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

...avx128 and the corresponding > >>> -mno-prefer-avx128/256 options to clang's driver Options.td file. I believe > >>> this will allow clang to pass these straight through to the -target-feature > >>> attribute in IR. > >>> > >>> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if AVX512 is > >>> enabled and prefer-avx256 and prefer-avx128 is not set. Similarly return > >>> 256 if AVX is enabled and prefer-avx128 is not set. > >>> > >> > >> Instead of multiple flags that have dif...

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 09

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

...efer-avx128/256 options to clang's driver Options.td file. I >> believe >> > >>> this will allow clang to pass these straight through to the >> -target-feature >> > >>> attribute in IR. >> > >>> >> > >>> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if >> AVX512 is >> > >>> enabled and prefer-avx256 and prefer-avx128 is not set. Similarly >> return >> > >>> 256 if AVX is enabled and prefer-avx128 is not set. >> > >>> >> > >>...

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 11

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

...tions.td file. >>>> I believe >>>> > >>> this will allow clang to pass these straight through to the >>>> -target-feature >>>> > >>> attribute in IR. >>>> > >>> >>>> > >>> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if >>>> AVX512 is >>>> > >>> enabled and prefer-avx256 and prefer-avx128 is not set. Similarly >>>> return >>>> > >>> 256 if AVX is enabled and prefer-avx128 is not set. >>>>...

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 12

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

...e >>>>>> > >>> this will allow clang to pass these straight through to the >>>>>> -target-feature >>>>>> > >>> attribute in IR. >>>>>> > >>> >>>>>> > >>> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if >>>>>> AVX512 is >>>>>> > >>> enabled and prefer-avx256 and prefer-avx128 is not set. >>>>>> Similarly return >>>>>> > >>> 256 if AVX is enabled and prefer-avx1...

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 13

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

...t;>> > >>> this will allow clang to pass these straight through to the >>>>>>> -target-feature >>>>>>> > >>> attribute in IR. >>>>>>> > >>> >>>>>>> > >>> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if >>>>>>> AVX512 is >>>>>>> > >>> enabled and prefer-avx256 and prefer-avx128 is not set. >>>>>>> Similarly return >>>>>>> > >>> 256 if AVX is enabled...

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 13

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

...>> -target-feature >> > >>> attribute in IR. >> > >>> >> > >>> -Modify >> X86TTIImpl::getRegisterBitWidth to >> only return 512 if AVX512 is >> > >>> enabled and prefer-avx256 and >> prefer-avx128 is not set. Similarly >>...

search for: x86ttiimpl