thr3ads.net - similar to: "[LLVMdev] splat instruction"

Displaying 20 results from an estimated 7000 matches similar to: "[LLVMdev] splat instruction"

2010 May 14

[LLVMdev] vector optimization

Hi! Is there a pass that optimizes vector operations? If I have for examle a sequence of shufflevector instructions that optimizes them? (in opencl notation e.g. a.xyzw.wzyx.xxxx -> a.wwww) -Jochen

[LLVMdev] InstCombine "pessimizes" trunc i8 to i1?

2011 Dec 28

[LLVMdev] InstCombine "pessimizes" trunc i8 to i1?

>> Hi! >> >> before InstCombine (llvm::createInstructionCombiningPass()) I have >> a trunc from i8 to i1 and then a select: >> >> %45 = load i8* @myGlobal, align 1 >> %tobool = trunc i8 %45 to i1 >> %cond = select i1 %tobool, float 1.000000e+00, float -1.000000e+00 >> >> after instCombine I have: >> >> %29 = load i8*

[LLVMdev] new vector resize instruction could be useful

2011 Mar 18

[LLVMdev] new vector resize instruction could be useful

Hi! If I build a vector of some length (e.g. 4) from a vector of another length (e.g. 3) then I get tons of extractelement and insertelement instructions. since vectors of length 3 and 4 both map to an sse register it could be useful to introduce an instruction that changes the length of a vector, either truncating or extending by zero or undef values (whichever makes more sense). for lengths 3

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

[LLVMdev] SIMD for sdiv <2 x i64>

This snippet of IR is interesting: %sub.ptr.div.iS37_D = sdiv <2 x i64> %sub.ptr.sub.iS36_D, <i64 24, i64 24> %cmp10S38_D = icmp ugt <2 x i64> %sub.ptr.div.iS37_D, %splatInsMapS1_D.splat %zextS39_D = sext <2 x i1> %cmp10S38_D to <2 x i64> %BCS39_D = bitcast <2 x i64> %zextS39_D to i128 %mskS39_D = icmp ne i128 %BCS39_D, 0 br i1 %mskS39_D,

[LLVMdev] vector optimization

2010 May 14

[LLVMdev] vector optimization

Instcombine does of this, late codegen also does some of it. -Chris On May 14, 2010, at 5:58 AM, Jochen Wilhelmy <j.wilhelmy at arcor.de> wrote: > Hi! > > Is there a pass that optimizes vector operations? > If I have for examle a sequence of shufflevector instructions > that optimizes them? > (in opencl notation e.g. a.xyzw.wzyx.xxxx -> a.wwww) > > -Jochen

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Feb 17

Vector trunc code generation difference between llvm-3.9 and 4.0

Correction in the C snippet: typedef signed short v8i16_t __attribute__((ext_vector_type(8))); v8i16_t foo (v8i16_t a, int n) { return a >> n; } Best regards Saurabh On 17 February 2017 at 16:21, Saurabh Verma <saurabh.verma at movidius.com> wrote: > Hello, > > We are investigating a difference in code generation for vector splat > instructions between llvm-3.9

[LLVMdev] InstCombine "pessimizes" trunc i8 to i1?

2011 Dec 29

[LLVMdev] InstCombine "pessimizes" trunc i8 to i1?

I think Chris is saying that the and is necessary because with your i1 trunc you're ignoring all of the high bits. The and implements that. If you don't want this behavior, don't generate the trunc in the first place and just compare the full width to zero. Reid On Wed, Dec 28, 2011 at 6:45 AM, Jochen Wilhelmy <j.wilhelmy at arcor.de>wrote: > > >> Hi! >

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

[LLVMdev] SIMD for sdiv <2 x i64>

------------------------------------ IR ------------------------------------------------------------------ if.then.i.i.i.i.i.i: ; preds = %if.then4 %S25_D = zext <2 x i32> %splatLDS17_D.splat to <2 x i64> %umul_with_overflow.i.iS26_D = shl <2 x i64> %S25_D, <i64 3, i64 3> %extumul_with_overflow.i.iS26_D = extractelement <2 x i64>

[RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

2018 Jul 30

[RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Hi, Are there any objections to going ahead with this? If not, we'll try to get the patches reviewed and committed after the 7.0 branch occurs. -Graham > On 2 Jul 2018, at 10:53, Graham Hunter <Graham.Hunter at arm.com> wrote: > > Hi, > > I've updated the RFC slightly based on the discussion within the thread, reposted below. Let me know if I've missed

[LLVMdev] Extracting a splat value from vector instruction.

2015 Jul 09

[LLVMdev] Extracting a splat value from vector instruction.

Hi, We have a function in IRBuilder.h Value *CreateVectorSplat(unsigned NumElts, Value *V, const Twine &Name = "") { .. } This function creates 2 instructions - "insertelement" and "shuffle" with all-zero mask. Now I want to add Value *getSplatValue(Value *Val). This function will try to recognize the pattern - insertelement+shuffle and return the splat value

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Feb 18

Vector trunc code generation difference between llvm-3.9 and 4.0

Thanks Sanjay. Interestingly for me, disable-llvm-optmzns did not make a difference in the way the shift was handled. Does the initial IR generated for you show this difference when the option is passed? Best regards Saurabh On 17 February 2017 at 19:03, Sanjay Patel <spatel at rotateright.com> wrote: > I think this is caused by a front-end change (cc'ing clang-dev) because >

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Mar 08

Vector trunc code generation difference between llvm-3.9 and 4.0

The regression for the reported case should be avoided after: https://reviews.llvm.org/rL297232 https://reviews.llvm.org/rL297242 https://reviews.llvm.org/rL297280 It would still be good to understand if the clang change was intentional or if that was a side effect that can be limited. On Sat, Feb 18, 2017 at 9:11 AM, Sanjay Patel <spatel at rotateright.com> wrote: > Yes, there is an

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The instcombine pass cleans up a lot. Any idea why there are still shufflevector, insertelement, *and* bitcast (!!) etc. instructions left? The original loop is so clean, a textbook example I'd say. There is no need to shuffle anything.At least I don't see it. Frank vector.ph: ; preds = %L5 %broadcast.splatinsert1 = insertelement <4 x

[RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

2018 Jun 05

[RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Hi, Now that Sander has committed enough MC support for SVE, here's an updated RFC for variable length vector support with a set of 14 patches (listed at the end) to demonstrate code generation for SVE using the extensions proposed in the RFC. I have some ideas about how to support RISC-V's upcoming extension alongside SVE; I'll send an email with some additional comments on

[RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

2019 May 24

[RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

In the RISC-V V extension, there is no upper limit to the size vector registers can be in a future CPU. (Formally, the upper limit is at least 2^31 bytes) Generic code can enquire the size, dynamically allocate space, and transparently save and restore the contents of a vector register or registers. On Fri, May 24, 2019 at 11:28 AM JinGu Kang via llvm-dev <llvm-dev at lists.llvm.org>

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The following IR implements the following nested loop: for (int i = start ; i < end ; ++i ) for (int p = 0 ; p < 4 ; ++p ) a[i*4+p] = b[i*4+p] + c[i*4+p]; define void @main(i64 %arg0, i64 %arg1, i1 %arg2, i64 %arg3, float* noalias %arg4, float* noalias %arg5, float* noalias %arg6) { entrypoint: br i1 %arg2, label %L0, label %L1 L0:

[LLVMdev] Thoughts about the llvm architecture -

2010 Sep 04

[LLVMdev] Thoughts about the llvm architecture -

Jochen Wilhelmy schrieb: >>>>> Hi! >>>>> >>>>> The following thoughts about the llvm architecture I'd like to share >>>>> with you >>>>> (from the perspective of a user): >>>>> >>>>> - If a backend has no vector support, then I wonder why there is no >>>>> de-vectorization

[EXT] Re: [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

2019 May 24

[EXT] Re: [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

JinGu: I’m not Graham, but you might find the following link a good starting point. https://community.arm.com/developer/tools-software/hpc/b/hpc-blog/posts/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture The question you ask doesn’t have a short answer. The compiler and the instruction set design work together to allow programs to be compiled without knowing

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The loop vectorizer relies on cleanup passes to be run after it: from Transforms/IPO/PassManagerBuilder.cpp: // Add the various vectorization passes and relevant cleanup passes for // them since we are no longer in the middle of the main scalar pipeline. MPM.add(createLoopVectorizePass(DisableUnrollLoops)); MPM.add(createInstructionCombiningPass());

[RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

2018 Jul 30

[RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

On 07/30/2018 05:34 AM, Chandler Carruth wrote: > I strongly suspect that there remains widespread concern with the > direction of this, I know I have them. > > I don't think that many of the people who have that concern have had > time to come back to this RFC and make progress on it, likely because > of other commitments or simply the amount of churn around SVE related >

similar to: [LLVMdev] splat instruction