similar to: [LLVMdev] splat instruction

Displaying 20 results from an estimated 7000 matches similar to: "[LLVMdev] splat instruction"

2010 May 14
2
[LLVMdev] vector optimization
Hi! Is there a pass that optimizes vector operations? If I have for examle a sequence of shufflevector instructions that optimizes them? (in opencl notation e.g. a.xyzw.wzyx.xxxx -> a.wwww) -Jochen
2011 Dec 28
3
[LLVMdev] InstCombine "pessimizes" trunc i8 to i1?
>> Hi! >> >> before InstCombine (llvm::createInstructionCombiningPass()) I have >> a trunc from i8 to i1 and then a select: >> >> %45 = load i8* @myGlobal, align 1 >> %tobool = trunc i8 %45 to i1 >> %cond = select i1 %tobool, float 1.000000e+00, float -1.000000e+00 >> >> after instCombine I have: >> >> %29 = load i8*
2011 Mar 18
2
[LLVMdev] new vector resize instruction could be useful
Hi! If I build a vector of some length (e.g. 4) from a vector of another length (e.g. 3) then I get tons of extractelement and insertelement instructions. since vectors of length 3 and 4 both map to an sse register it could be useful to introduce an instruction that changes the length of a vector, either truncating or extending by zero or undef values (whichever makes more sense). for lengths 3
2015 Jul 24
1
[LLVMdev] SIMD for sdiv <2 x i64>
This snippet of IR is interesting: %sub.ptr.div.iS37_D = sdiv <2 x i64> %sub.ptr.sub.iS36_D, <i64 24, i64 24> %cmp10S38_D = icmp ugt <2 x i64> %sub.ptr.div.iS37_D, %splatInsMapS1_D.splat %zextS39_D = sext <2 x i1> %cmp10S38_D to <2 x i64> %BCS39_D = bitcast <2 x i64> %zextS39_D to i128 %mskS39_D = icmp ne i128 %BCS39_D, 0 br i1 %mskS39_D,
2010 May 14
0
[LLVMdev] vector optimization
Instcombine does of this, late codegen also does some of it. -Chris On May 14, 2010, at 5:58 AM, Jochen Wilhelmy <j.wilhelmy at arcor.de> wrote: > Hi! > > Is there a pass that optimizes vector operations? > If I have for examle a sequence of shufflevector instructions > that optimizes them? > (in opencl notation e.g. a.xyzw.wzyx.xxxx -> a.wwww) > > -Jochen
2017 Feb 17
2
Vector trunc code generation difference between llvm-3.9 and 4.0
Correction in the C snippet: typedef signed short v8i16_t __attribute__((ext_vector_type(8))); v8i16_t foo (v8i16_t a, int n) { return a >> n; } Best regards Saurabh On 17 February 2017 at 16:21, Saurabh Verma <saurabh.verma at movidius.com> wrote: > Hello, > > We are investigating a difference in code generation for vector splat > instructions between llvm-3.9
2011 Dec 29
0
[LLVMdev] InstCombine "pessimizes" trunc i8 to i1?
I think Chris is saying that the and is necessary because with your i1 trunc you're ignoring all of the high bits. The and implements that. If you don't want this behavior, don't generate the trunc in the first place and just compare the full width to zero. Reid On Wed, Dec 28, 2011 at 6:45 AM, Jochen Wilhelmy <j.wilhelmy at arcor.de>wrote: > > >> Hi! >
2015 Jul 24
0
[LLVMdev] SIMD for sdiv <2 x i64>
------------------------------------ IR ------------------------------------------------------------------ if.then.i.i.i.i.i.i: ; preds = %if.then4 %S25_D = zext <2 x i32> %splatLDS17_D.splat to <2 x i64> %umul_with_overflow.i.iS26_D = shl <2 x i64> %S25_D, <i64 3, i64 3> %extumul_with_overflow.i.iS26_D = extractelement <2 x i64>
2018 Jul 30
5
[RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
Hi, Are there any objections to going ahead with this? If not, we'll try to get the patches reviewed and committed after the 7.0 branch occurs. -Graham > On 2 Jul 2018, at 10:53, Graham Hunter <Graham.Hunter at arm.com> wrote: > > Hi, > > I've updated the RFC slightly based on the discussion within the thread, reposted below. Let me know if I've missed
2015 Jul 09
2
[LLVMdev] Extracting a splat value from vector instruction.
Hi, We have a function in IRBuilder.h Value *CreateVectorSplat(unsigned NumElts, Value *V, const Twine &Name = "") { .. } This function creates 2 instructions - "insertelement" and "shuffle" with all-zero mask. Now I want to add Value *getSplatValue(Value *Val). This function will try to recognize the pattern - insertelement+shuffle and return the splat value
2017 Feb 18
2
Vector trunc code generation difference between llvm-3.9 and 4.0
Thanks Sanjay. Interestingly for me, disable-llvm-optmzns did not make a difference in the way the shift was handled. Does the initial IR generated for you show this difference when the option is passed? Best regards Saurabh On 17 February 2017 at 19:03, Sanjay Patel <spatel at rotateright.com> wrote: > I think this is caused by a front-end change (cc'ing clang-dev) because >
2017 Mar 08
2
Vector trunc code generation difference between llvm-3.9 and 4.0
The regression for the reported case should be avoided after: https://reviews.llvm.org/rL297232 https://reviews.llvm.org/rL297242 https://reviews.llvm.org/rL297280 It would still be good to understand if the clang change was intentional or if that was a side effect that can be limited. On Sat, Feb 18, 2017 at 9:11 AM, Sanjay Patel <spatel at rotateright.com> wrote: > Yes, there is an
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The instcombine pass cleans up a lot. Any idea why there are still shufflevector, insertelement, *and* bitcast (!!) etc. instructions left? The original loop is so clean, a textbook example I'd say. There is no need to shuffle anything.At least I don't see it. Frank vector.ph: ; preds = %L5 %broadcast.splatinsert1 = insertelement <4 x
2018 Jun 05
14
[RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
Hi, Now that Sander has committed enough MC support for SVE, here's an updated RFC for variable length vector support with a set of 14 patches (listed at the end) to demonstrate code generation for SVE using the extensions proposed in the RFC. I have some ideas about how to support RISC-V's upcoming extension alongside SVE; I'll send an email with some additional comments on
2019 May 24
2
[RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
In the RISC-V V extension, there is no upper limit to the size vector registers can be in a future CPU. (Formally, the upper limit is at least 2^31 bytes) Generic code can enquire the size, dynamically allocate space, and transparently save and restore the contents of a vector register or registers. On Fri, May 24, 2019 at 11:28 AM JinGu Kang via llvm-dev <llvm-dev at lists.llvm.org>
2010 Sep 04
1
[LLVMdev] Thoughts about the llvm architecture -
Jochen Wilhelmy schrieb: >>>>> Hi! >>>>> >>>>> The following thoughts about the llvm architecture I'd like to share >>>>> with you >>>>> (from the perspective of a user): >>>>> >>>>> - If a backend has no vector support, then I wonder why there is no >>>>> de-vectorization
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The following IR implements the following nested loop: for (int i = start ; i < end ; ++i ) for (int p = 0 ; p < 4 ; ++p ) a[i*4+p] = b[i*4+p] + c[i*4+p]; define void @main(i64 %arg0, i64 %arg1, i1 %arg2, i64 %arg3, float* noalias %arg4, float* noalias %arg5, float* noalias %arg6) { entrypoint: br i1 %arg2, label %L0, label %L1 L0:
2019 May 24
2
[EXT] Re: [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
JinGu: I’m not Graham, but you might find the following link a good starting point. https://community.arm.com/developer/tools-software/hpc/b/hpc-blog/posts/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture The question you ask doesn’t have a short answer. The compiler and the instruction set design work together to allow programs to be compiled without knowing
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The loop vectorizer relies on cleanup passes to be run after it: from Transforms/IPO/PassManagerBuilder.cpp: // Add the various vectorization passes and relevant cleanup passes for // them since we are no longer in the middle of the main scalar pipeline. MPM.add(createLoopVectorizePass(DisableUnrollLoops)); MPM.add(createInstructionCombiningPass());
2010 Sep 03
0
[LLVMdev] Thoughts about the llvm architecture -
>>>> Hi! >>>> >>>> The following thoughts about the llvm architecture I'd like to share >>>> with you >>>> (from the perspective of a user): >>>> >>>> - If a backend has no vector support, then I wonder why there is no >>>> de-vectorization >>>> pass that operates on indermediate