thr3ads.net - search: "shuf"

Displaying 20 results from an estimated 23 matches for "shuf".

Did you mean: sbuf

InstructionSimplify: adding a hook for shufflevector instructions

2017 Mar 30

InstructionSimplify: adding a hook for shufflevector instructions

As Sanjay noted in D31426<https://reviews.llvm.org/D31426#712701>, InstructionSimplify is missing the following simplification: This function: define <4 x i32> @splat_operand(<4 x i32> %x) { %splat = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> zeroinitializer %shuf = shufflevector <4 x i32> %splat, <4 x i32> undef, <4 x i32> <i32 0, i32 3, i32 2, i32 1> ret <4 x i32> %shuf } can be simplified to: define <4 x i32> @splat_operand(...

IR canonicalization: shufflevector or vector trunc?

2017 Jan 12

IR canonicalization: shufflevector or vector trunc?

It's time for another round of "What is the canonical IR?" Credit for this episode to Zvi and PR31551. :) https://llvm.org/bugs/show_bug.cgi?id=31551 define <4 x i16> @shuffle(<16 x i16> %x) { %shuf = shufflevector <16 x i16> %x, <16 x i16> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12> ret <4 x i16> %shuf } define <4 x i16> @trunc(<16 x i16> %x) { %bc = bitcast <16 x i16> %x to <4 x i64> %tr = trunc...

IR canonicalization: vector select or shufflevector?

2016 Aug 28

IR canonicalization: vector select or shufflevector?

A vector select with a constant vector condition operand: define <4 x i32> @foo(<4 x i32> %a, <4 x i32> %b) { %sel = select <4 x i1> <i1 true, i1 false, i1 false, i1 true>, <4 x i32> %a, <4 x i32> %b ret <4 x i32> %sel } ...is equivalent to a shufflevector: define <4 x i32> @goo(<4 x i32> %a, <4 x i32> %b) { %shuf = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 5, i32 6, i32 3> ret <4 x i32> %shuf } For the goal of canonicalization in IR, which of these should we prefer...

IR canonicalization: shufflevector or vector trunc?

2017 Jan 13

IR canonicalization: shufflevector or vector trunc?

Right - I think that case looks like this for little endian: define <2 x i32> @zextshuffle(<2 x i16> %x) { %zext_shuffle = shufflevector <2 x i16> %x, <2 x i16> zeroinitializer, <4 x i32> <i32 0, i32 2, i32 1, i32 2> %bc = bitcast <4 x i16> %zext_shuffle to <2 x i32> ret <2 x i32> %bc } define <2 x i32> @zextvec(<2 x i16&...

IR canonicalization: shufflevector or vector trunc?

2017 Jan 12

IR canonicalization: shufflevector or vector trunc?

On 1/12/2017 9:04 AM, Sanjay Patel via llvm-dev wrote: > It's time for another round of "What is the canonical IR?" > > Credit for this episode to Zvi and PR31551. :) > https://llvm.org/bugs/show_bug.cgi?id=31551 > define <4 x i16> @shuffle(<16 x i16> %x) { > %shuf = shufflevector <16 x i16> %x, <16 x i16> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12> > ret <4 x i16> %shuf > } > > define <4 x i16> @trunc(<16 x i16> %x) { > %bc = bitcast <16 x i16> %x...

IR canonicalization: shufflevector or vector trunc?

2017 Jan 17

IR canonicalization: shufflevector or vector trunc?

...2i128 should be treated differently than v16i16. Is this a valid argument to not canonicalize the IR? On Mon, Jan 16, 2017 at 10:16 AM, Rackover, Zvi <zvi.rackover at intel.com> wrote: > Suppose we prefer the ‘trunc’ form, then what about cases such as: > > define <2 x i16> @shuffle(<16 x i16> %x) { > > %shuf = shufflevector <16 x i16> %x, <16 x i16> undef, <2 x i32> <i32 0, > i32 8> > > ret <2 x i16> %shuf > > } > > > > Will the ‘shufflevector’ be canonicalized to a ‘trunc’ of a vector of i128? > &...

IR canonicalization: shufflevector or vector trunc?

2017 Jan 12

IR canonicalization: shufflevector or vector trunc?

....org> wrote: > On 1/12/2017 9:04 AM, Sanjay Patel via llvm-dev wrote: > > It's time for another round of "What is the canonical IR?" > > Credit for this episode to Zvi and PR31551. :) > https://llvm.org/bugs/show_bug.cgi?id=31551 > > define <4 x i16> @shuffle(<16 x i16> %x) { > %shuf = shufflevector <16 x i16> %x, <16 x i16> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12> > ret <4 x i16> %shuf > } > > define <4 x i16> @trunc(<16 x i16> %x) { > %bc = bitcast <16 x i16> %x to...

IR canonicalization: vector select or shufflevector?

2016 Aug 29

IR canonicalization: vector select or shufflevector?

I have a slight preference towards shufflevector, because it makes sequences of shuffles, where only some of the shuffles can be converted into selects (because the input and output vector sizes of the others don't match) simpler to reason about. I'm not sure this is a particularly good reason, though. On Mon, Aug 29, 2016 at 8...

IR canonicalization: shufflevector or vector trunc?

2017 Jan 21

IR canonicalization: shufflevector or vector trunc?

On Thu, Jan 19, 2017 at 9:17 AM, Rackover, Zvi <zvi.rackover at intel.com> wrote: > Hi Sanjay, > > > > I agree we should also discuss **if** this canonicalization is beneficial. > > For starters, do we have a concrete case where we would benefit from > canonicalizing shuffles <-> truncates in LLVM IR? > > IMO, we should not count benefits for codegen because that alone does not > justify transforming the IR ; we could always do this on the SelectionDAG. > > Agreed. If we're just talking about IR benefits, then it's easy to demonstrate a...

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

2012 Jul 06

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

...CODE ///////////////////////////////////////// #include <xmmintrin.h> #define __INLINE static inline __attribute__((always_inline)) #define LOAD _mm_load_ps #define STORE _mm_store_ps #define ADD _mm_add_ps #define SUB _mm_sub_ps #define MULT _mm_mul_ps #define STREAM _mm_stream_ps #define SHUF _mm_shuffle_ps #define VLIT4(a,b,c,d) _mm_set_ps(a,b,c,d) #define SWAP(d) SHUF(d,d,_MM_SHUFFLE(2,3,0,1)) #define UNPACK2LO(a,b) SHUF(a,b,_MM_SHUFFLE(1,0,1,0)) #define UNPACK2HI(a,b) SHUF(a,b,_MM_SHUFFLE(3,2,3,2)) #define HALFBLEND(a,b) SHUF(a,b,_MM_SHUFFLE(3,2,1,0)) __INLINE void TX2(__m128 *a, __...

IR canonicalization: vector select or shufflevector?

2016 Aug 29

IR canonicalization: vector select or shufflevector?

x86 has also put a lot of effort into shuffle lowering...so much so that it is its own life-form and brings most online codeviewer apps to their knees when you try to open X86ISelLowering.cpp. :) Given that: 1. There are at least 2 targets that lean towards shuffle (Martin's comment + x86 uses lowerVSELECTtoVectorShuffle() for all case...

[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs

2019 Dec 09

[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs

Sanjay, I'm looking at some missed optimizations caused by D70246. Here's a test case: define <4 x float> @f(i32 %t32, <4 x float>* %t24) { .entry: %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 %t44 = bitcast <3 x i32> %t43 to <3 x float> %t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef> %t46 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 undef, i32 1, i32 undef, i32 undef> %t47 = shufflevector <3 x float> %t...

Windows SMB2 client doing excessive, inefficient SMB2 Find (and other) requests

2017 Aug 31

Windows SMB2 client doing excessive, inefficient SMB2 Find (and other) requests

...indings, I have two simple questions: > > 1) I plan to use a new, reproducible test scenario with 2000 random small > files with a file length between 1 and 2048 bytes, created along the lines > of the following: > > for i in $(seq -f "%04g" 1 2000) ; do > length=`shuf -i 1-2048 -n 1` > head -c $length < /dev/urandom > file${i}.rnd > done This is overly complicated for this I guess, a simple touch file$i should do it. > in order to make test data non-confidential (unfortunately, my previous test > files/packet traces were confidential). Do y...

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 18

[LLVMdev] [Vectorization] Mis match in code generated

...ex = phi i32 [ 0, %vector.ph <http://vector.ph> ], [ %index.next, %vector.body ] %vec.phi = phi <4 x i32> [ zeroinitializer, %vector.ph <http://vector.ph> ], [ %14, %vector.body ] %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %index, i32 0 %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3> %0 = extractelement <4 x i32> %induction, i32 0 %1 = getelementptr inbounds i32* %a, i32 %0 %2 = inse...

Friendly Reminder: Would you please comment on my findings?

2017 Aug 18

Friendly Reminder: Would you please comment on my findings?

Ah, ok, "directory handle leases"... Ouch, I see... :-( In this case, I will first repeat my test scenario with a Windows SMB2 server and report back here. Based on the results of this exercise, you can then advise whether you still want to move this to smb-technical and raise this with Microsoft folks (who still might have a simple workaround "fix" to improve their SMB2

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 18

[LLVMdev] [Vectorization] Mis match in code generated

...%vector.ph <http://vector.ph/> ], [ %index.next, %vector.body ] %vec.phi = > phi <4 x i32> [ zeroinitializer, %vector.ph <http://vector.ph/> ], [ %14, > %vector.body ] %broadcast.splatinsert = insertelement <4 x i32> undef, i32 > %index, i32 0 %broadcast.splat = shufflevector <4 x i32> > %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer > %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3> > %0 = extractelement <4 x i32> %induction, i32 0 %1 = getelementptr > inbounds i32* %...

Windows SMB2 client doing excessive, inefficient SMB2 Find (and other) requests

2017 Aug 31

Windows SMB2 client doing excessive, inefficient SMB2 Find (and other) requests

...questions: >> >> 1) I plan to use a new, reproducible test scenario with 2000 random small >> files with a file length between 1 and 2048 bytes, created along the lines >> of the following: >> >> for i in $(seq -f "%04g" 1 2000) ; do >> length=`shuf -i 1-2048 -n 1` >> head -c $length < /dev/urandom > file${i}.rnd >> done > This is overly complicated for this I guess, a simple touch file$i should do it. I see. While I fear that some logic might detect that we are about to send empty files and possibly use some short-cut...

Effectiveness of CentOS vm.swappiness

2015 Jun 04

Effectiveness of CentOS vm.swappiness

Hi all, This might not be CentOS related at all. Sorry about that. I have lots of C6 & C7 machines in use and all of them have the default swappiness of 60. The problem now is that a lot of those machines do swap although there is no memory pressure. I'm now thinking about lowering swappiness to 1. But I'd still like to find out why this happens. The only common thing between all

Windows SMB2 client doing excessive, inefficient SMB2 Find (and other) requests

2017 Aug 24

Windows SMB2 client doing excessive, inefficient SMB2 Find (and other) requests

...ption of the scenario and the findings, I have two simple questions: 1) I plan to use a new, reproducible test scenario with 2000 random small files with a file length between 1 and 2048 bytes, created along the lines of the following: for i in $(seq -f "%04g" 1 2000) ; do length=`shuf -i 1-2048 -n 1` head -c $length < /dev/urandom > file${i}.rnd done in order to make test data non-confidential (unfortunately, my previous test files/packet traces were confidential). Do you agree that the above procedure is fine to create the test scenario? 2) What about confidential...

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 19

[LLVMdev] [Vectorization] Mis match in code generated

...<http://vector.ph/> ], [ %index.next, %vector.body ] %vec.phi = >> phi <4 x i32> [ zeroinitializer, %vector.ph <http://vector.ph/> ], [ %14, >> %vector.body ] %broadcast.splatinsert = insertelement <4 x i32> undef, i32 >> %index, i32 0 %broadcast.splat = shufflevector <4 x i32> >> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer >> %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3> >> %0 = extractelement <4 x i32> %induction, i32 0 %1 = getelementptr >>...

search for: shuf