thr3ads.net - search: "rackover"

Displaying 15 results from an estimated 15 matches for "rackover".

IR canonicalization: shufflevector or vector trunc?

2017 Jan 21

IR canonicalization: shufflevector or vector trunc?

On Thu, Jan 19, 2017 at 9:17 AM, Rackover, Zvi <zvi.rackover at intel.com> wrote: > Hi Sanjay, > > > > I agree we should also discuss **if** this canonicalization is beneficial. > > For starters, do we have a concrete case where we would benefit from > canonicalizing shuffles <-> truncates in LLVM IR? &...

IR canonicalization: shufflevector or vector trunc?

2017 Jan 17

IR canonicalization: shufflevector or vector trunc?

...your example. DataLayout doesn't appear to specify what configurations of a 256-bit vector are legal, so I don't think we can currently use that to say v2i128 should be treated differently than v16i16. Is this a valid argument to not canonicalize the IR? On Mon, Jan 16, 2017 at 10:16 AM, Rackover, Zvi <zvi.rackover at intel.com> wrote: > Suppose we prefer the ‘trunc’ form, then what about cases such as: > > define <2 x i16> @shuffle(<16 x i16> %x) { > > %shuf = shufflevector <16 x i16> %x, <16 x i16> undef, <2 x i32> <i32 0, > i32...

IR canonicalization: shufflevector or vector trunc?

2017 Jan 13

IR canonicalization: shufflevector or vector trunc?

...the shuffles makes the trunc/zext forms the better choice. That way, we limit the endian dependency to one place in InstCombine, and other transforms don't have to worry about it. We also have lots of existing folds for trunc/zext and hardly any for shuffles. On Thu, Jan 12, 2017 at 1:14 PM, Rackover, Zvi <zvi.rackover at intel.com> wrote: > Just to add, there is also the ‘zext’ – ‘shuffle with zero’ duality which > can broaden the discussion. > > > > --Zvi > > > > *From:* Sanjay Patel [mailto:spatel at rotateright.com] > *Sent:* Thursday, January 12, 201...

InstructionSimplify: adding a hook for shufflevector instructions

2017 Mar 30

InstructionSimplify: adding a hook for shufflevector instructions

...t; ret <4 x i32> %tmp7 } If the function is required to return a splat value, then I believe the answer is no, because the undef indices allow returning a value that is not a splat. Thanks, Zvi From: Sanjay Patel [mailto:spatel at rotateright.com] Sent: Thursday, March 30, 2017 18:31 To: Rackover, Zvi <zvi.rackover at intel.com> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: InstructionSimplify: adding a hook for shufflevector instructions My grasp of LLVM history isn't great, but I think these are missing because there wasn't much need for vector optimization i...

AVX Scheduling and Parallelism

2017 Jun 25

AVX Scheduling and Parallelism

...+ vstore sequences as independent and pipeline their execution. Thanks, Zvi From: Hal Finkel [mailto:hfinkel at anl.gov] Sent: Saturday, June 24, 2017 05:17 To: hameeza ahmed <hahmed2305 at gmail.com>; llvm-dev at lists.llvm.org Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Rackover, Zvi <zvi.rackover at intel.com>; Breger, Igor <igor.breger at intel.com>; craig.topper at gmail.com Subject: Re: [llvm-dev] AVX Scheduling and Parallelism It is possible that the issue with scheduling is constrained due to pointer-aliasing assumptions. Could you share the source for...

AVX Scheduling and Parallelism

2017 Jun 25

AVX Scheduling and Parallelism

...limited (it might even be decoder limited for this loop). We might want to less aggressive in generating complex addressing modes for the KNL. It seems like it would be better to materialize the base array addresses into a register to make the encodings shorter. -Hal On 06/25/2017 07:14 AM, Rackover, Zvi wrote: > > Hi Ahmed, > > From what can be seen in the code snippet you provided, the reuse of > XMM0 and XMM1 across loop-unroll instances does not inhibit > instruction-level parallelism. > > Modern X86 processors use register renaming that can eliminate the > de...

IR canonicalization: shufflevector or vector trunc?

2017 Jan 12

IR canonicalization: shufflevector or vector trunc?

On Thu, Jan 12, 2017 at 11:06 AM, Friedman, Eli <efriedma at codeaurora.org> wrote: > On 1/12/2017 9:04 AM, Sanjay Patel via llvm-dev wrote: > > It's time for another round of "What is the canonical IR?" > > Credit for this episode to Zvi and PR31551. :) > https://llvm.org/bugs/show_bug.cgi?id=31551 > > define <4 x i16> @shuffle(<16 x i16>

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 24

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

...to control internals of llc process.. - Elena From: Haber, Gadi Sent: Thursday, November 24, 2016 09:28 To: Craig Topper <craig.topper at gmail.com>; Hal Finkel <hfinkel at anl.gov> Cc: llvm-dev at lists.llvm.org; Demikhovsky, Elena <elena.demikhovsky at intel.com>; Rackover, Zvi <zvi.rackover at intel.com> Subject: RE: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding Thanx. This makes sense. Note that there are many tests, mostly under test/CodeGen/X86, that are affected by this optimization and I had to modify them as they include...

InstructionSimplify: adding a hook for shufflevector instructions

2017 Mar 30

InstructionSimplify: adding a hook for shufflevector instructions

As Sanjay noted in D31426<https://reviews.llvm.org/D31426#712701>, InstructionSimplify is missing the following simplification: This function: define <4 x i32> @splat_operand(<4 x i32> %x) { %splat = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> zeroinitializer %shuf = shufflevector <4 x i32> %splat, <4 x i32> undef, <4 x i32>

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 28

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

Hal, that’s a good point. There are more manually-maintained tables in the X86 backend that should probably be tablegened: the memory-folding tables and ReplaceableInstrs, to name a couple. If you have ideas on how to get these auto-generated, please let us know. From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Hal Finkel via llvm-dev Sent: Wednesday, November 23, 2016

IR canonicalization: shufflevector or vector trunc?

2017 Jan 12

IR canonicalization: shufflevector or vector trunc?

On 1/12/2017 9:04 AM, Sanjay Patel via llvm-dev wrote: > It's time for another round of "What is the canonical IR?" > > Credit for this episode to Zvi and PR31551. :) > https://llvm.org/bugs/show_bug.cgi?id=31551 > define <4 x i16> @shuffle(<16 x i16> %x) { > %shuf = shufflevector <16 x i16> %x, <16 x i16> undef, <4 x i32> <i32 0,

[LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions

2013 Jul 15

[LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions

Hello Elena, > There is no 32-bit KNC. Are you sure about this? From "System V Application Binary Interface K1OM Architecture Processor Supplement Version 1.0", p. 124: | A.1 Execution of 32-bit Programs | | The K1OM processors are able to execute 64-bit K1OM and also 32-bit ia32 programs. I'm really really looking for this opportunity, because we want to extend our kernel

Vectorizers code ownership

2016 Nov 09

Vectorizers code ownership

On 9 Nov 2016 06:04, "Chandler Carruth via llvm-dev" < llvm-dev at lists.llvm.org> wrote: > Just my two cents, but if Craig is up for it, I think this would be a pretty great fit. +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161108/9a73b9e1/attachment.html>

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 23

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

I would like a command line option to disable this optimization. That way tests can still verify that EVEX instructions came out of isel by using -show-mc-encoding. On Wed, Nov 23, 2016 at 5:01 AM Hal Finkel via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > ------------------------------ > > *From: *"Gadi via llvm-dev Haber" <llvm-dev at lists.llvm.org> >

AVX Scheduling and Parallelism

2017 Jun 24

AVX Scheduling and Parallelism

Hello, After generating AVX code for large no of iterations i came to realize that it still uses only 2 registers zmm0 and zmm1 when the loop urnroll factor=1024, i wonder if this register allocation allows operations in parallel? Also i know all the elements within a single vector instruction are computed in parallel but does the elements of multiple instructions computed in parallel? like are

search for: rackover