thr3ads.net - similar to: "[RFC] Introducing a vector reduction add instruction."

Displaying 20 results from an estimated 500 matches similar to: "[RFC] Introducing a vector reduction add instruction."

[RFC] Introducing a vector reduction add instruction.

2015 Nov 25

[RFC] Introducing a vector reduction add instruction.

On Wed, Nov 25, 2015 at 2:32 PM, Hal Finkel <hfinkel at anl.gov> wrote: > Hi Cong, > > After reading the original RFC and this update, I'm still not entirely sure I understand the semantics of the flag you're proposing to add. Does it having something to do with the ordering of the reduction operations? The flag is only useful for vectorized reduction for now. I'll give

[RFC] Introducing a vector reduction add instruction.

2015 Nov 19

[RFC] Introducing a vector reduction add instruction.

After some attempt to implement reduce-add in LLVM, I found out a easier way to detect reduce-add without introducing new IR operations. The basic idea is annotating phi node instead of add (so that it is easier to handle other reduction operations). In PHINode class, we can add a flag indicating if the phi node is a reduction one (the flag can be set in loop vectorizer for vectorized phi nodes).

[RFC] Introducing a vector reduction add instruction.

2015 Nov 25

[RFC] Introducing a vector reduction add instruction.

----- Original Message ----- > From: "Xinliang David Li" <davidxl at google.com> > To: "Cong Hou" <congh at google.com> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" <llvm-dev at lists.llvm.org> > Sent: Wednesday, November 25, 2015 5:17:58 PM > Subject: Re: [llvm-dev] [RFC] Introducing a vector reduction add

[LLVMdev] supporting SAD in loop vectorizer

2014 Nov 11

[LLVMdev] supporting SAD in loop vectorizer

----- Original Message ----- > From: "Dibyendu Das" <Dibyendu.Das at amd.com> > To: "Hal Finkel" <hfinkel at anl.gov>, "Renato Golin" <renato.golin at linaro.org> > Cc: llvmdev at cs.uiuc.edu > Sent: Tuesday, November 4, 2014 12:15:12 PM > Subject: RE: [LLVMdev] supporting SAD in loop vectorizer > > Here's the simple SAD

[LLVMdev] supporting SAD in loop vectorizer

2014 Nov 04

[LLVMdev] supporting SAD in loop vectorizer

----- Original Message ----- > From: "Renato Golin" <renato.golin at linaro.org> > To: "Dibyendu Das" <Dibyendu.Das at amd.com> > Cc: llvmdev at cs.uiuc.edu > Sent: Tuesday, November 4, 2014 5:23:30 AM > Subject: Re: [LLVMdev] supporting SAD in loop vectorizer > > On 4 November 2014 11:06, Das, Dibyendu <Dibyendu.Das at amd.com> wrote:

[LLVMdev] supporting SAD in loop vectorizer

2014 Nov 11

[LLVMdev] supporting SAD in loop vectorizer

----- Original Message ----- > From: "James Molloy" <james at jamesmolloy.co.uk> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "Dibyendu Das" <Dibyendu.Das at amd.com>, llvmdev at cs.uiuc.edu > Sent: Tuesday, November 11, 2014 8:21:37 AM > Subject: Re: [LLVMdev] supporting SAD in loop vectorizer > > > If you'd like to

[LLVMdev] some llvm/clang missed optimizations

2010 Jan 27

[LLVMdev] some llvm/clang missed optimizations

> Umm, can you find one that isn't a popcount implementation? Ok. MMX psadbw instruction: http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/CE/CE3DA132.shtml Position of first set bit: http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/1F/1F4003C7.shtml Log2 floor: http://embed.cs.utah.edu/embarrassing/jan_10/harvest/source/83/837A80E9.shtml Pixel format

Proposal for replacing asm code with intrinsics

2009 Oct 13

Proposal for replacing asm code with intrinsics

Hi, I'm new to Theora and would like to propose several performance optimization using advanced instructions in x86 CPUs (SSE2-SSE4.2). There are several source files in \x86 and \x86_vc which developed using inline assembler. However this cause several maintenance problems: 1) Need to sync gcc & msvc versions 2) Only 32bit environment is supported 3) No support for newer than MMX

[LLVMdev] supporting SAD in loop vectorizer

2014 Nov 04

[LLVMdev] supporting SAD in loop vectorizer

Nadav and other vectorizer folks- Is there any plan to support special idioms in the loop vectorizer like sum of absolute difference (SAD) ? We see some useful cases where llvm is losing performance at -O3 due to SADs not being vectorized (hence PSADBWs not being generated). Also, since the abs() call is already lowered to a sequence of 'icmp; neg; select' by simplifylibcalls (in -O3),

SCEV and LoopStrengthReduction Formulae

2018 Apr 07

SCEV and LoopStrengthReduction Formulae

> > I realize this is a micro-op saving a single cycle. But this reduces the instruction count, one less > instr to decode in a potentially hot path. If this all makes sense, and seems like a reasonable addition > to llvm, would it make sense to implement this as a supplemental LSR formula, or as a separate pass? This seems reasonable to me so long as rbx has no other uses that

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 23

[LoopVectorizer] Improving the performance of dot product reduction loop

~Craig On Mon, Jul 23, 2018 at 4:24 PM Hal Finkel <hfinkel at anl.gov> wrote: > > On 07/23/2018 05:22 PM, Craig Topper wrote: > > Hello all, > > This code https://godbolt.org/g/tTyxpf is a dot product reduction loop > multipying sign extended 16-bit values to produce a 32-bit accumulated > result. The x86 backend is currently not able to optimize it as well as gcc

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 24

[LoopVectorizer] Improving the performance of dot product reduction loop

On Tue, Jul 24, 2018 at 6:10 AM Hal Finkel <hfinkel at anl.gov> wrote: > > On 07/23/2018 06:37 PM, Craig Topper wrote: > > > ~Craig > > > On Mon, Jul 23, 2018 at 4:24 PM Hal Finkel <hfinkel at anl.gov> wrote: > >> >> On 07/23/2018 05:22 PM, Craig Topper wrote: >> >> Hello all, >> >> This code https://godbolt.org/g/tTyxpf

MCRegisterClass mandatory vs preferred alignment?

2015 Aug 31

MCRegisterClass mandatory vs preferred alignment?

On 08/31/2015 03:59 PM, Matthias Braun wrote: > Looks to me like the alignment is specified in tablegen. From Target.td: > > class RegisterClass<string namespace, list<ValueType> regTypes, int alignment, > dag regList, RegAltNameIndex idx = NoRegAltName> > > X86RegisterInfo.td: > > def VR256 : RegisterClass<"X86", [v32i8,

MCRegisterClass mandatory vs preferred alignment?

2015 Aug 31

MCRegisterClass mandatory vs preferred alignment?

Looking around today, it appears that TargetRegisterClass and MCRegisterClass only includes a single alignment. This is documented as being the minimum legal alignment, but it appears to often be greater than this in practice. For instance, on x86 the alignment of %ymm0 is listed as 32, not 1. Does anyone know why this is? Additionally, where are these alignments actually defined? I

[LLVMdev] AVX spill alignment

2011 Aug 25

[LLVMdev] AVX spill alignment

Hey guys, Are spills/reloads of AVX registers using aligned stores/loads? I can't seem to find the code that aligns the stack slots to 32-bytes. Could someone point me in the right direction? Thanks, Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110825/b5724dec/attachment.html>

SCEV and LoopStrengthReduction Formulae

2018 Apr 03

SCEV and LoopStrengthReduction Formulae

I am attempting to implement a minor loop strength reduction optimization for targets that support compare and jump fusion, specifically TTI::canMacroFuseCmp(). My approach might be wrong; however, I am soliciting the idea for feedback, so that I can implement this correctly. My plan is to add a Supplemental LSR formula to LoopStrengthReduce.cpp that optimizes the following case, but perhaps

[LLVMdev] AVX spill alignment

2011 Sep 01

[LLVMdev] AVX spill alignment

On Aug 25, 2011, at 4:17 PM, Cameron McInally wrote: > Hey guys, > > Are spills/reloads of AVX registers using aligned stores/loads? Yes. > I can't > seem to find the code that aligns the stack slots to 32-bytes. Could > someone point me in the right direction? The register class has 256-bit spill alignment: def VR256 : RegisterClass<"X86", [v32i8, v16i16,

[LLVMdev] RFC: ErLLVM - Implemented HiPE Calling Convention

2012 Apr 24

[LLVMdev] RFC: ErLLVM - Implemented HiPE Calling Convention

This patch (and the others that will follow) are rebased on svn r155440: "AVX2: The BLENDPW instruction selects between vectors of v16i16 using an i8 immediate. We can't use it here because the shuffle code does not check that the lower part of the word is identical to the upper part" Patch 1/3: The attached commits add a new calling convention to support the LLVM backend for

sum elements in the vector

2016 May 28

sum elements in the vector

Hi Rail, Below 2 revisions might be of your interest which Detect SAD patterns and emit psadbw instructions on X86.: http://reviews.llvm.org/D14840 http://reviews.llvm.org/D14897 Intrinsics related to absdiff revisons : http://reviews.llvm.org/D10867 http://reviews.llvm.org/D11678 Hope this helps. Regards, Suyog On Sat, May 28, 2016 at 4:20 AM, Rail Shafigulin via llvm-dev < llvm-dev at

[LLVMdev] RFC: ErLLVM - Implemented HiPE Calling Convention

2012 May 02

[LLVMdev] RFC: ErLLVM - Implemented HiPE Calling Convention

On 04/24/12 17:10, Yiannis Tsiouris wrote: > This patch (and the others that will follow) are rebased on svn r155440: > > "AVX2: The BLENDPW instruction selects between vectors of v16i16 using an i8 > immediate. We can't use it here because the shuffle code does not check that > the lower part of the word is identical to the upper part" > > Patch 1/3: >

similar to: [RFC] Introducing a vector reduction add instruction.