thr3ads.net - search: "congh"

Displaying 9 results from an estimated 9 matches for "congh".

Did you mean: conga

[RFC] Introducing a vector reduction add instruction.

2015 Nov 25

[RFC] Introducing a vector reduction add instruction.

----- Original Message ----- > From: "Xinliang David Li" <davidxl at google.com> > To: "Cong Hou" <congh at google.com> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" <llvm-dev at lists.llvm.org> > Sent: Wednesday, November 25, 2015 5:17:58 PM > Subject: Re: [llvm-dev] [RFC] Introducing a vector reduction add instruction. > > > Hal is probabl...

[RFC] Introducing a vector reduction add instruction.

2015 Nov 19

[RFC] Introducing a vector reduction add instruction.

...refining the cost model to let bigger VFs have less cost. For the example above the best result is from VF >=16. The draft of the patch is here: http://reviews.llvm.org/D14840 I will refine the patch later and submit it for review. thanks, Cong On Wed, Nov 18, 2015 at 2:45 PM, Cong Hou <congh at google.com> wrote: > On Mon, Nov 16, 2015 at 9:31 PM, Shahid, Asghar-ahmad > <Asghar-ahmad.Shahid at amd.com> wrote: >> Hi Cong, >> >>> -----Original Message----- >>> From: Cong Hou [mailto:congh at google.com] >>> Sent: Tuesday, November 17,...

[RFC] Introducing a vector reduction add instruction.

2015 Nov 25

[RFC] Introducing a vector reduction add instruction.

...lt is from VF >=16. >> >> The draft of the patch is here: http://reviews.llvm.org/D14840 >> >> I will refine the patch later and submit it for review. >> >> >> thanks, >> Cong >> >> >> On Wed, Nov 18, 2015 at 2:45 PM, Cong Hou <congh at google.com> wrote: >> > On Mon, Nov 16, 2015 at 9:31 PM, Shahid, Asghar-ahmad >> > <Asghar-ahmad.Shahid at amd.com> wrote: >> >> Hi Cong, >> >> >> >>> -----Original Message----- >> >>> From: Cong Hou [mailto:congh a...

X86 TRUNCATE cost for AVX & AVX2 mode

2016 Apr 12

X86 TRUNCATE cost for AVX & AVX2 mode

<Copied Cong> Thanks Elena. Mostly I was interested in why such a high cost 30 kept for TRUNCATE v16i32 to v16i8 in SSE41. Looking at the code it appears like TRUNCATE v16i32 to v16i8 in SSE41 is very expensive vs SSE2. I feel this number should be same/close to the cost mentioned for same operation in SSE2ConversionTbl. Below patch from Cong Hou reduce cost for same operation in SSE2

[RFC] Introducing a vector reduction add instruction.

2015 Nov 13

[RFC] Introducing a vector reduction add instruction.

Hi When a reduction instruction is vectorized in a loop, it will be turned into an instruction with vector operands of the same operation type. This new instruction has a special property that can give us more flexibility during instruction selection later: this operation is valid as long as the reduction of all elements of the result vector is identical to the reduction of all elements of its

[RFC] Allow loop vectorizer to choose vector widths that generate illegal types

2016 Jun 16

[RFC] Allow loop vectorizer to choose vector widths that generate illegal types

...molloy at arm.com>; Matthew Simpson <mssimpso at codeaurora.org>; Sanjay Patel <spatel at rotateright.com>; Chandler Carruth <chandlerc at google.com>; David Li <davidxl at google.com>; Wei Mi <wmi at google.com>; Dehao Chen <dehao at google.com>; Cong Hou <congh at google.com>; Llvm Dev <llvm-dev at lists.llvm.org> Subject: Re: [RFC] Allow loop vectorizer to choose vector widths that generate illegal types Hi Nadav, Thanks a lot for the feedback! Of course we need to explore this with numbers. Not just in terms of the performance vs. compile-tim...

[RFC] Allow loop vectorizer to choose vector widths that generate illegal types

2016 Jun 15

[RFC] Allow loop vectorizer to choose vector widths that generate illegal types

Hello, Currently the loop vectorizer will, by default, not consider vectorization factors that would make it generate types that do not fit into the target platform's vector registers. That is, if the widest scalar type in the scalar loop is i64, and the platform's largest vector register is 256-bit wide, we will not consider a VF above 4. We have a command line option (-mllvm

[RFC] Allow loop vectorizer to choose vector widths that generate illegal types

2016 Jun 16

[RFC] Allow loop vectorizer to choose vector widths that generate illegal types

Hi Michael, Thank you for working on this. The loop vectorizer tries a bunch of different vectorization factors and stops at the widest word size mostly because of compile time concerns. On every vectorization factors that we check we have to scan all of the instructions in the loop and make multiple calls into TTI. If you decide to increase the VF enumeration space then you will linearly

[3.8 Release] Release status

2016 Feb 19

[3.8 Release] Release status

According to the schedule (e.g. on the right on llvm.org), we should have tagged the release by now, but we haven't, so we're officially behind schedule. I'm still optimistic that we can wrap this up pretty soon, though. This is what's blocking us: - PR26509: Crash in InnerLoopVectorizer::vectorizeLoop() I'm waiting to hear what Cong comes up with, otherwise we can revert

search for: congh