search for: congh

Displaying 9 results from an estimated 9 matches for "congh".

Did you mean: conga
2015 Nov 25
2
[RFC] Introducing a vector reduction add instruction.
----- Original Message ----- > From: "Xinliang David Li" <davidxl at google.com> > To: "Cong Hou" <congh at google.com> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" <llvm-dev at lists.llvm.org> > Sent: Wednesday, November 25, 2015 5:17:58 PM > Subject: Re: [llvm-dev] [RFC] Introducing a vector reduction add instruction. > > > Hal is probabl...
2015 Nov 19
5
[RFC] Introducing a vector reduction add instruction.
...refining the cost model to let bigger VFs have less cost. For the example above the best result is from VF >=16. The draft of the patch is here: http://reviews.llvm.org/D14840 I will refine the patch later and submit it for review. thanks, Cong On Wed, Nov 18, 2015 at 2:45 PM, Cong Hou <congh at google.com> wrote: > On Mon, Nov 16, 2015 at 9:31 PM, Shahid, Asghar-ahmad > <Asghar-ahmad.Shahid at amd.com> wrote: >> Hi Cong, >> >>> -----Original Message----- >>> From: Cong Hou [mailto:congh at google.com] >>> Sent: Tuesday, November 17,...
2015 Nov 25
2
[RFC] Introducing a vector reduction add instruction.
...lt is from VF >=16. >> >> The draft of the patch is here: http://reviews.llvm.org/D14840 >> >> I will refine the patch later and submit it for review. >> >> >> thanks, >> Cong >> >> >> On Wed, Nov 18, 2015 at 2:45 PM, Cong Hou <congh at google.com> wrote: >> > On Mon, Nov 16, 2015 at 9:31 PM, Shahid, Asghar-ahmad >> > <Asghar-ahmad.Shahid at amd.com> wrote: >> >> Hi Cong, >> >> >> >>> -----Original Message----- >> >>> From: Cong Hou [mailto:congh a...
2016 Apr 12
2
X86 TRUNCATE cost for AVX & AVX2 mode
<Copied Cong> Thanks Elena. Mostly I was interested in why such a high cost 30 kept for TRUNCATE v16i32 to v16i8 in SSE41. Looking at the code it appears like TRUNCATE v16i32 to v16i8 in SSE41 is very expensive vs SSE2. I feel this number should be same/close to the cost mentioned for same operation in SSE2ConversionTbl. Below patch from Cong Hou reduce cost for same operation in SSE2
2015 Nov 13
2
[RFC] Introducing a vector reduction add instruction.
Hi When a reduction instruction is vectorized in a loop, it will be turned into an instruction with vector operands of the same operation type. This new instruction has a special property that can give us more flexibility during instruction selection later: this operation is valid as long as the reduction of all elements of the result vector is identical to the reduction of all elements of its
2016 Jun 16
2
[RFC] Allow loop vectorizer to choose vector widths that generate illegal types
...molloy at arm.com>; Matthew Simpson <mssimpso at codeaurora.org>; Sanjay Patel <spatel at rotateright.com>; Chandler Carruth <chandlerc at google.com>; David Li <davidxl at google.com>; Wei Mi <wmi at google.com>; Dehao Chen <dehao at google.com>; Cong Hou <congh at google.com>; Llvm Dev <llvm-dev at lists.llvm.org> Subject: Re: [RFC] Allow loop vectorizer to choose vector widths that generate illegal types Hi Nadav, Thanks a lot for the feedback! Of course we need to explore this with numbers. Not just in terms of the performance vs. compile-tim...
2016 Jun 15
8
[RFC] Allow loop vectorizer to choose vector widths that generate illegal types
Hello, Currently the loop vectorizer will, by default, not consider vectorization factors that would make it generate types that do not fit into the target platform's vector registers. That is, if the widest scalar type in the scalar loop is i64, and the platform's largest vector register is 256-bit wide, we will not consider a VF above 4. We have a command line option (-mllvm
2016 Jun 16
2
[RFC] Allow loop vectorizer to choose vector widths that generate illegal types
Hi Michael,  Thank you for working on this. The loop vectorizer tries a bunch of different vectorization factors and stops at the widest word size mostly because of compile time concerns. On every vectorization factors that we check we have to scan all of the instructions in the loop and make multiple calls into TTI. If you decide to increase the VF enumeration space then you will linearly
2016 Feb 19
12
[3.8 Release] Release status
According to the schedule (e.g. on the right on llvm.org), we should have tagged the release by now, but we haven't, so we're officially behind schedule. I'm still optimistic that we can wrap this up pretty soon, though. This is what's blocking us: - PR26509: Crash in InnerLoopVectorizer::vectorizeLoop() I'm waiting to hear what Cong comes up with, otherwise we can revert