Displaying 9 results from an estimated 9 matches for "congh".
Did you mean:
conga
2015 Nov 25
2
[RFC] Introducing a vector reduction add instruction.
----- Original Message -----
> From: "Xinliang David Li" <davidxl at google.com>
> To: "Cong Hou" <congh at google.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Wednesday, November 25, 2015 5:17:58 PM
> Subject: Re: [llvm-dev] [RFC] Introducing a vector reduction add instruction.
>
>
> Hal is probabl...
2015 Nov 19
5
[RFC] Introducing a vector reduction add instruction.
...refining the
cost model to let bigger VFs have less cost. For the example above the
best result is from VF >=16.
The draft of the patch is here: http://reviews.llvm.org/D14840
I will refine the patch later and submit it for review.
thanks,
Cong
On Wed, Nov 18, 2015 at 2:45 PM, Cong Hou <congh at google.com> wrote:
> On Mon, Nov 16, 2015 at 9:31 PM, Shahid, Asghar-ahmad
> <Asghar-ahmad.Shahid at amd.com> wrote:
>> Hi Cong,
>>
>>> -----Original Message-----
>>> From: Cong Hou [mailto:congh at google.com]
>>> Sent: Tuesday, November 17,...
2015 Nov 25
2
[RFC] Introducing a vector reduction add instruction.
...lt is from VF >=16.
>>
>> The draft of the patch is here: http://reviews.llvm.org/D14840
>>
>> I will refine the patch later and submit it for review.
>>
>>
>> thanks,
>> Cong
>>
>>
>> On Wed, Nov 18, 2015 at 2:45 PM, Cong Hou <congh at google.com> wrote:
>> > On Mon, Nov 16, 2015 at 9:31 PM, Shahid, Asghar-ahmad
>> > <Asghar-ahmad.Shahid at amd.com> wrote:
>> >> Hi Cong,
>> >>
>> >>> -----Original Message-----
>> >>> From: Cong Hou [mailto:congh a...
2016 Apr 12
2
X86 TRUNCATE cost for AVX & AVX2 mode
<Copied Cong>
Thanks Elena.
Mostly I was interested in why such a high cost 30 kept for TRUNCATE v16i32 to v16i8 in SSE41.
Looking at the code it appears like TRUNCATE v16i32 to v16i8 in SSE41 is very expensive
vs SSE2. I feel this number should be same/close to the cost mentioned for same
operation in SSE2ConversionTbl.
Below patch from Cong Hou reduce cost for same operation in SSE2
2015 Nov 13
2
[RFC] Introducing a vector reduction add instruction.
Hi
When a reduction instruction is vectorized in a loop, it will be
turned into an instruction with vector operands of the same operation
type. This new instruction has a special property that can give us
more flexibility during instruction selection later: this operation is
valid as long as the reduction of all elements of the result vector is
identical to the reduction of all elements of its
2016 Jun 16
2
[RFC] Allow loop vectorizer to choose vector widths that generate illegal types
...molloy at arm.com>; Matthew Simpson <mssimpso at codeaurora.org>; Sanjay Patel <spatel at rotateright.com>; Chandler Carruth <chandlerc at google.com>; David Li <davidxl at google.com>; Wei Mi <wmi at google.com>; Dehao Chen <dehao at google.com>; Cong Hou <congh at google.com>; Llvm Dev <llvm-dev at lists.llvm.org>
Subject: Re: [RFC] Allow loop vectorizer to choose vector widths that generate illegal types
Hi Nadav,
Thanks a lot for the feedback!
Of course we need to explore this with numbers. Not just in terms of the performance vs. compile-tim...
2016 Jun 15
8
[RFC] Allow loop vectorizer to choose vector widths that generate illegal types
Hello,
Currently the loop vectorizer will, by default, not consider vectorization
factors that would make it generate types that do not fit into the target
platform's vector registers. That is, if the widest scalar type in the
scalar loop is i64, and the platform's largest vector register is 256-bit
wide, we will not consider a VF above 4.
We have a command line option (-mllvm
2016 Jun 16
2
[RFC] Allow loop vectorizer to choose vector widths that generate illegal types
Hi Michael,
Thank you for working on this. The loop vectorizer tries a bunch of different vectorization factors and stops at the widest word size mostly because of compile time concerns. On every vectorization factors that we check we have to scan all of the instructions in the loop and make multiple calls into TTI. If you decide to increase the VF enumeration space then you will linearly
2016 Feb 19
12
[3.8 Release] Release status
According to the schedule (e.g. on the right on llvm.org), we should
have tagged the release by now, but we haven't, so we're officially
behind schedule. I'm still optimistic that we can wrap this up pretty
soon, though.
This is what's blocking us:
- PR26509: Crash in InnerLoopVectorizer::vectorizeLoop()
I'm waiting to hear what Cong comes up with, otherwise we can revert