Displaying 20 results from an estimated 10000 matches similar to: "[Proposal][RFC] Epilog loop vectorization"
2017 Feb 23
2
[Proposal][RFC] Epilog loop vectorization
On 02/22/2017 11:52 AM, Adam Nemet via llvm-dev wrote:
> Hi Ashutosh,
>
>> On Feb 22, 2017, at 1:57 AM, Nema, Ashutosh via llvm-dev
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>> Hi,
>> This is a proposal about epilog loop vectorization.
>> Currently Loop Vectorizer inserts an epilogue loop for handling loops
2017 Feb 27
4
[Proposal][RFC] Epilog loop vectorization
Thanks for looking into this.
1) Issues with re running vectorizer:
Vectorizer might generate redundant alias checks while vectorizing epilog loop.
Redundant alias checks are expensive, we like to reuse the results of already computed alias checks.
With metadata we can limit the width of epilog loop, but not sure about reusing alias check result.
Any thoughts on rerunning vectorizer with reusing
2017 Feb 27
2
[Proposal][RFC] Epilog loop vectorization
> On Feb 27, 2017, at 7:27 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>
>
> On 02/27/2017 06:29 AM, Nema, Ashutosh wrote:
>> Thanks for looking into this.
>>
>> 1) Issues with re running vectorizer:
>> Vectorizer might generate redundant alias checks while vectorizing epilog loop.
>> Redundant alias checks are expensive, we like to reuse the
2017 Mar 14
2
[Proposal][RFC] Epilog loop vectorization
On 03/14/2017 11:58 AM, Michael Kuperstein wrote:
> I'm still not sure about this, for a few reasons:
>
> 1) I'd like to try to treat epilogue loops the same way regardless of
> whether the main loop was vectorized by hand or automatically. So if
> someone hand-wrote an avx-512 16-wide loop, with alias checks, and we
> decide it's profitable to vectorize the
2017 Mar 14
4
[Proposal][RFC] Epilog loop vectorization
On 03/14/2017 12:11 PM, Adam Nemet wrote:
>
>> On Mar 14, 2017, at 9:49 AM, Hal Finkel <hfinkel at anl.gov
>> <mailto:hfinkel at anl.gov>> wrote:
>>
>>
>> On 03/14/2017 11:21 AM, Adam Nemet wrote:
>>>
>>>> On Mar 14, 2017, at 6:00 AM, Nema, Ashutosh <Ashutosh.Nema at amd.com
>>>> <mailto:Ashutosh.Nema at
2017 Mar 15
4
[Proposal][RFC] Epilog loop vectorization
On 03/14/2017 07:50 PM, Adam Nemet wrote:
>
>> On Mar 14, 2017, at 11:33 AM, Hal Finkel <hfinkel at anl.gov
>> <mailto:hfinkel at anl.gov>> wrote:
>>
>>
>>
>> On 03/14/2017 12:11 PM, Adam Nemet wrote:
>>>
>>>> On Mar 14, 2017, at 9:49 AM, Hal Finkel <hfinkel at anl.gov
>>>> <mailto:hfinkel at
2017 Mar 14
2
[Proposal][RFC] Epilog loop vectorization
On 03/14/2017 11:21 AM, Adam Nemet wrote:
>
>> On Mar 14, 2017, at 6:00 AM, Nema, Ashutosh <Ashutosh.Nema at amd.com
>> <mailto:Ashutosh.Nema at amd.com>> wrote:
>>
>> Summarizing the discussion on the implementation approaches.
>> Discussed about two approaches, first running ‘InnerLoopVectorizer’
>> again on the epilog loop immediately after
2017 Mar 14
10
[Proposal][RFC] Epilog loop vectorization
Summarizing the discussion on the implementation approaches.
Discussed about two approaches, first running ‘InnerLoopVectorizer’ again on the epilog loop immediately after vectorizing the original loop within the same vectorization pass, the second approach where re-running vectorization pass and limiting vectorization factor of epilog loop by metadata.
<Approach-2>
Challenges with
2018 Aug 03
2
Vectorizing remainder loop
>it cannot afford large size masks for large vectors
So, even a standard way of vectorizing remainder in masked or unmasked fashion wouldn’t work, I suppose. Ouch.
I suppose VPlan should be able to model this kind of gigantic remainder vector code (when the time comes). Not pretty at all, though.
Now, be fully aware that Direction #2 is really a poor (or rather extremely poor) person’s
2018 Aug 02
2
Vectorizing remainder loop
Hi Hameeza,
Aside from Ashutosh's patch.....
When the vector width is that large, we can't keep vectorizing remainder like below. It'll be a huge code size if nothing else ---- hitting ITLB miss because of this is very bad, for example.
VF=2048 // main vector loop
VF=1024 // vectorized remainder 1
VF=512 // vectorized remainder 2
...
Vectorize remainder until trip count is
2017 Feb 28
3
[Proposal][RFC] Epilog loop vectorization
I have tried running both gvn and newgvn but it did not helped in hoisting the alias checks:
Please check, maybe I have missed something.
<TestCase>
void foo (char *A, char *B, char *C, int len) {
int i = 0;
for (i=0 ; i< len; i++)
A[i] = B[i] + C[i];
}
<Command>
$ opt –O3 –gvn test.ll –o test.opt.ll
$ opt –O3 –newgvn test.ll –o test.opt.ll
“test.ll” is attached, it
2017 Feb 27
5
[Proposal][RFC] Epilog loop vectorization
> On Feb 27, 2017, at 9:39 AM, Daniel Berlin <dberlin at dberlin.org> wrote:
>
>
>
> On Mon, Feb 27, 2017 at 9:29 AM, Adam Nemet <anemet at apple.com <mailto:anemet at apple.com>> wrote:
>
>> On Feb 27, 2017, at 7:27 AM, Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at anl.gov>> wrote:
>>
>>
>> On 02/27/2017 06:29 AM,
2017 Feb 27
2
[Proposal][RFC] Epilog loop vectorization
There's another issue with re-running the vectorizer (which I support, btw
- I'm just saying there are more problems to solve on the way :-) )
Historically, we haven't even tried to evaluate the cost of the "constant"
(not per-iteration) vectorization overhead - things like alias checks.
Instead, we have hard bounds - we won't perform alias checks that are "too
2017 Feb 27
4
[Proposal][RFC] Epilog loop vectorization
On 02/27/2017 01:47 PM, Daniel Berlin wrote:
>
>
> On Mon, Feb 27, 2017 at 11:29 AM, Adam Nemet <anemet at apple.com
> <mailto:anemet at apple.com>> wrote:
>
>
>> On Feb 27, 2017, at 10:11 AM, Hal Finkel <hfinkel at anl.gov
>> <mailto:hfinkel at anl.gov>> wrote:
>>
>>
>> On 02/27/2017 11:47 AM, Adam Nemet wrote:
2017 Feb 27
2
[Proposal][RFC] Epilog loop vectorization
On 02/27/2017 12:41 PM, Michael Kuperstein wrote:
There's another issue with re-running the vectorizer (which I support, btw - I'm just saying there are more problems to solve on the way :-) )
Historically, we haven't even tried to evaluate the cost of the "constant" (not per-iteration) vectorization overhead - things like alias checks. Instead, we have hard bounds - we
2017 Mar 14
1
[Proposal][RFC] Epilog loop vectorization
On 03/14/2017 02:37 PM, Michael Kuperstein wrote:
>
>
> On Tue, Mar 14, 2017 at 11:40 AM, Hal Finkel <hfinkel at anl.gov
> <mailto:hfinkel at anl.gov>> wrote:
>
>
>
> On 03/14/2017 11:58 AM, Michael Kuperstein wrote:
>> I'm still not sure about this, for a few reasons:
>>
>> 1) I'd like to try to treat epilogue loops the
2017 Feb 27
2
[Proposal][RFC] Epilog loop vectorization
> On Feb 27, 2017, at 10:11 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>
>
> On 02/27/2017 11:47 AM, Adam Nemet wrote:
>>
>>> On Feb 27, 2017, at 9:39 AM, Daniel Berlin <dberlin at dberlin.org <mailto:dberlin at dberlin.org>> wrote:
>>>
>>>
>>>
>>> On Mon, Feb 27, 2017 at 9:29 AM, Adam Nemet <anemet at
2016 Jun 15
3
[Proposal][RFC] Strided Memory Access Vectorization
Sorry for the spam. Copy-paste didn't capture the Subject properly. Resending with the correct Subject so that the thread is captured properly.
-----Original Message-----
From: Saito, Hideki
Sent: Wednesday, June 15, 2016 1:39 PM
To: 'llvm-dev at lists.llvm.org' <llvm-dev at lists.llvm.org>
Subject: RE: [llvm-dev] [Proposal][RFC] Strided Memory Access
Ashutosh,
First,
2016 Jun 18
2
[Proposal][RFC] Strided Memory Access Vectorization
>Vectorizer's output should be as clean as vector code can be so that analyses and optimizers downstream can
>do a great job optimizing.
Guess I should clarify this philosophical position of mine. In terms of vector code optimization that complicates
the output of vectorizer:
If vectorizer is the best place to perform the optimization, it should do so.
This includes the cases like
2016 Jun 30
0
[Proposal][RFC] Strided Memory Access Vectorization
One common concern raised for cases where Loop Vectorizer generate
bigger types than target supported:
Based on VF currently we check the cost and generate the expected set of
instruction[s] for bigger type. It has two challenges for bigger types cost
is not always correct and code generation may not generate efficient
instruction[s].
Probably can depend on the support provided by below RFC by