thr3ads.net - llvm dev - [llvm-dev] RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality) [Jan 2018]

If this information is useful, please help other people find it:
Share via:

Saito, Hideki via llvm-dev

2018-Jan-06 00:28 UTC

[llvm-dev] RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

Amara,
>I support this direction
Thanks for the support.
>but are there actually any real world workloads where gather/scatter
scalarisation would be worth it, on any micro-architecture? If we don’t have
examples and the compile time cost is non-negligible then I think we’d still
like to keep the early >bailouts in some form.’
It's not like I have specific application code in mind. This all depends on
what your code has and how often the emulated code has to
kick-in. I'm more than happy to implement HWLegal class, and have early
bailout available under the switch. I'm also fine having early
bail out as the default if that helps sweeten the deal. :)

My intention is to make LV components more modular, reusable, and easier to
maintain. Moving things where they actually belong
is part of that picture. That makes it a lot easier for us to place those things
within VPlan infrastructure and explain why that's
the best place.

Thanks,
Hideki

-----Original Message-----
From: aemerson at apple.com [mailto:aemerson at apple.com] 
Sent: Friday, January 05, 2018 3:38 PM
To: Saito, Hideki <hideki.saito at intel.com>
Cc: llvm-dev at lists.llvm.org; Hal Finkel <hfinkel at anl.gov>;
Demikhovsky, Elena <elena.demikhovsky at intel.com>; Amara Emerson
<amara.emerson at arm.com>; Stotzer, Eric <estotzer at ti.com>;
Nemanja Ivanovic <nemanja.i.ibm at gmail.com>; Kreitzer, David L
<david.l.kreitzer at intel.com>; Nuzman, Dorit <dorit.nuzman at
intel.com>; Adam Nemet <anemet at apple.com>; James Molloy
<James.Molloy at arm.com>; Sander De Smalen <Sander.DeSmalen at
arm.com>; Zaks, Ayal <ayal.zaks at intel.com>; Graham Hunter
<Graham.Hunter at arm.com>; Michael Kuperstein <mkuper at
google.com>; Caballero, Diego <diego.caballero at intel.com>; Sanjay
Patel <spatel at rotateright.com>; Simon Pilgrim <llvm-dev at
redking.me.uk>; Tian, Xinmin <xinmin.tian at intel.com>; Nema, Ashutosh
<Ashutosh.Nema at amd.com>
Subject: Re: [llvm-dev] RFC: [LV] any objections in moving isLegalMasked* check
from Legal to CostModel? (Cleaning up LoopVectorizationLegality)
> On 5 Jan 2018, at 21:01, Saito, Hideki via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> All,
> 
> I'm trying to refactor LoopVectorize such that it has better 
> conformance to VPlan vision going forward 
> (http://www.llvm.org/docs/Proposals/VectorizationPlan.html). All 
> VP*Recipe class definitions are now moved to VPlan.h, and I have a patch
under review to move LoopVectorizationPlanner class out of LoopVectorize.cpp
(https://reviews.llvm.org/D41420).
> 
> Next thing I'm working on is LoopVectorizationLegality, and I noticed 
> that it has a component of CostModel and optimization, which doesn't
seem right from vectorizer's architectural perspective.
> It appears that we are currently abusing Legal as the attic to throw a 
> lot of things in order to avoid passing many pointers around. From 
> vectorizer's architectural point of view, we should distinguish Legal 
> from "Vectorization Context Information" (I'd call it 
> LoopVectorizationAnalysisInfo), some of which (such as induction,
reduction, etc.) are populated during the Legal step. InterleaveInfo
shouldn't even be a member of Legal. Nothing to do with Legality. It would
be a good member of LoopVectorizationAnalysisInfo.
> Eventually, I'd like to see these under Analysis subtree (instead of
Transform), since they are indeed Analysis.
> 
> As a first step of this LoopVectorizationLegality cleanup, I propose 
> to move the following checks (and member functions) to
LoopVectorizationCostModel.
> 	isLegalMaskedStore
> 	isLegalMaskedLoad
> 	isLegalMaskedScatter
> 	isLegalMaskedGather
> My assumption is that all SIMD architectures should support 
> serialization of those operations at some cost  (e.g., lowering in CG 
> prepare) and thus failing to vectorize due to "false" return
values of
> those calls is incorrect behavior. I'll make sure to use a very high
initial cost such that this cleanup is NFC for all practical purposes ---- and
leave the tuning work as TODO.
> 
> The down side I can think of is that this will end up running more 
> parts of vectorizer for those kind of loops ---- can expose 
> pre-existing bugs and compile time would be a bit longer since we are 
> bailing out later. Upside is that we can tune the cost model ---- if 
> other parts of the loop has enough speedup, we don't have to give up
entire vectorization simply because masked load/store/gather/scatter aren't
supported on the target.
> 
> If anyone still thinks "early bailout" is valuable, splitting
into a
> separate HWLegal class would be a cleaner approach than what we have today.
We should be able to disable/enable it under an option.
> 
> Let me know what you think.
> 
> Thanks,
> Hideki Saito
> 
> I support this direction, and agree in principal that these are strictly cost
model queries, but are there actually any real world workloads where
gather/scatter scalarisation would be worth it, on any micro-architecture? If we
don’t have examples and the compile time cost is non-negligible then I think
we’d still like to keep the early bailouts in some form.’

Amara

Hal Finkel via llvm-dev

2018-Jan-07 05:20 UTC

head link

[llvm-dev] RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

On 01/05/2018 06:28 PM, Saito, Hideki wrote:> Amara,
>
>> I support this direction
> Thanks for the support.
>
>> but are there actually any real world workloads where gather/scatter
scalarisation would be worth it, on any micro-architecture? If we don’t have
examples and the compile time cost is non-negligible then I think we’d still
like to keep the early >bailouts in some form.’
> It's not like I have specific application code in mind. This all
depends on what your code has and how often the emulated code has to
> kick-in. I'm more than happy to implement HWLegal class, and have early
bailout available under the switch. I'm also fine having early
> bail out as the default if that helps sweeten the deal. :)
>
> My intention is to make LV components more modular, reusable, and easier to
maintain. Moving things where they actually belong
> is part of that picture. That makes it a lot easier for us to place those
things within VPlan infrastructure and explain why that's
> the best place.
I think that we should proceed on this basis. It is not clear to me that 
the extra compile time will be significant, and we shouldn't prematurely 
optimize (although we should check for any significant impact).

The intrinsics should be scalarized on all targets without support (by 
lib/CodeGen/ScalarizeMaskedMemIntrin.cpp which is scheduled by default 
in TargetPassConfig::addIRPasses).

  -Hal
>
> Thanks,
> Hideki
>
> -----Original Message-----
> From: aemerson at apple.com [mailto:aemerson at apple.com]
> Sent: Friday, January 05, 2018 3:38 PM
> To: Saito, Hideki <hideki.saito at intel.com>
> Cc: llvm-dev at lists.llvm.org; Hal Finkel <hfinkel at anl.gov>;
Demikhovsky, Elena <elena.demikhovsky at intel.com>; Amara Emerson
<amara.emerson at arm.com>; Stotzer, Eric <estotzer at ti.com>;
Nemanja Ivanovic <nemanja.i.ibm at gmail.com>; Kreitzer, David L
<david.l.kreitzer at intel.com>; Nuzman, Dorit <dorit.nuzman at
intel.com>; Adam Nemet <anemet at apple.com>; James Molloy
<James.Molloy at arm.com>; Sander De Smalen <Sander.DeSmalen at
arm.com>; Zaks, Ayal <ayal.zaks at intel.com>; Graham Hunter
<Graham.Hunter at arm.com>; Michael Kuperstein <mkuper at
google.com>; Caballero, Diego <diego.caballero at intel.com>; Sanjay
Patel <spatel at rotateright.com>; Simon Pilgrim <llvm-dev at
redking.me.uk>; Tian, Xinmin <xinmin.tian at intel.com>; Nema, Ashutosh
<Ashutosh.Nema at amd.com>
> Subject: Re: [llvm-dev] RFC: [LV] any objections in moving isLegalMasked*
check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)
>
>> On 5 Jan 2018, at 21:01, Saito, Hideki via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>>
>>
>> All,
>>
>> I'm trying to refactor LoopVectorize such that it has better
>> conformance to VPlan vision going forward
>> (http://www.llvm.org/docs/Proposals/VectorizationPlan.html). All
>> VP*Recipe class definitions are now moved to VPlan.h, and I have a
patch under review to move LoopVectorizationPlanner class out of
LoopVectorize.cpp (https://reviews.llvm.org/D41420).
>>
>> Next thing I'm working on is LoopVectorizationLegality, and I
noticed
>> that it has a component of CostModel and optimization, which
doesn't seem right from vectorizer's architectural perspective.
>> It appears that we are currently abusing Legal as the attic to throw a
>> lot of things in order to avoid passing many pointers around. From
>> vectorizer's architectural point of view, we should distinguish
Legal
>> from "Vectorization Context Information" (I'd call it
>> LoopVectorizationAnalysisInfo), some of which (such as induction,
reduction, etc.) are populated during the Legal step. InterleaveInfo
shouldn't even be a member of Legal. Nothing to do with Legality. It would
be a good member of LoopVectorizationAnalysisInfo.
>> Eventually, I'd like to see these under Analysis subtree (instead
of Transform), since they are indeed Analysis.
>>
>> As a first step of this LoopVectorizationLegality cleanup, I propose
>> to move the following checks (and member functions) to
LoopVectorizationCostModel.
>> 	isLegalMaskedStore
>> 	isLegalMaskedLoad
>> 	isLegalMaskedScatter
>> 	isLegalMaskedGather
>> My assumption is that all SIMD architectures should support
>> serialization of those operations at some cost  (e.g., lowering in CG
>> prepare) and thus failing to vectorize due to "false" return
values of
>> those calls is incorrect behavior. I'll make sure to use a very
high initial cost such that this cleanup is NFC for all practical purposes ----
and leave the tuning work as TODO.
>>
>> The down side I can think of is that this will end up running more
>> parts of vectorizer for those kind of loops ---- can expose
>> pre-existing bugs and compile time would be a bit longer since we are
>> bailing out later. Upside is that we can tune the cost model ---- if
>> other parts of the loop has enough speedup, we don't have to give
up entire vectorization simply because masked load/store/gather/scatter
aren't supported on the target.
>>
>> If anyone still thinks "early bailout" is valuable, splitting
into a
>> separate HWLegal class would be a cleaner approach than what we have
today. We should be able to disable/enable it under an option.
>>
>> Let me know what you think.
>>
>> Thanks,
>> Hideki Saito
>>
>>
> I support this direction, and agree in principal that these are strictly
cost model queries, but are there actually any real world workloads where
gather/scatter scalarisation would be worth it, on any micro-architecture? If we
don’t have examples and the compile time cost is non-negligible then I think
we’d still like to keep the early bailouts in some form.’
>
> Amara
>
>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Saito, Hideki via llvm-dev

2018-Jan-09 00:44 UTC

head link

[llvm-dev] RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

Thanks, Hal.

I plan to post a patch w/o HW Legality early bailout first. That should enable
further discussion on where the initial very high cost for "illegal masked
load/store/gather/scatter" should be coming from --- like should
LoopVectorize provide it? Or should it be provided by TTI? I prefer the latter
(TTI) but the first revision of the patch will intentionally do the former
(LoopVectorize hacked cost) such that I'll be sure to have enough
"explicit" voice behind me to help me push for the TTI approach if
others feel the same way (you can also reply to this message saying "go
with TTI" without waiting for the code). If we are to go with TTI, we
certainly can do this in two steps (i.e., LoopVectorize hacked cost first and
then transition to TTI), and that may be actually desirable.

While the cost discussion is going on, I can think a bit more about how best to
do HW Legality early bailout. If we are doing it, I don't think it's
appropriate to limit it to masked load/store/gather/scatter. Would need some
extensibility, and I think that should be discussed as a separate review.

Stay Tuned.
Hideki

-----Original Message-----
From: Hal Finkel [mailto:hfinkel at anl.gov] 
Sent: Saturday, January 06, 2018 9:20 PM
To: Saito, Hideki <hideki.saito at intel.com>; aemerson at apple.com
Cc: llvm-dev at lists.llvm.org; Demikhovsky, Elena <elena.demikhovsky at
intel.com>; Amara Emerson <amara.emerson at arm.com>; Stotzer, Eric
<estotzer at ti.com>; Nemanja Ivanovic <nemanja.i.ibm at gmail.com>;
Kreitzer, David L <david.l.kreitzer at intel.com>; Nuzman, Dorit
<dorit.nuzman at intel.com>; Adam Nemet <anemet at apple.com>; James
Molloy <James.Molloy at arm.com>; Sander De Smalen <Sander.DeSmalen at
arm.com>; Zaks, Ayal <ayal.zaks at intel.com>; Graham Hunter
<Graham.Hunter at arm.com>; Michael Kuperstein <mkuper at
google.com>; Caballero, Diego <diego.caballero at intel.com>; Sanjay
Patel <spatel at rotateright.com>; Simon Pilgrim <llvm-dev at
redking.me.uk>; Tian, Xinmin <xinmin.tian at intel.com>; Nema, Ashutosh
<Ashutosh.Nema at amd.com>
Subject: Re: [llvm-dev] RFC: [LV] any objections in moving isLegalMasked* check
from Legal to CostModel? (Cleaning up LoopVectorizationLegality)


On 01/05/2018 06:28 PM, Saito, Hideki wrote:> Amara,
>
>> I support this direction
> Thanks for the support.
>
>> but are there actually any real world workloads where gather/scatter
scalarisation would be worth it, on any micro-architecture? If we don’t have
examples and the compile time cost is non-negligible then I think we’d still
like to keep the early >bailouts in some form.’
> It's not like I have specific application code in mind. This all 
> depends on what your code has and how often the emulated code has to 
> kick-in. I'm more than happy to implement HWLegal class, and have 
> early bailout available under the switch. I'm also fine having early 
> bail out as the default if that helps sweeten the deal. :)
>
> My intention is to make LV components more modular, reusable, and 
> easier to maintain. Moving things where they actually belong is part 
> of that picture. That makes it a lot easier for us to place those things
within VPlan infrastructure and explain why that's the best place.
I think that we should proceed on this basis. It is not clear to me that the
extra compile time will be significant, and we shouldn't prematurely
optimize (although we should check for any significant impact).

The intrinsics should be scalarized on all targets without support (by
lib/CodeGen/ScalarizeMaskedMemIntrin.cpp which is scheduled by default in
TargetPassConfig::addIRPasses).

  -Hal
>
> Thanks,
> Hideki
>
> -----Original Message-----
> From: aemerson at apple.com [mailto:aemerson at apple.com]
> Sent: Friday, January 05, 2018 3:38 PM
> To: Saito, Hideki <hideki.saito at intel.com>
> Cc: llvm-dev at lists.llvm.org; Hal Finkel <hfinkel at anl.gov>; 
> Demikhovsky, Elena <elena.demikhovsky at intel.com>; Amara Emerson 
> <amara.emerson at arm.com>; Stotzer, Eric <estotzer at ti.com>;
Nemanja
> Ivanovic <nemanja.i.ibm at gmail.com>; Kreitzer, David L 
> <david.l.kreitzer at intel.com>; Nuzman, Dorit <dorit.nuzman at
intel.com>;
> Adam Nemet <anemet at apple.com>; James Molloy <James.Molloy at
arm.com>;
> Sander De Smalen <Sander.DeSmalen at arm.com>; Zaks, Ayal 
> <ayal.zaks at intel.com>; Graham Hunter <Graham.Hunter at
arm.com>; Michael
> Kuperstein <mkuper at google.com>; Caballero, Diego 
> <diego.caballero at intel.com>; Sanjay Patel <spatel at
rotateright.com>;
> Simon Pilgrim <llvm-dev at redking.me.uk>; Tian, Xinmin 
> <xinmin.tian at intel.com>; Nema, Ashutosh <Ashutosh.Nema at
amd.com>
> Subject: Re: [llvm-dev] RFC: [LV] any objections in moving 
> isLegalMasked* check from Legal to CostModel? (Cleaning up 
> LoopVectorizationLegality)
>
>> On 5 Jan 2018, at 21:01, Saito, Hideki via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>>
>>
>> All,
>>
>> I'm trying to refactor LoopVectorize such that it has better 
>> conformance to VPlan vision going forward 
>> (http://www.llvm.org/docs/Proposals/VectorizationPlan.html). All 
>> VP*Recipe class definitions are now moved to VPlan.h, and I have a
patch under review to move LoopVectorizationPlanner class out of
LoopVectorize.cpp (https://reviews.llvm.org/D41420).
>>
>> Next thing I'm working on is LoopVectorizationLegality, and I
noticed
>> that it has a component of CostModel and optimization, which
doesn't seem right from vectorizer's architectural perspective.
>> It appears that we are currently abusing Legal as the attic to throw 
>> a lot of things in order to avoid passing many pointers around. From 
>> vectorizer's architectural point of view, we should distinguish
Legal
>> from "Vectorization Context Information" (I'd call it 
>> LoopVectorizationAnalysisInfo), some of which (such as induction,
reduction, etc.) are populated during the Legal step. InterleaveInfo
shouldn't even be a member of Legal. Nothing to do with Legality. It would
be a good member of LoopVectorizationAnalysisInfo.
>> Eventually, I'd like to see these under Analysis subtree (instead
of Transform), since they are indeed Analysis.
>>
>> As a first step of this LoopVectorizationLegality cleanup, I propose 
>> to move the following checks (and member functions) to
LoopVectorizationCostModel.
>> 	isLegalMaskedStore
>> 	isLegalMaskedLoad
>> 	isLegalMaskedScatter
>> 	isLegalMaskedGather
>> My assumption is that all SIMD architectures should support 
>> serialization of those operations at some cost  (e.g., lowering in CG
>> prepare) and thus failing to vectorize due to "false" return
values
>> of those calls is incorrect behavior. I'll make sure to use a very
high initial cost such that this cleanup is NFC for all practical purposes ----
and leave the tuning work as TODO.
>>
>> The down side I can think of is that this will end up running more 
>> parts of vectorizer for those kind of loops ---- can expose 
>> pre-existing bugs and compile time would be a bit longer since we are 
>> bailing out later. Upside is that we can tune the cost model ---- if 
>> other parts of the loop has enough speedup, we don't have to give
up entire vectorization simply because masked load/store/gather/scatter
aren't supported on the target.
>>
>> If anyone still thinks "early bailout" is valuable, splitting
into a
>> separate HWLegal class would be a cleaner approach than what we have
today. We should be able to disable/enable it under an option.
>>
>> Let me know what you think.
>>
>> Thanks,
>> Hideki Saito
>>
>>
> I support this direction, and agree in principal that these are strictly
cost model queries, but are there actually any real world workloads where
gather/scatter scalarisation would be worth it, on any micro-architecture? If we
don’t have examples and the compile time cost is non-negligible then I think
we’d still like to keep the early bailouts in some form.’
>
> Amara
>
>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages Leadership Computing
Facility Argonne National Laboratory

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Jan 2018 - RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

[llvm-dev] RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

[llvm-dev] RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

[llvm-dev] RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

Maybe Matching Threads