thr3ads.net - llvm dev - [llvm-dev] sum elements in the vector [May 2016]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2016-May-23 19:43 UTC

[llvm-dev] sum elements in the vector

Hi Chandler,

Regardless of the canonical form we choose, we need code to match non-canonical
associated shuffle sequences and convert them into the canonical form. We also
need code to match the pattern where we extractelement on all elements and sum
them into this canonical form. This code needs to exist somewhere, so we need to
decide whether it exists in the frontend or the backend.

Having an intrinsic is obviously smaller, in terms of IR memory overhead, than
these instructions. However, I'm not sure how many passes we'll need to
teach about the new intrinsic. Obviously there are many passes that understand
integer addition, but likely many fewer would really learn anything useful by
looking through the reduction. We would need add code into InstCombine in order
to pull apart the reduction intrinsic when we learn that the vector has only one
contributing element.

In short, I don't have a strong opinion on this, because we need the
matching code somewhere regardless. Using the intrinsic means not matching
multiple times, but it means adding extra code to handle the intrinsic.
Regarding issues such as idiom recognition, use by the SLP vectorizer, etc.
these seem independent of whether the canonical form is an intrinsic or a
composite, and I don't think it makes the vectorizer cost model easier one
way or the other.

In any case, we now have relevant pattern-matching code in SDAGBuilder (although
it is currently somewhat specific to reductions after loops), so we already have
a better infrastructure to help backends with the shuffle-matching problem.
There is a corresponding 'VectorReduction' SDNode flag.

 -Hal

----- Original Message -----> From: "Chandler Carruth" <chandlerc at gmail.com>
> To: "Asghar-ahmad Shahid" <Asghar-ahmad.Shahid at amd.com>,
"Rail Shafigulin" <rail at esenciatech.com>,
"llvm-dev"
> <llvm-dev at lists.llvm.org>, "Hal Finkel" <hfinkel at
anl.gov>
> Sent: Sunday, May 15, 2016 8:15:37 PM
> Subject: Re: [llvm-dev] sum elements in the vector
> 
> 
> I'm starting to think we should directly implement horizontal
> operations on vector types.
> 
> 
> 
> My suspicion is that coming up with a nice model for this would help
> us a lot with things like:
> - Idiom recognition of reduction patterns that use horizontal
> arithmetic
> - Ability to use horizontal operations in SLPVectorizer
> - Significantly easier cost modeling of vectorizing loops with
> reductions in LoopVectorize
> - Other things I've not thought of?
> 
> Curious what others think?
> 
> 
> -Chandler
> 
> 
> On Wed, May 11, 2016 at 10:07 PM Shahid, Asghar-ahmad via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:
> 
> 
> 
> 
> 
> 
> > why in order to add this particular instruction (sum elements in a
> > vector) I need to add an insrinsic?
> 
> 
> 
> Adding intrinsic is not the only way, it is one of the way and user
> WILL-NOT be required to invoke
> 
> It specifically.
> 
> 
> 
> Currently LLVM does not have any instruction to directly represent
> “sum of elements in a vector” and
> 
> generate your particular instruction.However, you can do it without
> intrinsic by pattern matching the
> 
> LLVM-IRs representing “sum of elements in vector” to your particular
> instruction in DAGCombiner.
> 
> 
> 
> Regards,
> 
> Shahid
> 
> 
> 
> 
> 
> 
> 
> 
> From: Rail Shafigulin [mailto: rail at esenciatech.com ]
> Sent: Monday, May 09, 2016 11:59 PM
> To: Shahid, Asghar-ahmad; llvm-dev
> Cc: Das, Dibyendu
> 
> 
> 
> 
> 
> 
> 
> Subject: Re: [llvm-dev] sum elements in the vector
> 
> 
> 
> 
> 
> 
> 
> I'm a little confused. Here is why.
> 
> 
> 
> 
> 
> I was able to add a vector add instruction to my target without using
> any intrinsics and without adding any new instructions to LLVM. So
> here is my question: how come I managed to add a new vector
> instruction without adding an intrinsic and why in order to add this
> particular instruction (sum elements in a vector) I need to add an
> insrinsic?
> 
> 
> 
> 
> 
> Another question that I have is whether compiler will be able to
> target this new instruction (sum elements in a vector) if it is
> implemented as an intrinsic or the user will have to specifically
> invoke an instrinsic.
> 
> 
> 
> 
> 
> Pardon if questions seem dumb, I'm still learning things.
> 
> 
> 
> 
> 
> Any help is appreciated.
> 
> 
> 
> 
> 
> On Fri, May 6, 2016 at 1:51 PM, Rail Shafigulin <
> rail at esenciatech.com > wrote:
> 
> 
> Thanks for the reply. These steps will add an instruction as an
> intrinsic. Is it possible to add an actual new instruction so that a
> compiler could target it during an optimization? How hard is it to
> do it? Is that a realistic objective.
> 
> 
> 
> 
> 
> Rail
> 
> 
> 
> 
> 
> 
> 
> On Mon, Apr 4, 2016 at 9:02 PM, Shahid, Asghar-ahmad <
> Asghar-ahmad.Shahid at amd.com > wrote:
> 
> 
> 
> Hi Rail,
> 
> 
> 
> We had done this for generation of X86 PSAD (sum of absolute
> difference) instruction through
> 
> Llvm intrinsic. Doing this requires following
> 
> 1. Define an intrinsic, xyz(), for the required instruction and
> corresponding SDNode
> 
> 2. Generate the “call xyz() “ IR based the matched pattern
> 
> 3. Map “call xyz()” IR to corresponding SDNode in
> SelectionDagBuilder.cpp
> 
> 4. Provide default expansion of the xyz() intrinsic
> 
> 5. Legalize type and/or operation
> 
> 6. Provide Lowering of intrinsic/SDNode to generate your target
> instruction
> 
> 
> 
> You can visit http://llvm.org/docs/ExtendingLLVM.html for details.
> 
> 
> 
> Regards,
> 
> Shahid
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> From: llvm-dev [mailto: llvm-dev-bounces at lists.llvm.org ] On Behalf
> Of Rail Shafigulin via llvm-dev
> Sent: Monday, April 04, 2016 11:00 PM
> To: Das, Dibyendu
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] sum elements in the vector
> 
> 
> 
> 
> 
> 
> Thanks for the pointers. I looked at hadd instructions. They seem to
> do very similar to what I need. Unfortunately as I said before my
> LLVM experience is limited. My understanding is that when I create a
> new type of SDNode I need to specify a pattern for it, so that when
> LLVM is analyzing the code and is seeing a given pattern it would
> create this particular node. I'm really struggling to understand how
> it is done. So here are the problems that I'm having.
> 
> 
> 
> 
> 
> 1. How do I identify that pattern that should be used?
> 
> 
> 2. How do I specify a given pattern?
> 
> 
> 
> 
> 
> Do you (or someone else) mind helping me out?
> 
> 
> 
> 
> 
> Any help is appreciated.
> 
> 
> 
> 
> 
> On Mon, Apr 4, 2016 at 9:59 AM, Das, Dibyendu < Dibyendu.Das at amd.com
> > wrote:
> 
> 
> 
> This is roughly along the lines of x86 hadd* instructions though the
> semantics of hadd* may not exactly match what you are looking for.
> This is probably more in line with x86/ARM SAD-like instructions but
> I don’t think llvm generates SAD without intrinsics.
> 
> 
> 
> From: llvm-dev [mailto: llvm-dev-bounces at lists.llvm.org ] On Behalf
> Of Rail Shafigulin via llvm-dev
> Sent: Monday, April 04, 2016 9:34 AM
> To: llvm-dev < llvm-dev at lists.llvm.org >
> Subject: [llvm-dev] sum elements in the vector
> 
> 
> 
> 
> My target has an instruction that adds up all elements in the vector
> and stores the result in a register. I'm trying to implement it in
> my compiler but I'm not sure even where to start.
> 
> 
> 
> 
> 
> 
> 
> I did look at other targets, but they don't seem to have anything
> like it ( I could be wrong. My experience with LLVM is limited, so
> if I missed it, I'd appreciate if someone could point it out ).
> 
> 
> 
> 
> 
> My understanding is that if SDNode for such an instruction doesn't
> exist I have to define one. Unfortunately, I don't know how to do
> it. I don't even know where to start looking. Would someone care to
> point me in the right direction?
> 
> 
> 
> 
> 
> Any help is appreciated.
> 
> 
> 
> 
> 
> --
> 
> 
> 
> 
> 
> 
> Rail Shafigulin
> 
> Software Engineer
> Esencia Technologies
> 
> 
> 
> 
> 
> 
> 
> 
> --
> 
> 
> 
> 
> 
> 
> Rail Shafigulin
> 
> Software Engineer
> Esencia Technologies
> 
> 
> 
> 
> 
> 
> 
> 
> --
> 
> 
> 
> 
> 
> 
> Rail Shafigulin
> 
> Software Engineer
> Esencia Technologies
> 
> 
> 
> 
> 
> 
> 
> 
> --
> 
> 
> 
> 
> 
> 
> Rail Shafigulin
> 
> Software Engineer
> Esencia Technologies _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Chandler Carruth via llvm-dev

2016-May-23 19:48 UTC

head link

[llvm-dev] sum elements in the vector

On Mon, May 23, 2016 at 12:43 PM Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi Chandler,
>
> Regardless of the canonical form we choose, we need code to match
> non-canonical associated shuffle sequences and convert them into the
> canonical form. We also need code to match the pattern where we
> extractelement on all elements and sum them into this canonical form. This
> code needs to exist somewhere, so we need to decide whether it exists in
> the frontend or the backend.
>
Agreed. However, we also need to choose where it lives within the
"backend"
or LLVM more generally.

I think putting it late will end up with less powerful matching than having
it fairly early and run often in instcombine.

Consider -- how should the inliner or the loop unroller evaluate code
sequences containing 16 vector shuffles that all amount to expanding a
horizontal operation?

That's why I think we should model this patterns as first class citizens in
the IR. Then backends can lower them however is necessary.

>
> Having an intrinsic is obviously smaller, in terms of IR memory overhead,
> than these instructions. However, I'm not sure how many passes
we'll need
> to teach about the new intrinsic.

An intrinsic already moves this into the IR instead of the code generator,
which I like.

I think the important distinction is between a *target specific* intrinsic
and a generic one that we expect every target to support. The latter I
think will be much more useful.

Then we can debate the relative merits of using a generic intrinsic versus
an instruction which I think are fairly mundane. I suspect this is a place
where we should use instructions, but that's a much more mechanical
discussion IMO.

(And in case it isn't clear, I'm not arguing we should avoid doing the
vector reduction matching and other patterns at all. I'm just trying to
start the discussion about the larger set of issues here.)

-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160523/81781e72/attachment.html>

Hal Finkel via llvm-dev

2016-May-23 20:06 UTC

head link

[llvm-dev] sum elements in the vector

----- Original Message -----
> From: "Chandler Carruth" <chandlerc at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>, "Chandler
Carruth"
> <chandlerc at gmail.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Monday, May 23, 2016 2:48:13 PM
> Subject: Re: [llvm-dev] sum elements in the vector
> On Mon, May 23, 2016 at 12:43 PM Hal Finkel via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:
> > Hi Chandler,
> 
> > Regardless of the canonical form we choose, we need code to match
> > non-canonical associated shuffle sequences and convert them into
> > the
> > canonical form. We also need code to match the pattern where we
> > extractelement on all elements and sum them into this canonical
> > form. This code needs to exist somewhere, so we need to decide
> > whether it exists in the frontend or the backend.
> 
> Agreed. However, we also need to choose where it lives within the
> "backend" or LLVM more generally.
> I think putting it late will end up with less powerful matching than
> having it fairly early and run often in instcombine.
> Consider -- how should the inliner or the loop unroller evaluate code
> sequences containing 16 vector shuffles that all amount to expanding
> a horizontal operation?I agree this is an issue. This is why I said that if we choose a composite
canonical form, we'll end up matching the pattern multiple times. Regarding
this kind of cost modeling, however, we already have this problem, and it's
sometimes quite bad, and has nothing to do with reductions. Targets, in general,
need to be better about providing better "user costs" for composite
sequences that will end up being cheap. Last I looked at this in the context of
loop unrolling, addressing modes were a common issue here as well.
> That's why I think we should model this patterns as first class
> citizens in the IR. Then backends can lower them however is
> necessary.
> > Having an intrinsic is obviously smaller, in terms of IR memory
> > overhead, than these instructions. However, I'm not sure how many
> > passes we'll need to teach about the new intrinsic.
> 
> An intrinsic already moves this into the IR instead of the code
> generator, which I like.
> I think the important distinction is between a *target specific*
> intrinsic and a generic one that we expect every target to support.
> The latter I think will be much more useful.
> Then we can debate the relative merits of using a generic intrinsic
> versus an instruction which I think are fairly mundane. I suspect
> this is a place where we should use instructions, but that's a much
> more mechanical discussion IMO.
Sure. If we're going to go with a dedicated representation, I'm fine
with either, and I agree an instruction here might be mechanically easier.
> (And in case it isn't clear, I'm not arguing we should avoid doing
> the vector reduction matching and other patterns at all. I'm just
> trying to start the discussion about the larger set of issues here.)Understood. 

In some mechanical sense, however, we're discussing: 

if (isVectorReduction(I) == Instruction::Add) vs. if (auto *VRI =
dyn_cast<VectorReductionInst>(I)) if (VRI->getReductionOp() ==
Instruction::Add) -- bikeshedding aside -- and so it might be better to start
with the utility-function implementation, which is essentially what we should
have now given the current implementation, and then see it we want to do more
based on compile-time impacts or other issues.

Thanks again, 
Hal 
> -Chandler-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160523/c54c9733/attachment.html>

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - May 2016 - sum elements in the vector

[llvm-dev] sum elements in the vector

[llvm-dev] sum elements in the vector

[llvm-dev] sum elements in the vector

Maybe Matching Threads