thr3ads.net - search: "scalarised"

Displaying 20 results from an estimated 35 matches for "scalarised".

2013 Nov 15

[LLVMdev] [PATCH] Add a Scalarize pass

...decision about what to scalarise and what not to scalarise, without any help from llvmpipe. The problem I'm trying to solve is that codegen is too late to get the benefit of other IR optimisations. So in my case I do not want to _change_ the decision about which vectors get scalarised and how. I just want to do it earlier. It would be a shame if that meant that llvmpipe had to duplicate exactly the decisions that codegen makes wrt scalarisation, since codegen can easily make those decisions available through TargetTransformInfo. That's why I thought using T...

[LLVMdev] [PATCH] Add a Scalarize pass

2013 Nov 15

[LLVMdev] [PATCH] Add a Scalarize pass

...t to scalarise and what not to scalarise, without > any help from llvmpipe. The problem I'm trying to solve is that > codegen is too late to get the benefit of other IR optimisations. > > So in my case I do not want to _change_ the decision about which > vectors get scalarised and how. I just want to do it earlier. > It would be a shame if that meant that llvmpipe had to duplicate > exactly the decisions that codegen makes wrt scalarisation, > since codegen can easily make those decisions available through > TargetTransformInfo. > > That...

[LLVMdev] [PATCH] Add a Scalarize pass

2013 Nov 14

[LLVMdev] [PATCH] Add a Scalarize pass

Richard Sandiford <rsandifo at linux.vnet.ibm.com> writes: > Are you worried that adding it to PMB will increase compile time? > The pass exits very early for any target that doesn't opt-in to doing > scalarisation at the IR level, without even looking at the function. As an alternative, adding Scalarizer and InstCombine passes to SystemZPassConfig::addIRPasses() would probably

[LLVMdev] [PATCH] Add a Scalarize pass

2013 Nov 14

[LLVMdev] [PATCH] Add a Scalarize pass

On Nov 14, 2013, at 2:32 PM, Richard Sandiford <rsandifo at linux.vnet.ibm.com> wrote: > Richard Sandiford <rsandifo at linux.vnet.ibm.com> writes: >> Are you worried that adding it to PMB will increase compile time? >> The pass exits very early for any target that doesn't opt-in to doing >> scalarisation at the IR level, without even looking at the function.

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 09

Vectorization with fast-math on irregular ISA sub-sets

...enablement work, so be it. There might be a slight issue with legacy IR bitcode, but if that's going to be a problem in practice, we can design some scheme to let auto-upgrade do the right thing. > > If the scalarisation is in IR, then any NEON intrinsic in C code will > get wrongly scalarised. Builtins can be lowered in either IR > operations or builtins, and the back-end has no way of knowing the > origin. > > If the scalarization is lower down, then we risk also changing inline > ASM snippets, which is even worse. Yes, but we don't do that, so that's not a pra...

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 09

Vectorization with fast-math on irregular ISA sub-sets

----- Original Message ----- > From: "James Molloy" <James.Molloy at arm.com> > To: "Renato Golin" <renato.golin at linaro.org> > Cc: "Nadav Rotem" <nrotem at apple.com>, "Arnold Schwaighofer" <aschwaighofer at apple.com>, "Hal Finkel" > <hfinkel at anl.gov>, "LLVM Dev" <llvm-dev at

[LLVMdev] [PATCH] Add a Scalarize pass

2013 Nov 14

[LLVMdev] [PATCH] Add a Scalarize pass

Hi Richard, Thanks for working on this. Comments below. > I don't understand the basis for the last statement though. Do you mean > that you think most cases produce better code if scalarised at the SD stage > rather than at the IR level? Could you give an example? You presented an example that shows that scalarizing vectors allow further optimizations. But I don’t think that this example represents the kind of problems that we run into in general C++ code. We currently consider...

[LLVMdev] Is there pass to break down <4 x float> to scalars

2013 Oct 25

[LLVMdev] Is there pass to break down <4 x float> to scalars

...posed. > If I got it right, this may not be necessary, or it may even be harmful. Say you decide that <4 x i32> vectors should be left alone, so that your pass only scalarise the others. But when the vectorizer passes again (to try and use CPU vector instructions), it might not match the scalarised version with the vector, and you end up with data movement between scalar and vector pipelines, which normally slows down CPUs (at least in ARM's case). Also, problematic cases like <5 x i32> could be better split into 3+2 pairs, rather than 4+1. If you scalarise everything, than the vec...

[LLVMdev] [PATCH] Add a Scalarize pass

2013 Nov 14

[LLVMdev] [PATCH] Add a Scalarize pass

Nadav Rotem <nrotem at apple.com> writes: >> I don't understand the basis for the last statement though. Do you mean >> that you think most cases produce better code if scalarised at the SD stage >> rather than at the IR level? Could you give an example? > > You presented an example that shows that scalarizing vectors allow > further optimizations. But I don’t think that this example represents > the kind of problems that we run into in general C++ code....

[LLVMdev] Is there pass to break down <4 x float> to scalars

2013 Oct 25

[LLVMdev] Is there pass to break down <4 x float> to scalars

...but I found in the llvmpipe case that this made things worse with TBAA, because DAGCombiner::GaterAllAliases has some fairly strict limits. So I disabled that by default; use -decompose-vector-load-store to reenable. The main motivation for z was instead to get InstCombine to rewrite things like scalarised selects. I haven't submitted it yet because it's less of a win than the TBAA DAGCombiner patch I posted, so I didn't want to distract from that. It would also need some TargetTransformInfo hooks to decide which vectors should be decomposed. Thanks, Richard -------------- next part --...

[LLVMdev] Is there pass to break down <4 x float> to scalars

2013 Oct 25

[LLVMdev] Is there pass to break down <4 x float> to scalars

Hi, LLVM community, I write some code in hand using LLVM IR. for simplicity, I write them in <4 x float>. now I found some stores for elements are useless. for example, If I store {0.0, 1.0, 2.0, 3.0} to a <4 x float> %a. maybe only %a.xy is alive in my program. our target doesn't feature SIMD instruction, which means we have to lower vector to many scalar instructions. I found

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 11

Vectorization with fast-math on irregular ISA sub-sets

..., 2016 8:30:50 AM > Subject: Re: Vectorization with fast-math on irregular ISA sub-sets > > On 9 February 2016 at 20:29, Hal Finkel <hfinkel at anl.gov> wrote: > >> If the scalarisation is in IR, then any NEON intrinsic in C code > >> will > >> get wrongly scalarised. Builtins can be lowered in either IR > >> operations or builtins, and the back-end has no way of knowing the > >> origin. > >> > >> If the scalarization is lower down, then we risk also changing > >> inline > >> ASM snippets, which is even wors...

[Proposal] Introducing the concept of invalid costs to the IR cost model

2020 Nov 05

[Proposal] Introducing the concept of invalid costs to the IR cost model

Hi, I'd like to propose a change to our cost interfaces so that instead of returning an unsigned value from functions like getInstructionCost, getUserCost, etc., we instead return a wrapper class that encodes an integer cost along with extra state. The extra state can be used to express: 1. A cost as infinitely expensive in order to prevent certain optimisations taking place. For example,

[LLVMdev] Is there pass to break down <4 x float> to scalars

2013 Oct 25

[LLVMdev] Is there pass to break down <4 x float> to scalars

...ot it right, this may not be necessary, or it may even be harmful. > > Say you decide that <4 x i32> vectors should be left alone, so that your > pass only scalarise the others. But when the vectorizer passes again (to > try and use CPU vector instructions), it might not match the scalarised > version with the vector, and you end up with data movement between scalar > and vector pipelines, which normally slows down CPUs (at least in ARM's > case). Also, problematic cases like <5 x i32> could be better split into > 3+2 pairs, rather than 4+1. > > If you scala...

[LLVMdev] [PATCH] Add a Scalarize pass

2013 Nov 13

[LLVMdev] [PATCH] Add a Scalarize pass

Hi Richard, Thanks for working on this. We should probably move this discussion to llvm-dev because it is not strictly related to the patch review anymore. The code below is not representative of general c/c++ code. Usually only domain specific language (such as OpenCL) contain vector instructions. The LLVM pass manager configuration (pass manager builder) is designed for C/C++ compilers, not

[LLVMdev] [PATCH] Add a Scalarize pass

2013 Nov 13

[LLVMdev] [PATCH] Add a Scalarize pass

...need to come up with an > optimization pipe and works for most programs that we care about. I > still think that scalarizing in SD is a reasonable solution for c/c++. I don't understand the basis for the last statement though. Do you mean that you think most cases produce better code if scalarised at the SD stage rather than at the IR level? Could you give an example? If the idea is to have a clean separation of concerns between the front end and LLVM, then it seems like there are two obvious approaches: (a) make it the front end's responsibility to only generate vector widths tha...

RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

2018 Jan 06

RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

Amara, >I support this direction Thanks for the support. >but are there actually any real world workloads where gather/scatter scalarisation would be worth it, on any micro-architecture? If we don’t have examples and the compile time cost is non-negligible then I think we’d still like to keep the early >bailouts in some form.’ It's not like I have specific application code in

[LLVMdev] Adding masked vector load and store intrinsics

2014 Oct 24

[LLVMdev] Adding masked vector load and store intrinsics

...ust cast to it from whatever the deal pointer type is. -Hal > > > Also, given that the types of the vectors matter, it seems like we’re > going to need TTI anyway whenever we want to generate one of these, > or else we’ll end up generating an illegal version which has to be > scalarised in the backend. > > > Thanks, > Pete > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- Hal Finkel Assistant Computat...

RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

2018 Jan 05

RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

> On 5 Jan 2018, at 21:01, Saito, Hideki via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > All, > > I'm trying to refactor LoopVectorize such that it has better conformance to VPlan vision going forward > (http://www.llvm.org/docs/Proposals/VectorizationPlan.html). All VP*Recipe class definitions are now > moved to VPlan.h, and I have a patch under review

RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

2018 Jan 07

RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

On 01/05/2018 06:28 PM, Saito, Hideki wrote: > Amara, > >> I support this direction > Thanks for the support. > >> but are there actually any real world workloads where gather/scatter scalarisation would be worth it, on any micro-architecture? If we don’t have examples and the compile time cost is non-negligible then I think we’d still like to keep the early >bailouts in

search for: scalarised