thr3ads.net - search: "scalarise"

Displaying 20 results from an estimated 35 matches for "scalarise".

Did you mean: scalarised

2013 Nov 15

[LLVMdev] [PATCH] Add a Scalarize pass

...uldn't the same be true in the other direction, for targets without vector support? (b) The situation you describe isn't the one that applies to llvmpipe. In llvmpipe the vectors are nice, known widths that are under the driver's own control. We certainly don't want to scalarise and revectorise llvmpipe IR on x86_64, or on powerpc with Altivec/VSX. The original code is already well vectorised for those targets. (And also for ARM NEON I expect.) In the llvmpipe case, codegen's type legaliser already makes a good decision about what to scalarise and...

[LLVMdev] [PATCH] Add a Scalarize pass

2013 Nov 15

[LLVMdev] [PATCH] Add a Scalarize pass

...true in the > other direction, for targets without vector support? > > (b) The situation you describe isn't the one that applies to llvmpipe. > In llvmpipe the vectors are nice, known widths that are under the > driver's own control. We certainly don't want to scalarise and > revectorise llvmpipe IR on x86_64, or on powerpc with Altivec/VSX. > The original code is already well vectorised for those targets. > (And also for ARM NEON I expect.) > > In the llvmpipe case, codegen's type legaliser already makes a good > decision abo...

[LLVMdev] [PATCH] Add a Scalarize pass

2013 Nov 14

[LLVMdev] [PATCH] Add a Scalarize pass

Richard Sandiford <rsandifo at linux.vnet.ibm.com> writes: > Are you worried that adding it to PMB will increase compile time? > The pass exits very early for any target that doesn't opt-in to doing > scalarisation at the IR level, without even looking at the function. As an alternative, adding Scalarizer and InstCombine passes to SystemZPassConfig::addIRPasses() would probably

[LLVMdev] [PATCH] Add a Scalarize pass

2013 Nov 14

[LLVMdev] [PATCH] Add a Scalarize pass

On Nov 14, 2013, at 2:32 PM, Richard Sandiford <rsandifo at linux.vnet.ibm.com> wrote: > Richard Sandiford <rsandifo at linux.vnet.ibm.com> writes: >> Are you worried that adding it to PMB will increase compile time? >> The pass exits very early for any target that doesn't opt-in to doing >> scalarisation at the IR level, without even looking at the function.

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 09

Vectorization with fast-math on irregular ISA sub-sets

...rgets where that's necessary) > > 2. Update the TTI cost model interfaces to take fast-math flags so > > that all vectorizers can make appropriate decisions > > I think this is exactly the opposite of what James is saying, and I > have to agree with him, since this would scalarise everything. No, it just means that the intrinsics need to set the appropriate fast-math flags on the instructions generated. This might require some frontend enablement work, so be it. There might be a slight issue with legacy IR bitcode, but if that's going to be a problem in practice, we ca...

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 09

Vectorization with fast-math on irregular ISA sub-sets

----- Original Message ----- > From: "James Molloy" <James.Molloy at arm.com> > To: "Renato Golin" <renato.golin at linaro.org> > Cc: "Nadav Rotem" <nrotem at apple.com>, "Arnold Schwaighofer" <aschwaighofer at apple.com>, "Hal Finkel" > <hfinkel at anl.gov>, "LLVM Dev" <llvm-dev at

[LLVMdev] [PATCH] Add a Scalarize pass

2013 Nov 14

[LLVMdev] [PATCH] Add a Scalarize pass

Hi Richard, Thanks for working on this. Comments below. > I don't understand the basis for the last statement though. Do you mean > that you think most cases produce better code if scalarised at the SD stage > rather than at the IR level? Could you give an example? You presented an example that shows that scalarizing vectors allow further optimizations. But I don’t think that this example represents the kind of problems that we run into in general C++ code. We currently consider...

[LLVMdev] Is there pass to break down <4 x float> to scalars

2013 Oct 25

[LLVMdev] Is there pass to break down <4 x float> to scalars

...y from OpenCL to NEON code. It would also need some TargetTransformInfo hooks to decide which > vectors should be decomposed. > If I got it right, this may not be necessary, or it may even be harmful. Say you decide that <4 x i32> vectors should be left alone, so that your pass only scalarise the others. But when the vectorizer passes again (to try and use CPU vector instructions), it might not match the scalarised version with the vector, and you end up with data movement between scalar and vector pipelines, which normally slows down CPUs (at least in ARM's case). Also, problematic...

[LLVMdev] [PATCH] Add a Scalarize pass

2013 Nov 14

[LLVMdev] [PATCH] Add a Scalarize pass

Nadav Rotem <nrotem at apple.com> writes: >> I don't understand the basis for the last statement though. Do you mean >> that you think most cases produce better code if scalarised at the SD stage >> rather than at the IR level? Could you give an example? > > You presented an example that shows that scalarizing vectors allow > further optimizations. But I don’t think that this example represents > the kind of problems that we run into in general C++ code....

[LLVMdev] Is there pass to break down <4 x float> to scalars

2013 Oct 25

[LLVMdev] Is there pass to break down <4 x float> to scalars

...but I found in the llvmpipe case that this made things worse with TBAA, because DAGCombiner::GaterAllAliases has some fairly strict limits. So I disabled that by default; use -decompose-vector-load-store to reenable. The main motivation for z was instead to get InstCombine to rewrite things like scalarised selects. I haven't submitted it yet because it's less of a win than the TBAA DAGCombiner patch I posted, so I didn't want to distract from that. It would also need some TargetTransformInfo hooks to decide which vectors should be decomposed. Thanks, Richard -------------- next part -...

[LLVMdev] Is there pass to break down <4 x float> to scalars

2013 Oct 25

[LLVMdev] Is there pass to break down <4 x float> to scalars

Hi, LLVM community, I write some code in hand using LLVM IR. for simplicity, I write them in <4 x float>. now I found some stores for elements are useless. for example, If I store {0.0, 1.0, 2.0, 3.0} to a <4 x float> %a. maybe only %a.xy is alive in my program. our target doesn't feature SIMD instruction, which means we have to lower vector to many scalar instructions. I found

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 11

Vectorization with fast-math on irregular ISA sub-sets

..., 2016 8:30:50 AM > Subject: Re: Vectorization with fast-math on irregular ISA sub-sets > > On 9 February 2016 at 20:29, Hal Finkel <hfinkel at anl.gov> wrote: > >> If the scalarisation is in IR, then any NEON intrinsic in C code > >> will > >> get wrongly scalarised. Builtins can be lowered in either IR > >> operations or builtins, and the back-end has no way of knowing the > >> origin. > >> > >> If the scalarization is lower down, then we risk also changing > >> inline > >> ASM snippets, which is even wor...

[Proposal] Introducing the concept of invalid costs to the IR cost model

2020 Nov 05

[Proposal] Introducing the concept of invalid costs to the IR cost model

Hi, I'd like to propose a change to our cost interfaces so that instead of returning an unsigned value from functions like getInstructionCost, getUserCost, etc., we instead return a wrapper class that encodes an integer cost along with extra state. The extra state can be used to express: 1. A cost as infinitely expensive in order to prevent certain optimisations taking place. For example,

[LLVMdev] Is there pass to break down <4 x float> to scalars

2013 Oct 25

[LLVMdev] Is there pass to break down <4 x float> to scalars

...gt;> It would also need some TargetTransformInfo hooks to decide which >> vectors should be decomposed. > > If I got it right, this may not be necessary, or it may even be harmful. > > Say you decide that <4 x i32> vectors should be left alone, so that your > pass only scalarise the others. But when the vectorizer passes again (to > try and use CPU vector instructions), it might not match the scalarised > version with the vector, and you end up with data movement between scalar > and vector pipelines, which normally slows down CPUs (at least in ARM's > case...

[LLVMdev] [PATCH] Add a Scalarize pass

2013 Nov 13

[LLVMdev] [PATCH] Add a Scalarize pass

Hi Richard, Thanks for working on this. We should probably move this discussion to llvm-dev because it is not strictly related to the patch review anymore. The code below is not representative of general c/c++ code. Usually only domain specific language (such as OpenCL) contain vector instructions. The LLVM pass manager configuration (pass manager builder) is designed for C/C++ compilers, not

[LLVMdev] [PATCH] Add a Scalarize pass

2013 Nov 13

[LLVMdev] [PATCH] Add a Scalarize pass

...need to come up with an > optimization pipe and works for most programs that we care about. I > still think that scalarizing in SD is a reasonable solution for c/c++. I don't understand the basis for the last statement though. Do you mean that you think most cases produce better code if scalarised at the SD stage rather than at the IR level? Could you give an example? If the idea is to have a clean separation of concerns between the front end and LLVM, then it seems like there are two obvious approaches: (a) make it the front end's responsibility to only generate vector widths th...

RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

2018 Jan 06

RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

Amara, >I support this direction Thanks for the support. >but are there actually any real world workloads where gather/scatter scalarisation would be worth it, on any micro-architecture? If we don’t have examples and the compile time cost is non-negligible then I think we’d still like to keep the early >bailouts in some form.’ It's not like I have specific application code in

[LLVMdev] Adding masked vector load and store intrinsics

2014 Oct 24

[LLVMdev] Adding masked vector load and store intrinsics

...ust cast to it from whatever the deal pointer type is. -Hal > > > Also, given that the types of the vectors matter, it seems like we’re > going to need TTI anyway whenever we want to generate one of these, > or else we’ll end up generating an illegal version which has to be > scalarised in the backend. > > > Thanks, > Pete > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- Hal Finkel Assistant Computa...

RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

2018 Jan 05

RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

> On 5 Jan 2018, at 21:01, Saito, Hideki via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > All, > > I'm trying to refactor LoopVectorize such that it has better conformance to VPlan vision going forward > (http://www.llvm.org/docs/Proposals/VectorizationPlan.html). All VP*Recipe class definitions are now > moved to VPlan.h, and I have a patch under review

RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

2018 Jan 07

RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

On 01/05/2018 06:28 PM, Saito, Hideki wrote: > Amara, > >> I support this direction > Thanks for the support. > >> but are there actually any real world workloads where gather/scatter scalarisation would be worth it, on any micro-architecture? If we don’t have examples and the compile time cost is non-negligible then I think we’d still like to keep the early >bailouts in

search for: scalarise