thr3ads.net - similar to: "[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations"

Displaying 20 results from an estimated 5000 matches similar to: "[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations"

[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations

2015 May 06

[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations

For void test0(unsigned short a, unsigned short * in, unsigned short * out) { for (unsigned short w = 1; w < a - 1; w++) //this will never overflow out[w] = in[w+7] * 2; } I think it will be sufficient to add a couple of new cases to ScalarEvolution::HowManyLessThans -- zext(A) ult zext(B) == A ult B sext(A) slt sext(B) == A slt B Currently it bails out if it sees a non-add

[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi

2016 Apr 18

[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi

On 18 April 2016 at 16:33, Silviu Baranga <Silviu.Baranga at arm.com> wrote: > Doing a grep "eabi" * -R | grep darwin in llvm I found the test divmod-eabi.ll > which uses the triple armv7-apple-darwin-eabi. What format does that have? Certainly not ELF. :) But I didn't mean "has eabi on triple", but "is in none-eabi mode", which may have to check a

[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi

2016 Apr 18

[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi

On 18 April 2016 at 16:18, Silviu Baranga <Silviu.Baranga at arm.com> wrote: > This doesn't look like something ACLE specific (I can't find it in the ACLE doc). Sorry, I didn't mean it was ACLE, only that you guys were fiddling with macros. :) > This seems to be a generic macro. I think it would make sense to define it > if we know we're emitting ELF. Since the

[LLVMdev] [RFC][Float2Int] Converting (fcmp Pred, x * F, y) to (ICmp ...)

2015 Apr 29

[LLVMdev] [RFC][Float2Int] Converting (fcmp Pred, x * F, y) to (ICmp ...)

Hi, I'm trying expand the Float2Int pass in order to make it able to optimize expressions like f * x > y, where x and y are integers (we'll assume unsigned for simplicity) and f is a floating point constant. The optimization would convert the expression to something like: (a * x)/b > y where a and b are integers guessed by the compiler (currently using continued

[LLVMdev] [RFC][Float2Int] Converting (fcmp Pred, x * F, y) to (ICmp ...)

2015 Apr 29

[LLVMdev] [RFC][Float2Int] Converting (fcmp Pred, x * F, y) to (ICmp ...)

> On Apr 29, 2015, at 2:33 PM, Matt Arsenault <arsenm2 at gmail.com> wrote: > >> On Apr 29, 2015, at 10:06 AM, Silviu Baranga <Silviu.Baranga at arm.com <mailto:Silviu.Baranga at arm.com>> wrote: >> >> Note that dividing by an integer constant should be a cheap operation >> compared to FP multiplication and comparison as this would get lowered to a

[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi

2016 Apr 16

[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi

On 16 April 2016 at 01:44, Zhao, Weiming via cfe-dev <cfe-dev at lists.llvm.org> wrote: > I'm building libunwind for ARM baremetal using clang. > I notice that __ELF__ is used in libunwind and the macro is only defined for > Linux target on ARM. > Should we also predefine that for arm-none-eabi target? Do you mean in Clang's ARMTargetInfo::getTargetDefines() ? I think

[LLVMdev] About the partial update clearence / dependency breaking mechanism

2013 Mar 25

[LLVMdev] About the partial update clearence / dependency breaking mechanism

Hello, I am currently looking into the advantages of using the partial update clearance / dependency breaking mechanism for some ARM cores. It seems that the ARM specific code for this will always return a clearance of 0 for VLD1LNd32 because of the following code in getPartialRegUpdateClearance: > if (UseOp != -1 && MI->getOperand(UseOp).readsReg()) > return 0; so

[LLVMdev] About the partial update clearence / dependency breaking mechanism

2013 Mar 25

[LLVMdev] About the partial update clearence / dependency breaking mechanism

On Mar 25, 2013, at 5:02 AM, Silviu Baranga <silbar01 at arm.com> wrote: > Hello, > > I am currently looking into the advantages of using the > partial update clearance / dependency breaking mechanism > for some ARM cores. > > It seems that the ARM specific code for this will always > return a clearance of 0 for VLD1LNd32 because of the following > code in

[LLVMdev] Making a CopyToReg/CopyFromReg into a zext/sext?

2015 Jan 27

[LLVMdev] Making a CopyToReg/CopyFromReg into a zext/sext?

I have a CopyToReg that is copying from different size types, what's the best way to change that to a zext or sext node based on signed or unsigned? I'm fairly unfamiliar with SelectionDAG process (outside of the docs on llvm website). It seems like I should be able to insert a custom hook using the register class to identify the type, potentially in ISelDAGToDag.cpp or is there a better

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 18

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

On Tue, Dec 18, 2012 at 9:56 AM, Matthew Curtis <mcurtis at codeaurora.org> wrote: > > Here's how I'm evaluating the expression (in my head): > > 00: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[n],3), i8), i32),3) > | > 01: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[0],3), i8), i32),3) >

[LLVMdev] Making a CopyToReg/CopyFromReg into a zext/sext?

2015 Jan 27

[LLVMdev] Making a CopyToReg/CopyFromReg into a zext/sext?

Thanks for getting back to me. So those nodes record if the type has already been expanded from a narrower type. Can you elaborate how I could use these to help? Again, I'm pretty unfamiliar with the SDNodes. Thanks. On Tue, Jan 27, 2015 at 3:22 PM, Matt Arsenault <Matthew.Arsenault at amd.com> wrote: > On 01/27/2015 12:16 PM, Ryan Taylor wrote: > > I have a CopyToReg that

Lowering SEXT (and ZEXT) efficiently on Z80

2018 Jul 18

Lowering SEXT (and ZEXT) efficiently on Z80

I'm working on a Z80 backend and am trying to efficiently lower SEXT, specifically 8 to 16 bit, in LowerOperation() according to the following rules: The Z80 has 8 bit registers and 16 bit registers, which are aliased versions of two 8 bit registers. 8 bit registers are named A, H, L, D, E and some more. 16 bit registers are HL (composed of H + L), DE (D + E) - and some more - with L and

How to clean-up SCEVs from sext/zext/trunc ?

2020 Sep 18

How to clean-up SCEVs from sext/zext/trunc ?

Hi everyone, I've been using SCEV Delinearization but it fails when other-than-pointer-size values are used in the subscripts. To make that clear, say that we have a target in which pointers are 64-bit for (int32_t i ...) for (int32_t j ...) a[i][j] = ... doesn't work while this: for (int64_t i ...) for (int64_t j ...) does work. I assume that the former does not work because

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 18

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

Dan, Thanks for the response ... On 12/17/2012 1:53 PM, Dan Gohman wrote: > On Mon, Dec 10, 2012 at 2:13 PM, Matthew Curtis <mcurtis at codeaurora.org> wrote: >> Hello all, >> >> I wanted to get some feedback on this patch for ScalarEvolution. >> >> It addresses a performance problem I am seeing for simple benchmark. >> >> Starting with this C

[LLVMdev] Making a CopyToReg/CopyFromReg into a zext/sext?

2015 Jan 27

[LLVMdev] Making a CopyToReg/CopyFromReg into a zext/sext?

I have a CopyToReg that is moving a 16bit reg to a 32bit reg, it's currently being mapped out as a simple mov (not an ext), I would like to change that to an ext. It seemed that the SelDAG was the easiest and cleanest way to do this. I can change the mov to an extension MI in the .td file; however, I can't tell at that point whether it's a sext or a zext, so it seemed the SelDAG was

[SCEV] Inconsistent SCEV formation for zext

2018 Feb 10

[SCEV] Inconsistent SCEV formation for zext

Hi, +CC Max, Serguei This looks like a textbook case of why caching is hard. We first call getZeroExtendExpr on %dec, and this call does end up returning an add rec. However, in the process of simplifying the zext, it also calls into isLoopBackedgeGuardedByCond which in turn calls getZeroExtendExpr(%dec) again. However, this second (recursive) time, we don't simplify the zext and cache a

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 20

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

Ok, so I think I've mis-represented what's really happening. Ignore my previous statements concerning %add :) Again, given: 05: for.body: ; preds = %entry, %for.body 06: %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ] 07: %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ] 08: %conv2 = and i32 %result.03, 255 09: %add = add nsw

[SCEV] Inconsistent SCEV formation for zext

2018 Feb 08

[SCEV] Inconsistent SCEV formation for zext

Hi Sanjoy, SCEV is behaving inconsistently when forming SCEV for this zext instruction in the attached test case- %conv5 = zext i32 %dec to i64 If we request a SCEV for the instruction, it returns- (zext i32 {{-1,+,1}<nw><%for.body>,+,-1}<nw><%for.body7> to i64) This can be seen by invoking- $ opt -analyze -scalar-evolution inconsistent-scev-zext.ll But when computing

How to clean-up SCEVs from sext/zext/trunc ?

2020 Sep 22

How to clean-up SCEVs from sext/zext/trunc ?

Hi Michael, Thanks for the reply. I've seen but have not used it. FWIW, the problem is not how to generate the runtime checks (although it'd be good if we can get it for free), but how to clean up the SCEVs. Does PSE do that ? Cheers, Stefanos Στις Δευ, 21 Σεπ 2020 στις 11:59 π.μ., ο/η Michael Kruse < llvmdev at meinersbur.de> έγραψε: > Have you looked into

[IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code

2016 Apr 23

[IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code

Hi Sanjoy, Thank you for looking into this! Yes, your patch does fix my larger test case too. My algorithm gets double performance improvement with the patch, as the loop now has a smaller instruction set and succeeds to unroll w/o any extra #pragma's. I also ran the LLVM tests against the patch. There are 6 new failures: Analysis/LoopAccessAnalysis/number-of-memchecks.ll

similar to: [LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations