similar to: [LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations

Displaying 20 results from an estimated 5000 matches similar to: "[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations"

2015 May 06
2
[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations
For void test0(unsigned short a, unsigned short * in, unsigned short * out) { for (unsigned short w = 1; w < a - 1; w++) //this will never overflow out[w] = in[w+7] * 2; } I think it will be sufficient to add a couple of new cases to ScalarEvolution::HowManyLessThans -- zext(A) ult zext(B) == A ult B sext(A) slt sext(B) == A slt B Currently it bails out if it sees a non-add
2016 Apr 18
2
[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi
On 18 April 2016 at 16:33, Silviu Baranga <Silviu.Baranga at arm.com> wrote: > Doing a grep "eabi" * -R | grep darwin in llvm I found the test divmod-eabi.ll > which uses the triple armv7-apple-darwin-eabi. What format does that have? Certainly not ELF. :) But I didn't mean "has eabi on triple", but "is in none-eabi mode", which may have to check a
2016 Apr 18
2
[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi
On 18 April 2016 at 16:18, Silviu Baranga <Silviu.Baranga at arm.com> wrote: > This doesn't look like something ACLE specific (I can't find it in the ACLE doc). Sorry, I didn't mean it was ACLE, only that you guys were fiddling with macros. :) > This seems to be a generic macro. I think it would make sense to define it > if we know we're emitting ELF. Since the
2015 Apr 29
2
[LLVMdev] [RFC][Float2Int] Converting (fcmp Pred, x * F, y) to (ICmp ...)
Hi, I'm trying expand the Float2Int pass in order to make it able to optimize expressions like f * x > y, where x and y are integers (we'll assume unsigned for simplicity) and f is a floating point constant. The optimization would convert the expression to something like: (a * x)/b > y where a and b are integers guessed by the compiler (currently using continued
2015 Apr 29
2
[LLVMdev] [RFC][Float2Int] Converting (fcmp Pred, x * F, y) to (ICmp ...)
> On Apr 29, 2015, at 2:33 PM, Matt Arsenault <arsenm2 at gmail.com> wrote: > >> On Apr 29, 2015, at 10:06 AM, Silviu Baranga <Silviu.Baranga at arm.com <mailto:Silviu.Baranga at arm.com>> wrote: >> >> Note that dividing by an integer constant should be a cheap operation >> compared to FP multiplication and comparison as this would get lowered to a
2016 Apr 16
2
[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi
On 16 April 2016 at 01:44, Zhao, Weiming via cfe-dev <cfe-dev at lists.llvm.org> wrote: > I'm building libunwind for ARM baremetal using clang. > I notice that __ELF__ is used in libunwind and the macro is only defined for > Linux target on ARM. > Should we also predefine that for arm-none-eabi target? Do you mean in Clang's ARMTargetInfo::getTargetDefines() ? I think
2013 Mar 25
2
[LLVMdev] About the partial update clearence / dependency breaking mechanism
Hello, I am currently looking into the advantages of using the partial update clearance / dependency breaking mechanism for some ARM cores. It seems that the ARM specific code for this will always return a clearance of 0 for VLD1LNd32 because of the following code in getPartialRegUpdateClearance: > if (UseOp != -1 && MI->getOperand(UseOp).readsReg()) > return 0; so
2013 Mar 25
0
[LLVMdev] About the partial update clearence / dependency breaking mechanism
On Mar 25, 2013, at 5:02 AM, Silviu Baranga <silbar01 at arm.com> wrote: > Hello, > > I am currently looking into the advantages of using the > partial update clearance / dependency breaking mechanism > for some ARM cores. > > It seems that the ARM specific code for this will always > return a clearance of 0 for VLD1LNd32 because of the following > code in
2015 Jan 27
2
[LLVMdev] Making a CopyToReg/CopyFromReg into a zext/sext?
I have a CopyToReg that is copying from different size types, what's the best way to change that to a zext or sext node based on signed or unsigned? I'm fairly unfamiliar with SelectionDAG process (outside of the docs on llvm website). It seems like I should be able to insert a custom hook using the register class to identify the type, potentially in ISelDAGToDag.cpp or is there a better
2012 Dec 18
0
[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
On Tue, Dec 18, 2012 at 9:56 AM, Matthew Curtis <mcurtis at codeaurora.org> wrote: > > Here's how I'm evaluating the expression (in my head): > > 00: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[n],3), i8), i32),3) > | > 01: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[0],3), i8), i32),3) >
2015 Jan 27
2
[LLVMdev] Making a CopyToReg/CopyFromReg into a zext/sext?
Thanks for getting back to me. So those nodes record if the type has already been expanded from a narrower type. Can you elaborate how I could use these to help? Again, I'm pretty unfamiliar with the SDNodes. Thanks. On Tue, Jan 27, 2015 at 3:22 PM, Matt Arsenault <Matthew.Arsenault at amd.com> wrote: > On 01/27/2015 12:16 PM, Ryan Taylor wrote: > > I have a CopyToReg that
2018 Jul 18
2
Lowering SEXT (and ZEXT) efficiently on Z80
I'm working on a Z80 backend and am trying to efficiently lower SEXT, specifically 8 to 16 bit, in LowerOperation() according to the following rules: The Z80 has 8 bit registers and 16 bit registers, which are aliased versions of two 8 bit registers. 8 bit registers are named A, H, L, D, E and some more. 16 bit registers are HL (composed of H + L), DE (D + E) - and some more - with L and
2020 Sep 18
2
How to clean-up SCEVs from sext/zext/trunc ?
Hi everyone, I've been using SCEV Delinearization but it fails when other-than-pointer-size values are used in the subscripts. To make that clear, say that we have a target in which pointers are 64-bit for (int32_t i ...) for (int32_t j ...) a[i][j] = ... doesn't work while this: for (int64_t i ...) for (int64_t j ...) does work. I assume that the former does not work because
2012 Dec 18
2
[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
Dan, Thanks for the response ... On 12/17/2012 1:53 PM, Dan Gohman wrote: > On Mon, Dec 10, 2012 at 2:13 PM, Matthew Curtis <mcurtis at codeaurora.org> wrote: >> Hello all, >> >> I wanted to get some feedback on this patch for ScalarEvolution. >> >> It addresses a performance problem I am seeing for simple benchmark. >> >> Starting with this C
2015 Jan 27
4
[LLVMdev] Making a CopyToReg/CopyFromReg into a zext/sext?
I have a CopyToReg that is moving a 16bit reg to a 32bit reg, it's currently being mapped out as a simple mov (not an ext), I would like to change that to an ext. It seemed that the SelDAG was the easiest and cleanest way to do this. I can change the mov to an extension MI in the .td file; however, I can't tell at that point whether it's a sext or a zext, so it seemed the SelDAG was
2018 Feb 10
0
[SCEV] Inconsistent SCEV formation for zext
Hi, +CC Max, Serguei This looks like a textbook case of why caching is hard. We first call getZeroExtendExpr on %dec, and this call does end up returning an add rec. However, in the process of simplifying the zext, it also calls into isLoopBackedgeGuardedByCond which in turn calls getZeroExtendExpr(%dec) again. However, this second (recursive) time, we don't simplify the zext and cache a
2012 Dec 20
2
[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
Ok, so I think I've mis-represented what's really happening. Ignore my previous statements concerning %add :) Again, given: 05: for.body: ; preds = %entry, %for.body 06: %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ] 07: %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ] 08: %conv2 = and i32 %result.03, 255 09: %add = add nsw
2018 Feb 08
2
[SCEV] Inconsistent SCEV formation for zext
Hi Sanjoy, SCEV is behaving inconsistently when forming SCEV for this zext instruction in the attached test case- %conv5 = zext i32 %dec to i64 If we request a SCEV for the instruction, it returns- (zext i32 {{-1,+,1}<nw><%for.body>,+,-1}<nw><%for.body7> to i64) This can be seen by invoking- $ opt -analyze -scalar-evolution inconsistent-scev-zext.ll But when computing
2020 Sep 22
2
How to clean-up SCEVs from sext/zext/trunc ?
Hi Michael, Thanks for the reply. I've seen but have not used it. FWIW, the problem is not how to generate the runtime checks (although it'd be good if we can get it for free), but how to clean up the SCEVs. Does PSE do that ? Cheers, Stefanos Στις Δευ, 21 Σεπ 2020 στις 11:59 π.μ., ο/η Michael Kruse < llvmdev at meinersbur.de> έγραψε: > Have you looked into
2016 Apr 23
2
[IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code
Hi Sanjoy, Thank you for looking into this! Yes, your patch does fix my larger test case too. My algorithm gets double performance improvement with the patch, as the loop now has a smaller instruction set and succeeds to unroll w/o any extra #pragma's. I also ran the LLVM tests against the patch. There are 6 new failures: Analysis/LoopAccessAnalysis/number-of-memchecks.ll