similar to: Meaning of `sub nsw`

Displaying 20 results from an estimated 30000 matches similar to: "Meaning of `sub nsw`"

2012 Aug 24
2
[LLVMdev] Stop opt from producing 832 bit integer?
I'm translating llvm's intermediate representation, after optimization, to the intermediate representation of another optimizer. One of the problems I've run into is that llvm sometimes (although rarely) produces strangely sized integers after an opt pass with -O3 (in this example, 832 bits). I need to use 8, 16, or 32 bit integers for the other intermediate language. In short,
2018 Jan 17
3
always allow canonicalizing to 8- and 16-bit ops?
Example: define i8 @narrow_add(i8 %x, i8 %y) { %x32 = zext i8 %x to i32 %y32 = zext i8 %y to i32 %add = add nsw i32 %x32, %y32 %tr = trunc i32 %add to i8 ret i8 %tr } With no data-layout or with an x86 target where 8-bit integer is in the data-layout, we reduce to: $ ./opt -instcombine narrowadd.ll -S define i8 @narrow_add(i8 %x, i8 %y) { %add = add i8 %x, %y ret i8 %add } But on
2013 Aug 15
4
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
Hi all, I have investigated the 6X extra compile-time overhead when Polly compiles the simple nestedloop benchmark in LLVM-testsuite. (http://188.40.87.11:8000/db_default/v4/nts/31?compare_to=28&baseline=28). Preliminary results show that such compile-time overhead is resulted by the complicated polly-dependence analysis. However, the key seems to be the polly-prepare pass, which introduces
2018 Aug 15
2
[SCEV] Why is backedge-taken count <nsw> instead of <nuw>?
Hello, If I run clang on the following code: void func(unsigned n) { > for (unsigned long x = 1; x < n; ++x) > dummy(x); > } I get the following llvm ir: define void @func(i32 %n) { > entry: > %conv = zext i32 %n to i64 > %cmp5 = icmp ugt i32 %n, 1 > br i1 %cmp5, label %for.body, label %for.cond.cleanup > for.cond.cleanup:
2013 Oct 27
2
[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?
The following piece of IR is a fixed point for opt -std-compile-opts/-O3: --- target datalayout = "e-p:64:64:64-S128-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-unknown-linux-gnu" ; Function Attrs: nounwind readonly define i32 @get32Bits(i8*
2018 Jan 22
2
always allow canonicalizing to 8- and 16-bit ops?
Thanks for the perf testing. I assume that DAG legalization is equipped to handle these cases fairly well, or someone would've complained by now... FWIW (and at least some of this can be blamed on me), instcombine already does the narrowing transforms without checking shouldChangeType() for binops like and/or/xor/udiv. The justification was that narrower ops are always better for
2018 Aug 15
2
[SCEV] Why is backedge-taken count <nsw> instead of <nuw>?
Is that why we do not deduce +<nsw> from "add nsw" either? Is that an intrinsic limitation of creating a context-invariant expressions from a Value* or is that a limitation of our implementation (our unification not considering the nsw flags)? On Wed, Aug 15, 2018 at 12:39 PM Friedman, Eli <efriedma at codeaurora.org> wrote: > On 8/15/2018 12:21 PM, Alexandre Isoard via
2018 Jan 22
0
always allow canonicalizing to 8- and 16-bit ops?
Hello Thanks for looking into this. I can't be very confident what the knock on result of a change like that would be, especially on architectures that are not Arm. What I can do though, is run some benchmarks and look at that results. Using this patch: --- a/lib/Transforms/InstCombine/InstructionCombining.cpp +++ b/lib/Transforms/InstCombine/InstructionCombining.cpp @@ -150,6 +150,9 @@
2014 Dec 26
3
[LLVMdev] Correct usage of `llvm.assume` for loop vectorization alignment?
Using LLVM ToT and Hal's helpful slide deck [1], I've been trying to use `llvm.assume` to communicate pointer alignment guarantees to vector load and store instructions. For example, in [2] %5 and %9 are guaranteed to be 32-byte aligned. However, if I run this IR through `opt -O3 -datalayout -S`, the vectorized loads and stores are still 1-byte aligned [3]. What's going wrong? Do I
2013 Aug 16
2
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
Hi Sebpop, Thanks for your explanation. I noticed that Polly would finally run the SROA pass to transform these load/store instructions into scalar operations. Is it possible to run such a pass before polly-dependence analysis? Star Tan At 2013-08-15 21:12:53,"Sebastian Pop" <sebpop at gmail.com> wrote: >Codeprepare and independent blocks are introducing these loads and
2018 Aug 15
2
[SCEV] Why is backedge-taken count <nsw> instead of <nuw>?
I'm not sure I understand the poison/undef/UB distinctions. But on this example: define i32 @func(i1 zeroext %b, i32 %x, i32 %y) { > entry: > %adds = add nsw i32 %x, %y > %addu = add nuw i32 %x, %y > %cond = select i1 %b, i32 %adds, i32 %addu > ret i32 %cond > } It is important to not propagate the nsw/nuw between the two SCEV expressions (which unification would
2013 Aug 15
0
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
Codeprepare and independent blocks are introducing these loads and stores. These are prepasses that polly runs prior to building the dependence graph to transform scalar dependences into data dependences. Ether was working on eliminating the rewrite of scalar dependences. On Thu, Aug 15, 2013 at 5:32 AM, Star Tan <tanmx_star at yeah.net> wrote: > Hi all, > > I have investigated the
2012 Dec 20
2
[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
Ok, so I think I've mis-represented what's really happening. Ignore my previous statements concerning %add :) Again, given: 05: for.body: ; preds = %entry, %for.body 06: %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ] 07: %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ] 08: %conv2 = and i32 %result.03, 255 09: %add = add nsw
2014 Nov 11
3
[LLVMdev] supporting SAD in loop vectorizer
----- Original Message ----- > From: "Dibyendu Das" <Dibyendu.Das at amd.com> > To: "Hal Finkel" <hfinkel at anl.gov>, "Renato Golin" <renato.golin at linaro.org> > Cc: llvmdev at cs.uiuc.edu > Sent: Tuesday, November 4, 2014 12:15:12 PM > Subject: RE: [LLVMdev] supporting SAD in loop vectorizer > > Here's the simple SAD
2013 Aug 16
0
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
I do not think that running SROA before polly is a good idea: it would defeat the purpose of the code preparation passes that polly intentionally schedules for the data dependence analysis. If you remove the data references before polly runs, you would miss them in the dependence graph: that could lead to incorrect transforms. On Thu, Aug 15, 2013 at 7:28 PM, Star Tan <tanmx_star at
2018 Aug 16
3
[SCEV] Why is backedge-taken count <nsw> instead of <nuw>?
Ok. To go back to the original issue, would it be meaningful to add a SCEVUMax(0, BTC) on the final BTC computed by SCEV? So that it does not use "negative values"? On Wed, Aug 15, 2018 at 2:40 PM Friedman, Eli <efriedma at codeaurora.org> wrote: > On 8/15/2018 2:27 PM, Alexandre Isoard wrote: > > I'm not sure I understand the poison/undef/UB distinctions. >
2014 Nov 11
4
[LLVMdev] supporting SAD in loop vectorizer
----- Original Message ----- > From: "James Molloy" <james at jamesmolloy.co.uk> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "Dibyendu Das" <Dibyendu.Das at amd.com>, llvmdev at cs.uiuc.edu > Sent: Tuesday, November 11, 2014 8:21:37 AM > Subject: Re: [LLVMdev] supporting SAD in loop vectorizer > > > If you'd like to
2015 Jan 28
2
[LLVMdev] RFC: generation of PSAD instruction
Hello, I was looking at the following test case which is very relevant in imaging applications. int sad(unsigned char *pix1, unsigned char *pix2) { int sum = 0; for( int x = 0; x < 16; x++ ) { sum += abs( pix1[x] - pix2[x] ); } return sum; } The llvm IR generated after all the IR
2012 Jan 26
3
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On Thu, 2012-01-26 at 15:12 -0600, Sebastian Pop wrote: > On Thu, Jan 26, 2012 at 2:49 PM, Hal Finkel <hfinkel at anl.gov> wrote: > > Thanks! Did you compile with any non-default flags other than -mllvm > > -vectorize? > > I used -O3 and -vectorize, no other non-default flags. If I run clang -O3 -mllvm -vectorize -S -emit-llvm -o test.ll test.c then I get no
2012 Jan 26
0
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On Thu, Jan 26, 2012 at 3:41 PM, Hal Finkel <hfinkel at anl.gov> wrote: > On Thu, 2012-01-26 at 15:36 -0600, Sebastian Pop wrote: >> arm-none-linux-gnueabi > > Indeed, adding -ccc-host-triple arm-none-linux-gnueabi I also get Minor remark: please use -target instead of -ccc-host-triple that is now deprecated. Thanks for looking at this testcase. Sebastian -- Qualcomm