thr3ads.net - similar to: "InstCombine wrongful (?) optimization on BinOp with SameOperands"

Displaying 20 results from an estimated 400 matches similar to: "InstCombine wrongful (?) optimization on BinOp with SameOperands"

Speedups with Ra and jit

2008 May 02

Speedups with Ra and jit

The topic of Ra and jit has come up on this list recently (see http://www.milbo.users.sonic.net/ra/index.html) so I thought people might be interested in this little demo. For it I used my machine, a 3-year old laptop with 2Gb memory running Windows XP, and the good old convolution example, the same one as used on the web page, (though the code on the web page has a slight glitch in it). This

Lowering ISD::TRUNCATE

2018 Aug 06

Lowering ISD::TRUNCATE

I'm working on defining the instructions and implementing the lowering code for a Z80 backend. For now, the backend supports only the native CPU-supported datatypes, which are 8 and 16 bits wide (i.e. no 32 bit long, float, ... yet). So far, a lot of the simple stuff like immediate loads and return values is very straightforward, but now I got stuck with ISD::TRUNCATE, as in:

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

2011 Sep 08

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

On Thu, Sep 08, 2011 at 11:15:06AM -0500, Villmow, Micah wrote: > Peter, > Is there a way to make this flag globally available? Metadata can be fairly expensive to handle at each node when in many cases it is a global flag and not a per operation flag. There are two main reasons why I think we shouldn't go for global flags: 1) It becomes difficult if not impossible to correctly link

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 10

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

Hello all, I wanted to get some feedback on this patch for ScalarEvolution. It addresses a performance problem I am seeing for simple benchmark. Starting with this C code: 01: signed char foo(void) 02: { 03: const int count = 8000; 04: signed char result = 0; 05: int j; 06: 07: for (j = 0; j < count; ++j) { 08: result += (result_t)(3); 09: } 10: 11: return result; 12: } I

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

2011 Sep 08

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

Peter, Is there a way to make this flag globally available? Metadata can be fairly expensive to handle at each node when in many cases it is a global flag and not a per operation flag. > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Robert Quill > Sent: Thursday, September 08, 2011 3:24 AM > To: Peter

Remove zext-unfolding from InstCombine

2016 Jul 27

Remove zext-unfolding from InstCombine

Hi Sanjay, thank you a lot for your answer. I understand that in your examples it is desirable that `foo` and `goo` are canonicalized to the same IR, i.e., something like `@goo`. However, I still have a few open questions, but please correct me in case I'm thinking in the wrong direction. > Am 21.07.2016 um 18:51 schrieb Sanjay Patel <spatel at rotateright.com>: > > I've

[LLVMdev] Folding an insertelt chain

2012 Feb 17

[LLVMdev] Folding an insertelt chain

On Feb 17, 2012, at 12:50 AM, Ivan Llopard wrote: > Hello, > > I've added a little combining operation in DAGCombiner to fold a chain of insertelt nodes if that chain is proved to fully overwrite the very first source vector. In which case, I supposed a build_vector is better. It seems to be safe but I don't know if it is correctly implemented or if it is already done somewhere

[LLVMdev] Multiply i8 operands promotes to i32

2012 Oct 08

[LLVMdev] Multiply i8 operands promotes to i32

Hi, I am trying to complete the hardware multiplier option for MSP430 backend. As the hardware multiplier in most of the MSP430 devices is for i8 and i16 operands, with i16 and i32 result, I am lowering MUL_i8 and MUL_I16. However, the front-end promotes the i8 argument to i32, executes 32-bit multiplier and truncates to 16-bit, so I never lower MUL_I8 nor MUL_I16 but MUL_I32, wchich is lowered

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 17

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

On Mon, Dec 10, 2012 at 2:13 PM, Matthew Curtis <mcurtis at codeaurora.org> wrote: > Hello all, > > I wanted to get some feedback on this patch for ScalarEvolution. > > It addresses a performance problem I am seeing for simple benchmark. > > Starting with this C code: > > 01: signed char foo(void) > 02: { > 03: const int count = 8000; > 04: signed char

[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

2020 Jan 11

[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

Thanks so much for your feedback Simon. I am not sure that what I am proposing here is at odds with what you're referring to (here and in the PR you linked). The key difference AFAICT is that the pattern I am referring to is probably more aptly described as "reducing scalarization" than as "vectorization". The reason I say that is that the inputs are vectors and the output

[LLVMdev] Folding an insertelt chain

2012 Feb 17

[LLVMdev] Folding an insertelt chain

Hello, I've added a little combining operation in DAGCombiner to fold a chain of insertelt nodes if that chain is proved to fully overwrite the very first source vector. In which case, I supposed a build_vector is better. It seems to be safe but I don't know if it is correctly implemented or if it is already done somewhere else. Please find attached the patch. Regards, Ivan

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

2011 Sep 08

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

Hi Peter, This sounds like I really good idea. One thing that did occur to me though from an OpenCL point of view is that ULP accuracy requirements can differ for embedded and full profile so that may need to be handled somehow. Thanks, Rob On Wed, 2011-09-07 at 21:55 +0100, Peter Collingbourne wrote: > Hi, > > This is my proposal to add floating point accuracy support to LLVM. >

[LLVMdev] global type legalization?

2010 Sep 14

[LLVMdev] global type legalization?

Returning to an old discussion here.... On Aug 18, 2010, at 10:42 AM, Chris Lattner wrote: > On Aug 18, 2010, at 10:27 AM, Bob Wilson wrote: >>> I tend to think that it isn't worth the compile time to try to microoptimize out every compare, but I could be convinced otherwise if there are important use cases we're failing to handle. I also do think that whole-function

[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

2020 Jan 11

[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

Absolutely. We do it for scalars, so it would likely be a matter of just extending it. But that is one example. The issue of extracting elements, performing an operation on each element individually and then rebuilding the vector is likely more prevalent than that. At least I think that is the case, but I'll do some analysis to see if it is so or not. On Sat, Jan 11, 2020 at 6:15 PM Craig

Remove zext-unfolding from InstCombine

2016 Jul 21

Remove zext-unfolding from InstCombine

Hi all, I have a question regarding a transformation that is carried out in InstCombine, which has been introduced by r48715. It unfolds expressions of the form `zext(or(icmp, (icmp)))` to `or(zext(icmp), zext(icmp)))` to expose pairs of `zext(icmp)`. In a subsequent iteration these `zext(icmp)` pairs could then (possibly) be optimized by another optimization (which has already been there before

[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

2020 Jan 10

[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

I have added a few PPC-specific DAG combines in the past that follow this pattern on specific operations. Now that it appears that this would be useful to do on yet another operation, I'm wondering what people think about doing this in the target-independent DAG Combiner for any legal/custom operation on the target. TL; DR; The generic pattern would look like this: (build_vector (op

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 18

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

Dan, Thanks for the response ... On 12/17/2012 1:53 PM, Dan Gohman wrote: > On Mon, Dec 10, 2012 at 2:13 PM, Matthew Curtis <mcurtis at codeaurora.org> wrote: >> Hello all, >> >> I wanted to get some feedback on this patch for ScalarEvolution. >> >> It addresses a performance problem I am seeing for simple benchmark. >> >> Starting with this C

Signed Division and InstCombine

2016 May 31

Signed Division and InstCombine

I was looking through the InstCombine pass, and I was wondering why signed division is not considered a valid operation to combine in the canEvaluateTruncated function. This means, given the following code: %conv = sext i16 %0 to i32 %conv1 = sext i16 %1 to i32 %div = sdiv i32 %conv, %conv1 %conv2 = trunc i32 %div to i16 * Assume %0 and %1 are registers created from simple 16-bit loads. We

[Bug 1142] New: invalid binop operation 6nft

2017 Apr 02

[Bug 1142] New: invalid binop operation 6nft

https://bugzilla.netfilter.org/show_bug.cgi?id=1142 Bug ID: 1142 Summary: invalid binop operation 6nft Product: nftables Version: unspecified Hardware: x86_64 OS: other Status: NEW Severity: major Priority: P5 Component: nft Assignee: pablo at netfilter.org Reporter:

[LLVMdev] TargetLowering vs. TargetTransform

2013 Jan 25

[LLVMdev] TargetLowering vs. TargetTransform

Hi Renato, I think that we need to improve ::isTruncateFree, ::isZextFree, etc to include all of the free conversions. Vector and Scalar. Non-free conversions are marked with setOperationAction so the generic parts of TTI should be able to give a reasonable cost estimation. The cost tables should contain cases that are not handled by TTI. So, if we have a clever DAGCombine optimization (that

similar to: InstCombine wrongful (?) optimization on BinOp with SameOperands