thr3ads.net - similar to: "Implementing a proposed InstCombine optimization"

Displaying 20 results from an estimated 8000 matches similar to: "Implementing a proposed InstCombine optimization"

Implementing a proposed InstCombine optimization

2016 Apr 07

Implementing a proposed InstCombine optimization

I am not entirely sure this is safe. Transforming this to an fsub could change the value stored on platforms that implement negates using arithmetic instead of with bitmath (such as ours) and either canonicalize NaNs or don’t support denormals. This is actually important because this kind of bitmath on floats is very commonly used as part of algorithms for complex math functions that need to get

Implementing a proposed InstCombine optimization

2016 Apr 09

Implementing a proposed InstCombine optimization

It’s definitely one that would need some target hooks, and is probably not actually worth doing without analysing the producers and consumers of the value. If the source and destination values need to be in floating point registers, the cost of FPR<->GPR moves is likely to be a lot higher than the cost of the subtract, even if the xor is free. If the results are going to end up in integer

Implementing a proposed InstCombine optimization

2016 Apr 11

Implementing a proposed InstCombine optimization

> I am not entirely sure this is safe. Transforming this to an fsub could change the value stored on platforms that implement negates using arithmetic instead of with bitmath (such as ours) I think it's probably safe for IEEE754-2008 conformant platforms because negation was clarified to be a non-arithmetic bit flip that cannot cause exceptions in that specification. However, I'm sure

Implementing a proposed InstCombine optimization

2016 Apr 12

Implementing a proposed InstCombine optimization

Good point. The same argument seems to apply to copy() too so I suppose it depends how strict we want to be about it. From: fglaser at apple.com [mailto:fglaser at apple.com] On Behalf Of escha at apple.com Sent: 11 April 2016 20:55 To: Daniel Sanders Cc: Alex Rosenberg; llvm-dev at lists.llvm.org; Carlos Liam Subject: Re: [llvm-dev] Implementing a proposed InstCombine optimization On Apr 11,

[FPEnv] FNEG instruction

2018 Sep 11

[FPEnv] FNEG instruction

Which exactly was the plan? Add a new, regular instruction? Add a new constrained math intrinsic? Both? Andrew Kaylor made a good point here: * As I said, all LLVM IR FP instructions are //assumed// to have no side effects. I'm not sure we want an instruction that goes beyond this to be //defined// as having no side effects. It adds a complication to the language and introduces

[LLVMdev] Signed NaNs in APFloat arithmetic

2014 Aug 08

[LLVMdev] Signed NaNs in APFloat arithmetic

FYI, I was looking at the SSE/AVX codegen here: http://llvm.org/bugs/show_bug.cgi?id=20578 If LLVM starts caring about FP exceptions, even this won't be possible. Is there a way of doing an IEEE-754 fneg in C/C++? Ie, there's no fneg() in libm, so any C method we choose could cause an exception, and that's not allowed by the IEEE definition of fneg. On Fri, Aug 8, 2014 at 12:29 PM,

[FPEnv] FNEG instruction

2018 Sep 11

[FPEnv] FNEG instruction

+1 for an explicit FNEG instruction, since as previously discussed, it has stricter requirements for what value may be returned by the operation. And strengthening the requirement on FSUB is not feasible when the values are variables rather than literals. That is: FSUB(-0.0, NaN) = either NaN *or* -NaN FSUB(-0.0, -NaN) = either NaN *or* -NaN FNEG(NaN) = -NaN FNEG(-NaN) = NaN On Tue, Sep 11,

PATCH: bugfixes for bitmath.h

2013 Sep 04

PATCH: bugfixes for bitmath.h

More or less detailed explanation of this patch. 1. The first parameter of _BitScanReverse() and _BitScanReverse64() is a pointer to unsigned long (4-byte int). However _BitScanReverse64() is called with a pointer to FLAC__uint64 (8-byte int). IMHO it's a bug and this patch changes the type of idx variable inside FLAC__bitmath_ilog2_wide() from FLAC__uint64 to unsigned long. The type of idx

[LLVMdev] XOR Optimization

2011 Jul 26

[LLVMdev] XOR Optimization

Hi Daniel, > Hi folks, > > I couldn't find a specific XOR (OR and AND) optimization on llvm, and > therefore I am about to implement it. > But first I would like to check with you guys that it really does not exist. > > For a simple loop like this: > > nbits = 128; > bit_addr = 0; > while(nbits--) > { > bindex=bit_addr>>5; /* Index is

Why is there still ineffective code after -o3 optimization?

2019 Feb 25

Why is there still ineffective code after -o3 optimization?

Hi, I have some IR module from random generation (mostly ineffective instructions). It has a function with void return, and two function arguments where one is a reference. Therefore, I expect every instruction not altering the value at the 2nd arguments address should be ineffective. Here is the function definition (see below for full ll): define void @_Z27entityMainDataInputCallbackdRd(double

FENV_ACCESS and floating point LibFunc calls

2017 May 11

FENV_ACCESS and floating point LibFunc calls

Sounds like the select lowering issue is definitely separate from the FENV work. Is there a bug report with a C or IR example? You want to generate compare and branch instead of a cmov for something like this? int foo(float x) { if (x < 42.0f) return x; return 12; } define i32 @foo(float %x) { %cmp = fcmp olt float %x, 4.200000e+01 %conv = fptosi float %x to i32 %ret = select

FENV_ACCESS and floating point LibFunc calls

2017 May 11

FENV_ACCESS and floating point LibFunc calls

Thanks, Andy. I'm not sure how to solve that or my case given the DAG's basic-block limit. Probably CodeGenPrepare or SelectionDAGBuilder...or we wait until after isel and try to split it up in a machine instruction pass. I filed my example here: https://bugs.llvm.org/show_bug.cgi?id=33013 Feel free to comment there and/or open a new bug for the FP_TO_UINT case. On Thu, May 11, 2017 at

InstCombine wrongful (?) optimization on BinOp with SameOperands

2015 Sep 30

InstCombine wrongful (?) optimization on BinOp with SameOperands

Hi all, I have been looking at the way LLVM optimizes code before forwarding it to the backend I develop for my company and while building define i32 @test_extract_subreg_func(i32 %x, i32 %y) #0 { entry: %conv = zext i32 %x to i64 %conv1 = zext i32 %y to i64 %mul = mul nuw i64 %conv1, %conv %shr = lshr i64 %mul, 32 %xor = xor i64 %shr, %mul %conv2 = trunc i64 %xor to i32

Failed win32 build

2012 Jun 27

Failed win32 build

Hello, While building from git, I'm getting the following failure (help please): CC ogg_mapping.lo CCLD libFLAC.la Creating library file: .libs/libFLAC.dll.a .libs/bitreader.o: In function `FLAC__clz_soft_uint32': ./include/private/bitmath.h:46: multiple definition of `_FLAC__clz_soft_uint32' .libs/bitmath.o:./include/private/bitmath.h:46: first defined here .libs/fixed.o:

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

2017 Apr 21

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

I think it’s generally true that whenever branches can reliably be predicted branching is faster than a cmov that involves speculative execution, and I would guess that your assessment regarding looping on input values is probably correct. I believe the code that actually creates most of the transformation you’re interested in here is in SelectionDAGLegalize::ExpandNode() in LegalizeDAG.cpp. The

Remove zext-unfolding from InstCombine

2016 Aug 04

Remove zext-unfolding from InstCombine

Hi Sanjay, > Am 02.08.2016 um 21:39 schrieb Sanjay Patel <spatel at rotateright.com>: > > Hi Matthias - > > Sorry for the delayed reply. I think you're on the right path with D22864. No problem, thank you for your answer! > If I'm understanding it correctly, my foo() example and zext_or_icmp_icmp() will be equivalent after your patch is added to InstCombine.

Hitting assertion failure related to vectorization + instcombine

2016 Jul 20

Hitting assertion failure related to vectorization + instcombine

Thanks for notifying me. Yes, this was a recent change. Taking a look now. On Wed, Jul 20, 2016 at 1:49 PM, Michael Kuperstein <mkuper at google.com> wrote: > +Sanjay, who touched this last. :-) > > On Wed, Jul 20, 2016 at 12:44 PM, Ismail Badawi (ibadawi) via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi folks, >> >> I'm hitting the

Hitting assertion failure related to vectorization + instcombine

2016 Jul 22

Hitting assertion failure related to vectorization + instcombine

Sanjay: let me know if this is something that will apply to 3.9. Thanks, Hans On Wed, Jul 20, 2016 at 5:59 PM, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Quick update - the bug existed before I refactored that chunk in > InstSimplify with: > https://reviews.llvm.org/rL275911 > > In fact, as discussed in https://reviews.llvm.org/D22537 - because we have a

[LLVMdev] XOR Optimization

2011 Jul 26

[LLVMdev] XOR Optimization

Hi- > I haven't seen a machine in which OR is faster than ADD nor more energy-efficient. They're all done by the same ALU circuitry which delays the pipeline by its worstcase path timing. So, for modern processor hardware purposes, OR is exactly equal ADD. Transforming ADD to OR isn't strenght reduction at all. Maybe this is benefical only if you have a backend generating circuitry

Hitting assertion failure related to vectorization + instcombine

2016 Jul 25

Hitting assertion failure related to vectorization + instcombine

Sure. David, what do you think about merging this to 3.9? Sanjay: are you saying I'd just apply that diff to InstructionSimplify.cpp, not InstCombineSelect.cpp? On Fri, Jul 22, 2016 at 7:08 AM, Sanjay Patel <spatel at rotateright.com> wrote: > Hi Hans - > > Yes, I think this is a good patch for 3.9 (cc'ing David Majnemer as code > owner). The functional change was

similar to: Implementing a proposed InstCombine optimization