similar to: Implementing a proposed InstCombine optimization

Displaying 20 results from an estimated 8000 matches similar to: "Implementing a proposed InstCombine optimization"

2016 Apr 07
7
Implementing a proposed InstCombine optimization
I am not entirely sure this is safe. Transforming this to an fsub could change the value stored on platforms that implement negates using arithmetic instead of with bitmath (such as ours) and either canonicalize NaNs or don’t support denormals. This is actually important because this kind of bitmath on floats is very commonly used as part of algorithms for complex math functions that need to get
2016 Apr 09
2
Implementing a proposed InstCombine optimization
It’s definitely one that would need some target hooks, and is probably not actually worth doing without analysing the producers and consumers of the value. If the source and destination values need to be in floating point registers, the cost of FPR<->GPR moves is likely to be a lot higher than the cost of the subtract, even if the xor is free. If the results are going to end up in integer
2016 Apr 11
4
Implementing a proposed InstCombine optimization
> I am not entirely sure this is safe. Transforming this to an fsub could change the value stored on platforms that implement negates using arithmetic instead of with bitmath (such as ours) I think it's probably safe for IEEE754-2008 conformant platforms because negation was clarified to be a non-arithmetic bit flip that cannot cause exceptions in that specification. However, I'm sure
2016 Apr 12
2
Implementing a proposed InstCombine optimization
Good point. The same argument seems to apply to copy() too so I suppose it depends how strict we want to be about it. From: fglaser at apple.com [mailto:fglaser at apple.com] On Behalf Of escha at apple.com Sent: 11 April 2016 20:55 To: Daniel Sanders Cc: Alex Rosenberg; llvm-dev at lists.llvm.org; Carlos Liam Subject: Re: [llvm-dev] Implementing a proposed InstCombine optimization On Apr 11,
2018 Sep 11
2
[FPEnv] FNEG instruction
Which exactly was the plan? Add a new, regular instruction? Add a new constrained math intrinsic? Both? Andrew Kaylor made a good point here: * As I said, all LLVM IR FP instructions are //assumed// to have no side effects. I'm not sure we want an instruction that goes beyond this to be //defined// as having no side effects. It adds a complication to the language and introduces
2014 Aug 08
2
[LLVMdev] Signed NaNs in APFloat arithmetic
FYI, I was looking at the SSE/AVX codegen here: http://llvm.org/bugs/show_bug.cgi?id=20578 If LLVM starts caring about FP exceptions, even this won't be possible. Is there a way of doing an IEEE-754 fneg in C/C++? Ie, there's no fneg() in libm, so any C method we choose could cause an exception, and that's not allowed by the IEEE definition of fneg. On Fri, Aug 8, 2014 at 12:29 PM,
2018 Sep 11
2
[FPEnv] FNEG instruction
+1 for an explicit FNEG instruction, since as previously discussed, it has stricter requirements for what value may be returned by the operation. And strengthening the requirement on FSUB is not feasible when the values are variables rather than literals. That is: FSUB(-0.0, NaN) = either NaN *or* -NaN FSUB(-0.0, -NaN) = either NaN *or* -NaN FNEG(NaN) = -NaN FNEG(-NaN) = NaN On Tue, Sep 11,
2013 Sep 04
4
PATCH: bugfixes for bitmath.h
More or less detailed explanation of this patch. 1. The first parameter of _BitScanReverse() and _BitScanReverse64() is a pointer to unsigned long (4-byte int). However _BitScanReverse64() is called with a pointer to FLAC__uint64 (8-byte int). IMHO it's a bug and this patch changes the type of idx variable inside FLAC__bitmath_ilog2_wide() from FLAC__uint64 to unsigned long. The type of idx
2011 Jul 26
2
[LLVMdev] XOR Optimization
Hi Daniel, > Hi folks, > > I couldn't find a specific XOR (OR and AND) optimization on llvm, and > therefore I am about to implement it. > But first I would like to check with you guys that it really does not exist. > > For a simple loop like this: > > nbits = 128; > bit_addr = 0; > while(nbits--) > { > bindex=bit_addr>>5; /* Index is
2019 Feb 25
3
Why is there still ineffective code after -o3 optimization?
Hi, I have some IR module from random generation (mostly ineffective instructions). It has a function with void return, and two function arguments where one is a reference. Therefore, I expect every instruction not altering the value at the 2nd arguments address should be ineffective. Here is the function definition (see below for full ll): define void @_Z27entityMainDataInputCallbackdRd(double
2017 May 11
2
FENV_ACCESS and floating point LibFunc calls
Sounds like the select lowering issue is definitely separate from the FENV work. Is there a bug report with a C or IR example? You want to generate compare and branch instead of a cmov for something like this? int foo(float x) { if (x < 42.0f) return x; return 12; } define i32 @foo(float %x) { %cmp = fcmp olt float %x, 4.200000e+01 %conv = fptosi float %x to i32 %ret = select
2017 May 11
3
FENV_ACCESS and floating point LibFunc calls
Thanks, Andy. I'm not sure how to solve that or my case given the DAG's basic-block limit. Probably CodeGenPrepare or SelectionDAGBuilder...or we wait until after isel and try to split it up in a machine instruction pass. I filed my example here: https://bugs.llvm.org/show_bug.cgi?id=33013 Feel free to comment there and/or open a new bug for the FP_TO_UINT case. On Thu, May 11, 2017 at
2015 Sep 30
2
InstCombine wrongful (?) optimization on BinOp with SameOperands
Hi all, I have been looking at the way LLVM optimizes code before forwarding it to the backend I develop for my company and while building define i32 @test_extract_subreg_func(i32 %x, i32 %y) #0 { entry: %conv = zext i32 %x to i64 %conv1 = zext i32 %y to i64 %mul = mul nuw i64 %conv1, %conv %shr = lshr i64 %mul, 32 %xor = xor i64 %shr, %mul %conv2 = trunc i64 %xor to i32
2012 Jun 27
1
Failed win32 build
Hello, While building from git, I'm getting the following failure (help please): CC ogg_mapping.lo CCLD libFLAC.la Creating library file: .libs/libFLAC.dll.a .libs/bitreader.o: In function `FLAC__clz_soft_uint32': ./include/private/bitmath.h:46: multiple definition of `_FLAC__clz_soft_uint32' .libs/bitmath.o:./include/private/bitmath.h:46: first defined here .libs/fixed.o:
2017 Apr 21
2
[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
I think it’s generally true that whenever branches can reliably be predicted branching is faster than a cmov that involves speculative execution, and I would guess that your assessment regarding looping on input values is probably correct. I believe the code that actually creates most of the transformation you’re interested in here is in SelectionDAGLegalize::ExpandNode() in LegalizeDAG.cpp. The
2016 Aug 04
2
Remove zext-unfolding from InstCombine
Hi Sanjay, > Am 02.08.2016 um 21:39 schrieb Sanjay Patel <spatel at rotateright.com>: > > Hi Matthias - > > Sorry for the delayed reply. I think you're on the right path with D22864. No problem, thank you for your answer! > If I'm understanding it correctly, my foo() example and zext_or_icmp_icmp() will be equivalent after your patch is added to InstCombine.
2016 Jul 20
2
Hitting assertion failure related to vectorization + instcombine
Thanks for notifying me. Yes, this was a recent change. Taking a look now. On Wed, Jul 20, 2016 at 1:49 PM, Michael Kuperstein <mkuper at google.com> wrote: > +Sanjay, who touched this last. :-) > > On Wed, Jul 20, 2016 at 12:44 PM, Ismail Badawi (ibadawi) via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi folks, >> >> I'm hitting the
2016 Jul 22
2
Hitting assertion failure related to vectorization + instcombine
Sanjay: let me know if this is something that will apply to 3.9. Thanks, Hans On Wed, Jul 20, 2016 at 5:59 PM, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Quick update - the bug existed before I refactored that chunk in > InstSimplify with: > https://reviews.llvm.org/rL275911 > > In fact, as discussed in https://reviews.llvm.org/D22537 - because we have a
2011 Jul 26
2
[LLVMdev] XOR Optimization
Hi- > I haven't seen a machine in which OR is faster than ADD nor more energy-efficient. They're all done by the same ALU circuitry which delays the pipeline by its worstcase path timing. So, for modern processor hardware purposes, OR is exactly equal ADD. Transforming ADD to OR isn't strenght reduction at all. Maybe this is benefical only if you have a backend generating circuitry
2016 Jul 25
2
Hitting assertion failure related to vectorization + instcombine
Sure. David, what do you think about merging this to 3.9? Sanjay: are you saying I'd just apply that diff to InstructionSimplify.cpp, not InstCombineSelect.cpp? On Fri, Jul 22, 2016 at 7:08 AM, Sanjay Patel <spatel at rotateright.com> wrote: > Hi Hans - > > Yes, I think this is a good patch for 3.9 (cc'ing David Majnemer as code > owner). The functional change was