similar to: [LLVMdev] SimplifyDemandedUseBits vs (and (xor %V, -1), 4096)

Displaying 20 results from an estimated 30000 matches similar to: "[LLVMdev] SimplifyDemandedUseBits vs (and (xor %V, -1), 4096)"

2011 Jul 27
0
[LLVMdev] XOR Optimization
2011/7/26 Daniel Nicácio <dnicacios at gmail.com>: > > I also would like to see why the "XOR  A,  -1" is not turned into a NOT, any > Probably because NOT (like NEG) doesn't exist :) <http://llvm.org/docs/LangRef.html#instref> I assume the decision was made that it wasn't worth adding the extra unary instructions when they can easily be handled in codegen
2011 Jul 27
2
[LLVMdev] XOR Optimization
After a few more tests, I found out that if we set -unroll-threshold to a value large enough, and run "opt -std-compile-opts" or "opt -O3" 3 times, the unroll will be able to unroll the original loop 32 times, and when you have it unrolled for at least 32 times a optimization is triggered, folding it to a single "%xor.3.3.1 = xor i32 %tmp6, -1" (dont know why it does
2004 Sep 10
3
patch
So here is quick patch solving the problem, now it should be PIC. -- Miroslav Lichvar lichvarm@phoenix.inf.upol.cz -------------- next part -------------- --- lpc_asm.nasm.orig Wed Jul 18 02:23:40 2001 +++ lpc_asm.nasm Sat Nov 17 21:09:46 2001 @@ -59,10 +59,10 @@ ; ALIGN 16 cident FLAC__lpc_compute_autocorrelation_asm_ia32 - ;[esp + 24] == autoc[] - ;[esp + 20] == lag - ;[esp + 16] ==
2008 Aug 08
0
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
On Aug 7, 2008, at 12:13 PM, David Greene wrote: > On Tuesday 05 August 2008 13:27, David Greene wrote: > >> Neither solution eliminates the need for instcombine to be careful >> and >> consult masks from time to time. >> >> Perhaps I'm totally missing something. Concrete examples would be >> helpful. > > Ok, so I took my own advice and
2004 Sep 10
2
An assembly optimization and fix
I have optimized FLAC__fixed_compute_best_predictor_asm_ia32_mmx_cmov function and fixed bug when data_len == 0. Now the function is about 50% faster and flac -5 is about 5% faster on my box. I have tested it thoroughly, I think it can go to flac 1.0.4. -- Miroslav Lichvar -------------- next part -------------- --- src/libFLAC/ia32/fixed_asm.nasm.orig 2002-01-26 19:05:12.000000000 +0100 +++
2016 Sep 13
2
undef * 0
Thanks for your answers. Another example of unsound transformation on Boolean algebra. According to the LLVM documentation (http://llvm.org/docs/LangRef.html#undefined-values) it is unsafe to consider ' a & undef = undef ' and ' a | undef = undef ' but 'undef xor undef = undef' is safe. Now, given an expression ((a & (~b)) | ((~a) & b)) where a and b are
2011 Jul 26
2
[LLVMdev] XOR optimization
Hi folks, I couldn't find a specific XOR (OR and AND) optimization on llvm, and therefore I am about to implement it. But first I would like to check with you guys that it really does not exist. For a simple loop like this: nbits = 128; bit_addr = 0; while(nbits--) { bindex=bit_addr>>5; /* Index is number /32 */ bitnumb=bit_addr % 32; /* Bit number in longword */
2011 Jul 26
0
[LLVMdev] XOR Optimization
Hi, On Tue, Jul 26, 2011 at 11:32 AM, Matt Johnson <johnso87 at crhc.illinois.edu>wrote: > Hi Daniel, > > > Hi folks, > > > > I couldn't find a specific XOR (OR and AND) optimization on llvm, and > > therefore I am about to implement it. > > But first I would like to check with you guys that it really does not > exist. > > > > For a
2011 Jul 26
0
[LLVMdev] XOR Optimization
Hi Duncan, when I run "opt -std-compile-opts" on the original source code it has the same output of O3. when I run "opt -std-compile-opts" on the -O3 optimized code, things get even more weird, it outputs the following code: while.body: ; preds = %while.body, %entry %indvar = phi i32 [ 0, %entry ], [ %indvar.next.3, %while.body ] %tmp
2012 Oct 22
2
bitwise XOR of Matrix
Hi, I would like to xor (bitwise) two matrices filled with binary values (0,1). The result of such XOR is expected to be 0,1. But apparently neither of xor nor bitXor is working in this case. I got ": binary operation on non-conformable arrays" error message when I used xor (M1,M2) . The problem with bitXor(M1,M2) is that it just truncates the result into a vector rather than a
2011 Jul 26
2
[LLVMdev] XOR Optimization
Hi Daniel, > Hi folks, > > I couldn't find a specific XOR (OR and AND) optimization on llvm, and > therefore I am about to implement it. > But first I would like to check with you guys that it really does not exist. > > For a simple loop like this: > > nbits = 128; > bit_addr = 0; > while(nbits--) > { > bindex=bit_addr>>5; /* Index is
2010 Jul 06
2
[LLVMdev] ConstantFold 'undef xor undef'
Hi, At line 2292, lib/VMCore/ConstantFold.cpp (llvm2.7 release) Constant *llvm::ConstantFoldBinaryInstruction(unsigned Opcode, Constant *C1, Constant *C2) { ... // Handle UndefValue up front. if (isa<UndefValue>(C1) || isa<UndefValue>(C2)) { switch (Opcode) { case Instruction::Xor: if (isa<UndefValue>(C1)
2008 Aug 07
6
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
On Tuesday 05 August 2008 13:27, David Greene wrote: > Neither solution eliminates the need for instcombine to be careful and > consult masks from time to time. > > Perhaps I'm totally missing something. Concrete examples would be helpful. Ok, so I took my own advice and thought about CSE and instcombine a bit. I wrote the code by hand in a sort of pseudo-llvm language, so
2012 Sep 26
0
[LLVMdev] Folding nodes with more than one use during ISel
I'm working on a backend for the Freescale CPU12 family as a hobby project and I'm having difficulty getting the instruction selection pass to handle the indirect indexed addressing modes. I'd really appreciate advice on how best to handle them. The following llvm instructions: %arrayidx = getelementptr inbounds i8** %p, i16 3 %0 = load i8** %arrayidx, align 2, !tbaa !0 %1 =
2011 Jul 28
1
[LLVMdev] XOR Optimization
Hey guys, I still think there is no optimization doing what I want. When the loop is unrolled 32 times, llvm is able to identify that the loop is working on a whole word, it finds some constants and propagate them, resulting in the folded XOR instruction. However, when the loop operates on some bits of the word, llvm is still not able to fold those XOR, even when the operated bits does not
2011 Jul 26
0
[LLVMdev] XOR Optimization
"The fact that the loop is unrolled explains why the XORs, SHLs, and ORs are not folded into 1." I dont see why the unrolling explains it. "I think he is trying to say this expression generated by unrolling by a factor of 4 can indeed be folded into a single XOR, SHL and OR. " Precisely. The code generated by unrolling can be folded into a single XOR and SHL. And even if it
2019 Mar 04
2
Where's the optimiser gone (part 11): use the proper instruction for sign extension
Compile with -O3 -m32 (see <https://godbolt.org/z/yCpBpM>): long lsign(long x) { return (x > 0) - (x < 0); } long long llsign(long long x) { return (x > 0) - (x < 0); } While the code generated for the "long" version of this function is quite OK, the code for the "long long" version misses an obvious optimisation: lsign: # @lsign mov
2019 Oct 03
2
[cfe-dev] CFG simplification question, and preservation of branching in the original code
Hi all, > On 2 Oct 2019, at 14:34, Sanjay Patel <spatel at rotateright.com> wrote > Providing target options/overrides to code that is supposed to be target-independent sounds self-defeating to me. I doubt that proposal would gain much support. > Of course, if you're customizing LLVM for your own out-of-trunk backend, you can do anything you'd like if you're willing to
2016 Jul 28
1
Hitting assertion failure related to vectorization + instcombine
LGTM On Wednesday, July 27, 2016, Hans Wennborg <hans at chromium.org> wrote: > David, Sanjay: ping? > > On Mon, Jul 25, 2016 at 11:07 AM, Hans Wennborg <hans at chromium.org > <javascript:;>> wrote: > > Sure. David, what do you think about merging this to 3.9? > > > > Sanjay: are you saying I'd just apply that diff to > >
2016 Aug 04
2
Remove zext-unfolding from InstCombine
Hi Sanjay, > Am 02.08.2016 um 21:39 schrieb Sanjay Patel <spatel at rotateright.com>: > > Hi Matthias - > > Sorry for the delayed reply. I think you're on the right path with D22864. No problem, thank you for your answer! > If I'm understanding it correctly, my foo() example and zext_or_icmp_icmp() will be equivalent after your patch is added to InstCombine.