thr3ads.net - similar to: "[LLVMdev] SimplifyDemandedUseBits vs (and (xor %V, -1), 4096)"

Displaying 20 results from an estimated 30000 matches similar to: "[LLVMdev] SimplifyDemandedUseBits vs (and (xor %V, -1), 4096)"

[LLVMdev] XOR Optimization

2011 Jul 27

[LLVMdev] XOR Optimization

2011/7/26 Daniel Nicácio <dnicacios at gmail.com>: > > I also would like to see why the "XOR A, -1" is not turned into a NOT, any > Probably because NOT (like NEG) doesn't exist :) <http://llvm.org/docs/LangRef.html#instref> I assume the decision was made that it wasn't worth adding the extra unary instructions when they can easily be handled in codegen

[LLVMdev] XOR Optimization

2011 Jul 27

[LLVMdev] XOR Optimization

After a few more tests, I found out that if we set -unroll-threshold to a value large enough, and run "opt -std-compile-opts" or "opt -O3" 3 times, the unroll will be able to unroll the original loop 32 times, and when you have it unrolled for at least 32 times a optimization is triggered, folding it to a single "%xor.3.3.1 = xor i32 %tmp6, -1" (dont know why it does

patch

2004 Sep 10

patch

So here is quick patch solving the problem, now it should be PIC. -- Miroslav Lichvar lichvarm@phoenix.inf.upol.cz -------------- next part -------------- --- lpc_asm.nasm.orig Wed Jul 18 02:23:40 2001 +++ lpc_asm.nasm Sat Nov 17 21:09:46 2001 @@ -59,10 +59,10 @@ ; ALIGN 16 cident FLAC__lpc_compute_autocorrelation_asm_ia32 - ;[esp + 24] == autoc[] - ;[esp + 20] == lag - ;[esp + 16] ==

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

2008 Aug 08

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

On Aug 7, 2008, at 12:13 PM, David Greene wrote: > On Tuesday 05 August 2008 13:27, David Greene wrote: > >> Neither solution eliminates the need for instcombine to be careful >> and >> consult masks from time to time. >> >> Perhaps I'm totally missing something. Concrete examples would be >> helpful. > > Ok, so I took my own advice and

An assembly optimization and fix

2004 Sep 10

An assembly optimization and fix

I have optimized FLAC__fixed_compute_best_predictor_asm_ia32_mmx_cmov function and fixed bug when data_len == 0. Now the function is about 50% faster and flac -5 is about 5% faster on my box. I have tested it thoroughly, I think it can go to flac 1.0.4. -- Miroslav Lichvar -------------- next part -------------- --- src/libFLAC/ia32/fixed_asm.nasm.orig 2002-01-26 19:05:12.000000000 +0100 +++

undef * 0

2016 Sep 13

undef * 0

Thanks for your answers. Another example of unsound transformation on Boolean algebra. According to the LLVM documentation (http://llvm.org/docs/LangRef.html#undefined-values) it is unsafe to consider ' a & undef = undef ' and ' a | undef = undef ' but 'undef xor undef = undef' is safe. Now, given an expression ((a & (~b)) | ((~a) & b)) where a and b are

[LLVMdev] XOR optimization

2011 Jul 26

[LLVMdev] XOR optimization

Hi folks, I couldn't find a specific XOR (OR and AND) optimization on llvm, and therefore I am about to implement it. But first I would like to check with you guys that it really does not exist. For a simple loop like this: nbits = 128; bit_addr = 0; while(nbits--) { bindex=bit_addr>>5; /* Index is number /32 */ bitnumb=bit_addr % 32; /* Bit number in longword */

[LLVMdev] XOR Optimization

2011 Jul 26

[LLVMdev] XOR Optimization

Hi, On Tue, Jul 26, 2011 at 11:32 AM, Matt Johnson <johnso87 at crhc.illinois.edu>wrote: > Hi Daniel, > > > Hi folks, > > > > I couldn't find a specific XOR (OR and AND) optimization on llvm, and > > therefore I am about to implement it. > > But first I would like to check with you guys that it really does not > exist. > > > > For a

[LLVMdev] XOR Optimization

2011 Jul 26

[LLVMdev] XOR Optimization

Hi Duncan, when I run "opt -std-compile-opts" on the original source code it has the same output of O3. when I run "opt -std-compile-opts" on the -O3 optimized code, things get even more weird, it outputs the following code: while.body: ; preds = %while.body, %entry %indvar = phi i32 [ 0, %entry ], [ %indvar.next.3, %while.body ] %tmp

[merged mm-nonmm-stable] crypto-arm-xor-add-missing-module_description-macro.patch removed from -mm tree

2024 Sep 02

[merged mm-nonmm-stable] crypto-arm-xor-add-missing-module_description-macro.patch removed from -mm tree

The quilt patch titled Subject: crypto: arm/xor - add missing MODULE_DESCRIPTION() macro has been removed from the -mm tree. Its filename was crypto-arm-xor-add-missing-module_description-macro.patch This patch was dropped because it was merged into the mm-nonmm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

+ crypto-arm-xor-add-missing-module_description-macro.patch added to mm-nonmm-unstable branch

2024 Jul 30

+ crypto-arm-xor-add-missing-module_description-macro.patch added to mm-nonmm-unstable branch

The patch titled Subject: crypto: arm/xor - add missing MODULE_DESCRIPTION() macro has been added to the -mm mm-nonmm-unstable branch. Its filename is crypto-arm-xor-add-missing-module_description-macro.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/crypto-arm-xor-add-missing-module_description-macro.patch This

bitwise XOR of Matrix

2012 Oct 22

bitwise XOR of Matrix

Hi, I would like to xor (bitwise) two matrices filled with binary values (0,1). The result of such XOR is expected to be 0,1. But apparently neither of xor nor bitXor is working in this case. I got ": binary operation on non-conformable arrays" error message when I used xor (M1,M2) . The problem with bitXor(M1,M2) is that it just truncates the result into a vector rather than a

[LLVMdev] XOR Optimization

2011 Jul 26

[LLVMdev] XOR Optimization

Hi Daniel, > Hi folks, > > I couldn't find a specific XOR (OR and AND) optimization on llvm, and > therefore I am about to implement it. > But first I would like to check with you guys that it really does not exist. > > For a simple loop like this: > > nbits = 128; > bit_addr = 0; > while(nbits--) > { > bindex=bit_addr>>5; /* Index is

[LLVMdev] ConstantFold 'undef xor undef'

2010 Jul 06

[LLVMdev] ConstantFold 'undef xor undef'

Hi, At line 2292, lib/VMCore/ConstantFold.cpp (llvm2.7 release) Constant *llvm::ConstantFoldBinaryInstruction(unsigned Opcode, Constant *C1, Constant *C2) { ... // Handle UndefValue up front. if (isa<UndefValue>(C1) || isa<UndefValue>(C2)) { switch (Opcode) { case Instruction::Xor: if (isa<UndefValue>(C1)

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

2008 Aug 07

[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

On Tuesday 05 August 2008 13:27, David Greene wrote: > Neither solution eliminates the need for instcombine to be careful and > consult masks from time to time. > > Perhaps I'm totally missing something. Concrete examples would be helpful. Ok, so I took my own advice and thought about CSE and instcombine a bit. I wrote the code by hand in a sort of pseudo-llvm language, so

[LLVMdev] Folding nodes with more than one use during ISel

2012 Sep 26

[LLVMdev] Folding nodes with more than one use during ISel

I'm working on a backend for the Freescale CPU12 family as a hobby project and I'm having difficulty getting the instruction selection pass to handle the indirect indexed addressing modes. I'd really appreciate advice on how best to handle them. The following llvm instructions: %arrayidx = getelementptr inbounds i8** %p, i16 3 %0 = load i8** %arrayidx, align 2, !tbaa !0 %1 =

[LLVMdev] XOR Optimization

2011 Jul 28

[LLVMdev] XOR Optimization

Hey guys, I still think there is no optimization doing what I want. When the loop is unrolled 32 times, llvm is able to identify that the loop is working on a whole word, it finds some constants and propagate them, resulting in the folded XOR instruction. However, when the loop operates on some bits of the word, llvm is still not able to fold those XOR, even when the operated bits does not

[LLVMdev] XOR Optimization

2011 Jul 26

[LLVMdev] XOR Optimization

"The fact that the loop is unrolled explains why the XORs, SHLs, and ORs are not folded into 1." I dont see why the unrolling explains it. "I think he is trying to say this expression generated by unrolling by a factor of 4 can indeed be folded into a single XOR, SHL and OR. " Precisely. The code generated by unrolling can be folded into a single XOR and SHL. And even if it

[cfe-dev] CFG simplification question, and preservation of branching in the original code

2019 Oct 03

[cfe-dev] CFG simplification question, and preservation of branching in the original code

Hi all, > On 2 Oct 2019, at 14:34, Sanjay Patel <spatel at rotateright.com> wrote > Providing target options/overrides to code that is supposed to be target-independent sounds self-defeating to me. I doubt that proposal would gain much support. > Of course, if you're customizing LLVM for your own out-of-trunk backend, you can do anything you'd like if you're willing to

Where's the optimiser gone (part 11): use the proper instruction for sign extension

2019 Mar 04

Where's the optimiser gone (part 11): use the proper instruction for sign extension

Compile with -O3 -m32 (see <https://godbolt.org/z/yCpBpM>): long lsign(long x) { return (x > 0) - (x < 0); } long long llsign(long long x) { return (x > 0) - (x < 0); } While the code generated for the "long" version of this function is quite OK, the code for the "long long" version misses an obvious optimisation: lsign: # @lsign mov

similar to: [LLVMdev] SimplifyDemandedUseBits vs (and (xor %V, -1), 4096)