thr3ads.net - search: "xors"

Displaying 20 results from an estimated 1052 matches for "xors".

Did you mean: xorl

2011 Jul 26

[LLVMdev] XOR Optimization

Hi Duncan, when I run "opt -std-compile-opts" on the original source code it has the same output of O3. when I run "opt -std-compile-opts" on the -O3 optimized code, things get even more weird, it outputs the following code: while.body: ; preds = %while.body, %entry %indvar = phi i32 [ 0, %entry ], [ %indvar.next.3, %while.body ] %tmp

[LLVMdev] XOR Optimization

2011 Jul 26

[LLVMdev] XOR Optimization

Hi Daniel, > Precisely. The code generated by unrolling can be folded into a single XOR and > SHL. And even if it was not inside a loop, it can still be optimized. What I > want to know is: is there any optimization supposed to optimize this code, but > for some reason it thinks it is not possible, or there is no optimization for > that situation at all? it could be a phase

[LLVMdev] XOR Optimization

2011 Jul 27

[LLVMdev] XOR Optimization

After a few more tests, I found out that if we set -unroll-threshold to a value large enough, and run "opt -std-compile-opts" or "opt -O3" 3 times, the unroll will be able to unroll the original loop 32 times, and when you have it unrolled for at least 32 times a optimization is triggered, folding it to a single "%xor.3.3.1 = xor i32 %tmp6, -1" (dont know why it does

[LLVMdev] XOR Optimization

2011 Jul 26

[LLVMdev] XOR Optimization

...> %inc.3 = add i32 %0, 4 > %exitcond.3 = icmp eq i32 %inc.3, 128 > br i1 %exitcond.3, label %while.end, label %while.body > > while.end: ; preds = %while.body > ret void > > > > It is clear that we are able to fold all XORs into a single XOR, and the > same happens to all SHLs and ORs. > I am using -O3, but the code is not optimized, so I am assuming there is no > optimization for this case. Am I correct? The loop is being unrolled by a factor of 4. This breaks the artificial dependence between loop iterati...

[LLVMdev] Possible missed optimization?

2010 Sep 04

[LLVMdev] Possible missed optimization?

Hello, while testing trivial functions in my backend i noticed a suboptimal way of assigning regs that had the following pattern, consider the following function: typedef unsigned short t; t foo(t a, t b) { t a4 = b^a^18; return a4; } Argument "a" is passed in R15:R14 and argument "b" is passed in R13:R12, the return value is stored in R15:R14. Producing the

[PATCH] Optimized assembler version of md5_process() for x86-64

2020 May 22

[PATCH] Optimized assembler version of md5_process() for x86-64

This patch introduces an optimized assembler version of md5_process(), the inner loop of MD5 checksumming. It affects the performance of all MD5 operations in rsync - including block matching and whole-file checksums. Performance gain is 5-10% depending on the specific CPU. Originally created by Marc Bevand and placed in the public domain, later integrated into OpenSSL. This is the original

bitwise XOR of Matrix

2012 Oct 22

bitwise XOR of Matrix

Hi, I would like to xor (bitwise) two matrices filled with binary values (0,1). The result of such XOR is expected to be 0,1. But apparently neither of xor nor bitXor is working in this case. I got ": binary operation on non-conformable arrays" error message when I used xor (M1,M2) . The problem with bitXor(M1,M2) is that it just truncates the result into a vector rather than a

[LLVMdev] XOR optimization

2011 Jul 26

[LLVMdev] XOR optimization

...shl.3 store i32 %xor.3, i32* %arrayidx, align 4 %inc.3 = add i32 %0, 4 %exitcond.3 = icmp eq i32 %inc.3, 128 br i1 %exitcond.3, label %while.end, label %while.body while.end: ; preds = %while.body ret void It is clear that we are able to fold all XORs into a single XOR, and the same happens to all SHLs and ORs. I am using -O3, but the code is not optimized, so I am assuming there is no optimization for this case. Am I correct? If yes, I have a few other questions: - Do you know of any other similar optimization that could do something here bu...

[LLVMdev] XOR Optimization

2011 Jul 26

[LLVMdev] XOR Optimization

...exitcond.3 = icmp eq i32 %inc.3, 128 > > br i1 %exitcond.3, label %while.end, label %while.body > > > > while.end: ; preds = %while.body > > ret void > > > > > > > > It is clear that we are able to fold all XORs into a single XOR, and the > > same happens to all SHLs and ORs. > > I am using -O3, but the code is not optimized, so I am assuming there is > no > > optimization for this case. Am I correct? > > The loop is being unrolled by a factor of 4. This breaks the artificial &g...

InstCombine wrongful (?) optimization on BinOp with SameOperands

2015 Sep 30

InstCombine wrongful (?) optimization on BinOp with SameOperands

Hi all, I have been looking at the way LLVM optimizes code before forwarding it to the backend I develop for my company and while building define i32 @test_extract_subreg_func(i32 %x, i32 %y) #0 { entry: %conv = zext i32 %x to i64 %conv1 = zext i32 %y to i64 %mul = mul nuw i64 %conv1, %conv %shr = lshr i64 %mul, 32 %xor = xor i64 %shr, %mul %conv2 = trunc i64 %xor to i32

Plotting Prediction Surface with persp()

2008 Jul 03

Plotting Prediction Surface with persp()

Hi all I have a question about correct usage of persp(). I have a simple neural net-based XOR example, as follows: library(nnet) xor.data <- data.frame(cbind(expand.grid(c(0,1),c(0,1)), c(0,1,1,0))) names(xor.data) <- c("x","y","o") xor.nn <- nnet(o ~ x + y, data=xor.data, linout=FALSE, size=1) # Create an (x.y) surface and predict over all points d <-

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 06

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

Hi @ll, while clang/LLVM recognizes common bit-twiddling idioms/expressions like unsigned int rotate(unsigned int x, unsigned int n) { return (x << n) | (x >> (32 - n)); } and typically generates "rotate" machine instructions for this expression, it fails to recognize other also common bit-twiddling idioms/expressions. The standard IEEE CRC-32 for "big

Issue with DAG legalization of brcond, setcc, xor

2017 Jul 21

Issue with DAG legalization of brcond, setcc, xor

But isn't kinda silly that we transform to xor and then we transform it back. What is the advantage in doing so? Also, since we do that method, I now have to introduce setcc patterns for i1 values, instead of being able to just use logical pattern operators like not. -Dilan On Fri, Jul 21, 2017 at 11:00 AM Dilan Manatunga <manatunga at gmail.com> wrote: > For some reason I

+ crypto-arm-xor-add-missing-module_description-macro.patch added to mm-nonmm-unstable branch

2024 Jul 30

+ crypto-arm-xor-add-missing-module_description-macro.patch added to mm-nonmm-unstable branch

The patch titled Subject: crypto: arm/xor - add missing MODULE_DESCRIPTION() macro has been added to the -mm mm-nonmm-unstable branch. Its filename is crypto-arm-xor-add-missing-module_description-macro.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/crypto-arm-xor-add-missing-module_description-macro.patch This

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 27

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

"Sanjay Patel" <spatel at rotateright.com> wrote: > IIUC, you want to use x86-specific bit-hacks (sbb masking) in cases like > this: > unsigned int foo(unsigned int crc) { > if (crc & 0x80000000) > crc <<= 1, crc ^= 0xEDB88320; > else > crc <<= 1; > return crc; > } To document this for x86 too: rewrite the function

[merged mm-nonmm-stable] crypto-arm-xor-add-missing-module_description-macro.patch removed from -mm tree

2024 Sep 02

[merged mm-nonmm-stable] crypto-arm-xor-add-missing-module_description-macro.patch removed from -mm tree

The quilt patch titled Subject: crypto: arm/xor - add missing MODULE_DESCRIPTION() macro has been removed from the -mm tree. Its filename was crypto-arm-xor-add-missing-module_description-macro.patch This patch was dropped because it was merged into the mm-nonmm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

[LLVMdev] XOR Optimization

2011 Jul 28

[LLVMdev] XOR Optimization

Hey guys, I still think there is no optimization doing what I want. When the loop is unrolled 32 times, llvm is able to identify that the loop is working on a whole word, it finds some constants and propagate them, resulting in the folded XOR instruction. However, when the loop operates on some bits of the word, llvm is still not able to fold those XOR, even when the operated bits does not

[LLVMdev] bitwise AND selector node not commutative?

2009 Jun 25

[LLVMdev] bitwise AND selector node not commutative?

Using the Thumb-2 target we see that ORN ( a | ^b) and BIC (a & ^b) have similar patterns, as we would expect: defm t2BIC : T2I_bin_irs<"bic", BinOpFrag<(and node:$LHS, (not node: $RHS))>>; defm t2ORN : T2I_bin_irs<"orn", BinOpFrag<(or node:$LHS, (not node: $RHS))>>; Compiling the following three works as expected: %tmp1 = xor i32

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 13

[LLVMdev] trunk's optimizer generates slower code than 3.5

I submitted the problem report to clang's bugzilla but no one seems to care so I have to send it to the mailing list. clang 3.7 svn (trunk 229055 as the time I was to report this problem) generates slower code than 3.5 (Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)) for the following code. It is a "8 queens puzzle" solver written as an educational example. As

[LLVMdev] XOR Optimization

2011 Jul 27

[LLVMdev] XOR Optimization

2011/7/26 Daniel Nicácio <dnicacios at gmail.com>: > > I also would like to see why the "XOR A, -1" is not turned into a NOT, any > Probably because NOT (like NEG) doesn't exist :) <http://llvm.org/docs/LangRef.html#instref> I assume the decision was made that it wasn't worth adding the extra unary instructions when they can easily be handled in codegen

search for: xors