search for: xors

Displaying 20 results from an estimated 1041 matches for "xors".

Did you mean: xorl
2011 Jul 26
0
[LLVMdev] XOR Optimization
Hi Duncan, when I run "opt -std-compile-opts" on the original source code it has the same output of O3. when I run "opt -std-compile-opts" on the -O3 optimized code, things get even more weird, it outputs the following code: while.body: ; preds = %while.body, %entry %indvar = phi i32 [ 0, %entry ], [ %indvar.next.3, %while.body ] %tmp
2011 Jul 26
2
[LLVMdev] XOR Optimization
Hi Daniel, > Precisely. The code generated by unrolling can be folded into a single XOR and > SHL. And even if it was not inside a loop, it can still be optimized. What I > want to know is: is there any optimization supposed to optimize this code, but > for some reason it thinks it is not possible, or there is no optimization for > that situation at all? it could be a phase
2011 Jul 27
2
[LLVMdev] XOR Optimization
After a few more tests, I found out that if we set -unroll-threshold to a value large enough, and run "opt -std-compile-opts" or "opt -O3" 3 times, the unroll will be able to unroll the original loop 32 times, and when you have it unrolled for at least 32 times a optimization is triggered, folding it to a single "%xor.3.3.1 = xor i32 %tmp6, -1" (dont know why it does
2011 Jul 26
2
[LLVMdev] XOR Optimization
...> %inc.3 = add i32 %0, 4 > %exitcond.3 = icmp eq i32 %inc.3, 128 > br i1 %exitcond.3, label %while.end, label %while.body > > while.end: ; preds = %while.body > ret void > > > > It is clear that we are able to fold all XORs into a single XOR, and the > same happens to all SHLs and ORs. > I am using -O3, but the code is not optimized, so I am assuming there is no > optimization for this case. Am I correct? The loop is being unrolled by a factor of 4. This breaks the artificial dependence between loop iterati...
2010 Sep 04
6
[LLVMdev] Possible missed optimization?
Hello, while testing trivial functions in my backend i noticed a suboptimal way of assigning regs that had the following pattern, consider the following function: typedef unsigned short t; t foo(t a, t b) { t a4 = b^a^18; return a4; } Argument "a" is passed in R15:R14 and argument "b" is passed in R13:R12, the return value is stored in R15:R14. Producing the
2020 May 22
2
[PATCH] Optimized assembler version of md5_process() for x86-64
This patch introduces an optimized assembler version of md5_process(), the inner loop of MD5 checksumming. It affects the performance of all MD5 operations in rsync - including block matching and whole-file checksums. Performance gain is 5-10% depending on the specific CPU. Originally created by Marc Bevand and placed in the public domain, later integrated into OpenSSL. This is the original
2012 Oct 22
2
bitwise XOR of Matrix
Hi, I would like to xor (bitwise) two matrices filled with binary values (0,1). The result of such XOR is expected to be 0,1. But apparently neither of xor nor bitXor is working in this case. I got ": binary operation on non-conformable arrays" error message when I used xor (M1,M2) . The problem with bitXor(M1,M2) is that it just truncates the result into a vector rather than a
2011 Jul 26
2
[LLVMdev] XOR optimization
...shl.3 store i32 %xor.3, i32* %arrayidx, align 4 %inc.3 = add i32 %0, 4 %exitcond.3 = icmp eq i32 %inc.3, 128 br i1 %exitcond.3, label %while.end, label %while.body while.end: ; preds = %while.body ret void It is clear that we are able to fold all XORs into a single XOR, and the same happens to all SHLs and ORs. I am using -O3, but the code is not optimized, so I am assuming there is no optimization for this case. Am I correct? If yes, I have a few other questions: - Do you know of any other similar optimization that could do something here bu...
2011 Jul 26
0
[LLVMdev] XOR Optimization
...exitcond.3 = icmp eq i32 %inc.3, 128 > > br i1 %exitcond.3, label %while.end, label %while.body > > > > while.end: ; preds = %while.body > > ret void > > > > > > > > It is clear that we are able to fold all XORs into a single XOR, and the > > same happens to all SHLs and ORs. > > I am using -O3, but the code is not optimized, so I am assuming there is > no > > optimization for this case. Am I correct? > > The loop is being unrolled by a factor of 4. This breaks the artificial &g...
2015 Sep 30
2
InstCombine wrongful (?) optimization on BinOp with SameOperands
Hi all, I have been looking at the way LLVM optimizes code before forwarding it to the backend I develop for my company and while building define i32 @test_extract_subreg_func(i32 %x, i32 %y) #0 { entry: %conv = zext i32 %x to i64 %conv1 = zext i32 %y to i64 %mul = mul nuw i64 %conv1, %conv %shr = lshr i64 %mul, 32 %xor = xor i64 %shr, %mul %conv2 = trunc i64 %xor to i32
2008 Jul 03
2
Plotting Prediction Surface with persp()
Hi all I have a question about correct usage of persp(). I have a simple neural net-based XOR example, as follows: library(nnet) xor.data <- data.frame(cbind(expand.grid(c(0,1),c(0,1)), c(0,1,1,0))) names(xor.data) <- c("x","y","o") xor.nn <- nnet(o ~ x + y, data=xor.data, linout=FALSE, size=1) # Create an (x.y) surface and predict over all points d <-
2018 Nov 06
4
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
Hi @ll, while clang/LLVM recognizes common bit-twiddling idioms/expressions like unsigned int rotate(unsigned int x, unsigned int n) { return (x << n) | (x >> (32 - n)); } and typically generates "rotate" machine instructions for this expression, it fails to recognize other also common bit-twiddling idioms/expressions. The standard IEEE CRC-32 for "big
2017 Jul 21
4
Issue with DAG legalization of brcond, setcc, xor
But isn't kinda silly that we transform to xor and then we transform it back. What is the advantage in doing so? Also, since we do that method, I now have to introduce setcc patterns for i1 values, instead of being able to just use logical pattern operators like not. -Dilan On Fri, Jul 21, 2017 at 11:00 AM Dilan Manatunga <manatunga at gmail.com> wrote: > For some reason I
2018 Nov 27
2
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
"Sanjay Patel" <spatel at rotateright.com> wrote: > IIUC, you want to use x86-specific bit-hacks (sbb masking) in cases like > this: > unsigned int foo(unsigned int crc) { > if (crc & 0x80000000) > crc <<= 1, crc ^= 0xEDB88320; > else > crc <<= 1; > return crc; > } To document this for x86 too: rewrite the function
2011 Jul 28
1
[LLVMdev] XOR Optimization
Hey guys, I still think there is no optimization doing what I want. When the loop is unrolled 32 times, llvm is able to identify that the loop is working on a whole word, it finds some constants and propagate them, resulting in the folded XOR instruction. However, when the loop operates on some bits of the word, llvm is still not able to fold those XOR, even when the operated bits does not
2009 Jun 25
2
[LLVMdev] bitwise AND selector node not commutative?
Using the Thumb-2 target we see that ORN ( a | ^b) and BIC (a & ^b) have similar patterns, as we would expect: defm t2BIC : T2I_bin_irs<"bic", BinOpFrag<(and node:$LHS, (not node: $RHS))>>; defm t2ORN : T2I_bin_irs<"orn", BinOpFrag<(or node:$LHS, (not node: $RHS))>>; Compiling the following three works as expected: %tmp1 = xor i32
2015 Feb 13
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
I submitted the problem report to clang's bugzilla but no one seems to care so I have to send it to the mailing list. clang 3.7 svn (trunk 229055 as the time I was to report this problem) generates slower code than 3.5 (Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)) for the following code. It is a "8 queens puzzle" solver written as an educational example. As
2011 Jul 27
0
[LLVMdev] XOR Optimization
2011/7/26 Daniel Nicácio <dnicacios at gmail.com>: > > I also would like to see why the "XOR  A,  -1" is not turned into a NOT, any > Probably because NOT (like NEG) doesn't exist :) <http://llvm.org/docs/LangRef.html#instref> I assume the decision was made that it wasn't worth adding the extra unary instructions when they can easily be handled in codegen
2017 Jul 20
3
Issue with DAG legalization of brcond, setcc, xor
Hi, I am having some issues with how some of the instructions are being legalized. So this is my intial basic block. The area of concern is the last three instructions. I will pick and choose debug output to keep this small. SelectionDAG has 36 nodes: t0: ch = EntryToken t6: i32,ch = CopyFromReg t0, Register:i32 %vreg507 t2: i32,ch = CopyFromReg t0, Register:i32 %vreg17
2010 Sep 30
4
[LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG? (Re: comparison pattern trouble - might be a bug in LLVM 2.8?)
Bill Wendling wrote: > On Sep 29, 2010, at 12:36 AM, Heikki Kultala wrote: > >> On 29 Sep 2010, at 06:25, Heikki Kultala wrote: >> >>> Our architecture has 1-bit boolean predicate registers. >>> >>> I've defined comparison >>> >>> def NErrb : InstTCE<(outs I1Regs:$op3), (ins I32Regs:$op1,I32Regs:$op2), "", [(set