Displaying 20 results from an estimated 1052 matches for "xors".
Did you mean:
xorl
2011 Jul 26
0
[LLVMdev] XOR Optimization
Hi Duncan,
when I run "opt -std-compile-opts" on the original source code it has the
same output of O3.
when I run "opt -std-compile-opts" on the -O3 optimized code, things get
even more weird, it outputs the following code:
while.body: ; preds = %while.body,
%entry
%indvar = phi i32 [ 0, %entry ], [ %indvar.next.3, %while.body ]
%tmp
2011 Jul 26
2
[LLVMdev] XOR Optimization
Hi Daniel,
> Precisely. The code generated by unrolling can be folded into a single XOR and
> SHL. And even if it was not inside a loop, it can still be optimized. What I
> want to know is: is there any optimization supposed to optimize this code, but
> for some reason it thinks it is not possible, or there is no optimization for
> that situation at all?
it could be a phase
2011 Jul 27
2
[LLVMdev] XOR Optimization
After a few more tests, I found out that if we set -unroll-threshold to a
value large enough, and run "opt -std-compile-opts" or "opt -O3" 3 times,
the unroll will be able to unroll the original loop 32 times, and when you
have it unrolled for at least 32 times a optimization is triggered, folding
it to a single "%xor.3.3.1 = xor i32 %tmp6, -1" (dont know why it does
2011 Jul 26
2
[LLVMdev] XOR Optimization
...> %inc.3 = add i32 %0, 4
> %exitcond.3 = icmp eq i32 %inc.3, 128
> br i1 %exitcond.3, label %while.end, label %while.body
>
> while.end: ; preds = %while.body
> ret void
>
>
>
> It is clear that we are able to fold all XORs into a single XOR, and the
> same happens to all SHLs and ORs.
> I am using -O3, but the code is not optimized, so I am assuming there is no
> optimization for this case. Am I correct?
The loop is being unrolled by a factor of 4. This breaks the artificial
dependence between loop iterati...
2010 Sep 04
6
[LLVMdev] Possible missed optimization?
Hello, while testing trivial functions in my backend i noticed a suboptimal
way of assigning regs that had the following pattern, consider the following
function:
typedef unsigned short t;
t foo(t a, t b)
{
t a4 = b^a^18;
return a4;
}
Argument "a" is passed in R15:R14 and argument "b" is passed in R13:R12, the
return value is stored in R15:R14.
Producing the
2020 May 22
2
[PATCH] Optimized assembler version of md5_process() for x86-64
This patch introduces an optimized assembler version of md5_process(),
the inner loop of MD5 checksumming. It affects the performance of all
MD5 operations in rsync - including block matching and whole-file
checksums.
Performance gain is 5-10% depending on the specific CPU.
Originally created by Marc Bevand and placed in the public domain,
later integrated into OpenSSL. This is the original
2012 Oct 22
2
bitwise XOR of Matrix
Hi,
I would like to xor (bitwise) two matrices filled with binary values
(0,1). The result of such XOR is expected to be 0,1.
But apparently neither of xor nor bitXor is working in this case.
I got ": binary operation on non-conformable arrays" error message
when I used xor (M1,M2) .
The problem with bitXor(M1,M2) is that it just truncates the result
into a vector rather than a
2011 Jul 26
2
[LLVMdev] XOR optimization
...shl.3
store i32 %xor.3, i32* %arrayidx, align 4
%inc.3 = add i32 %0, 4
%exitcond.3 = icmp eq i32 %inc.3, 128
br i1 %exitcond.3, label %while.end, label %while.body
while.end: ; preds = %while.body
ret void
It is clear that we are able to fold all XORs into a single XOR, and the
same happens to all SHLs and ORs.
I am using -O3, but the code is not optimized, so I am assuming there is no
optimization for this case. Am I correct?
If yes, I have a few other questions:
- Do you know of any other similar optimization that could do something
here bu...
2011 Jul 26
0
[LLVMdev] XOR Optimization
...exitcond.3 = icmp eq i32 %inc.3, 128
> > br i1 %exitcond.3, label %while.end, label %while.body
> >
> > while.end: ; preds = %while.body
> > ret void
> >
> >
> >
> > It is clear that we are able to fold all XORs into a single XOR, and the
> > same happens to all SHLs and ORs.
> > I am using -O3, but the code is not optimized, so I am assuming there is
> no
> > optimization for this case. Am I correct?
>
> The loop is being unrolled by a factor of 4. This breaks the artificial
&g...
2015 Sep 30
2
InstCombine wrongful (?) optimization on BinOp with SameOperands
Hi all,
I have been looking at the way LLVM optimizes code before
forwarding it to the backend I develop for my company and while building
define i32 @test_extract_subreg_func(i32 %x, i32 %y) #0 {
entry:
%conv = zext i32 %x to i64
%conv1 = zext i32 %y to i64
%mul = mul nuw i64 %conv1, %conv
%shr = lshr i64 %mul, 32
%xor = xor i64 %shr, %mul
%conv2 = trunc i64 %xor to i32
2008 Jul 03
2
Plotting Prediction Surface with persp()
Hi all
I have a question about correct usage of persp(). I have a simple neural
net-based XOR example, as follows:
library(nnet)
xor.data <- data.frame(cbind(expand.grid(c(0,1),c(0,1)), c(0,1,1,0)))
names(xor.data) <- c("x","y","o")
xor.nn <- nnet(o ~ x + y, data=xor.data, linout=FALSE, size=1)
# Create an (x.y) surface and predict over all points
d <-
2018 Nov 06
4
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
Hi @ll,
while clang/LLVM recognizes common bit-twiddling idioms/expressions
like
unsigned int rotate(unsigned int x, unsigned int n)
{
return (x << n) | (x >> (32 - n));
}
and typically generates "rotate" machine instructions for this
expression, it fails to recognize other also common bit-twiddling
idioms/expressions.
The standard IEEE CRC-32 for "big
2017 Jul 21
4
Issue with DAG legalization of brcond, setcc, xor
But isn't kinda silly that we transform to xor and then we transform it
back. What is the advantage in doing so? Also, since we do that method, I
now have to introduce setcc patterns for i1 values, instead of being able
to just use logical pattern operators like not.
-Dilan
On Fri, Jul 21, 2017 at 11:00 AM Dilan Manatunga <manatunga at gmail.com>
wrote:
> For some reason I
2024 Jul 30
0
+ crypto-arm-xor-add-missing-module_description-macro.patch added to mm-nonmm-unstable branch
The patch titled
Subject: crypto: arm/xor - add missing MODULE_DESCRIPTION() macro
has been added to the -mm mm-nonmm-unstable branch. Its filename is
crypto-arm-xor-add-missing-module_description-macro.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/crypto-arm-xor-add-missing-module_description-macro.patch
This
2018 Nov 27
2
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
"Sanjay Patel" <spatel at rotateright.com> wrote:
> IIUC, you want to use x86-specific bit-hacks (sbb masking) in cases like
> this:
> unsigned int foo(unsigned int crc) {
> if (crc & 0x80000000)
> crc <<= 1, crc ^= 0xEDB88320;
> else
> crc <<= 1;
> return crc;
> }
To document this for x86 too: rewrite the function
2024 Sep 02
0
[merged mm-nonmm-stable] crypto-arm-xor-add-missing-module_description-macro.patch removed from -mm tree
The quilt patch titled
Subject: crypto: arm/xor - add missing MODULE_DESCRIPTION() macro
has been removed from the -mm tree. Its filename was
crypto-arm-xor-add-missing-module_description-macro.patch
This patch was dropped because it was merged into the mm-nonmm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
2011 Jul 28
1
[LLVMdev] XOR Optimization
Hey guys,
I still think there is no optimization doing what I want. When the loop is
unrolled 32 times, llvm is able to identify that the loop is working on a
whole word, it finds some constants and propagate them, resulting in the
folded XOR instruction. However, when the loop operates on some bits of the
word, llvm is still not able to fold those XOR, even when the operated bits
does not
2009 Jun 25
2
[LLVMdev] bitwise AND selector node not commutative?
Using the Thumb-2 target we see that ORN ( a | ^b) and BIC (a & ^b)
have similar patterns, as we would expect:
defm t2BIC : T2I_bin_irs<"bic", BinOpFrag<(and node:$LHS, (not node:
$RHS))>>;
defm t2ORN : T2I_bin_irs<"orn", BinOpFrag<(or node:$LHS, (not node:
$RHS))>>;
Compiling the following three works as expected:
%tmp1 = xor i32
2015 Feb 13
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
I submitted the problem report to clang's bugzilla but no one seems to
care so I have to send it to the mailing list.
clang 3.7 svn (trunk 229055 as the time I was to report this problem)
generates slower code than 3.5 (Apple LLVM version 6.0
(clang-600.0.56) (based on LLVM 3.5svn)) for the following code.
It is a "8 queens puzzle" solver written as an educational example. As
2011 Jul 27
0
[LLVMdev] XOR Optimization
2011/7/26 Daniel Nicácio <dnicacios at gmail.com>:
>
> I also would like to see why the "XOR A, -1" is not turned into a NOT, any
>
Probably because NOT (like NEG) doesn't exist :)
<http://llvm.org/docs/LangRef.html#instref>
I assume the decision was made that it wasn't worth adding the extra
unary instructions when they can easily be handled in codegen