Displaying 20 results from an estimated 30000 matches similar to: "[LLVMdev] SimplifyDemandedUseBits vs (and (xor %V, -1), 4096)"
2011 Jul 27
0
[LLVMdev] XOR Optimization
2011/7/26 Daniel Nicácio <dnicacios at gmail.com>:
>
> I also would like to see why the "XOR A, -1" is not turned into a NOT, any
>
Probably because NOT (like NEG) doesn't exist :)
<http://llvm.org/docs/LangRef.html#instref>
I assume the decision was made that it wasn't worth adding the extra
unary instructions when they can easily be handled in codegen
2011 Jul 27
2
[LLVMdev] XOR Optimization
After a few more tests, I found out that if we set -unroll-threshold to a
value large enough, and run "opt -std-compile-opts" or "opt -O3" 3 times,
the unroll will be able to unroll the original loop 32 times, and when you
have it unrolled for at least 32 times a optimization is triggered, folding
it to a single "%xor.3.3.1 = xor i32 %tmp6, -1" (dont know why it does
2004 Sep 10
3
patch
So here is quick patch solving the problem, now it should be PIC.
--
Miroslav Lichvar
lichvarm@phoenix.inf.upol.cz
-------------- next part --------------
--- lpc_asm.nasm.orig Wed Jul 18 02:23:40 2001
+++ lpc_asm.nasm Sat Nov 17 21:09:46 2001
@@ -59,10 +59,10 @@
;
ALIGN 16
cident FLAC__lpc_compute_autocorrelation_asm_ia32
- ;[esp + 24] == autoc[]
- ;[esp + 20] == lag
- ;[esp + 16] ==
2008 Aug 08
0
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
On Aug 7, 2008, at 12:13 PM, David Greene wrote:
> On Tuesday 05 August 2008 13:27, David Greene wrote:
>
>> Neither solution eliminates the need for instcombine to be careful
>> and
>> consult masks from time to time.
>>
>> Perhaps I'm totally missing something. Concrete examples would be
>> helpful.
>
> Ok, so I took my own advice and
2004 Sep 10
2
An assembly optimization and fix
I have optimized FLAC__fixed_compute_best_predictor_asm_ia32_mmx_cmov
function and fixed bug when data_len == 0. Now the function is about
50% faster and flac -5 is about 5% faster on my box. I have tested it
thoroughly, I think it can go to flac 1.0.4.
--
Miroslav Lichvar
-------------- next part --------------
--- src/libFLAC/ia32/fixed_asm.nasm.orig 2002-01-26 19:05:12.000000000 +0100
+++
2016 Sep 13
2
undef * 0
Thanks for your answers.
Another example of unsound transformation on Boolean algebra.
According to the LLVM documentation
(http://llvm.org/docs/LangRef.html#undefined-values) it is unsafe to
consider ' a & undef = undef ' and ' a | undef = undef ' but 'undef xor
undef = undef' is safe.
Now, given an expression ((a & (~b)) | ((~a) & b)) where a and b are
2011 Jul 26
2
[LLVMdev] XOR optimization
Hi folks,
I couldn't find a specific XOR (OR and AND) optimization on llvm, and
therefore I am about to implement it.
But first I would like to check with you guys that it really does not exist.
For a simple loop like this:
nbits = 128;
bit_addr = 0;
while(nbits--)
{
bindex=bit_addr>>5; /* Index is number /32 */
bitnumb=bit_addr % 32; /* Bit number in longword */
2011 Jul 26
0
[LLVMdev] XOR Optimization
Hi,
On Tue, Jul 26, 2011 at 11:32 AM, Matt Johnson
<johnso87 at crhc.illinois.edu>wrote:
> Hi Daniel,
>
> > Hi folks,
> >
> > I couldn't find a specific XOR (OR and AND) optimization on llvm, and
> > therefore I am about to implement it.
> > But first I would like to check with you guys that it really does not
> exist.
> >
> > For a
2011 Jul 26
0
[LLVMdev] XOR Optimization
Hi Duncan,
when I run "opt -std-compile-opts" on the original source code it has the
same output of O3.
when I run "opt -std-compile-opts" on the -O3 optimized code, things get
even more weird, it outputs the following code:
while.body: ; preds = %while.body,
%entry
%indvar = phi i32 [ 0, %entry ], [ %indvar.next.3, %while.body ]
%tmp
2024 Sep 02
0
[merged mm-nonmm-stable] crypto-arm-xor-add-missing-module_description-macro.patch removed from -mm tree
The quilt patch titled
Subject: crypto: arm/xor - add missing MODULE_DESCRIPTION() macro
has been removed from the -mm tree. Its filename was
crypto-arm-xor-add-missing-module_description-macro.patch
This patch was dropped because it was merged into the mm-nonmm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
2024 Jul 30
0
+ crypto-arm-xor-add-missing-module_description-macro.patch added to mm-nonmm-unstable branch
The patch titled
Subject: crypto: arm/xor - add missing MODULE_DESCRIPTION() macro
has been added to the -mm mm-nonmm-unstable branch. Its filename is
crypto-arm-xor-add-missing-module_description-macro.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/crypto-arm-xor-add-missing-module_description-macro.patch
This
2012 Oct 22
2
bitwise XOR of Matrix
Hi,
I would like to xor (bitwise) two matrices filled with binary values
(0,1). The result of such XOR is expected to be 0,1.
But apparently neither of xor nor bitXor is working in this case.
I got ": binary operation on non-conformable arrays" error message
when I used xor (M1,M2) .
The problem with bitXor(M1,M2) is that it just truncates the result
into a vector rather than a
2011 Jul 26
2
[LLVMdev] XOR Optimization
Hi Daniel,
> Hi folks,
>
> I couldn't find a specific XOR (OR and AND) optimization on llvm, and
> therefore I am about to implement it.
> But first I would like to check with you guys that it really does not exist.
>
> For a simple loop like this:
>
> nbits = 128;
> bit_addr = 0;
> while(nbits--)
> {
> bindex=bit_addr>>5; /* Index is
2010 Jul 06
2
[LLVMdev] ConstantFold 'undef xor undef'
Hi,
At line 2292, lib/VMCore/ConstantFold.cpp (llvm2.7 release)
Constant *llvm::ConstantFoldBinaryInstruction(unsigned Opcode,
Constant *C1, Constant *C2) {
...
// Handle UndefValue up front.
if (isa<UndefValue>(C1) || isa<UndefValue>(C2)) {
switch (Opcode) {
case Instruction::Xor:
if (isa<UndefValue>(C1)
2008 Aug 07
6
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
On Tuesday 05 August 2008 13:27, David Greene wrote:
> Neither solution eliminates the need for instcombine to be careful and
> consult masks from time to time.
>
> Perhaps I'm totally missing something. Concrete examples would be helpful.
Ok, so I took my own advice and thought about CSE and instcombine a bit.
I wrote the code by hand in a sort of pseudo-llvm language, so
2012 Sep 26
0
[LLVMdev] Folding nodes with more than one use during ISel
I'm working on a backend for the Freescale CPU12 family as a hobby project and I'm having difficulty getting the instruction selection pass to handle the indirect indexed addressing modes. I'd really appreciate advice on how best to handle them.
The following llvm instructions:
%arrayidx = getelementptr inbounds i8** %p, i16 3
%0 = load i8** %arrayidx, align 2, !tbaa !0
%1 =
2011 Jul 28
1
[LLVMdev] XOR Optimization
Hey guys,
I still think there is no optimization doing what I want. When the loop is
unrolled 32 times, llvm is able to identify that the loop is working on a
whole word, it finds some constants and propagate them, resulting in the
folded XOR instruction. However, when the loop operates on some bits of the
word, llvm is still not able to fold those XOR, even when the operated bits
does not
2011 Jul 26
0
[LLVMdev] XOR Optimization
"The fact that the loop is unrolled explains why the XORs, SHLs, and ORs are
not folded into 1."
I dont see why the unrolling explains it.
"I think he is trying to say this expression generated by unrolling by a
factor of 4 can indeed be folded into a single XOR, SHL and OR. "
Precisely. The code generated by unrolling can be folded into a single XOR
and SHL. And even if it
2019 Oct 03
2
[cfe-dev] CFG simplification question, and preservation of branching in the original code
Hi all,
> On 2 Oct 2019, at 14:34, Sanjay Patel <spatel at rotateright.com> wrote
> Providing target options/overrides to code that is supposed to be target-independent sounds self-defeating to me. I doubt that proposal would gain much support.
> Of course, if you're customizing LLVM for your own out-of-trunk backend, you can do anything you'd like if you're willing to
2019 Mar 04
2
Where's the optimiser gone (part 11): use the proper instruction for sign extension
Compile with -O3 -m32 (see <https://godbolt.org/z/yCpBpM>):
long lsign(long x)
{
return (x > 0) - (x < 0);
}
long long llsign(long long x)
{
return (x > 0) - (x < 0);
}
While the code generated for the "long" version of this function is quite
OK, the code for the "long long" version misses an obvious optimisation:
lsign: # @lsign
mov