thr3ads.net - search: "imull"

llvm-3.6 MCAsmParser x64 Error "invalid operand for instruction" when msb set

2015 Dec 17

2

llvm-3.6 MCAsmParser x64 Error "invalid operand for instruction" when msb set

Hello, I am experiencing problems, when trying to assemble these two x86-64 Opcodes "add r64, imm32" "imul r64, r64, imm32" When having the most significant bit set for imm32, for example: "add rax, 0x80000000", "add rax, 0xffffffff", ... "imul rbx, rsi, 0x80000000", "imul rbx, rsi, 0xffffffff", ... The Error Message I receive is the

[LLVMdev] IMUL x86 instruction

2014 Jul 13

2

[LLVMdev] IMUL x86 instruction

Hi, The x86 CPU IMUL instruction has forms such as: IMUL reg EDX:EAX ← EAX ∗ reg reg, EAX and EDX are 32bit registers. How can I represent this sort of instruction in LLVM IR ? It is really a 32bit * 32 bit = 64 bit, but no LLVM IR exists to do that. Or, a similar question: What LLVM IR would produce this IMUL instruction form? For context, I am writing a x86 to LLVM IR decompiler, so wish to

Where's the optimiser gone? (part 5.c): missed tail calls, and more...

2018 Dec 01

2

Where's the optimiser gone? (part 5.c): missed tail calls, and more...

Compile the following functions with "-O3 -target i386-win32" (see <https://godbolt.org/z/exmjWY>): __int64 __fastcall div(__int64 foo, __int64 bar) { return foo / bar; } On the left the generated code; on the right the expected, properly optimised code: push dword ptr [esp + 16] | push dword ptr [esp + 16] | push dword ptr [esp + 16] |

[LLVMdev] Area for improvement

2005 Feb 22

5

[LLVMdev] Area for improvement

...is generates the following X86 code: .text .align 16 .globl init_board .type init_board, @function init_board: subl $4, %esp movl %esi, (%esp) movl 8(%esp), %eax movl $0, %ecx .LBBinit_board_1: # loopexit.1 imull $7, %ecx, %edx movl %eax, %esi addl %edx, %esi movb $46, (%esi) imull $7, %ecx, %edx movl %eax, %esi addl %edx, %esi leal 1(%esi), %edx movb $46, (%edx) imull $7, %ecx, %edx movl %eax, %esi addl %edx, %esi...

[LLVMdev] Area for improvement

2005 Feb 22

0

[LLVMdev] Area for improvement

...t > .align 16 > .globl init_board > .type init_board, @function > init_board: > subl $4, %esp > movl %esi, (%esp) > movl 8(%esp), %eax > movl $0, %ecx > .LBBinit_board_1: # loopexit.1 > imull $7, %ecx, %edx > movl %eax, %esi > addl %edx, %esi > movb $46, (%esi) > imull $7, %ecx, %edx > movl %eax, %esi > addl %edx, %esi > leal 1(%esi), %edx > movb $46, (%edx) > imull $7, %ecx, %edx >...

[LLVMdev] Area for improvement

2005 Feb 22

0

[LLVMdev] Area for improvement

...WS+1]) > { > int i,j; > > for (i=0;i<COLS;i++) > for (j=0;j<ROWS;j++) > b[i][j]='.'; > for (i=0;i<COLS;i++) > b[i][ROWS]=0; > } > > This generates the following X86 code: > imull $7, %ecx, %edx > movl %eax, %esi > addl %edx, %esi > movb $46, (%esi) > imull $7, %ecx, %edx > movl %eax, %esi > addl %edx, %esi > leal 1(%esi), %edx ... (many many copies of this, see the end of the email for full output) ... > Th...

[LLVMdev] Area for improvement

2005 Feb 22

2

[LLVMdev] Area for improvement

...t; >> for (i=0;i<COLS;i++) >> for (j=0;j<ROWS;j++) >> b[i][j]='.'; >> for (i=0;i<COLS;i++) >> b[i][ROWS]=0; >> } >> >> This generates the following X86 code: >> imull $7, %ecx, %edx >> movl %eax, %esi >> addl %edx, %esi >> movb $46, (%esi) >> imull $7, %ecx, %edx >> movl %eax, %esi >> addl %edx, %esi >> leal 1(%esi), %edx > > ... (many many copies of this, see the end of th...

[LLVMdev] Failure to optimize ? operator

2011 Dec 14

2

[LLVMdev] Failure to optimize ? operator

...e for the two functions: ============================================== _f1: pushl %ebp xorl %eax, %eax movl %esp, %ebp movl 8(%ebp), %edx testl %edx, %edx jle L5 popl %ebp ret .p2align 4,,7L5: movl %edx, %ecx imull %edx, %ecx popl %ebp leal 3(%ecx,%ecx,4), %eax imull %edx, %eax leal 1(%eax,%ecx,2), %eax ret .p2align 4,,15 _f2: pushl %ebp xorl %eax, %eax movl %esp, %ebp movl 8(%ebp), %edx testl %edx, %edx jle...

[LLVMdev] equivalent IR, different asm

2010 Sep 01

5

[LLVMdev] equivalent IR, different asm

...20RenderBoxModelObjectEPNS_10StyleImageE ## BB#0: pushq %r14 pushq %rbx subq $8, %rsp movq %rsi, %rbx movq %rdi, %r14 movq %rdx, %rdi movq %rcx, %rsi callq __ZN7WebCore4viziEPKNS_20RenderBoxModelObjectEPNS_10StyleImageE movq %rax, %rcx shrq $32, %rcx testl %ecx, %ecx je LBB0_2 ## BB#1: imull (%rbx), %eax cltd idivl %ecx movl %eax, (%r14) LBB0_2: addq $8, %rsp popq %rbx popq %r14 ret $ llc opt-fail.ll -o - .section __TEXT,__text,regular,pure_instructions .globl __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE .align 4, 0x90 __ZN7WebCore6kolos1ERiS0_PKN...

[LLVMdev] Boostrap Failure -- Expected Differences?

2007 Apr 30

0

[LLVMdev] Boostrap Failure -- Expected Differences?

On Apr 27, 2007, at 3:50 PM, David Greene wrote: > The saga continues. > > I've been tracking the interface changes and merging them with > the refactoring work I'm doing. I got as far as building stage3 > of llvm-gcc but the object files from stage2 and stage3 differ: > > > warning: ./cc1-checksum.o differs > warning: ./cc1plus-checksum.o differs > >

[LLVMdev] Area for improvement

2005 Feb 22

0

[LLVMdev] Area for improvement

...S;i++) >>> for (j=0;j<ROWS;j++) >>> b[i][j]='.'; >>> for (i=0;i<COLS;i++) >>> b[i][ROWS]=0; >>> } >>> >>> This generates the following X86 code: >>> imull $7, %ecx, %edx >>> movl %eax, %esi >>> addl %edx, %esi >>> movb $46, (%esi) >>> imull $7, %ecx, %edx >>> movl %eax, %esi >>> addl %edx, %esi >>> leal 1(%esi), %edx >> >> ... (many m...

Where's the optimiser gone? (part 5.b): missed tail calls, and more...

2018 Dec 01

2

Where's the optimiser gone? (part 5.b): missed tail calls, and more...

Compile the following functions with "-O3 -target i386" (see <https://godbolt.org/z/VmKlXL>): long long div(long long foo, long long bar) { return foo / bar; } On the left the generated code; on the right the expected, properly optimised code: div: # @div push ebp | mov ebp, esp | push dword ptr [ebp + 20] | push

[LLVMdev] Folding an insertelt chain

2012 Feb 17

0

[LLVMdev] Folding an insertelt chain

On Feb 17, 2012, at 12:50 AM, Ivan Llopard wrote: > Hello, > > I've added a little combining operation in DAGCombiner to fold a chain of insertelt nodes if that chain is proved to fully overwrite the very first source vector. In which case, I supposed a build_vector is better. It seems to be safe but I don't know if it is correctly implemented or if it is already done somewhere

[LLVMdev] Boostrap Failure -- Expected Differences?

2007 Apr 27

2

[LLVMdev] Boostrap Failure -- Expected Differences?

The saga continues. I've been tracking the interface changes and merging them with the refactoring work I'm doing. I got as far as building stage3 of llvm-gcc but the object files from stage2 and stage3 differ: warning: ./cc1-checksum.o differs warning: ./cc1plus-checksum.o differs (Are the above two ok?) The list below is clearly bad. I think it's every object file in the

Condition removed? Difference between LLVM and GCC on a small testcase

2019 Mar 01

2

Condition removed? Difference between LLVM and GCC on a small testcase

Hello Dev, I have a very simple testcase, which shows strange difference between LLVM and GCC. Does anyone know which optimization pass removes the condition? Thanks! C code: extern void bar(int, int); void foo(int a) { int b, d; if (a > 114) { b = a * 58; } else { d = a * 51; } bar(b, d); } clang.7.0.1 -O2, LLVM generated assembly: 0: 6b c7 3a

[LLVMdev] Failure to optimize ? operator

2011 Dec 14

0

[LLVMdev] Failure to optimize ? operator

On Tue, Dec 13, 2011 at 5:59 AM, Brent Walker <brenthwalker at gmail.com> wrote: > The following seemingly identical functions, get compiled to quite > different machine code. The first is correctly optimized (the > computation of var y is nicely moved into the else branch of the "if" > statement), which the second one is not (the full computation of var y > is

[LLVMdev] equivalent IR, different asm

2010 Sep 01

0

[LLVMdev] equivalent IR, different asm

...; pushq %rbx > subq $8, %rsp > movq %rsi, %rbx > movq %rdi, %r14 > movq %rdx, %rdi > movq %rcx, %rsi > callq __ZN7WebCore4viziEPKNS_20RenderBoxModelObjectEPNS_10StyleImageE > movq %rax, %rcx > shrq $32, %rcx > testl %ecx, %ecx > je LBB0_2 > ## BB#1: > imull (%rbx), %eax > cltd > idivl %ecx > movl %eax, (%r14) > LBB0_2: > addq $8, %rsp > popq %rbx > popq %r14 > ret > > > $ llc opt-fail.ll -o - > > .section __TEXT,__text,regular,pure_instructions > .globl __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxMode...

[LLVMdev] Folding an insertelt chain

2012 Feb 17

3

[LLVMdev] Folding an insertelt chain

Hello, I've added a little combining operation in DAGCombiner to fold a chain of insertelt nodes if that chain is proved to fully overwrite the very first source vector. In which case, I supposed a build_vector is better. It seems to be safe but I don't know if it is correctly implemented or if it is already done somewhere else. Please find attached the patch. Regards, Ivan

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

2

KNL Assembly Code for Matrix Multiplication

Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=

[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs

2014 Oct 24

3

[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs

Hi, I noticed a significant performance regression (up to 40%) on some internal CUDA benchmarks (a reduced example presented below). The root cause of this regression seems that IndVarSimpilfy widens induction variables assuming arithmetics on wider integer types are as cheap as those on narrower ones. However, this assumption is wrong at least for the NVPTX64 target. Although the NVPTX64 target

search for: imull