search for: imull

Displaying 20 results from an estimated 93 matches for "imull".

Did you mean: smull
2015 Dec 17
2
llvm-3.6 MCAsmParser x64 Error "invalid operand for instruction" when msb set
Hello, I am experiencing problems, when trying to assemble these two x86-64 Opcodes "add r64, imm32" "imul r64, r64, imm32" When having the most significant bit set for imm32, for example: "add rax, 0x80000000", "add rax, 0xffffffff", ... "imul rbx, rsi, 0x80000000", "imul rbx, rsi, 0xffffffff", ... The Error Message I receive is the
2014 Jul 13
2
[LLVMdev] IMUL x86 instruction
Hi, The x86 CPU IMUL instruction has forms such as: IMUL reg EDX:EAX ← EAX ∗ reg reg, EAX and EDX are 32bit registers. How can I represent this sort of instruction in LLVM IR ? It is really a 32bit * 32 bit = 64 bit, but no LLVM IR exists to do that. Or, a similar question: What LLVM IR would produce this IMUL instruction form? For context, I am writing a x86 to LLVM IR decompiler, so wish to
2018 Dec 01
2
Where's the optimiser gone? (part 5.c): missed tail calls, and more...
Compile the following functions with "-O3 -target i386-win32" (see <https://godbolt.org/z/exmjWY>): __int64 __fastcall div(__int64 foo, __int64 bar) { return foo / bar; } On the left the generated code; on the right the expected, properly optimised code: push dword ptr [esp + 16] | push dword ptr [esp + 16] | push dword ptr [esp + 16] |
2005 Feb 22
5
[LLVMdev] Area for improvement
...is generates the following X86 code: .text .align 16 .globl init_board .type init_board, @function init_board: subl $4, %esp movl %esi, (%esp) movl 8(%esp), %eax movl $0, %ecx .LBBinit_board_1: # loopexit.1 imull $7, %ecx, %edx movl %eax, %esi addl %edx, %esi movb $46, (%esi) imull $7, %ecx, %edx movl %eax, %esi addl %edx, %esi leal 1(%esi), %edx movb $46, (%edx) imull $7, %ecx, %edx movl %eax, %esi addl %edx, %esi...
2005 Feb 22
0
[LLVMdev] Area for improvement
...t > .align 16 > .globl init_board > .type init_board, @function > init_board: > subl $4, %esp > movl %esi, (%esp) > movl 8(%esp), %eax > movl $0, %ecx > .LBBinit_board_1: # loopexit.1 > imull $7, %ecx, %edx > movl %eax, %esi > addl %edx, %esi > movb $46, (%esi) > imull $7, %ecx, %edx > movl %eax, %esi > addl %edx, %esi > leal 1(%esi), %edx > movb $46, (%edx) > imull $7, %ecx, %edx >...
2005 Feb 22
0
[LLVMdev] Area for improvement
...WS+1]) > { > int i,j; > > for (i=0;i<COLS;i++) > for (j=0;j<ROWS;j++) > b[i][j]='.'; > for (i=0;i<COLS;i++) > b[i][ROWS]=0; > } > > This generates the following X86 code: > imull $7, %ecx, %edx > movl %eax, %esi > addl %edx, %esi > movb $46, (%esi) > imull $7, %ecx, %edx > movl %eax, %esi > addl %edx, %esi > leal 1(%esi), %edx ... (many many copies of this, see the end of the email for full output) ... > Th...
2005 Feb 22
2
[LLVMdev] Area for improvement
...t; >> for (i=0;i<COLS;i++) >> for (j=0;j<ROWS;j++) >> b[i][j]='.'; >> for (i=0;i<COLS;i++) >> b[i][ROWS]=0; >> } >> >> This generates the following X86 code: >> imull $7, %ecx, %edx >> movl %eax, %esi >> addl %edx, %esi >> movb $46, (%esi) >> imull $7, %ecx, %edx >> movl %eax, %esi >> addl %edx, %esi >> leal 1(%esi), %edx > > ... (many many copies of this, see the end of th...
2011 Dec 14
2
[LLVMdev] Failure to optimize ? operator
...e for the two functions: ============================================== _f1:        pushl   %ebp        xorl    %eax, %eax        movl %esp, %ebp        movl    8(%ebp), %edx        testl   %edx, %edx   jle     L5        popl    %ebp        ret        .p2align 4,,7L5:     movl    %edx, %ecx        imull   %edx, %ecx        popl    %ebp     leal    3(%ecx,%ecx,4), %eax        imull   %edx, %eax leal    1(%eax,%ecx,2), %eax        ret        .p2align 4,,15 _f2:         pushl   %ebp        xorl    %eax, %eax        movl    %esp, %ebp        movl    8(%ebp), %edx        testl   %edx, %edx        jle...
2010 Sep 01
5
[LLVMdev] equivalent IR, different asm
...20RenderBoxModelObjectEPNS_10StyleImageE ## BB#0: pushq %r14 pushq %rbx subq $8, %rsp movq %rsi, %rbx movq %rdi, %r14 movq %rdx, %rdi movq %rcx, %rsi callq __ZN7WebCore4viziEPKNS_20RenderBoxModelObjectEPNS_10StyleImageE movq %rax, %rcx shrq $32, %rcx testl %ecx, %ecx je LBB0_2 ## BB#1: imull (%rbx), %eax cltd idivl %ecx movl %eax, (%r14) LBB0_2: addq $8, %rsp popq %rbx popq %r14 ret $ llc opt-fail.ll -o - .section __TEXT,__text,regular,pure_instructions .globl __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE .align 4, 0x90 __ZN7WebCore6kolos1ERiS0_PKN...
2007 Apr 30
0
[LLVMdev] Boostrap Failure -- Expected Differences?
On Apr 27, 2007, at 3:50 PM, David Greene wrote: > The saga continues. > > I've been tracking the interface changes and merging them with > the refactoring work I'm doing. I got as far as building stage3 > of llvm-gcc but the object files from stage2 and stage3 differ: > > > warning: ./cc1-checksum.o differs > warning: ./cc1plus-checksum.o differs > >
2005 Feb 22
0
[LLVMdev] Area for improvement
...S;i++) >>> for (j=0;j<ROWS;j++) >>> b[i][j]='.'; >>> for (i=0;i<COLS;i++) >>> b[i][ROWS]=0; >>> } >>> >>> This generates the following X86 code: >>> imull $7, %ecx, %edx >>> movl %eax, %esi >>> addl %edx, %esi >>> movb $46, (%esi) >>> imull $7, %ecx, %edx >>> movl %eax, %esi >>> addl %edx, %esi >>> leal 1(%esi), %edx >> >> ... (many m...
2018 Dec 01
2
Where's the optimiser gone? (part 5.b): missed tail calls, and more...
Compile the following functions with "-O3 -target i386" (see <https://godbolt.org/z/VmKlXL>): long long div(long long foo, long long bar) { return foo / bar; } On the left the generated code; on the right the expected, properly optimised code: div: # @div push ebp | mov ebp, esp | push dword ptr [ebp + 20] | push
2012 Feb 17
0
[LLVMdev] Folding an insertelt chain
On Feb 17, 2012, at 12:50 AM, Ivan Llopard wrote: > Hello, > > I've added a little combining operation in DAGCombiner to fold a chain of insertelt nodes if that chain is proved to fully overwrite the very first source vector. In which case, I supposed a build_vector is better. It seems to be safe but I don't know if it is correctly implemented or if it is already done somewhere
2007 Apr 27
2
[LLVMdev] Boostrap Failure -- Expected Differences?
The saga continues. I've been tracking the interface changes and merging them with the refactoring work I'm doing. I got as far as building stage3 of llvm-gcc but the object files from stage2 and stage3 differ: warning: ./cc1-checksum.o differs warning: ./cc1plus-checksum.o differs (Are the above two ok?) The list below is clearly bad. I think it's every object file in the
2019 Mar 01
2
Condition removed? Difference between LLVM and GCC on a small testcase
Hello Dev, I have a very simple testcase, which shows strange difference between LLVM and GCC. Does anyone know which optimization pass removes the condition? Thanks! C code: extern void bar(int, int); void foo(int a) { int b, d; if (a > 114) { b = a * 58; } else { d = a * 51; } bar(b, d); } clang.7.0.1 -O2, LLVM generated assembly: 0: 6b c7 3a
2011 Dec 14
0
[LLVMdev] Failure to optimize ? operator
On Tue, Dec 13, 2011 at 5:59 AM, Brent Walker <brenthwalker at gmail.com> wrote: > The following seemingly identical functions, get compiled to quite > different machine code.  The first is correctly optimized (the > computation of var y is nicely moved into the else branch of the "if" > statement), which the second one is not (the full computation of var y > is
2010 Sep 01
0
[LLVMdev] equivalent IR, different asm
...; pushq %rbx > subq $8, %rsp > movq %rsi, %rbx > movq %rdi, %r14 > movq %rdx, %rdi > movq %rcx, %rsi > callq __ZN7WebCore4viziEPKNS_20RenderBoxModelObjectEPNS_10StyleImageE > movq %rax, %rcx > shrq $32, %rcx > testl %ecx, %ecx > je LBB0_2 > ## BB#1: > imull (%rbx), %eax > cltd > idivl %ecx > movl %eax, (%r14) > LBB0_2: > addq $8, %rsp > popq %rbx > popq %r14 > ret > > > $ llc opt-fail.ll -o - > > .section __TEXT,__text,regular,pure_instructions > .globl __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxMode...
2012 Feb 17
3
[LLVMdev] Folding an insertelt chain
Hello, I've added a little combining operation in DAGCombiner to fold a chain of insertelt nodes if that chain is proved to fully overwrite the very first source vector. In which case, I supposed a build_vector is better. It seems to be safe but I don't know if it is correctly implemented or if it is already done somewhere else. Please find attached the patch. Regards, Ivan
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=
2014 Oct 24
3
[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs
Hi, I noticed a significant performance regression (up to 40%) on some internal CUDA benchmarks (a reduced example presented below). The root cause of this regression seems that IndVarSimpilfy widens induction variables assuming arithmetics on wider integer types are as cheap as those on narrower ones. However, this assumption is wrong at least for the NVPTX64 target. Although the NVPTX64 target