thr3ads.net - search: "andq"

2015 Jan 19

2

[LLVMdev] X86TargetLowering::LowerToBT

Which BTQ? There are three flavors. BTQ reg/reg BTQ reg/mem BTQ reg/imm I can imagine that the reg/reg and especially the reg/mem versions would be slow. However the shrq/and versions *with the same operands* would be slow as well. There's even a compiler comment about the reg/mem version saying "this is for disassembly only". But I doubt BTQ reg/imm would be microcoded. -- Ite

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 19

2

[LLVMdev] X86TargetLowering::LowerToBT

Sure. Attached is the file but here are the functions. The first uses a fixed bit offset. The second has a indexed bit offset. Compiling with llc -O3, LLVM version 3.7.0svn, it compiles the IR from IsBitSetB() using btq %rsi, %rdi. Good. But then it compiles IsBitSetA() with shrq/andq, which is is pretty much what Clang had generated as IR. shrq $25, %rdi andq $1, %rdi LLVM should be able to replace these two with a single X86_64 instruction: btq reg,25 The generated code is correct in both cases. It just isn't optimized in the immediate operatnd case. unsigned long long...

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

2014 Jul 23

4

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

..., %rcx leaq 4(%rdi), %rax cmpq %rax, %rcx cmovaq %rcx, %rax movq %rdi, %rsi notq %rsi addq %rax, %rsi shrq $2, %rsi incq %rsi xorl %edx, %edx movabsq $9223372036854775800, %rax # imm = 0x7FFFFFFFFFFFFFF8 andq %rsi, %rax pxor %xmm0, %xmm0 je .LBB0_1 # BB#2: # %vector.body.preheader leaq (%rdi,%rax,4), %r8 addq $16, %rdi movq %rsi, %rdx andq $-8, %rdx pxor %xmm0, %xmm0 pxor %xmm1, %xmm1 .align...

[LLVMdev] Need a clue to improve the optimization of some C code

2015 Mar 03

2

[LLVMdev] Need a clue to improve the optimization of some C code

...Thanks for any feedback. Ciao Nat! P.S. In case someone is interested, here is the assembler code and the IR that produced it. Relevant LLVM generated x86_64 assembler portion with -Os ~~~ testq %r12, %r12 je LBB0_5 ## BB#1: movq -8(%r12), %rcx movq (%rcx), %rax movq -8(%rax), %rdx andq %r15, %rdx cmpq %r15, (%rax,%rdx) je LBB0_2 ## BB#3: addq $8, %rcx jmp LBB0_4 LBB0_2: leaq 8(%rdx,%rax), %rcx LBB0_4: movq %r12, %rdi movq %r15, %rsi movq %r14, %rdx callq *(%rcx) movq %rax, %rbx LBB0_5: ~~~ Better/tighter assembler code would be (saves 2 instructions, one jump less) ~~~...

[LLVMdev] GCC Atomic NAND implementation

2011 Jul 12

0

[LLVMdev] GCC Atomic NAND implementation

Hey Guys, I have a newbie question about supporting the GNU atomic builtin, __sync_fetch_and_nand. It appears that LLVM 29 produces X86 assembly like the GCC versions below v4.4, i.e. NEGATE and AND notq %rax movq 48(%rsp), %rcx andq %rcx, %rax I'm looking to produce X86 assembly like GCC v4.4 and greater, i.e. NOT AND movq 48(%rsp), %rcx andq %rcx, %rax notq %rax I currently have custom code to make the switch between implementations, but it's invasive at best. Has the newer...

LLD support for ld64 mach-o linker synthesised symbols

2017 Jun 06

4

LLD support for ld64 mach-o linker synthesised symbols

...$__section - section$end$__SEGMENT$__section In asm: /* get imagebase and slide for static PIE and ASLR support in x86_64-xnu-musl */ .align 3 __image_base: .quad segment$start$__TEXT __start_static: .quad start .text .align 3 .global start start: xor %rbp,%rbp mov %rsp,%rdi andq $-16,%rsp movq __image_base(%rip), %rsi leaq start(%rip), %rdx subq __start_static(%rip), %rdx call __start_c In C: /* run C++ constructors in __libc_start_main for x86_64-xnu-musl */ typedef void (*__init_fn)(int, char **, char **, char **); extern __init_fn __init_...

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 19

6

[LLVMdev] X86TargetLowering::LowerToBT

I'm tracking down an X86 code generation malfeasance regarding BT (bit test) and I have some questions. This IR *matches* and then *X86TargetLowering::LowerToBT **is called:* %and = and i64 %shl, %val * ; (val & (1 << index)) != 0 ; *bit test with a *register* index This IR *does not match* and so *X86TargetLowering::LowerToBT **is not called:* %and = lshr i64 %val, 25

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 26

2

[LLVMdev] Can LLVM vectorize <2 x i32> type

...# %for.cond.preheader imull %r9d, %ebx testl %ebx, %ebx jle .LBB10_63 # BB#4: # %for.body.preheader leal -1(%rbx), %eax incq %rax xorl %edx, %edx movabsq $8589934584, %rcx # imm = 0x1FFFFFFF8 andq %rax, %rcx je .LBB10_8 I changed all the scalar operands to <2 x ValueType> ones. The IR becomes the following for.cond.preheader: ; preds = %if.end18 %mulS44_D = mul <2 x i32> %splatLDS24_D.splat, %splatLDS7_D.splat %cmp21128S45_D = icmp sgt...

LLD support for ld64 mach-o linker synthesised symbols

2017 Jun 06

2

LLD support for ld64 mach-o linker synthesised symbols

...gebase and slide for static PIE and ASLR support in x86_64-xnu-musl */ > > .align 3 > __image_base: > .quad segment$start$__TEXT > __start_static: > .quad start > .text > .align 3 > .global start > start: > xor %rbp,%rbp > mov %rsp,%rdi > andq $-16,%rsp > movq __image_base(%rip), %rsi > leaq start(%rip), %rdx > subq __start_static(%rip), %rdx > call __start_c > > In C: > > /* run C++ constructors in __libc_start_main for x86_64-xnu-musl */ > > typedef void (*__init_fn)(int, cha...

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 22

2

[LLVMdev] X86TargetLowering::LowerToBT

...> >>> Sure. Attached is the file but here are the functions. The first uses a fixed bit offset. The second has a indexed bit offset. Compiling with llc -O3, LLVM version 3.7.0svn, it compiles the IR from IsBitSetB() using btq %rsi, %rdi. Good. But then it compiles IsBitSetA() with shrq/andq, which is is pretty much what Clang had generated as IR. >>> >>> shrq $25, %rdi >>> andq $1, %rdi >>> >>> LLVM should be able to replace these two with a single X86_64 instruction: btq reg,25 >>> The generated code is correct in both cases. It...

[LLVMdev] Need a clue to improve the optimization of some C code

2015 Mar 03

2

[LLVMdev] Need a clue to improve the optimization of some C code

...d the IR that produced it. >> >> >> >> Relevant LLVM generated x86_64 assembler portion with -Os >> ~~~ >> testq %r12, %r12 >> je LBB0_5 >> ## BB#1: >> movq -8(%r12), %rcx >> movq (%rcx), %rax >> movq -8(%rax), %rdx >> andq %r15, %rdx >> cmpq %r15, (%rax,%rdx) >> je LBB0_2 >> ## BB#3: >> addq $8, %rcx >> jmp LBB0_4 >> LBB0_2: >> leaq 8(%rdx,%rax), %rcx >> LBB0_4: >> movq %r12, %rdi >> movq %r15, %rsi >> movq %r14, %rdx >> callq *(%rcx) &g...

[PATCH v2 00/11] xen: Initial kexec/kdump implementation

2012 Nov 20

12

[PATCH v2 00/11] xen: Initial kexec/kdump implementation

Hi, This set of patches contains initial kexec/kdump implementation for Xen v2 (previous version were posted to few people by mistake; sorry for that). Currently only dom0 is supported, however, almost all infrustructure required for domU support is ready. Jan Beulich suggested to merge Xen x86 assembler code with baremetal x86 code. This could simplify and reduce a bit size of kernel code.

[PATCH v2 00/11] xen: Initial kexec/kdump implementation

2012 Nov 20

12

[PATCH v2 00/11] xen: Initial kexec/kdump implementation

Hi, This set of patches contains initial kexec/kdump implementation for Xen v2 (previous version were posted to few people by mistake; sorry for that). Currently only dom0 is supported, however, almost all infrustructure required for domU support is ready. Jan Beulich suggested to merge Xen x86 assembler code with baremetal x86 code. This could simplify and reduce a bit size of kernel code.

[PATCH v2 00/11] xen: Initial kexec/kdump implementation

2012 Nov 20

12

[PATCH v2 00/11] xen: Initial kexec/kdump implementation

Hi, This set of patches contains initial kexec/kdump implementation for Xen v2 (previous version were posted to few people by mistake; sorry for that). Currently only dom0 is supported, however, almost all infrustructure required for domU support is ready. Jan Beulich suggested to merge Xen x86 assembler code with baremetal x86 code. This could simplify and reduce a bit size of kernel code.

LLD support for ld64 mach-o linker synthesised symbols

2017 Jun 07

3

LLD support for ld64 mach-o linker synthesised symbols

...x86_64-xnu-musl */ >> >> .align 3 >> __image_base: >> .quad segment$start$__TEXT >> __start_static: >> .quad start >> .text >> .align 3 >> .global start >> start: >> xor %rbp,%rbp >> mov %rsp,%rdi >> andq $-16,%rsp >> movq __image_base(%rip), %rsi >> leaq start(%rip), %rdx >> subq __start_static(%rip), %rdx >> call __start_c >> >> >> In C: >> >> /* run C++ constructors in __libc_start_main for x86_64-xnu-musl */ >&g...

[LLVMdev] RFC: Constant Hoisting

2015 Feb 03

2

[LLVMdev] RFC: Constant Hoisting

I've had a bug/pessimization which I've tracked down for 1 bit bitmasks: if (((xx) & (1ULL << (40)))) return 1; if (!((yy) & (1ULL << (40)))) ... The second time Constant Hoisting sees the value (1<<40) it wraps it up with a bitcast. That value then gets hoisted. However, the first (1<<40) is not bitcast and gets recognized as a BT. The second

[RFC/PATCH PV_OPS X86_64 03/17] paravirt_ops - system routines

2007 Apr 18

0

[RFC/PATCH PV_OPS X86_64 03/17] paravirt_ops - system routines

...__("movq %%cr4,%%rax\n\t" - "orq %0,%%rax\n\t" - "movq %%rax,%%cr4\n" - : : "irg" (mask) - :"ax"); -} - -static inline void clear_in_cr4 (unsigned long mask) -{ - mmu_cr4_features &= ~mask; - __asm__("movq %%cr4,%%rax\n\t" - "andq %0,%%rax\n\t" - "movq %%rax,%%cr4\n" - : : "irg" (~mask) - :"ax"); -} - - -/* * User space process size. 47bits minus one guard page. */ #define TASK_SIZE64 (0x800000000000UL - 4096) @@ -299,6 +270,10 @@ struct thread_struct { set_fs(USER_DS); \...

[RFC/PATCH PV_OPS X86_64 03/17] paravirt_ops - system routines

2007 Apr 18

0

[RFC/PATCH PV_OPS X86_64 03/17] paravirt_ops - system routines

...__("movq %%cr4,%%rax\n\t" - "orq %0,%%rax\n\t" - "movq %%rax,%%cr4\n" - : : "irg" (mask) - :"ax"); -} - -static inline void clear_in_cr4 (unsigned long mask) -{ - mmu_cr4_features &= ~mask; - __asm__("movq %%cr4,%%rax\n\t" - "andq %0,%%rax\n\t" - "movq %%rax,%%cr4\n" - : : "irg" (~mask) - :"ax"); -} - - -/* * User space process size. 47bits minus one guard page. */ #define TASK_SIZE64 (0x800000000000UL - 4096) @@ -299,6 +270,10 @@ struct thread_struct { set_fs(USER_DS); \...

Xen package security updates for jessie 4.4, XSA-213, XSA-214

2017 May 04

4

Xen package security updates for jessie 4.4, XSA-213, XSA-214

...RCX, R11, [DS-GS,] [ERRCODE,] RIP, CS, RFLAGS, RSP, SS } */ + /* %rdx: trap_bounce, %rbx: struct vcpu */ + /* On return only %rbx and %rdx are guaranteed non-clobbered. */ + create_bounce_frame: +@@ -366,7 +366,7 @@ create_bounce_frame: + 2: andq $~0xf,%rsi # Stack frames are 16-byte aligned. + movq $HYPERVISOR_VIRT_START,%rax + cmpq %rax,%rsi +- movq $HYPERVISOR_VIRT_END+60,%rax ++ movq $HYPERVISOR_VIRT_END+12*8,%rax + sbb %ecx,%ecx # In +ve address space? Then oka...

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 24

2

[LLVMdev] Can LLVM vectorize <2 x i32> type

Hi, Is LLVM be able to generate code for the following code? %mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type. I am running it on a Haswell processor with LLVM-3.4.2. It seems that it will generates really complicated code with vpaddq, vpmuludq, vpsllq, vpsrlq. Thanks, Zhi -------------- next part -------------- An HTML attachment was scrubbed... URL:

search for: andq