thr3ads.net - search: "incq"

2015 Aug 08

2

RFC: PGO Late instrumentation for LLVM

...llbacks) >> can incur overhead due to indirect call target profiling. >> >> >> 1.1 Redundant Counter Update >> >> If checking the assembly of the instrumented binary generated by current >> LLVM implementation, we can find many sequence of consecutive 'incq' >> instructions that updating difference counters in the same basic block. As >> an example that extracted from real binary: >> ... >> incq 0xa91d80(%rip) # 14df4b8 >> <__llvm_profile_counters__ZN13LowLevelAlloc5ArenaC2Ev+0x1b8> >> incq...

RFC: PGO Late instrumentation for LLVM

2015 Aug 10

3

RFC: PGO Late instrumentation for LLVM

...target profiling. > >>> > >>> > >>> 1.1 Redundant Counter Update > >>> > >>> If checking the assembly of the instrumented binary generated by > current > >>> LLVM implementation, we can find many sequence of consecutive 'incq' > >>> instructions that updating difference counters in the same basic > block. As > >>> an example that extracted from real binary: > >>> ... > >>> incq 0xa91d80(%rip) # 14df4b8 > >>> <__llvm_profile_counters__ZN13L...

RFC: PGO Late instrumentation for LLVM

2015 Aug 08

3

RFC: PGO Late instrumentation for LLVM

.... Small and hot callee functions taking function pointer (callbacks) can incur overhead due to indirect call target profiling. 1.1 Redundant Counter Update If checking the assembly of the instrumented binary generated by current LLVM implementation, we can find many sequence of consecutive 'incq' instructions that updating difference counters in the same basic block. As an example that extracted from real binary: ... incq 0xa91d80(%rip) # 14df4b8 <__llvm_profile_counters__ZN13LowLevelAlloc5ArenaC2Ev+0x1b8> incq 0xa79011(%rip) # 14c6750 <__llvm_profile_cou...

AVX2 codegen - question reg. FMA generation

2019 Sep 02

3

AVX2 codegen - question reg. FMA generation

...#39;s the snippet in the output it generates: $ llc -O3 -mcpu=skylake --------------------- .LBB0_2: # =>This Inner Loop Header: Depth=1 vbroadcastss (%rsi,%rdx,4), %ymm0 vmulps (%rdi,%rcx), %ymm0, %ymm0 vaddps (%rax,%rcx), %ymm0, %ymm0 vmovups %ymm0, (%rax,%rcx) incq %rdx addq $32, %rcx cmpq $15, %rdx jle .LBB0_2 ----------------------- $ llc --version LLVM (http://llvm.org/): LLVM version 8.0.0 Optimized build. Default target: x86_64-unknown-linux-gnu Host CPU: skylake (llvm commit 198009ae8db11d7c0b0517f17358870dc486fcfb from Aug 31) Using opt -O3 f...

[LLVMdev] asan coverage

2014 Feb 18

2

[LLVMdev] asan coverage

...ompared it with AsanCoverage. AsanCoverage produces code like this: mov 0xe86cce(%rip),%al test %al,%al je 48b4a0 # to call __sanitizer_cov ... callq 4715b0 <__sanitizer_cov> A simple counter-based thing (which just increments counters and does nothing else useful) produces this: incq 0xe719c6(%rip) The performance is more or less the same, although the issue with false sharing still remains (http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/066116.html) Do you have any more details about the planned clang coverage? Thanks, --kcc On Tue, Feb 18, 2014 at 1:00 PM, K...

[LLVMdev] asan coverage

2014 Feb 19

2

[LLVMdev] asan coverage

...produces code like this: > mov 0xe86cce(%rip),%al > test %al,%al > je 48b4a0 # to call __sanitizer_cov > ... > callq 4715b0 <__sanitizer_cov> > > A simple counter-based thing (which just increments counters and does > nothing else useful) produces this: > incq 0xe719c6(%rip) > > The performance is more or less the same, although the issue with false > sharing still remains > (http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/066116.html) > > Do you have any more details about the planned clang coverage? > > Thanks, > &...

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 18

4

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

On Fri, Apr 18, 2014 at 12:13 AM, Dmitry Vyukov <dvyukov at google.com> wrote: > Hi, > > This is long thread, so I will combine several comments into single email. > > > >> - 8-bit per-thread counters, dumping into central counters on overflow. > >The overflow will happen very quickly with 8bit counter. > > Yes, but it reduces contention by 256x (a thread

2017 Jul 17

2

A bug related with undef value when bootstrap MemorySSA.cpp

....LBB1_1: # =>This Inner Loop Header: Depth=1 86 testb $1, %sil 87 je .LBB1_3 88 # BB#2: # in Loop: Header=BB1_1 Depth=1 89 movq b(%rip), %rsi 90 addq %rax, %rsi 91 movq %rsi, c(%rip) 92 movq $3, i_hasval(%rip) 93 incq %rdx 94 xorl %esi, %esi 95 cmpq %rcx, %rdx 96 jl .LBB1_1 97 .LBB1_3: 98 retq ``` IMHO, enhancing `isGuaranteedNotToBeUndefOrPoison` and using it as a precondition in loop unswitching is not enough. undef (and poison) value can be stored into memory, and also be passed by a function arg...

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 26

2

[LLVMdev] Can LLVM vectorize <2 x i32> type

...label %middle.block, label %vector.ph The corresponding assembly code is: # BB#3: # %for.cond.preheader imull %r9d, %ebx testl %ebx, %ebx jle .LBB10_63 # BB#4: # %for.body.preheader leal -1(%rbx), %eax incq %rax xorl %edx, %edx movabsq $8589934584, %rcx # imm = 0x1FFFFFFF8 andq %rax, %rcx je .LBB10_8 I changed all the scalar operands to <2 x ValueType> ones. The IR becomes the following for.cond.preheader: ; preds = %if.end18 %...

2017 Jul 17

3

A bug related with undef value when bootstrap MemorySSA.cpp

...t;> 86 testb $1, %sil >> 87 je .LBB1_3 >> 88 # BB#2: # in Loop: Header=BB1_1 >> Depth=1 >> 89 movq b(%rip), %rsi >> 90 addq %rax, %rsi >> 91 movq %rsi, c(%rip) >> 92 movq $3, i_hasval(%rip) >> 93 incq %rdx >> 94 xorl %esi, %esi >> 95 cmpq %rcx, %rdx >> 96 jl .LBB1_1 >> 97 .LBB1_3: >> 98 retq >> ``` >> >> IMHO, enhancing `isGuaranteedNotToBeUndefOrPoison` and using it as a >> precondition in loop unswitching is >> not enough....

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

2014 Jul 23

4

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

...fferent (not just inlined) to the_func clang -DITER -O2 clang -DITER -O3 gives: the_func: leaq 12(%rdi), %rcx leaq 4(%rdi), %rax cmpq %rax, %rcx cmovaq %rcx, %rax movq %rdi, %rsi notq %rsi addq %rax, %rsi shrq $2, %rsi incq %rsi xorl %edx, %edx movabsq $9223372036854775800, %rax # imm = 0x7FFFFFFFFFFFFFF8 andq %rsi, %rax pxor %xmm0, %xmm0 je .LBB0_1 # BB#2: # %vector.body.preheader leaq (%rdi,%rax,4), %r8 addq $16, %rdi...

2017 Jul 17

3

A bug related with undef value when bootstrap MemorySSA.cpp

...gt;> 88 # BB#2: # in Loop: Header=BB1_1 >> >> Depth=1 >> >> 89 movq b(%rip), %rsi >> >> 90 addq %rax, %rsi >> >> 91 movq %rsi, c(%rip) >> >> 92 movq $3, i_hasval(%rip) >> >> 93 incq %rdx >> >> 94 xorl %esi, %esi >> >> 95 cmpq %rcx, %rdx >> >> 96 jl .LBB1_1 >> >> 97 .LBB1_3: >> >> 98 retq >> >> ``` >> >> >> >> IMHO, enhancing `isGuaranteedNotToBeUndefOrPoison` and using i...

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 23

4

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

...) { v[0] = 42; } > > Here we have a single basic block and a call, but since the coverage is emitted by the > FE before inlining (and is also emitted for std::vector methods) we get this assembler at -O2: > 0000000000400b90 <_Z3foov>: > 400b90: 48 ff 05 11 25 20 00 incq 0x202511(%rip) # 6030a8 <__llvm_profile_counters__Z3foov> > 400b97: 48 ff 05 42 25 20 00 incq 0x202542(%rip) # 6030e0 <__llvm_profile_counters__ZNSt6vectorIiSaIiEEixEm> > 400b9e: 48 8b 05 4b 26 20 00 mov 0x20264b(%rip),%rax # 6031f...

[LLVMdev] Bignum development

2010 Jun 13

2

[LLVMdev] Bignum development

...t; # => This Inner Loop Header: Depth=2 > addq (%rbx,%rsi,8), %rdi > movl $0, %r8d > adcq $0, %r8 > addq (%r14,%rsi,8), %rdi > adcq $0, %r8 > movq %rdi, (%r15,%rsi,8) > incq %rsi > cmpq $1000, %rsi # imm = 0x3E8 > movq %r8, %rdi > jne .LBB1_7 > > So it basically tries to keep track of the carry in %r8 instead of in > the carry flag. > > As hinted, the other optimisation missed here, is that instead o...

[LLVMdev] Bignum development

2010 Jun 12

0

[LLVMdev] Bignum development

...# Parent Loop BB1_6 Depth=1 # => This Inner Loop Header: Depth=2 addq (%rbx,%rsi,8), %rdi movl $0, %r8d adcq $0, %r8 addq (%r14,%rsi,8), %rdi adcq $0, %r8 movq %rdi, (%r15,%rsi,8) incq %rsi cmpq $1000, %rsi # imm = 0x3E8 movq %r8, %rdi jne .LBB1_7 So it basically tries to keep track of the carry in %r8 instead of in the carry flag. As hinted, the other optimisation missed here, is that instead of comparing with $1000 it can start...

2017 Jul 18

4

A bug related with undef value when bootstrap MemorySSA.cpp

...# in Loop: Header=BB1_1 >>>> >> Depth=1 >>>> >> 89 movq b(%rip), %rsi >>>> >> 90 addq %rax, %rsi >>>> >> 91 movq %rsi, c(%rip) >>>> >> 92 movq $3, i_hasval(%rip) >>>> >> 93 incq %rdx >>>> >> 94 xorl %esi, %esi >>>> >> 95 cmpq %rcx, %rdx >>>> >> 96 jl .LBB1_1 >>>> >> 97 .LBB1_3: >>>> >> 98 retq >>>> >> ``` >>>> >> >>>> >&gt...

[LLVMdev] tail call optimization question

2011 Dec 22

1

[LLVMdev] tail call optimization question

...## %if.no movq %rdi, %rbx testq %rsi, %rsi jle LBB1_4 ## BB#2: ## %if.no2 decq %rsi movq %rbx, %rdi callq _ack.15 movq %rbx, %rdi decq %rdi movq %rax, %rsi popq %rbx jmp _ack.15 ## TAILCALL LBB1_3: ## %if.yes incq %rsi movq %rsi, %rax popq %rbx ret LBB1_4: ## %if.yes1 movq %rbx, %rdi decq %rdi movl $1, %esi popq %rbx jmp _ack.15 ## TAILCALL Leh_func_end1: <snip> Thanks very much, N

[LLVMdev] Bignum development

2010 Jun 13

0

[LLVMdev] Bignum development

...# => This Inner Loop Header: Depth=2 >> addq (%rbx,%rsi,8), %rdi >> movl $0, %r8d >> adcq $0, %r8 >> addq (%r14,%rsi,8), %rdi >> adcq $0, %r8 >> movq %rdi, (%r15,%rsi,8) >> incq %rsi >> cmpq $1000, %rsi # imm = 0x3E8 >> movq %r8, %rdi >> jne .LBB1_7 >> >> So it basically tries to keep track of the carry in %r8 instead of in >> the carry flag. >> >> As hinted, the other optimisat...

[LLVMdev] asan coverage

2014 Feb 18

2

[LLVMdev] asan coverage

On Feb 17, 2014, at 5:13 AM, Kostya Serebryany <kcc at google.com> wrote: > Then my question: will there be any objection if I disentangle AsanCoverage from ASan and make it a separate LLVM phase with the proper clang driver support? > Or it will be an unwelcome competition with the planned clang coverage? I don’t view it as a competition, but assuming that we both succeed in our

[LLVMdev] better code for IV

2014 Feb 19

2

[LLVMdev] better code for IV

....LBB1_1: # %L_entry # =>This Inner Loop Header: Depth=1 movslq %eax, %r8 movss (%rdi,%r8,4), %xmm0 addss (%rsi,%r8,4), %xmm0 movss %xmm0, (%rdx,%r8,4) incq %rax cmpq %rax, %rcx jne .LBB1_1 # BB#2: Ret --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipi...

search for: incq