thr3ads.net - search: "lbb0

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

2014 Jul 23

4

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

...paddd %xmm1, %xmm0 movdqa %xmm0, %xmm1 movhlps %xmm1, %xmm1 # xmm1 = xmm1[1,1] paddd %xmm0, %xmm1 pshufd $1, %xmm1, %xmm0 # xmm0 = xmm1[1,0,0,0] paddd %xmm1, %xmm0 movd %xmm0, %eax cmpq %rdx, %rsi je .LBB0_7 .align 16, 0x90 .LBB0_6: # %scalar.ph # =>This Inner Loop Header: Depth=1 addl (%rdi), %eax addq $4, %rdi cmpq %rcx, %rdi jb .LBB0_6 .LBB0_7: # %._...

LLVM Loop vectorizer - 2 vector.body blocks appear

2016 Aug 01

2

LLVM Loop vectorizer - 2 vector.body blocks appear

Hello. Mikhail, with the more recent version of the LoopVectorize.cpp code (retrieved at the beginning of July 2016) I ran the following piece of C code: void foo(long *A, long *B, long *C, long N) { for (long i = 0; i < N; ++i) { C[i] = A[i] + B[i]; } } The vectorized LLVM program I obtain contains 2 vector.body blocks - one named

enabling interleaved access loop vectorization

2016 Aug 05

3

enabling interleaved access loop vectorization

...movdqu 16(%rdi,%rcx,2), %xmm2 pshufd $132, %xmm2, %xmm2 # xmm2 = xmm2[0,1,0,2] pshufd $232, %xmm0, %xmm0 # xmm0 = xmm0[0,2,2,3] pblendw $240, %xmm2, %xmm0 # xmm0 = xmm0[0,1,2,3],xmm2[4,5,6,7] paddd %xmm1, %xmm0 movdqu %xmm0, (%rsi,%rcx) cmpq $992, %rcx # imm = 0x3E0 jne .LBB0_7 The performance I see out of the 3 versions (with a 500K-iteration outer loop): Scalar: 0m10.320s Vector (Non-interleaved): 0m8.054s Vector (Interleaved): 0m3.541s This is far from being the perfect use case for interleaved access: 1) There's no real interleaving, just one strided gather, so...

enabling interleaved access loop vectorization

2016 May 26

2

enabling interleaved access loop vectorization

Interleaved access is not enabled on X86 yet. We looked at this feature and got into conclusion that interleaving (as loads + shuffles) is not always profitable on X86. We should provide the right cost which depends on number of shuffles. Number of shuffles depends on permutations (shuffle mask). And even if we estimate the number of shuffles, the shuffles are not generated in-place. Vectorizer

enabling interleaved access loop vectorization

2016 Aug 05

2

enabling interleaved access loop vectorization

...# xmm2 = xmm2[0,1,0,2] > > pshufd $232, %xmm0, %xmm0 # xmm0 = xmm0[0,2,2,3] > > pblendw $240, %xmm2, %xmm0 # xmm0 = xmm0[0,1,2,3],xmm2[4,5,6,7] > > paddd %xmm1, %xmm0 > > movdqu %xmm0, (%rsi,%rcx) > > cmpq $992, %rcx # imm = 0x3E0 > > jne .LBB0_7 > > > > The performance I see out of the 3 versions (with a 500K-iteration outer > loop): > > > > Scalar: 0m10.320s > > Vector (Non-interleaved): 0m8.054s > > Vector (Interleaved): 0m3.541s > > > > This is far from being the perfect use case for in...

Trouble when suppressing a portion of fast-math-transformations

2017 Sep 29

2

Trouble when suppressing a portion of fast-math-transformations

...changing the sense of the associated branch) when comparing '-O2' with '-O2 -ffast-math -fno-reciprocal-math': $ # New Clang behavior: $ # nearly identical, but there should be many diffs $ diff O2.s O2fm.no_arcp.s 188,189c188,189 < ucomisd %xmm5, %xmm6 < ja .LBB0_7 --- > ucomisd %xmm6, %xmm5 > jb .LBB0_7 $ In full disclosure, for this "mandelbrot.c" test-case, I don't know if any of the changes in code-gen done by us or by GCC when '-ffast-math' is enabled are helpful (from a performance perspective) or dangerous...

enabling interleaved access loop vectorization

2016 May 26

0

enabling interleaved access loop vectorization

On 26 May 2016 at 19:12, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Is there a compile-time and/or potential runtime cost that makes > enableInterleavedAccessVectorization() default to 'false'? > > I notice that this is set to true for ARM, AArch64, and PPC. > > In particular, I'm wondering if there's a reason it's not enabled for

Trouble when suppressing a portion of fast-math-transformations

2017 Sep 29

0

Trouble when suppressing a portion of fast-math-transformations

...ng '-O2' with > > '-O2 -ffast-math -fno-reciprocal-math': > > $ # New Clang behavior: > > $ # nearly identical, but there should be many diffs > > $ diff O2.s O2fm.no_arcp.s > > 188,189c188,189 > > < ucomisd %xmm5, %xmm6 > > < ja .LBB0_7 > > --- > > >ucomisd %xmm6, %xmm5 > > >jb .LBB0_7 > > $ > > In full disclosure, for this "mandelbrot.c" test-case, I don't know if > any of > > the changes in code-gen done by us or by GCC when '-ffast-math' is > enabled a...

[LLVMdev] Question regarding basic-block placement optimization

2011 Oct 19

0

[LLVMdev] Question regarding basic-block placement optimization

...LBB0_3: # %then2 movl %ebp, %edi movl $1, %esi movl %ebx, %edx callq error .LBB0_5: # %then3 movl %ebp, %edi movl $1, %esi movl %ebx, %edx callq error .LBB0_7: # %then4 movl %ebp, %edi movl $1, %esi movl %ebx, %edx callq error .LBB0_9: # %then5 movl %ebp, %edi movl $1, %esi movl %ebx, %edx callq error .Ltmp11...

bpf compilation using clang

2018 Sep 25

2

bpf compilation using clang

...<= 0x20 34: 77 02 00 00 20 00 00 00 r2 >>= 0x20 35: 2d 29 ee ff 00 00 00 00 if r9 > r2 goto -0x12 <LBB0_2> 36: b7 09 00 00 0e 00 00 00 r9 = 0xe 37: 55 08 03 00 a8 88 00 00 if r8 != 0x88a8 goto +0x3 <LBB0_7> 38: b7 09 00 00 12 00 00 00 r9 = 0x12 _______________________________________________________________________________ > > Cheers. > > Tim. -- Respectfully, Sameeh Jubran Linkedin Software Engineer @ Daynix.

bpf compilation using clang

2018 Sep 13

4

bpf compilation using clang

Hi all, I am trying to insert instructions into the bpf using the bpf syscall, the instructions were generated using the following command line: clang -I ~/Builds/bpf_rss/iproute2/include -Wall -target bpf -O2 -emit-llvm -c upstream/qemu/hw/net/rss_tap_bpf_program.c -o - | llc -march=bpf -filetype=obj -o tap_bpf_program.o and then were translated to bpf instructions using the BPFCparser tool

[LLVMdev] Question regarding basic-block placement optimization

2011 Oct 19

3

[LLVMdev] Question regarding basic-block placement optimization

On Tue, Oct 18, 2011 at 6:58 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk>wrote: > > On Oct 18, 2011, at 5:22 PM, Chandler Carruth wrote: > > As for why it should be an IR pass, mostly because once the selection dag >> runs through the code, we can never recover all of the freedom we have at >> the IR level. To start with, splicing MBBs around requires known about

search for: lbb0_7