Displaying 12 results from an estimated 12 matches for "lbb0_7".
Did you mean:
lbb0_1
2014 Jul 23
4
[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
...paddd %xmm1, %xmm0
movdqa %xmm0, %xmm1
movhlps %xmm1, %xmm1 # xmm1 = xmm1[1,1]
paddd %xmm0, %xmm1
pshufd $1, %xmm1, %xmm0 # xmm0 = xmm1[1,0,0,0]
paddd %xmm1, %xmm0
movd %xmm0, %eax
cmpq %rdx, %rsi
je .LBB0_7
.align 16, 0x90
.LBB0_6: # %scalar.ph
# =>This Inner Loop Header: Depth=1
addl (%rdi), %eax
addq $4, %rdi
cmpq %rcx, %rdi
jb .LBB0_6
.LBB0_7: # %._...
2016 Aug 01
2
LLVM Loop vectorizer - 2 vector.body blocks appear
Hello.
Mikhail, with the more recent version of the LoopVectorize.cpp code (retrieved at the
beginning of July 2016) I ran the following piece of C code:
void foo(long *A, long *B, long *C, long N) {
for (long i = 0; i < N; ++i) {
C[i] = A[i] + B[i];
}
}
The vectorized LLVM program I obtain contains 2 vector.body blocks - one named
2016 Aug 05
3
enabling interleaved access loop vectorization
...movdqu 16(%rdi,%rcx,2), %xmm2
pshufd $132, %xmm2, %xmm2 # xmm2 = xmm2[0,1,0,2]
pshufd $232, %xmm0, %xmm0 # xmm0 = xmm0[0,2,2,3]
pblendw $240, %xmm2, %xmm0 # xmm0 = xmm0[0,1,2,3],xmm2[4,5,6,7]
paddd %xmm1, %xmm0
movdqu %xmm0, (%rsi,%rcx)
cmpq $992, %rcx # imm = 0x3E0
jne .LBB0_7
The performance I see out of the 3 versions (with a 500K-iteration outer loop):
Scalar: 0m10.320s
Vector (Non-interleaved): 0m8.054s
Vector (Interleaved): 0m3.541s
This is far from being the perfect use case for interleaved access:
1) There's no real interleaving, just one strided gather, so...
2016 May 26
2
enabling interleaved access loop vectorization
Interleaved access is not enabled on X86 yet.
We looked at this feature and got into conclusion that interleaving (as loads + shuffles) is not always profitable on X86. We should provide the right cost which depends on number of shuffles. Number of shuffles depends on permutations (shuffle mask). And even if we estimate the number of shuffles, the shuffles are not generated in-place. Vectorizer
2016 Aug 05
2
enabling interleaved access loop vectorization
...# xmm2 = xmm2[0,1,0,2]
>
> pshufd $232, %xmm0, %xmm0 # xmm0 = xmm0[0,2,2,3]
>
> pblendw $240, %xmm2, %xmm0 # xmm0 = xmm0[0,1,2,3],xmm2[4,5,6,7]
>
> paddd %xmm1, %xmm0
>
> movdqu %xmm0, (%rsi,%rcx)
>
> cmpq $992, %rcx # imm = 0x3E0
>
> jne .LBB0_7
>
>
>
> The performance I see out of the 3 versions (with a 500K-iteration outer
> loop):
>
>
>
> Scalar: 0m10.320s
>
> Vector (Non-interleaved): 0m8.054s
>
> Vector (Interleaved): 0m3.541s
>
>
>
> This is far from being the perfect use case for in...
2017 Sep 29
2
Trouble when suppressing a portion of fast-math-transformations
...changing
the sense of the associated branch) when comparing '-O2' with
'-O2 -ffast-math -fno-reciprocal-math':
$ # New Clang behavior:
$ # nearly identical, but there should be many diffs
$ diff O2.s O2fm.no_arcp.s
188,189c188,189
< ucomisd %xmm5, %xmm6
< ja .LBB0_7
---
> ucomisd %xmm6, %xmm5
> jb .LBB0_7
$
In full disclosure, for this "mandelbrot.c" test-case, I don't know if any of
the changes in code-gen done by us or by GCC when '-ffast-math' is enabled are
helpful (from a performance perspective) or dangerous...
2016 May 26
0
enabling interleaved access loop vectorization
On 26 May 2016 at 19:12, Sanjay Patel via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> Is there a compile-time and/or potential runtime cost that makes
> enableInterleavedAccessVectorization() default to 'false'?
>
> I notice that this is set to true for ARM, AArch64, and PPC.
>
> In particular, I'm wondering if there's a reason it's not enabled for
2017 Sep 29
0
Trouble when suppressing a portion of fast-math-transformations
...ng '-O2' with
>
> '-O2 -ffast-math -fno-reciprocal-math':
>
> $ # New Clang behavior:
>
> $ # nearly identical, but there should be many diffs
>
> $ diff O2.s O2fm.no_arcp.s
>
> 188,189c188,189
>
> < ucomisd %xmm5, %xmm6
>
> < ja .LBB0_7
>
> ---
>
> >ucomisd %xmm6, %xmm5
>
> >jb .LBB0_7
>
> $
>
> In full disclosure, for this "mandelbrot.c" test-case, I don't know if
> any of
>
> the changes in code-gen done by us or by GCC when '-ffast-math' is
> enabled a...
2011 Oct 19
0
[LLVMdev] Question regarding basic-block placement optimization
...LBB0_3: # %then2
movl %ebp, %edi
movl $1, %esi
movl %ebx, %edx
callq error
.LBB0_5: # %then3
movl %ebp, %edi
movl $1, %esi
movl %ebx, %edx
callq error
.LBB0_7: # %then4
movl %ebp, %edi
movl $1, %esi
movl %ebx, %edx
callq error
.LBB0_9: # %then5
movl %ebp, %edi
movl $1, %esi
movl %ebx, %edx
callq error
.Ltmp11...
2018 Sep 25
2
bpf compilation using clang
...<= 0x20
34: 77 02 00 00 20 00 00 00 r2 >>= 0x20
35: 2d 29 ee ff 00 00 00 00 if r9 > r2 goto -0x12 <LBB0_2>
36: b7 09 00 00 0e 00 00 00 r9 = 0xe
37: 55 08 03 00 a8 88 00 00 if r8 != 0x88a8 goto
+0x3 <LBB0_7>
38: b7 09 00 00 12 00 00 00 r9 = 0x12
_______________________________________________________________________________
>
> Cheers.
>
> Tim.
--
Respectfully,
Sameeh Jubran
Linkedin
Software Engineer @ Daynix.
2018 Sep 13
4
bpf compilation using clang
Hi all,
I am trying to insert instructions into the bpf using the bpf syscall,
the instructions were generated using the following command line:
clang -I ~/Builds/bpf_rss/iproute2/include -Wall -target bpf -O2
-emit-llvm -c upstream/qemu/hw/net/rss_tap_bpf_program.c -o - | llc
-march=bpf -filetype=obj -o tap_bpf_program.o
and then were translated to bpf instructions using the BPFCparser tool
2011 Oct 19
3
[LLVMdev] Question regarding basic-block placement optimization
On Tue, Oct 18, 2011 at 6:58 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk>wrote:
>
> On Oct 18, 2011, at 5:22 PM, Chandler Carruth wrote:
>
> As for why it should be an IR pass, mostly because once the selection dag
>> runs through the code, we can never recover all of the freedom we have at
>> the IR level. To start with, splicing MBBs around requires known about