thr3ads.net - search: "btver2"

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 25

4

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

I ran the benchmarking subset of test-suite on a btver2 machine and optimizing for btver2 (so enabling AVX codegen). I don't see anything outside of the noise with x86-experimental-vector-shuffle-legality=1. On Fri, Jan 23, 2015 at 5:19 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com > wrote: > Hi Chandler, > > On Fri, Jan 23, 2...

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 23

5

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

Greetings LLVM hackers and x86 vector shufflers! I would like to flip on another chunk of the new vector shuffling, specifically the logic to mark ~all shuffles as "legal". This can be tested today with the flag "-x86-experimental-vector-shuffle-legality". I would essentially like to make this the default (by removing the "false" path). Doing this will allow me to

Generating object files more efficiently

2019 Mar 23

2

Generating object files more efficiently

...andybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cannonlake, icelake-client, icelake-server, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, x86-64 ________________________________ From: Doerfert, Johannes <jdoerfert at anl.gov> Sent: Saturday, March 23, 2019 1:15 PM To: J S Cc: via llvm-dev Subject: Re: [llvm-dev] Generating object files more efficiently I would have guess...

Generating object files more efficiently

2019 Mar 23

4

Generating object files more efficiently

...andybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cannonlake, icelake-client, icelake-server, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, x86-64 ________________________________ From: Doerfert, Johannes <jdoerfert at anl.gov> Sent: Saturday, March 23, 2019 1:15 PM To: J S Cc: via llvm-dev Subject: Re: [llvm-dev] Generating object files more efficiently I would have guess...

[LLVMdev] Poor register allocation (constants causing spilling)

2015 Jul 14

4

[LLVMdev] Poor register allocation (constants causing spilling)

...ortunately, the full report is fairly long and detailed. However, in short, I found that not splitting rematerializable live-ranges lead to significantly better register allocation, and an overall performance improvement of 3%. *** The Problem Compile the attached testcase as follows: llc -mcpu=btver2 test.ll Examining the assembly in test.s we can see a constant is being loaded into %xmm8 (second instruction in foo). Tracing the constant we can see the following: foo: ... vmovaps .LCPI0_0(%rip), %xmm8 # xmm8 = [6.366197e-01,6.366197e-01,...] ... vmulps %xmm8, %xmm0, %x...

Generating object files more efficiently

2019 Mar 23

2

Generating object files more efficiently

...vx, ivybridge, core-avx-i, haswell, > core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, > cannonlake, icelake-client, icelake-server, knl, knm, k8, athlon64, > athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, > barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, > znver2, > x86-64 > > > ------------------------------ > *From:* Doerfert, Johannes <jdoerfert at anl.gov> > *Sent:* Saturday, March 23, 2019 1:15 PM > *To:* J S > *Cc:* via llvm-dev > *Subject:* Re: [llvm-dev] Gene...

[LLVMdev] [x86] Prefetch intrinsics and prefetchw

2015 Jul 30

0

[LLVMdev] [x86] Prefetch intrinsics and prefetchw

...%5, i32 1, i32 1, i32 1) tail call void @llvm.prefetch(i8* %6, i32 1, i32 2, i32 1) tail call void @llvm.prefetch(i8* %7, i32 1, i32 3, i32 1) The generated x86_64 code for the first 4 calls, where the read/write parameter is 0 (read) is exactly as expected: (Generated with clang -O2 -S -march=btver2 test.c) prefetchnta foo(%rip) prefetcht2 foo(%rip) prefetcht1 foo(%rip) prefetcht0 foo(%rip) The question is what should be expected when the r/w parameter is 1 (write). Currently the backend generates: prefetchnta foo(%rip) prefetcht2 foo(%rip) prefetcht1 foo(%rip) prefetchw foo(%rip)...

Generating object files more efficiently

2019 Mar 23

2

Generating object files more efficiently

Currently I compile my C code in 2 steps in order to generate .o files clang -emit-llvm -c foo.c -o foo.bc llc -march=XYZ foo.bc -filetype=obj Is there a way to generate either .o or .elf files in just 1 command? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190323/da9b3c18/attachment.html>

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 01

9