search for: btver2

Displaying 13 results from an estimated 13 matches for "btver2".

2015 Jan 25
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
I ran the benchmarking subset of test-suite on a btver2 machine and optimizing for btver2 (so enabling AVX codegen). I don't see anything outside of the noise with x86-experimental-vector-shuffle-legality=1. On Fri, Jan 23, 2015 at 5:19 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com > wrote: > Hi Chandler, > > On Fri, Jan 23, 2...
2015 Jan 23
5
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
Greetings LLVM hackers and x86 vector shufflers! I would like to flip on another chunk of the new vector shuffling, specifically the logic to mark ~all shuffles as "legal". This can be tested today with the flag "-x86-experimental-vector-shuffle-legality". I would essentially like to make this the default (by removing the "false" path). Doing this will allow me to
2019 Mar 23
2
Generating object files more efficiently
...andybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cannonlake, icelake-client, icelake-server, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, x86-64 ________________________________ From: Doerfert, Johannes <jdoerfert at anl.gov> Sent: Saturday, March 23, 2019 1:15 PM To: J S Cc: via llvm-dev Subject: Re: [llvm-dev] Generating object files more efficiently I would have guess...
2019 Mar 23
4
Generating object files more efficiently
...andybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cannonlake, icelake-client, icelake-server, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, x86-64 ________________________________ From: Doerfert, Johannes <jdoerfert at anl.gov> Sent: Saturday, March 23, 2019 1:15 PM To: J S Cc: via llvm-dev Subject: Re: [llvm-dev] Generating object files more efficiently I would have guess...
2015 Jul 14
4
[LLVMdev] Poor register allocation (constants causing spilling)
...ortunately, the full report is fairly long and detailed. However, in short, I found that not splitting rematerializable live-ranges lead to significantly better register allocation, and an overall performance improvement of 3%. *** The Problem Compile the attached testcase as follows: llc -mcpu=btver2 test.ll Examining the assembly in test.s we can see a constant is being loaded into %xmm8 (second instruction in foo). Tracing the constant we can see the following: foo: ... vmovaps .LCPI0_0(%rip), %xmm8 # xmm8 = [6.366197e-01,6.366197e-01,...] ... vmulps %xmm8, %xmm0, %x...
2019 Mar 23
2
Generating object files more efficiently
...vx, ivybridge, core-avx-i, haswell, > core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, > cannonlake, icelake-client, icelake-server, knl, knm, k8, athlon64, > athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, > barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, > znver2, > x86-64 > > > ------------------------------ > *From:* Doerfert, Johannes <jdoerfert at anl.gov> > *Sent:* Saturday, March 23, 2019 1:15 PM > *To:* J S > *Cc:* via llvm-dev > *Subject:* Re: [llvm-dev] Gene...
2015 Jul 30
0
[LLVMdev] [x86] Prefetch intrinsics and prefetchw
...%5, i32 1, i32 1, i32 1) tail call void @llvm.prefetch(i8* %6, i32 1, i32 2, i32 1) tail call void @llvm.prefetch(i8* %7, i32 1, i32 3, i32 1) The generated x86_64 code for the first 4 calls, where the read/write parameter is 0 (read) is exactly as expected: (Generated with clang -O2 -S -march=btver2 test.c) prefetchnta foo(%rip) prefetcht2 foo(%rip) prefetcht1 foo(%rip) prefetcht0 foo(%rip) The question is what should be expected when the r/w parameter is 1 (write). Currently the backend generates: prefetchnta foo(%rip) prefetcht2 foo(%rip) prefetcht1 foo(%rip) prefetchw foo(%rip)...
2019 Mar 23
2
Generating object files more efficiently
Currently I compile my C code in 2 steps in order to generate .o files clang -emit-llvm -c foo.c -o foo.bc llc -march=XYZ foo.bc -filetype=obj Is there a way to generate either .o or .elf files in just 1 command? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190323/da9b3c18/attachment.html>
2018 Mar 01
9
[RFC] llvm-mca: a static performance analysis tool
...tool has been mostly tested for x86 targets, but there are still several limitations, some of which could be overcome by integrating extra information into the scheduling models. As was mentioned before, this tool has been (and is still being) used internally in Sony to debug/triage issues in the btver2 scheduling model. We have also tested it on other targets to check how generic the tool is. In our experience, the tool makes it easy to identify simple mistakes like "wrong number of micro opcodes specified for an instruction", or "wrong set of hardware resources". Some of thes...
2018 Mar 02
0
[RFC] llvm-mca: a static performance analysis tool
...targets, but there are still several > limitations, some of which could be overcome by integrating extra > information > into the scheduling models. > > As was mentioned before, this tool has been (and is still being) used > internally > in Sony to debug/triage issues in the btver2 scheduling model. We have > also > tested it on other targets to check how generic the tool is. In our > experience, > the tool makes it easy to identify simple mistakes like "wrong number > of micro > opcodes specified for an instruction", or "wrong set of hardw...
2018 Mar 02
0
[RFC] llvm-mca: a static performance analysis tool
...86 targets, but there are still several > limitations, some of which could be overcome by integrating extra > information > into the scheduling models. > > As was mentioned before, this tool has been (and is still being) used > internally > in Sony to debug/triage issues in the btver2 scheduling model. We have also > tested it on other targets to check how generic the tool is. In our > experience, > the tool makes it easy to identify simple mistakes like "wrong number of > micro > opcodes specified for an instruction", or "wrong set of hardware >...
2018 Mar 02
0
[RFC] llvm-mca: a static performance analysis tool
...ted for x86 targets, but there are still several > limitations, some of which could be overcome by integrating extra information > into the scheduling models. > > As was mentioned before, this tool has been (and is still being) used internally > in Sony to debug/triage issues in the btver2 scheduling model. We have also > tested it on other targets to check how generic the tool is. In our experience, > the tool makes it easy to identify simple mistakes like "wrong number of micro > opcodes specified for an instruction", or "wrong set of hardware resources&quot...
2018 Mar 02
5
[RFC] llvm-mca: a static performance analysis tool
...86 targets, but there are still several > limitations, some of which could be overcome by integrating extra > information > into the scheduling models. > > As was mentioned before, this tool has been (and is still being) used > internally > in Sony to debug/triage issues in the btver2 scheduling model. We have also > tested it on other targets to check how generic the tool is. In our > experience, > the tool makes it easy to identify simple mistakes like "wrong number of > micro > opcodes specified for an instruction", or "wrong set of hardware >...