thr3ads.net - similar to: "RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available"

Displaying 20 results from an estimated 1000 matches similar to: "RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available"

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 03

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev < llvm-dev at lists.llvm.org> wrote: > On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hello all, >> >> >> >> I would like to propose adding the -mprefer-avx256 and -mprefer-avx128 >> command line flags supported by latest GCC to

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 07

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

On Fri, Nov 3, 2017, at 05:47, Craig Topper via llvm-dev wrote: > That's a very good point about the ordering of the command line options. > gcc's current implementation treats -mprefer-avx256 has "prefer 256 over > 512" and -mprefer-avx128 as "prefer 128 over 256". Which feels weird for > other reasons, but has less of an ordering ambiguity. > >

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 09

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

I agree that a less x86 specific command line makes sense. I've been having an internal discussions with gcc folks and their evaluating switching to something like -mprefer-vector-width=128/256/512/none Based on the current performance data we're seeing, we think we need to ultimately default skylake-avx512 to -mprefer-vector-width=256. If we go with a target independent

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 11

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

Are you referring to the X86TargetLowering::isFsqrtCheap hook? ~Craig On Fri, Nov 10, 2017 at 7:39 AM, Sanjay Patel <spatel at rotateright.com> wrote: > We can tie a user preference / override to a CPU model. We do something > like that for square root estimates already (although it does use a > SubtargetFeature currently for x86; ideally, we'd key that off of something >

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 12

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

If skylake is that bad at AVX2 it belongs in -mcpu / -march IMO. Most people will build for the standard x86_64-pc-linux or whatever anyway, and completely ignore the change. This will mainly affect those who build their own software and optimize for their system, and lots there have probably caught on to this already. I always thought that's what -march was made for, really. GNOMETOYS

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 13

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

On Sat, Nov 11, 2017 at 8:52 PM, Hal Finkel via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > On 11/11/2017 09:52 PM, UE US via llvm-dev wrote: > > If skylake is that bad at AVX2 > > > I don't think this says anything negative about AVX2, but AVX-512. > > it belongs in -mcpu / -march IMO. > > > No. We'd still want to enable the architectural

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 13

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

On 11/13/2017 05:49 PM, Eric Christopher wrote: > > > On Mon, Nov 13, 2017 at 2:15 PM Craig Topper via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > On Sat, Nov 11, 2017 at 8:52 PM, Hal Finkel via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > > On

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

2017 Nov 14

RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

I haven't looked into actually implementing revectorization, so we may just want to ignore that possibility for now. But I imagined that revectorization could hit the same problem that we're trying to avoid here: if the cost models say that wider vectors are legal and cheaper, but the reality is that perf will suffer when using those wider vectors, then we want to avoid using the wider

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 23

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

Hi All. This is an RFC for a proposed target specific X86 optimization for reducing code size in the encoding of AVX-512 instructions when possible. When the AVX512F instruction set was introduced in X86 it included additional 32 registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31. In order to encode the new registers of

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 23

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

I would like a command line option to disable this optimization. That way tests can still verify that EVEX instructions came out of isel by using -show-mc-encoding. On Wed, Nov 23, 2016 at 5:01 AM Hal Finkel via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > ------------------------------ > > *From: *"Gadi via llvm-dev Haber" <llvm-dev at lists.llvm.org> >

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 24

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

> I would like a command line option to disable this optimization. That way tests can still verify that EVEX instructions came out of isel by using -show-mc-encoding. I think that keeping tests compatibility is not a reason for an additional “llc” flag. We check encoding in test/MC/X86 dir. Is there any option to report-out from llc in non-debug mode? It should be an option to control

[LLVMdev] LLVM Loop Vectorizer

2012 Oct 05

[LLVMdev] LLVM Loop Vectorizer

If -simd option is specified opt could do validity checks, dependency analysis and such and recognize that a loop can be executed in parallel and as the -simd option is specified, convert the data types to vector instructions and add the scaling factor to the loop's iterators. Following this there can be an early machine function pass that sets up processor specific value in all of

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

2016 Nov 28

RFC: code size reduction in X86 by replacing EVEX with VEX encoding

Hal, that’s a good point. There are more manually-maintained tables in the X86 backend that should probably be tablegened: the memory-folding tables and ReplaceableInstrs, to name a couple. If you have ideas on how to get these auto-generated, please let us know. From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Hal Finkel via llvm-dev Sent: Wednesday, November 23, 2016

invalid code generated on Windows x86_64 using skylake-specific features

2017 Sep 30

invalid code generated on Windows x86_64 using skylake-specific features

I have this code, which works fine on MacOS and Linux hosts: const char *target_specific_cpu_args; const char *target_specific_features; if (g->is_native_target) { target_specific_cpu_args = ZigLLVMGetHostCPUName(); target_specific_features = ZigLLVMGetNativeFeatures(); } else { target_specific_cpu_args = ""; target_specific_features =

invalid code generated on Windows x86_64 using skylake-specific features

2017 Oct 01

invalid code generated on Windows x86_64 using skylake-specific features

I suspect that there are 2 issues here: * I have incorrect alignment somewhere * MSVC / .pdb / CodeView debugging is not working correctly. I think the latter would help solve the former. I will send out a new email later talking about the issues I'm having debugging llvm-generated binaries with MSVC. On Sat, Sep 30, 2017 at 3:33 PM, Andrew Kelley <superjoe30 at gmail.com> wrote:

[LLVMdev] LLVM Loop Vectorizer

2012 Oct 05

[LLVMdev] LLVM Loop Vectorizer

----- Original Message ----- > From: "Ramshankar Ramanarayanan" <Ramshankar.Ramanarayanan at amd.com> > To: "Hal Finkel" <hfinkel at anl.gov>, "Dibyendu Das" <Dibyendu.Das at amd.com> > Cc: "llvmdev at cs.uiuc.edu Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Friday, October 5, 2012 11:00:39 AM > Subject: RE: [LLVMdev]

invalid code generated on Windows x86_64 using skylake-specific features

2017 Oct 03

invalid code generated on Windows x86_64 using skylake-specific features

I figured it out. I was using this implementation of __chkstk from compiler-rt: DEFINE_COMPILERRT_FUNCTION(___chkstk) push %rcx cmp $0x1000,%rax lea 16(%rsp),%rcx // rsp before calling this routine -> rcx jb 1f 2: sub $0x1000,%rcx test %rcx,(%rcx) sub $0x1000,%rax cmp $0x1000,%rax ja 2b 1:

Possible AVX512 codegen bug in LLVM 10.0.1?

2020 Sep 05

Possible AVX512 codegen bug in LLVM 10.0.1?

Hey LLVMDev, Perhaps I'm missing something, but I think I've stumbled across a codegen bug in LLVM 10.0.1 related to AVX512. I've attached a small LLVM IR testcase and generated x86_64 assembly file that shows the bug. The test case is small, but not quite minimal, mostly because of driver code included in the test case so one can compile and run the program. The program does a

Major internal changes, TI DSP build change

2006 Apr 21

Major internal changes, TI DSP build change

> The C5x and C6x output diverges in build 10143, which has log message "lpc > floor converted to fixed-point." Also, the measured SNR changed from 11.05 > in builds 9854-10141 to 9.22 and 9.24 in 10143. Actually, build 10143 introduced another bug, that was the reason for the 1.1.11.1 release. > There is just four lines in modes.c which declare the constant, and one

[RFC] Using Intel MPX to harden SafeStack

2017 Feb 18

[RFC] Using Intel MPX to harden SafeStack

On 2/7/2017 20:02, Kostya Serebryany wrote: > ... > > My understanding is that BNDCU is the cheapest possible instruction, > just like XOR or ADD, > so the overhead should be relatively small. > Still my guesstimate would be >= 5% since stores are very numerous. > And such overhead will be on top of whatever overhead SafeStack has. > Do you have any measurements to

similar to: RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available