thr3ads.net - similar to: "getCacheSize() / subtarget machine id"

Displaying 20 results from an estimated 200 matches similar to: "getCacheSize() / subtarget machine id"

help/hints/suggestions/tips please: how to give _generic_ compilation for a particular ISA a non-zero LoopMicroOpBufferSize?

2016 Dec 16

help/hints/suggestions/tips please: how to give _generic_ compilation for a particular ISA a non-zero LoopMicroOpBufferSize?

Dear all, Some benchmarking experimentation I`ve done recently -- all on AArch64 -- has shown that it might be beneficial for all AArch64 targets to have a positive LoopMicroOpBufferSize, whereas the default that applies to all ISAs seems to be zero. Although I`ve tried going as far down the rabbit hole as I can, I haven`t found a way to set DefaultLoopMicroOpBufferSize on a per-ISA basis or

Subtarget Initialization in <ARCH>TargetMachine constructor

2017 Aug 22

Subtarget Initialization in <ARCH>TargetMachine constructor

Hi, I found some different discrepancy on how Subtarget is created between some arch specific TargetMachine constructor. For example, for BPF/Lanai: BPFTargetMachine::BPFTargetMachine(const Target &T, const Triple &TT, StringRef CPU, StringRef FS, const TargetOptions &Options,

Subtarget Initialization in <ARCH>TargetMachine constructor

2017 Aug 23

Subtarget Initialization in <ARCH>TargetMachine constructor

Thanks, Alex. See my comments below. On Wed, Aug 23, 2017 at 12:59 AM, Alex Bradbury <asb at asbradbury.org> wrote: > On 22 August 2017 at 23:39, Y Song via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> Hi, > > Hi Yonghong. > >> I found some different discrepancy on how Subtarget is created >> between some arch specific TargetMachine constructor.

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 06

[RFC] llvm-mca: a static performance analysis tool

On Tue, Mar 6, 2018 at 5:55 AM, Andrew Trick via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > > On Mar 5, 2018, at 6:28 PM, Matthias Braun <mbraun at apple.com> wrote: > > > > On Mar 5, 2018, at 6:14 PM, Andrew Trick via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > > On Mar 5, 2018, at 3:38 PM, Quentin Colombet <qcolombet at

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 06

[RFC] llvm-mca: a static performance analysis tool

> On Mar 5, 2018, at 6:14 PM, Andrew Trick via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > >> On Mar 5, 2018, at 3:38 PM, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>> wrote: >> >> When Ahmed and I worked on the decompiler, we first targeted MC. Going to MI was more difficult and really wouldn’t have gotten us a

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 06

[RFC] llvm-mca: a static performance analysis tool

> On Mar 6, 2018, at 4:20 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote: > > To be clear then, resolveSchedClass should be moved from TargetSchedModel into MCSchedModel (which is where I originally wanted it). Any TargetInstrInfo APIs called from SchedPredicate should be moved to MCInstrInfo, which should be straightforward but annoying. > > Personally, I

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 06

[RFC] llvm-mca: a static performance analysis tool

> On Mar 5, 2018, at 6:28 PM, Matthias Braun <mbraun at apple.com> wrote: > > > >> On Mar 5, 2018, at 6:14 PM, Andrew Trick via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> >> >>> On Mar 5, 2018, at 3:38 PM, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at

DFAPacketizer, Scheduling and LoadLatency

2015 Nov 16

DFAPacketizer, Scheduling and LoadLatency

I'm unclear how does DFAPacketizer and the scheduler know a given instruction is a load. Here is what I'm talking about Let's assume my VLIW target is described as follows: def MyTargetItineraries : ProcessorItineraries<[Slot0, Slot1], [], [ .............................. InstrItinData<RI, [InstrStage<1, [Slot0, Slot1]>]>,

[PATCH] x86, Allow x2apic without IR on VMware platform.

2013 Jan 17

[PATCH] x86, Allow x2apic without IR on VMware platform.

Please consider this patch to allow x2apic without IR support when running on VMware platform. Tested on top of 3.8-rc3. Thanks, Alok -- Allow x2apic without IR on VMware platform. From: Alok N Kataria <akataria at vmware.com> This patch updates x2apic initializaition code to allow x2apic on VMware platform even without interrupt remapping support. The hypervisor_x2apic_available hook

[PATCH] x86, Allow x2apic without IR on VMware platform.

2013 Jan 17

[PATCH] x86, Allow x2apic without IR on VMware platform.

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 06

[RFC] llvm-mca: a static performance analysis tool

> On Mar 5, 2018, at 3:38 PM, Quentin Colombet <qcolombet at apple.com> wrote: > > When Ahmed and I worked on the decompiler, we first targeted MC. Going to MI was more difficult and really wouldn’t have gotten us a lot of benefits. Instead, Ahmed pushed for directly decompiling to IR (look for dagger). Thanks for the pointer Quentin. > I would actually be in favor for more

[LLVMdev] Instruction Scheduling - migration from v3.1 to v3.2

2013 Apr 30

[LLVMdev] Instruction Scheduling - migration from v3.1 to v3.2

On Apr 26, 2013, at 3:53 AM, Martin J. O'Riordan <Martin.ORiordan at movidius.com> wrote: > I am migrating the llvm/clang derived compiler for our processor from the > v3.1 to v3.2 codebase. This has mostly gone well except that instruction > latency scheduling is no longer happening. > > The people who implemented this previously sub-classed 'ScheduleDAGInstrs'

[LLVMdev] Artificial deps and stores

2014 Jan 18

[LLVMdev] Artificial deps and stores

On Jan 17, 2014, at 4:03 PM, Andrew Trick <atrick at apple.com> wrote: > > On Jan 17, 2014, at 3:54 PM, Hal Finkel <hfinkel at anl.gov> wrote: > >> Andy, et al., >> >> In ScheduleDAGInstrs::buildSchedGraph, the code for handling stores has this: >> >> if (!ExitSU.isPred(SU)) >> // Push store's up a bit to avoid them

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 05

[RFC] llvm-mca: a static performance analysis tool

Thanks Andrea for working on this! I’ve been willing to do this for quite some time now. Looks like procrastination was the right approach here ;). > On Mar 2, 2018, at 9:33 AM, Andrew Trick via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > +Ahmed > >> On Mar 2, 2018, at 6:42 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com <mailto:andrea.dibiagio at

Incorrect Cortex-R4/R4F/R5 ProcessorModel in ARM.td

2020 Sep 23

Incorrect Cortex-R4/R4F/R5 ProcessorModel in ARM.td

In ARM.td, I see that the ProcessorModel for cortex-r4, cortex-r4f, and cortex-r5 (as well as r7 and r8) is based on "CortexA8Model", which seems incorrect. When this was added in 2015, there were also comments associated with this configuration, such as "// FIXME: R5 has currently the same ProcessorModel as A8" (later removed). The processor model for Cortex-r52 appears to

RFC: System (cache, etc.) model for LLVM

2018 Nov 01

RFC: System (cache, etc.) model for LLVM

Am Do., 1. Nov. 2018 um 15:21 Uhr schrieb David Greene <dag at cray.com>> > > thank you for sharing the system hierarchy model. IMHO it makes a lot > > of sense, although I don't know which of today's passes would make use > > of it. Here are my remarks. > > LoopDataPrefetch would use it via the existing TTI interfaces, but I > think that's about it

what can cause a "CPU table is not sorted" assertion

2015 Oct 15

what can cause a "CPU table is not sorted" assertion

I'm trying to create a simplified 2 slot VLIW from an OR1K. The codebase I'm working with is here <https://github.com/openrisc/llvm-or1k>. I've created an initial MyTargetSchedule.td def MyTargetModel : SchedMachineModel { // HW can decode 2 instructions per cycle. let IssueWidth = 2; let LoadLatency = 4; let MispredictPenalty = 16; // This flag is set to allow the

Performance degradation on ARMv7 (cortex-a9)

2016 Feb 24

Performance degradation on ARMv7 (cortex-a9)

Hi Bradley, I was doing some performance analysis for ARMv7 (cortex-a9) and I noticed that one of my benchmarks degraded by 93%. I have tracked the regression down to the following commit by you: / //commit 7c1b77248baaeafec5d6433c3d1da9a2e2b69595// //Author: Bradley Smith <bradley.smith at arm.com>// //Date: Mon Nov 16 11:10:19 2015 +0000// // [ARM] Introduce subtarget features per

Performance degradation on ARMv7 (cortex-a9)

2016 Feb 24

Performance degradation on ARMv7 (cortex-a9)

Thanks Bradley. I see that the features set in /ARM.td/ get written to the generated file /<build>/llvm/lib/Target/ARM/ARMGenSubtargetInfo.inc./ Here the ProcA9 features appear in /ARMFeatureKV/ table: /{ "a9", "Cortex-A9 ARM processors", { ARM::ProcA9 }, { *ARM::FeatureFP16* } }, /With your change, the features for ProcA9 in the above entry are empty.//This

RFC: System (cache, etc.) model for LLVM

2018 Nov 01

RFC: System (cache, etc.) model for LLVM

Hi, thank you for sharing the system hierarchy model. IMHO it makes a lot of sense, although I don't know which of today's passes would make use of it. Here are my remarks. I am wondering how one could model the following features using this model, or whether they should be part of a performance model at all: * ARM's big.LITTLE * NUMA hierarchies (are the NUMA domains

similar to: getCacheSize() / subtarget machine id