thr3ads.net - similar to: "Problems with undef subranges in identity copies"

Displaying 20 results from an estimated 100 matches similar to: "Problems with undef subranges in identity copies"

2017 May 16

Bug in TableGen RegisterBankEmitter

On 05/16/2017 11:57 AM, Daniel Sanders wrote: >> If that's right, one possible fix would be to rename some of the subregister indices but that's likely to be quite painful. I'll have a think and see if I can come up with something nicer. > > I haven't been able to come up with a better answer for this, just an alternate choice as to where the complexity is. If we were

Bug in TableGen RegisterBankEmitter

2017 May 10

Bug in TableGen RegisterBankEmitter

Hi Tom, The output: Added VReg_64(explicit) Added VS_32(explicit (VS_32) VReg_64 class-with-subregs: VReg_64) is saying that VS_32 was added because VReg_64 was explicitly specified and that while inspecting VS_32, it noticed that every register in VS_32 was a subregister of a register from VReg_64 using a single common subregister index. I've added some more tracing to my local copy and

Bug in TableGen RegisterBankEmitter

2017 May 10

Bug in TableGen RegisterBankEmitter

Hi, I've run into an issue with the RegisterBankEmitter on the AMDGPU backend. AMDGPU has a register class: VS_32, which is non-allocatable and contains registers from both defined register banks (SGPRRegBank and VGPRRegBank). The RegisterBankEmitter is adding this class to the CoverageData array for both register classes, because it contains sub-registers of one of the classes explicitly

[LLVMdev] [PATCH] R600/SI: Embed disassembly in ELF object

2013 Oct 10

[LLVMdev] [PATCH] R600/SI: Embed disassembly in ELF object

Hi, This patch adds R600/SI disassembly text to compiled object files, when a code dump is requested, to assist debugging in Mesa clients. Here's an example of the output in a Mesa client with a corresponding patch and RADEON_DUMP_SHADERS set: Shader Disassembly: S_WQM_B64 EXEC, EXEC ; BEFE0A7E S_MOV_B32 M0, SGPR6 ; BEFC0306

Fwd: MachineScheduler not scheduling for latency

2019 Sep 09

Fwd: MachineScheduler not scheduling for latency

Hi, I'm trying to understand why MachineScheduler does a poor job in straight line code in cases like the one in the attached debug dump. This is on AMDGPU, an in-order target, and the problem is that the IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in the resulting schedule they are often placed right next to their uses like this: 1784B %140:vgpr_32 =

RFC: atomic operations on SI+

2016 Mar 28

RFC: atomic operations on SI+

On Fri, Mar 25, 2016 at 02:22:11PM -0400, Jan Vesely wrote: > Hi Tom, Matt, > > I'm working on a project that needs few coherent atomic operations (HSA > mode: load, store, compare-and-swap) for std::atomic_uint in HCC. > > the attached patch implements atomic compare and swap for SI+ > (untested). I tried to stay within what was available, but there are > few issues

MachineScheduler not scheduling for latency

2019 Sep 10

MachineScheduler not scheduling for latency

Hi Andy, Thanks for the explanations. Yes AMDGPU is in-order and has MicroOpBufferSize = 1. Re "issue limited" and instruction groups: could it make sense to disable the generic scheduler's detection of issue limitation on in-order CPUs, or on CPUs that don't define instruction groups, or some similar condition? Something like: --- a/lib/CodeGen/MachineScheduler.cpp +++

[LLVMdev] Data sharing between two ALUs and avoiding illegal copies

2012 Oct 26

[LLVMdev] Data sharing between two ALUs and avoiding illegal copies

Hi, I'm working on support for the latest generation of AMD GPUs (Southern Islands) in the R600 backend, and I need some advice on how to handle interactions between two different ALUs. The processors on Southern Islands GPUs are grouped into compute units, which contain 1 Scalar ALU (sALU) and 64 Vector ALUs (vALU). The sALU is mainly responsible for flow control (implemented using

RFC: atomic operations on SI+

2016 Mar 25

RFC: atomic operations on SI+

Hi Tom, Matt, I'm working on a project that needs few coherent atomic operations (HSA mode: load, store, compare-and-swap) for std::atomic_uint in HCC. the attached patch implements atomic compare and swap for SI+ (untested). I tried to stay within what was available, but there are few issues that I was unsure how to address: 1.) it currently uses v2i32 for both input and output. This

[LLVMdev] [PATCH] R600/SI: Embed disassembly in ELF object

2013 Oct 10

[LLVMdev] [PATCH] R600/SI: Embed disassembly in ELF object

On Wed, Oct 09, 2013 at 08:06:42PM -0500, Jay Cornwall wrote: > Hi, > > This patch adds R600/SI disassembly text to compiled object files, when > a code dump is requested, to assist debugging in Mesa clients. > > Here's an example of the output in a Mesa client with a corresponding > patch and RADEON_DUMP_SHADERS set: > > Shader Disassembly: > >

pre-RA scheduling/live register analysis optimization (handle move) forcing spill of registers

2018 Apr 23

pre-RA scheduling/live register analysis optimization (handle move) forcing spill of registers

Hi, I have a question related to pre-RA scheduling and spill of registers. I'm writing a backend for two operands instructions set, so FPU operations result have implicit destination. For example, the result of FMUL_A_oo is implicitly the register FA_ROUTMUL. I have defined FPUaROUTMULRegisterClass containing only FA_ROUTMUL. During the instruction lowering, in order to avoid frequent spill

[LLVMdev] How to use TargetLowering::addRegisterClass() for multiple register classes

2012 Oct 25

[LLVMdev] How to use TargetLowering::addRegisterClass() for multiple register classes

Hi, On my target, most value types can be stored in two register classes. For example: def SReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add SGPR_64, VCC, EXEC)>; def VReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add VGPR_64)>; What criteria should I use to decide which register class to associate with each type using TargetLowering::addRegisterClass() ? Thanks,

[LLVMdev] subregs in trivial coalescing

2010 Nov 18

[LLVMdev] subregs in trivial coalescing

I'm running into a problem with subregs during trivial coalescing in the linear scan allocator. Should RALinScan::attemptTrivialCoalescing be allowed to coalesce a COPY that uses a subreg as a destination? I've got the following sequence of code (unfortunately for an out of tree target) that is moving 32 and 64 bit sub-registers around within a 128 bit register. By the time the register

virtual subregister liveness?

2019 Aug 30

virtual subregister liveness?

Hi, After dead-mi-elimination I'm experiencing a machine verifier failure at this virtual subregister write: %5.sub1 = COPY undef %11 The machine verifier essentially complains that the rest of the register is undefined (a subregister write implies a "read" of the other parts). So the problem is that dead-mi-elimination has removed the previously existing defines of %5.sub0.

virtual subregister liveness?

2019 Sep 02

virtual subregister liveness?

On Fri, 2019-08-30 at 10:03 -0700, Quentin Colombet wrote: > > On Aug 30, 2019, at 8:31 AM, Jesper Antonsson via llvm-dev < > > llvm-dev at lists.llvm.org> wrote: > > > > Hi, > > > > After dead-mi-elimination I'm experiencing a machine verifier > > failure > > at this virtual subregister write: > > > > %5.sub1 = COPY undef

[LLVMdev] Combining physical registers

2013 May 16

[LLVMdev] Combining physical registers

On 5/16/2013 11:17 AM, Jakob Stoklund Olesen wrote: > > Would this TRI function solve your problem? >[...] > /// > /// Covering = getCoveringLanes(); > /// MaskA = getSubRegIndexLaneMask(SubA); > /// MaskB = getSubRegIndexLaneMask(SubB); > /// > /// If (MaskA & ~(MaskB & Covering)) == 0, then SubA is completely covered by > /// SubB.

How to describe the RegisterInfo?

2016 Aug 23

How to describe the RegisterInfo?

Yes, the arch is just as you said, something like AMD GPU, but Intel GPU don't have separate register file for 'scalar/vector'. In fact my idea of defining the register tuples was borrowed from SIRegisterInfo.td in AMD GPU. But seems that AMD GPU mainly support i32/i64 register type, while Intel GPU also support byte/short register type. So I have to start defining the registers from

Debugging UNREACHABLE "Couldn't join subrange" in RegisterCoalescer (out-of-tree backend)

2017 Apr 24

Debugging UNREACHABLE "Couldn't join subrange" in RegisterCoalescer (out-of-tree backend)

Hello, I have a minimal testcase which crashes RegisterCoalescer in my out-of-tree target. It only crashes in Debug builds of llc---not in Release builds. Also, interesting to note that the x86 backend lowers this same testcase successfully. I did a quick search of bugs.llvm.org and found no matches. This implies that the problem is in my backend and/or how my backend interacts with

[LLVMdev] Combining physical registers

2013 May 16

[LLVMdev] Combining physical registers

On May 16, 2013, at 8:13 AM, Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote: > The function TII::canCombineSubRegIndices has been gone for a while now, and I was wondering if there is a target-independent way of determining if a certain set of physical registers "adds up" to a larger register. For example, on X86, AL and AH together form AX. On Hexagon, R0 and R1 are

[LLVMdev] Combining physical registers

2013 May 16

[LLVMdev] Combining physical registers

The function TII::canCombineSubRegIndices has been gone for a while now, and I was wondering if there is a target-independent way of determining if a certain set of physical registers "adds up" to a larger register. For example, on X86, AL and AH together form AX. On Hexagon, R0 and R1 are D0. The context here is an attempt to coalesce multiple loads/stores into fewer loads/stores

similar to: Problems with undef subranges in identity copies