similar to: Problems with undef subranges in identity copies

Displaying 20 results from an estimated 100 matches similar to: "Problems with undef subranges in identity copies"

2017 May 16
2
Bug in TableGen RegisterBankEmitter
On 05/16/2017 11:57 AM, Daniel Sanders wrote: >> If that's right, one possible fix would be to rename some of the subregister indices but that's likely to be quite painful. I'll have a think and see if I can come up with something nicer. > > I haven't been able to come up with a better answer for this, just an alternate choice as to where the complexity is. If we were
2017 May 10
2
Bug in TableGen RegisterBankEmitter
Hi Tom, The output: Added VReg_64(explicit) Added VS_32(explicit (VS_32) VReg_64 class-with-subregs: VReg_64) is saying that VS_32 was added because VReg_64 was explicitly specified and that while inspecting VS_32, it noticed that every register in VS_32 was a subregister of a register from VReg_64 using a single common subregister index. I've added some more tracing to my local copy and
2017 May 10
2
Bug in TableGen RegisterBankEmitter
Hi, I've run into an issue with the RegisterBankEmitter on the AMDGPU backend. AMDGPU has a register class: VS_32, which is non-allocatable and contains registers from both defined register banks (SGPRRegBank and VGPRRegBank). The RegisterBankEmitter is adding this class to the CoverageData array for both register classes, because it contains sub-registers of one of the classes explicitly
2013 Oct 10
2
[LLVMdev] [PATCH] R600/SI: Embed disassembly in ELF object
Hi, This patch adds R600/SI disassembly text to compiled object files, when a code dump is requested, to assist debugging in Mesa clients. Here's an example of the output in a Mesa client with a corresponding patch and RADEON_DUMP_SHADERS set: Shader Disassembly: S_WQM_B64 EXEC, EXEC ; BEFE0A7E S_MOV_B32 M0, SGPR6 ; BEFC0306
2019 Sep 09
2
Fwd: MachineScheduler not scheduling for latency
Hi, I'm trying to understand why MachineScheduler does a poor job in straight line code in cases like the one in the attached debug dump. This is on AMDGPU, an in-order target, and the problem is that the IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in the resulting schedule they are often placed right next to their uses like this: 1784B %140:vgpr_32 =
2016 Mar 28
0
RFC: atomic operations on SI+
On Fri, Mar 25, 2016 at 02:22:11PM -0400, Jan Vesely wrote: > Hi Tom, Matt, > > I'm working on a project that needs few coherent atomic operations (HSA > mode: load, store, compare-and-swap) for std::atomic_uint in HCC. > > the attached patch implements atomic compare and swap for SI+ > (untested). I tried to stay within what was available, but there are > few issues
2019 Sep 10
2
MachineScheduler not scheduling for latency
Hi Andy, Thanks for the explanations. Yes AMDGPU is in-order and has MicroOpBufferSize = 1. Re "issue limited" and instruction groups: could it make sense to disable the generic scheduler's detection of issue limitation on in-order CPUs, or on CPUs that don't define instruction groups, or some similar condition? Something like: --- a/lib/CodeGen/MachineScheduler.cpp +++
2012 Oct 26
0
[LLVMdev] Data sharing between two ALUs and avoiding illegal copies
Hi, I'm working on support for the latest generation of AMD GPUs (Southern Islands) in the R600 backend, and I need some advice on how to handle interactions between two different ALUs. The processors on Southern Islands GPUs are grouped into compute units, which contain 1 Scalar ALU (sALU) and 64 Vector ALUs (vALU). The sALU is mainly responsible for flow control (implemented using
2016 Mar 25
2
RFC: atomic operations on SI+
Hi Tom, Matt, I'm working on a project that needs few coherent atomic operations (HSA mode: load, store, compare-and-swap) for std::atomic_uint in HCC. the attached patch implements atomic compare and swap for SI+ (untested). I tried to stay within what was available, but there are few issues that I was unsure how to address: 1.) it currently uses v2i32 for both input and output. This
2013 Oct 10
0
[LLVMdev] [PATCH] R600/SI: Embed disassembly in ELF object
On Wed, Oct 09, 2013 at 08:06:42PM -0500, Jay Cornwall wrote: > Hi, > > This patch adds R600/SI disassembly text to compiled object files, when > a code dump is requested, to assist debugging in Mesa clients. > > Here's an example of the output in a Mesa client with a corresponding > patch and RADEON_DUMP_SHADERS set: > > Shader Disassembly: > >
2018 Apr 23
2
pre-RA scheduling/live register analysis optimization (handle move) forcing spill of registers
Hi, I have a question related to pre-RA scheduling and spill of registers. I'm writing a backend for two operands instructions set, so FPU operations result have implicit destination. For example, the result of FMUL_A_oo is implicitly the register FA_ROUTMUL. I have defined FPUaROUTMULRegisterClass containing only FA_ROUTMUL. During the instruction lowering, in order to avoid frequent spill
2012 Oct 25
0
[LLVMdev] How to use TargetLowering::addRegisterClass() for multiple register classes
Hi, On my target, most value types can be stored in two register classes. For example: def SReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add SGPR_64, VCC, EXEC)>; def VReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add VGPR_64)>; What criteria should I use to decide which register class to associate with each type using TargetLowering::addRegisterClass() ? Thanks,
2010 Nov 18
1
[LLVMdev] subregs in trivial coalescing
I'm running into a problem with subregs during trivial coalescing in the linear scan allocator. Should RALinScan::attemptTrivialCoalescing be allowed to coalesce a COPY that uses a subreg as a destination? I've got the following sequence of code (unfortunately for an out of tree target) that is moving 32 and 64 bit sub-registers around within a 128 bit register. By the time the register
2019 Aug 30
2
virtual subregister liveness?
Hi, After dead-mi-elimination I'm experiencing a machine verifier failure at this virtual subregister write: %5.sub1 = COPY undef %11 The machine verifier essentially complains that the rest of the register is undefined (a subregister write implies a "read" of the other parts). So the problem is that dead-mi-elimination has removed the previously existing defines of %5.sub0.
2019 Sep 02
2
virtual subregister liveness?
On Fri, 2019-08-30 at 10:03 -0700, Quentin Colombet wrote: > > On Aug 30, 2019, at 8:31 AM, Jesper Antonsson via llvm-dev < > > llvm-dev at lists.llvm.org> wrote: > > > > Hi, > > > > After dead-mi-elimination I'm experiencing a machine verifier > > failure > > at this virtual subregister write: > > > > %5.sub1 = COPY undef
2013 May 16
0
[LLVMdev] Combining physical registers
On 5/16/2013 11:17 AM, Jakob Stoklund Olesen wrote: > > Would this TRI function solve your problem? >[...] > /// > /// Covering = getCoveringLanes(); > /// MaskA = getSubRegIndexLaneMask(SubA); > /// MaskB = getSubRegIndexLaneMask(SubB); > /// > /// If (MaskA & ~(MaskB & Covering)) == 0, then SubA is completely covered by > /// SubB.
2016 Aug 23
2
How to describe the RegisterInfo?
Yes, the arch is just as you said, something like AMD GPU, but Intel GPU don't have separate register file for 'scalar/vector'. In fact my idea of defining the register tuples was borrowed from SIRegisterInfo.td in AMD GPU. But seems that AMD GPU mainly support i32/i64 register type, while Intel GPU also support byte/short register type. So I have to start defining the registers from
2017 Apr 24
3
Debugging UNREACHABLE "Couldn't join subrange" in RegisterCoalescer (out-of-tree backend)
Hello, I have a minimal testcase which crashes RegisterCoalescer in my out-of-tree target. It only crashes in Debug builds of llc---not in Release builds. Also, interesting to note that the x86 backend lowers this same testcase successfully. I did a quick search of bugs.llvm.org and found no matches. This implies that the problem is in my backend and/or how my backend interacts with
2013 May 16
1
[LLVMdev] Combining physical registers
On May 16, 2013, at 8:13 AM, Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote: > The function TII::canCombineSubRegIndices has been gone for a while now, and I was wondering if there is a target-independent way of determining if a certain set of physical registers "adds up" to a larger register. For example, on X86, AL and AH together form AX. On Hexagon, R0 and R1 are
2013 May 16
2
[LLVMdev] Combining physical registers
The function TII::canCombineSubRegIndices has been gone for a while now, and I was wondering if there is a target-independent way of determining if a certain set of physical registers "adds up" to a larger register. For example, on X86, AL and AH together form AX. On Hexagon, R0 and R1 are D0. The context here is an attempt to coalesce multiple loads/stores into fewer loads/stores