Displaying 20 results from an estimated 100 matches similar to: "Problems with undef subranges in identity copies"
2017 May 16
2
Bug in TableGen RegisterBankEmitter
On 05/16/2017 11:57 AM, Daniel Sanders wrote:
>> If that's right, one possible fix would be to rename some of the subregister indices but that's likely to be quite painful. I'll have a think and see if I can come up with something nicer.
>
> I haven't been able to come up with a better answer for this, just an alternate choice as to where the complexity is. If we were
2017 May 10
2
Bug in TableGen RegisterBankEmitter
Hi Tom,
The output:
Added VReg_64(explicit)
Added VS_32(explicit (VS_32) VReg_64 class-with-subregs: VReg_64)
is saying that VS_32 was added because VReg_64 was explicitly specified and that while inspecting VS_32, it noticed that every register in VS_32 was a subregister of a register from VReg_64 using a single common subregister index.
I've added some more tracing to my local copy and
2017 May 10
2
Bug in TableGen RegisterBankEmitter
Hi,
I've run into an issue with the RegisterBankEmitter on the AMDGPU backend.
AMDGPU has a register class: VS_32, which is non-allocatable and contains
registers from both defined register banks (SGPRRegBank and VGPRRegBank).
The RegisterBankEmitter is adding this class to the CoverageData array
for both register classes, because it contains sub-registers of one
of the classes explicitly
2013 Oct 10
2
[LLVMdev] [PATCH] R600/SI: Embed disassembly in ELF object
Hi,
This patch adds R600/SI disassembly text to compiled object files, when
a code dump is requested, to assist debugging in Mesa clients.
Here's an example of the output in a Mesa client with a corresponding
patch and RADEON_DUMP_SHADERS set:
Shader Disassembly:
S_WQM_B64 EXEC, EXEC ; BEFE0A7E
S_MOV_B32 M0, SGPR6 ; BEFC0306
2019 Sep 09
2
Fwd: MachineScheduler not scheduling for latency
Hi,
I'm trying to understand why MachineScheduler does a poor job in
straight line code in cases like the one in the attached debug dump.
This is on AMDGPU, an in-order target, and the problem is that the
IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in
the resulting schedule they are often placed right next to their uses
like this:
1784B %140:vgpr_32 =
2016 Mar 28
0
RFC: atomic operations on SI+
On Fri, Mar 25, 2016 at 02:22:11PM -0400, Jan Vesely wrote:
> Hi Tom, Matt,
>
> I'm working on a project that needs few coherent atomic operations (HSA
> mode: load, store, compare-and-swap) for std::atomic_uint in HCC.
>
> the attached patch implements atomic compare and swap for SI+
> (untested). I tried to stay within what was available, but there are
> few issues
2019 Sep 10
2
MachineScheduler not scheduling for latency
Hi Andy,
Thanks for the explanations. Yes AMDGPU is in-order and has
MicroOpBufferSize = 1.
Re "issue limited" and instruction groups: could it make sense to
disable the generic scheduler's detection of issue limitation on
in-order CPUs, or on CPUs that don't define instruction groups, or
some similar condition? Something like:
--- a/lib/CodeGen/MachineScheduler.cpp
+++
2012 Oct 26
0
[LLVMdev] Data sharing between two ALUs and avoiding illegal copies
Hi,
I'm working on support for the latest generation of AMD GPUs (Southern
Islands) in the R600 backend, and I need some advice on how to handle
interactions between two different ALUs.
The processors on Southern Islands GPUs are grouped into compute units,
which contain 1 Scalar ALU (sALU) and 64 Vector ALUs (vALU). The sALU
is mainly responsible for flow control (implemented using
2016 Mar 25
2
RFC: atomic operations on SI+
Hi Tom, Matt,
I'm working on a project that needs few coherent atomic operations (HSA
mode: load, store, compare-and-swap) for std::atomic_uint in HCC.
the attached patch implements atomic compare and swap for SI+
(untested). I tried to stay within what was available, but there are
few issues that I was unsure how to address:
1.) it currently uses v2i32 for both input and output. This
2013 Oct 10
0
[LLVMdev] [PATCH] R600/SI: Embed disassembly in ELF object
On Wed, Oct 09, 2013 at 08:06:42PM -0500, Jay Cornwall wrote:
> Hi,
>
> This patch adds R600/SI disassembly text to compiled object files, when
> a code dump is requested, to assist debugging in Mesa clients.
>
> Here's an example of the output in a Mesa client with a corresponding
> patch and RADEON_DUMP_SHADERS set:
>
> Shader Disassembly:
>
>
2018 Apr 23
2
pre-RA scheduling/live register analysis optimization (handle move) forcing spill of registers
Hi,
I have a question related to pre-RA scheduling and spill of registers.
I'm writing a backend for two operands instructions set, so FPU operations result have implicit destination.
For example, the result of FMUL_A_oo is implicitly the register FA_ROUTMUL.
I have defined FPUaROUTMULRegisterClass containing only FA_ROUTMUL.
During the instruction lowering, in order to avoid frequent spill
2012 Oct 25
0
[LLVMdev] How to use TargetLowering::addRegisterClass() for multiple register classes
Hi,
On my target, most value types can be stored in two register classes.
For example:
def SReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add SGPR_64, VCC, EXEC)>;
def VReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add VGPR_64)>;
What criteria should I use to decide which register class to associate
with each type using TargetLowering::addRegisterClass() ?
Thanks,
2010 Nov 18
1
[LLVMdev] subregs in trivial coalescing
I'm running into a problem with subregs during trivial coalescing in the
linear scan allocator.
Should RALinScan::attemptTrivialCoalescing be allowed to coalesce a COPY
that uses a subreg as a destination?
I've got the following sequence of code (unfortunately for an out of tree
target) that is moving 32 and 64 bit sub-registers around within a 128 bit
register. By the time the register
2019 Aug 30
2
virtual subregister liveness?
Hi,
After dead-mi-elimination I'm experiencing a machine verifier failure
at this virtual subregister write:
%5.sub1 = COPY undef %11
The machine verifier essentially complains that the rest of the
register is undefined (a subregister write implies a "read" of the
other parts).
So the problem is that dead-mi-elimination has removed the previously
existing defines of %5.sub0.
2019 Sep 02
2
virtual subregister liveness?
On Fri, 2019-08-30 at 10:03 -0700, Quentin Colombet wrote:
> > On Aug 30, 2019, at 8:31 AM, Jesper Antonsson via llvm-dev <
> > llvm-dev at lists.llvm.org> wrote:
> >
> > Hi,
> >
> > After dead-mi-elimination I'm experiencing a machine verifier
> > failure
> > at this virtual subregister write:
> >
> > %5.sub1 = COPY undef
2013 May 16
0
[LLVMdev] Combining physical registers
On 5/16/2013 11:17 AM, Jakob Stoklund Olesen wrote:
>
> Would this TRI function solve your problem?
>[...]
> ///
> /// Covering = getCoveringLanes();
> /// MaskA = getSubRegIndexLaneMask(SubA);
> /// MaskB = getSubRegIndexLaneMask(SubB);
> ///
> /// If (MaskA & ~(MaskB & Covering)) == 0, then SubA is completely covered by
> /// SubB.
2016 Aug 23
2
How to describe the RegisterInfo?
Yes, the arch is just as you said, something like AMD GPU, but Intel GPU
don't have separate register file for 'scalar/vector'.
In fact my idea of defining the register tuples was borrowed from
SIRegisterInfo.td in AMD GPU.
But seems that AMD GPU mainly support i32/i64 register type, while Intel
GPU also support byte/short register type.
So I have to start defining the registers from
2017 Apr 24
3
Debugging UNREACHABLE "Couldn't join subrange" in RegisterCoalescer (out-of-tree backend)
Hello,
I have a minimal testcase which crashes RegisterCoalescer in my out-of-tree target. It only crashes in Debug builds of llc---not in Release builds. Also, interesting to note that the x86 backend lowers this same testcase successfully. I did a quick search of bugs.llvm.org and found no matches.
This implies that the problem is in my backend and/or how my backend interacts with
2013 May 16
1
[LLVMdev] Combining physical registers
On May 16, 2013, at 8:13 AM, Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote:
> The function TII::canCombineSubRegIndices has been gone for a while now, and I was wondering if there is a target-independent way of determining if a certain set of physical registers "adds up" to a larger register. For example, on X86, AL and AH together form AX. On Hexagon, R0 and R1 are
2013 May 16
2
[LLVMdev] Combining physical registers
The function TII::canCombineSubRegIndices has been gone for a while now,
and I was wondering if there is a target-independent way of determining
if a certain set of physical registers "adds up" to a larger register.
For example, on X86, AL and AH together form AX. On Hexagon, R0 and R1
are D0.
The context here is an attempt to coalesce multiple loads/stores into
fewer loads/stores