Displaying 9 results from an estimated 9 matches for "vreg_64".
Did you mean:
vreg64
2017 May 16
2
Bug in TableGen RegisterBankEmitter
...e or not. This would allow us to prevent it from following the subreg indices into the wrong classes but it would also make it harder to define the register banks.
>
I'm a little confused about what the issue is. AMDGPU has 2 64-bit register
classes each with sub0 and sub1 sub-registers:
VReg_64:sub0=VGPR_32
VReg_64:sub1=VGPR_32
SReg_64:sub0=SGPR_32
SReg_64:sub1=SGPR_32
Are you saying that tablegen considers VReg_64:sub0 and SReg_64:sub0 to be
the same sub-register class because they are both called sub0 ?
-Tom
>> On 10 May 2017, at 21:58, Daniel Sanders via llvm-dev <llvm-de...
2017 May 10
2
Bug in TableGen RegisterBankEmitter
Hi Tom,
The output:
Added VReg_64(explicit)
Added VS_32(explicit (VS_32) VReg_64 class-with-subregs: VReg_64)
is saying that VS_32 was added because VReg_64 was explicitly specified and that while inspecting VS_32, it noticed that every register in VS_32 was a subregister of a register from VReg_64 using a single common subregiste...
2020 Nov 19
1
Problems with undef subranges in identity copies
...minates identity
copies. The fundamental problem is complexity from the fact that undef
values are a special case since they don't have an associated
VNInfo/Segment unless the value is used across blocks.
For example, in this case, %0 has 2 subregisters sub0 and sub1:
bb.0:
undef %0.sub1:vreg_64 = COPY killed $vgpr0
bb.1:
%0:vreg_64 = COPY %0
S_CBRANCH_EXECNZ %bb.1, implicit $exec
bb.2:
undef %0.sub1:vreg_64 = nofpexcept V_CEIL_F32_e32 killed %0.sub1, implicit $mode, implicit $exec
S_BRANCH %bb.1
sub0 has no defined values anywhere in this function. The value only
ex...
2017 May 10
2
Bug in TableGen RegisterBankEmitter
...from both defined register banks (SGPRRegBank and VGPRRegBank).
The RegisterBankEmitter is adding this class to the CoverageData array
for both register classes, because it contains sub-registers of one
of the classes explicitly added to the RegisterBank, for example:
Added VS_32(explicit (VS_32) VReg_64 class-with-subregs: VReg_64)
This is a problem, because both RegisterBanks think they cover
VS_32, even though neither of them actually do.
What exactly is the best way to fix this? It seems like we need some
additional checks in the RegisterBankEmitter to fix this, but it's not
clear to me...
2019 Sep 09
2
Fwd: MachineScheduler not scheduling for latency
...ne in the attached debug dump.
This is on AMDGPU, an in-order target, and the problem is that the
IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in
the resulting schedule they are often placed right next to their uses
like this:
1784B %140:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2 %533:vreg_64,
%30:sreg_256, %26:sreg_128, 8, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec
:: (dereferenceable load 4 from custom TargetCustom8)
1792B %142:vgpr_32 = V_MUL_F32_e32 %44:sreg_32, %140:vgpr_32, implicit $exec
...
1784B %140:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2 %533:vreg_64,
%30:sreg_256, %26:sreg_128,...
2016 Mar 28
0
RFC: atomic operations on SI+
...et/AMDGPU/CIInstructions.td
> @@ -156,7 +156,7 @@ defm FLAT_ATOMIC_SWAP : FLAT_ATOMIC <
> flat<0x30, 0x40>, "flat_atomic_swap", VGPR_32
> >;
> defm FLAT_ATOMIC_CMPSWAP : FLAT_ATOMIC <
> - flat<0x31, 0x41>, "flat_atomic_cmpswap", VGPR_32, VReg_64
> + flat<0x31, 0x41>, "flat_atomic_cmpswap", VReg_64
> >;
> defm FLAT_ATOMIC_ADD : FLAT_ATOMIC <
> flat<0x32, 0x42>, "flat_atomic_add", VGPR_32
> @@ -322,6 +322,7 @@ def : FlatAtomicPat <FLAT_ATOMIC_SMIN_RTN, atomic_min_global, i32>;...
2016 Mar 25
2
RFC: atomic operations on SI+
Hi Tom, Matt,
I'm working on a project that needs few coherent atomic operations (HSA
mode: load, store, compare-and-swap) for std::atomic_uint in HCC.
the attached patch implements atomic compare and swap for SI+
(untested). I tried to stay within what was available, but there are
few issues that I was unsure how to address:
1.) it currently uses v2i32 for both input and output. This
2019 Sep 10
2
MachineScheduler not scheduling for latency
...an in-order target, and the problem is that the
> > IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in
> > the resulting schedule they are often placed right next to their uses
> > like this:
> >
> > 1784B %140:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2 %533:vreg_64,
> > %30:sreg_256, %26:sreg_128, 8, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec
> > :: (dereferenceable load 4 from custom TargetCustom8)
> > 1792B %142:vgpr_32 = V_MUL_F32_e32 %44:sreg_32, %140:vgpr_32, implicit $exec
> > ...
> > 1784B %140:vgpr_32 = IMAGE_SAMPLE_L...
2012 Oct 25
0
[LLVMdev] How to use TargetLowering::addRegisterClass() for multiple register classes
Hi,
On my target, most value types can be stored in two register classes.
For example:
def SReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add SGPR_64, VCC, EXEC)>;
def VReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add VGPR_64)>;
What criteria should I use to decide which register class to associate
with each type using TargetLowering::addRegisterClass() ?
Thanks,
Tom