search for: vreg_64

Displaying 9 results from an estimated 9 matches for "vreg_64".

Did you mean: vreg64
2017 May 16
2
Bug in TableGen RegisterBankEmitter
...e or not. This would allow us to prevent it from following the subreg indices into the wrong classes but it would also make it harder to define the register banks. > I'm a little confused about what the issue is. AMDGPU has 2 64-bit register classes each with sub0 and sub1 sub-registers: VReg_64:sub0=VGPR_32 VReg_64:sub1=VGPR_32 SReg_64:sub0=SGPR_32 SReg_64:sub1=SGPR_32 Are you saying that tablegen considers VReg_64:sub0 and SReg_64:sub0 to be the same sub-register class because they are both called sub0 ? -Tom >> On 10 May 2017, at 21:58, Daniel Sanders via llvm-dev <llvm-de...
2017 May 10
2
Bug in TableGen RegisterBankEmitter
Hi Tom, The output: Added VReg_64(explicit) Added VS_32(explicit (VS_32) VReg_64 class-with-subregs: VReg_64) is saying that VS_32 was added because VReg_64 was explicitly specified and that while inspecting VS_32, it noticed that every register in VS_32 was a subregister of a register from VReg_64 using a single common subregiste...
2020 Nov 19
1
Problems with undef subranges in identity copies
...minates identity copies. The fundamental problem is complexity from the fact that undef values are a special case since they don't have an associated VNInfo/Segment unless the value is used across blocks. For example, in this case, %0 has 2 subregisters sub0 and sub1: bb.0: undef %0.sub1:vreg_64 = COPY killed $vgpr0 bb.1: %0:vreg_64 = COPY %0 S_CBRANCH_EXECNZ %bb.1, implicit $exec bb.2: undef %0.sub1:vreg_64 = nofpexcept V_CEIL_F32_e32 killed %0.sub1, implicit $mode, implicit $exec S_BRANCH %bb.1 sub0 has no defined values anywhere in this function. The value only ex...
2017 May 10
2
Bug in TableGen RegisterBankEmitter
...from both defined register banks (SGPRRegBank and VGPRRegBank). The RegisterBankEmitter is adding this class to the CoverageData array for both register classes, because it contains sub-registers of one of the classes explicitly added to the RegisterBank, for example: Added VS_32(explicit (VS_32) VReg_64 class-with-subregs: VReg_64) This is a problem, because both RegisterBanks think they cover VS_32, even though neither of them actually do. What exactly is the best way to fix this? It seems like we need some additional checks in the RegisterBankEmitter to fix this, but it's not clear to me...
2019 Sep 09
2
Fwd: MachineScheduler not scheduling for latency
...ne in the attached debug dump. This is on AMDGPU, an in-order target, and the problem is that the IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in the resulting schedule they are often placed right next to their uses like this: 1784B %140:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2 %533:vreg_64, %30:sreg_256, %26:sreg_128, 8, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4 from custom TargetCustom8) 1792B %142:vgpr_32 = V_MUL_F32_e32 %44:sreg_32, %140:vgpr_32, implicit $exec ... 1784B %140:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2 %533:vreg_64, %30:sreg_256, %26:sreg_128,...
2016 Mar 28
0
RFC: atomic operations on SI+
...et/AMDGPU/CIInstructions.td > @@ -156,7 +156,7 @@ defm FLAT_ATOMIC_SWAP : FLAT_ATOMIC < > flat<0x30, 0x40>, "flat_atomic_swap", VGPR_32 > >; > defm FLAT_ATOMIC_CMPSWAP : FLAT_ATOMIC < > - flat<0x31, 0x41>, "flat_atomic_cmpswap", VGPR_32, VReg_64 > + flat<0x31, 0x41>, "flat_atomic_cmpswap", VReg_64 > >; > defm FLAT_ATOMIC_ADD : FLAT_ATOMIC < > flat<0x32, 0x42>, "flat_atomic_add", VGPR_32 > @@ -322,6 +322,7 @@ def : FlatAtomicPat <FLAT_ATOMIC_SMIN_RTN, atomic_min_global, i32>;...
2016 Mar 25
2
RFC: atomic operations on SI+
Hi Tom, Matt, I'm working on a project that needs few coherent atomic operations (HSA mode: load, store, compare-and-swap) for std::atomic_uint in HCC. the attached patch implements atomic compare and swap for SI+ (untested). I tried to stay within what was available, but there are few issues that I was unsure how to address: 1.) it currently uses v2i32 for both input and output. This
2019 Sep 10
2
MachineScheduler not scheduling for latency
...an in-order target, and the problem is that the > > IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in > > the resulting schedule they are often placed right next to their uses > > like this: > > > > 1784B %140:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2 %533:vreg_64, > > %30:sreg_256, %26:sreg_128, 8, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec > > :: (dereferenceable load 4 from custom TargetCustom8) > > 1792B %142:vgpr_32 = V_MUL_F32_e32 %44:sreg_32, %140:vgpr_32, implicit $exec > > ... > > 1784B %140:vgpr_32 = IMAGE_SAMPLE_L...
2012 Oct 25
0
[LLVMdev] How to use TargetLowering::addRegisterClass() for multiple register classes
Hi, On my target, most value types can be stored in two register classes. For example: def SReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add SGPR_64, VCC, EXEC)>; def VReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add VGPR_64)>; What criteria should I use to decide which register class to associate with each type using TargetLowering::addRegisterClass() ? Thanks, Tom