search for: sreg_128

Displaying 4 results from an estimated 4 matches for "sreg_128".

2015 Mar 27
2
[LLVMdev] Question about load clustering in the machine scheduler
...I restrict load clustering to 4 at a time, but when I look at the debug output, the loads are always being scheduled based on the fact that that are clustered. e.g. Pick Top CLUSTER Scheduling SU(10) %vreg13<def> = S_BUFFER_LOAD_DWORD_IMM %vreg9, 4; mem:LD4[<unknown>] SGPR_32:%vreg13 SReg_128:%vreg9 I have a feeling there is something wrong with my machine model in the R600 backend, but I've experimented with a few variations of it and have been unable to solve this problem. Does anyone have any idea what I might be doing wrong? Here are my resource definitions from lib/Target/R6...
2019 Sep 09
2
Fwd: MachineScheduler not scheduling for latency
...mp. This is on AMDGPU, an in-order target, and the problem is that the IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in the resulting schedule they are often placed right next to their uses like this: 1784B %140:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2 %533:vreg_64, %30:sreg_256, %26:sreg_128, 8, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 4 from custom TargetCustom8) 1792B %142:vgpr_32 = V_MUL_F32_e32 %44:sreg_32, %140:vgpr_32, implicit $exec ... 1784B %140:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2 %533:vreg_64, %30:sreg_256, %26:sreg_128, 8, 0, 0, 0, 0, 0, 0, 0, 0, i...
2015 Mar 27
2
[LLVMdev] Question about load clustering in the machine scheduler
...ime, but when I look at the debug output, the loads are > > always being scheduled based on the fact that that are clustered. e.g. > > > > Pick Top CLUSTER > > Scheduling SU(10) %vreg13<def> = S_BUFFER_LOAD_DWORD_IMM %vreg9, 4; mem:LD4[<unknown>] SGPR_32:%vreg13 SReg_128:%vreg9 > > Well, only 4 loads in a sequence should have the “cluster” edges. You should be able to see that when the DAG is printed before scheduling. > There are 4 consecutive 'Pick Top CLUSTER' then a 'Pick Top WEAK' and then the pattern repeats itself. All of these a...
2019 Sep 10
2
MachineScheduler not scheduling for latency
...is that the > > IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in > > the resulting schedule they are often placed right next to their uses > > like this: > > > > 1784B %140:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2 %533:vreg_64, > > %30:sreg_256, %26:sreg_128, 8, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec > > :: (dereferenceable load 4 from custom TargetCustom8) > > 1792B %142:vgpr_32 = V_MUL_F32_e32 %44:sreg_32, %140:vgpr_32, implicit $exec > > ... > > 1784B %140:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2 %533:vreg_64, > > %30:sr...