Displaying 4 results from an estimated 4 matches for "sreg_128".
2015 Mar 27
2
[LLVMdev] Question about load clustering in the machine scheduler
...I restrict load clustering
to 4 at a time, but when I look at the debug output, the loads are
always being scheduled based on the fact that that are clustered. e.g.
Pick Top CLUSTER
Scheduling SU(10) %vreg13<def> = S_BUFFER_LOAD_DWORD_IMM %vreg9, 4; mem:LD4[<unknown>] SGPR_32:%vreg13 SReg_128:%vreg9
I have a feeling there is something wrong with my machine model in the
R600 backend, but I've experimented with a few variations of it and have
been unable to solve this problem. Does anyone have any idea what I
might be doing wrong?
Here are my resource definitions from lib/Target/R6...
2019 Sep 09
2
Fwd: MachineScheduler not scheduling for latency
...mp.
This is on AMDGPU, an in-order target, and the problem is that the
IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in
the resulting schedule they are often placed right next to their uses
like this:
1784B %140:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2 %533:vreg_64,
%30:sreg_256, %26:sreg_128, 8, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec
:: (dereferenceable load 4 from custom TargetCustom8)
1792B %142:vgpr_32 = V_MUL_F32_e32 %44:sreg_32, %140:vgpr_32, implicit $exec
...
1784B %140:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2 %533:vreg_64,
%30:sreg_256, %26:sreg_128, 8, 0, 0, 0, 0, 0, 0, 0, 0, i...
2015 Mar 27
2
[LLVMdev] Question about load clustering in the machine scheduler
...ime, but when I look at the debug output, the loads are
> > always being scheduled based on the fact that that are clustered. e.g.
> >
> > Pick Top CLUSTER
> > Scheduling SU(10) %vreg13<def> = S_BUFFER_LOAD_DWORD_IMM %vreg9, 4; mem:LD4[<unknown>] SGPR_32:%vreg13 SReg_128:%vreg9
>
> Well, only 4 loads in a sequence should have the “cluster” edges. You should be able to see that when the DAG is printed before scheduling.
>
There are 4 consecutive 'Pick Top CLUSTER' then a 'Pick Top WEAK' and
then the pattern repeats itself. All of these a...
2019 Sep 10
2
MachineScheduler not scheduling for latency
...is that the
> > IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in
> > the resulting schedule they are often placed right next to their uses
> > like this:
> >
> > 1784B %140:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2 %533:vreg_64,
> > %30:sreg_256, %26:sreg_128, 8, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec
> > :: (dereferenceable load 4 from custom TargetCustom8)
> > 1792B %142:vgpr_32 = V_MUL_F32_e32 %44:sreg_32, %140:vgpr_32, implicit $exec
> > ...
> > 1784B %140:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2 %533:vreg_64,
> > %30:sr...