Displaying 2 results from an estimated 2 matches for "image_sample_lz_v1_v2".
2019 Sep 09
2
Fwd: MachineScheduler not scheduling for latency
...ne code in cases like the one in the attached debug dump.
This is on AMDGPU, an in-order target, and the problem is that the
IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in
the resulting schedule they are often placed right next to their uses
like this:
1784B %140:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2 %533:vreg_64,
%30:sreg_256, %26:sreg_128, 8, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec
:: (dereferenceable load 4 from custom TargetCustom8)
1792B %142:vgpr_32 = V_MUL_F32_e32 %44:sreg_32, %140:vgpr_32, implicit $exec
...
1784B %140:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2 %533:vreg_64,
%30:sreg_256, %...
2019 Sep 10
2
MachineScheduler not scheduling for latency
...gt; > This is on AMDGPU, an in-order target, and the problem is that the
> > IMAGE_SAMPLE instructions have very high (80 cycle) latency, but in
> > the resulting schedule they are often placed right next to their uses
> > like this:
> >
> > 1784B %140:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V2 %533:vreg_64,
> > %30:sreg_256, %26:sreg_128, 8, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec
> > :: (dereferenceable load 4 from custom TargetCustom8)
> > 1792B %142:vgpr_32 = V_MUL_F32_e32 %44:sreg_32, %140:vgpr_32, implicit $exec
> > ...
> > 1784B %140:vgpr_32 = I...