search for: v_mov_b32

Displaying 14 results from an estimated 14 matches for "v_mov_b32".

2014 Apr 04
2
[LLVMdev] How should I update LiveIntervals after removing a use of a register?
Hi, I am working on a simple copy propagation pass for the R600 backend that propagates immediates rather than registers. For example, I want to transform: ... %vreg1 = V_MOV_B32 1 %vreg2 = V_ADD_I32 %vreg1, %vreg0 ... into: %vreg1 = V_MOV_B32 1 ; <- Only delete this if it is dead %vreg2 = V_ADD_I32 1, %vreg0 For best results, I am trying to run this pass after the TwoAddressInstruction pass, which means I need to preserve the LiveIntervals analysis. My question is:...
2016 Oct 03
5
Is this undefined behavior optimization legal?
...a 32-bit vector, a common way to implement this function on a target with 32-bit registers would be to zero initialize a 32-bit register to hold the initial vector and then 'mask' and 'or' the inserted value with the initial vector. In AMDGPU assembly it would look something like: v_mov_b32 v0, 0 v_cvt_u32_f32_e32 v1, s0 v_and_b32 v1, v1, 0x000000ff v_or_b32 v0, v0, v1 The optimization the SelectionDAG does for us in this function, though, ends up removing the mask operation. Which gives us: v_mov_b32 v0, 0 v_cvt_u32_f32_e32 v1, s0 v_or_b32 v0, v0, v1 The reason the SelectionDAG i...
2017 Jun 15
2
Implementing cross-thread reduction in the AMDGPU backend
...;>>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >>>>>>>> v_foo_f32 with the appropriate operation): >>>>>>>> >>>>>>>> ; v0 is the input register >>>>>>>> v_mov_b32 v1, v0 >>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 >>>>>>>> v_nop // Add two...
2017 Jun 15
1
Implementing cross-thread reduction in the AMDGPU backend
...cn-assembly-cross-lane-operations/ >>>>>>>>> (replace >>>>>>>>> v_foo_f32 with the appropriate operation): >>>>>>>>> >>>>>>>>> ; v0 is the input register >>>>>>>>> v_mov_b32 v1, v0 >>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add >>>>>&gt...
2017 Jun 14
5
Implementing cross-thread reduction in the AMDGPU backend
...>>>> this, modified from >>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >>>>>> v_foo_f32 with the appropriate operation): >>>>>> >>>>>> ; v0 is the input register >>>>>> v_mov_b32 v1, v0 >>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 >>>>>> v_nop // Add two independent instructions to avo...
2019 Nov 14
2
imm COPY generated by PHI elim not propagated
In this case the load imm is foldable into the copy, once converted to a mov. Directly folding this would be 4 v_mov_b32 instead of 5 produced currently -Matt On 11/14/19, 07:20, "llvm-dev on behalf of Quentin Colombet via llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of llvm-dev at lists.llvm.org> wrote: Hi Ryan, Unless you can fold your immediate directly in an instruc...
2019 Nov 15
2
imm COPY generated by PHI elim not propagated
...nd the copy in > expand pseudo after regalloc. > > > On Nov 14, 2019, at 12:20 AM, Arsenault, Matthew < > Matthew.Arsenault at amd.com> wrote: > > > > In this case the load imm is foldable into the copy, once converted to a > mov. Directly folding this would be 4 v_mov_b32 instead of 5 produced > currently > > > > -Matt > > > > On 11/14/19, 07:20, "llvm-dev on behalf of Quentin Colombet via > llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of > llvm-dev at lists.llvm.org> wrote: > > > > Hi Ryan,...
2019 Nov 20
2
imm COPY generated by PHI elim not propagated
...nd pseudo after regalloc. >> >> > On Nov 14, 2019, at 12:20 AM, Arsenault, Matthew < >> Matthew.Arsenault at amd.com> wrote: >> > >> > In this case the load imm is foldable into the copy, once converted to >> a mov. Directly folding this would be 4 v_mov_b32 instead of 5 produced >> currently >> > >> > -Matt >> > >> > On 11/14/19, 07:20, "llvm-dev on behalf of Quentin Colombet via >> llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of >> llvm-dev at lists.llvm.org> wrote: &gt...
2017 Jun 13
2
Implementing cross-thread reduction in the AMDGPU backend
...d to do a code sequence that looks like >>>> this, modified from >>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >>>> v_foo_f32 with the appropriate operation): >>>> >>>> ; v0 is the input register >>>> v_mov_b32 v1, v0 >>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 >>>> v_nop // Add two independent instructions to avoid a data hazard >>>&gt...
2017 Jun 14
0
Implementing cross-thread reduction in the AMDGPU backend
...fied from >>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ >>>>>> (replace >>>>>> v_foo_f32 with the appropriate operation): >>>>>> >>>>>> ; v0 is the input register >>>>>> v_mov_b32 v1, v0 >>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add two >>>>>> independent instructions to av...
2017 Jun 12
4
Implementing cross-thread reduction in the AMDGPU backend
...ut I can think of a few concerns/questions. First of all, to implement the prefix scan, we'll need to do a code sequence that looks like this, modified from http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace v_foo_f32 with the appropriate operation): ; v0 is the input register v_mov_b32 v1, v0 v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add two independent instructions to avoid a data hazard v_nop v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4 v_nop // Add tw...
2017 Jun 12
2
Implementing cross-thread reduction in the AMDGPU backend
...mplement >> the prefix scan, we'll need to do a code sequence that looks like >> this, modified from >> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >> v_foo_f32 with the appropriate operation): >> >> ; v0 is the input register >> v_mov_b32 v1, v0 >> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 >> v_nop // Add two independent instructions to avoid a data hazard >> v_nop >> v_foo_f32 v1, v1, v1 ro...
2019 Nov 13
2
imm COPY generated by PHI elim not propagated
I have some code such that: vgpr1 = mov 0 branch bb bb: PHI vgpr2 = vgpr1, …. PHI vgpr3 = vgpr1, …. PHI vgpr4 = vgpr1, …. PHI vgpr5 = vgpr1, …. PHI node elimination is generating copies for all these PHIs (and hoisting them) as such: vgpr1 = 0 vgpr20 = COPY vgpr1 // old vgpr2 vgpr30 = COPY vgpr1 // old vgpr3 vgpr40 = COPY vgpr1 // old vgpr4 vgpr 50 = COPY vgprt1 // old vgpr5 I expect the zero
2015 Oct 24
2
[AMDGPU] AMDGPUAsmParser fails to parse several instructions
Thanks you. I'm new to LLVM backend, so the help is much appreciated. On Sat, Oct 24, 2015 at 2:12 AM, Matt Arsenault <arsenm2 at gmail.com> wrote: > > > On Oct 23, 2015, at 3:36 AM, 李弘宇 via llvm-dev <llvm-dev at lists.llvm.org> > wrote: > > > The first line has the following error message: > > > > sop1-playground.s:1:15: error: invalid immediate: