thr3ads.net - search: "v_mov

Displaying 14 results from an estimated 14 matches for "v_mov_b32".

[LLVMdev] How should I update LiveIntervals after removing a use of a register?

2014 Apr 04

[LLVMdev] How should I update LiveIntervals after removing a use of a register?

Hi, I am working on a simple copy propagation pass for the R600 backend that propagates immediates rather than registers. For example, I want to transform: ... %vreg1 = V_MOV_B32 1 %vreg2 = V_ADD_I32 %vreg1, %vreg0 ... into: %vreg1 = V_MOV_B32 1 ; <- Only delete this if it is dead %vreg2 = V_ADD_I32 1, %vreg0 For best results, I am trying to run this pass after the TwoAddressInstruction pass, which means I need to preserve the LiveIntervals analysis. My question is:...

Is this undefined behavior optimization legal?

2016 Oct 03

Is this undefined behavior optimization legal?

...a 32-bit vector, a common way to implement this function on a target with 32-bit registers would be to zero initialize a 32-bit register to hold the initial vector and then 'mask' and 'or' the inserted value with the initial vector. In AMDGPU assembly it would look something like: v_mov_b32 v0, 0 v_cvt_u32_f32_e32 v1, s0 v_and_b32 v1, v1, 0x000000ff v_or_b32 v0, v0, v1 The optimization the SelectionDAG does for us in this function, though, ends up removing the mask operation. Which gives us: v_mov_b32 v0, 0 v_cvt_u32_f32_e32 v1, s0 v_or_b32 v0, v0, v1 The reason the SelectionDAG i...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 15

Implementing cross-thread reduction in the AMDGPU backend

...;>>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >>>>>>>> v_foo_f32 with the appropriate operation): >>>>>>>> >>>>>>>> ; v0 is the input register >>>>>>>> v_mov_b32 v1, v0 >>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 >>>>>>>> v_nop // Add two...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 15

Implementing cross-thread reduction in the AMDGPU backend

...cn-assembly-cross-lane-operations/ >>>>>>>>> (replace >>>>>>>>> v_foo_f32 with the appropriate operation): >>>>>>>>> >>>>>>>>> ; v0 is the input register >>>>>>>>> v_mov_b32 v1, v0 >>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add >>>>>&gt...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 14

Implementing cross-thread reduction in the AMDGPU backend

...>>>> this, modified from >>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >>>>>> v_foo_f32 with the appropriate operation): >>>>>> >>>>>> ; v0 is the input register >>>>>> v_mov_b32 v1, v0 >>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 >>>>>> v_nop // Add two independent instructions to avo...

imm COPY generated by PHI elim not propagated

2019 Nov 14

imm COPY generated by PHI elim not propagated

In this case the load imm is foldable into the copy, once converted to a mov. Directly folding this would be 4 v_mov_b32 instead of 5 produced currently -Matt On 11/14/19, 07:20, "llvm-dev on behalf of Quentin Colombet via llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of llvm-dev at lists.llvm.org> wrote: Hi Ryan, Unless you can fold your immediate directly in an instruc...

imm COPY generated by PHI elim not propagated

2019 Nov 15

imm COPY generated by PHI elim not propagated

...nd the copy in > expand pseudo after regalloc. > > > On Nov 14, 2019, at 12:20 AM, Arsenault, Matthew < > Matthew.Arsenault at amd.com> wrote: > > > > In this case the load imm is foldable into the copy, once converted to a > mov. Directly folding this would be 4 v_mov_b32 instead of 5 produced > currently > > > > -Matt > > > > On 11/14/19, 07:20, "llvm-dev on behalf of Quentin Colombet via > llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of > llvm-dev at lists.llvm.org> wrote: > > > > Hi Ryan,...

imm COPY generated by PHI elim not propagated

2019 Nov 20

imm COPY generated by PHI elim not propagated

...nd pseudo after regalloc. >> >> > On Nov 14, 2019, at 12:20 AM, Arsenault, Matthew < >> Matthew.Arsenault at amd.com> wrote: >> > >> > In this case the load imm is foldable into the copy, once converted to >> a mov. Directly folding this would be 4 v_mov_b32 instead of 5 produced >> currently >> > >> > -Matt >> > >> > On 11/14/19, 07:20, "llvm-dev on behalf of Quentin Colombet via >> llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of >> llvm-dev at lists.llvm.org> wrote: &gt...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 13

Implementing cross-thread reduction in the AMDGPU backend

...d to do a code sequence that looks like >>>> this, modified from >>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >>>> v_foo_f32 with the appropriate operation): >>>> >>>> ; v0 is the input register >>>> v_mov_b32 v1, v0 >>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 >>>> v_nop // Add two independent instructions to avoid a data hazard >>>&gt...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 14

Implementing cross-thread reduction in the AMDGPU backend

...fied from >>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ >>>>>> (replace >>>>>> v_foo_f32 with the appropriate operation): >>>>>> >>>>>> ; v0 is the input register >>>>>> v_mov_b32 v1, v0 >>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add two >>>>>> independent instructions to av...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 12

Implementing cross-thread reduction in the AMDGPU backend

...ut I can think of a few concerns/questions. First of all, to implement the prefix scan, we'll need to do a code sequence that looks like this, modified from http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace v_foo_f32 with the appropriate operation): ; v0 is the input register v_mov_b32 v1, v0 v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add two independent instructions to avoid a data hazard v_nop v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4 v_nop // Add tw...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 12

Implementing cross-thread reduction in the AMDGPU backend

...mplement >> the prefix scan, we'll need to do a code sequence that looks like >> this, modified from >> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >> v_foo_f32 with the appropriate operation): >> >> ; v0 is the input register >> v_mov_b32 v1, v0 >> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 >> v_nop // Add two independent instructions to avoid a data hazard >> v_nop >> v_foo_f32 v1, v1, v1 ro...

imm COPY generated by PHI elim not propagated

2019 Nov 13

imm COPY generated by PHI elim not propagated

I have some code such that: vgpr1 = mov 0 branch bb bb: PHI vgpr2 = vgpr1, …. PHI vgpr3 = vgpr1, …. PHI vgpr4 = vgpr1, …. PHI vgpr5 = vgpr1, …. PHI node elimination is generating copies for all these PHIs (and hoisting them) as such: vgpr1 = 0 vgpr20 = COPY vgpr1 // old vgpr2 vgpr30 = COPY vgpr1 // old vgpr3 vgpr40 = COPY vgpr1 // old vgpr4 vgpr 50 = COPY vgprt1 // old vgpr5 I expect the zero

[AMDGPU] AMDGPUAsmParser fails to parse several instructions

2015 Oct 24

[AMDGPU] AMDGPUAsmParser fails to parse several instructions

Thanks you. I'm new to LLVM backend, so the help is much appreciated. On Sat, Oct 24, 2015 at 2:12 AM, Matt Arsenault <arsenm2 at gmail.com> wrote: > > > On Oct 23, 2015, at 3:36 AM, 李弘宇 via llvm-dev <llvm-dev at lists.llvm.org> > wrote: > > > The first line has the following error message: > > > > sop1-playground.s:1:15: error: invalid immediate:

search for: v_mov_b32