Displaying 14 results from an estimated 14 matches for "v_mov_b32".
2014 Apr 04
2
[LLVMdev] How should I update LiveIntervals after removing a use of a register?
Hi,
I am working on a simple copy propagation pass for the R600 backend that
propagates immediates rather than registers. For example, I want to
transform:
...
%vreg1 = V_MOV_B32 1
%vreg2 = V_ADD_I32 %vreg1, %vreg0
...
into:
%vreg1 = V_MOV_B32 1 ; <- Only delete this if it is dead
%vreg2 = V_ADD_I32 1, %vreg0
For best results, I am trying to run this pass after the
TwoAddressInstruction pass, which means I need to preserve
the LiveIntervals analysis.
My question is:...
2016 Oct 03
5
Is this undefined behavior optimization legal?
...a 32-bit vector, a common way to implement this function on a target
with 32-bit registers would be to zero initialize a 32-bit register to hold
the initial vector and then 'mask' and 'or' the inserted value with the
initial vector. In AMDGPU assembly it would look something like:
v_mov_b32 v0, 0
v_cvt_u32_f32_e32 v1, s0
v_and_b32 v1, v1, 0x000000ff
v_or_b32 v0, v0, v1
The optimization the SelectionDAG does for us in this function, though, ends
up removing the mask operation. Which gives us:
v_mov_b32 v0, 0
v_cvt_u32_f32_e32 v1, s0
v_or_b32 v0, v0, v1
The reason the SelectionDAG i...
2017 Jun 15
2
Implementing cross-thread reduction in the AMDGPU backend
...;>>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace
>>>>>>>> v_foo_f32 with the appropriate operation):
>>>>>>>>
>>>>>>>> ; v0 is the input register
>>>>>>>> v_mov_b32 v1, v0
>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3
>>>>>>>> v_nop // Add two...
2017 Jun 15
1
Implementing cross-thread reduction in the AMDGPU backend
...cn-assembly-cross-lane-operations/
>>>>>>>>> (replace
>>>>>>>>> v_foo_f32 with the appropriate operation):
>>>>>>>>>
>>>>>>>>> ; v0 is the input register
>>>>>>>>> v_mov_b32 v1, v0
>>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
>>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add
>>>>>>...
2017 Jun 14
5
Implementing cross-thread reduction in the AMDGPU backend
...>>>> this, modified from
>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace
>>>>>> v_foo_f32 with the appropriate operation):
>>>>>>
>>>>>> ; v0 is the input register
>>>>>> v_mov_b32 v1, v0
>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
>>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3
>>>>>> v_nop // Add two independent instructions to avo...
2019 Nov 14
2
imm COPY generated by PHI elim not propagated
In this case the load imm is foldable into the copy, once converted to a mov. Directly folding this would be 4 v_mov_b32 instead of 5 produced currently
-Matt
On 11/14/19, 07:20, "llvm-dev on behalf of Quentin Colombet via llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of llvm-dev at lists.llvm.org> wrote:
Hi Ryan,
Unless you can fold your immediate directly in an instruc...
2019 Nov 15
2
imm COPY generated by PHI elim not propagated
...nd the copy in
> expand pseudo after regalloc.
>
> > On Nov 14, 2019, at 12:20 AM, Arsenault, Matthew <
> Matthew.Arsenault at amd.com> wrote:
> >
> > In this case the load imm is foldable into the copy, once converted to a
> mov. Directly folding this would be 4 v_mov_b32 instead of 5 produced
> currently
> >
> > -Matt
> >
> > On 11/14/19, 07:20, "llvm-dev on behalf of Quentin Colombet via
> llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of
> llvm-dev at lists.llvm.org> wrote:
> >
> > Hi Ryan,...
2019 Nov 20
2
imm COPY generated by PHI elim not propagated
...nd pseudo after regalloc.
>>
>> > On Nov 14, 2019, at 12:20 AM, Arsenault, Matthew <
>> Matthew.Arsenault at amd.com> wrote:
>> >
>> > In this case the load imm is foldable into the copy, once converted to
>> a mov. Directly folding this would be 4 v_mov_b32 instead of 5 produced
>> currently
>> >
>> > -Matt
>> >
>> > On 11/14/19, 07:20, "llvm-dev on behalf of Quentin Colombet via
>> llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of
>> llvm-dev at lists.llvm.org> wrote:
>...
2017 Jun 13
2
Implementing cross-thread reduction in the AMDGPU backend
...d to do a code sequence that looks like
>>>> this, modified from
>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace
>>>> v_foo_f32 with the appropriate operation):
>>>>
>>>> ; v0 is the input register
>>>> v_mov_b32 v1, v0
>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3
>>>> v_nop // Add two independent instructions to avoid a data hazard
>>>>...
2017 Jun 14
0
Implementing cross-thread reduction in the AMDGPU backend
...fied from
>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/
>>>>>> (replace
>>>>>> v_foo_f32 with the appropriate operation):
>>>>>>
>>>>>> ; v0 is the input register
>>>>>> v_mov_b32 v1, v0
>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>>>>>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
>>>>>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add two
>>>>>> independent instructions to av...
2017 Jun 12
4
Implementing cross-thread reduction in the AMDGPU backend
...ut
I can think of a few concerns/questions. First of all, to implement
the prefix scan, we'll need to do a code sequence that looks like
this, modified from
http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace
v_foo_f32 with the appropriate operation):
; v0 is the input register
v_mov_b32 v1, v0
v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3
v_nop // Add two independent instructions to avoid a data hazard
v_nop
v_foo_f32 v1, v1, v1 row_shr:4 bank_mask:0xe // Instruction 4
v_nop // Add tw...
2017 Jun 12
2
Implementing cross-thread reduction in the AMDGPU backend
...mplement
>> the prefix scan, we'll need to do a code sequence that looks like
>> this, modified from
>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace
>> v_foo_f32 with the appropriate operation):
>>
>> ; v0 is the input register
>> v_mov_b32 v1, v0
>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
>> v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3
>> v_nop // Add two independent instructions to avoid a data hazard
>> v_nop
>> v_foo_f32 v1, v1, v1 ro...
2019 Nov 13
2
imm COPY generated by PHI elim not propagated
I have some code such that:
vgpr1 = mov 0
branch bb
bb:
PHI vgpr2 = vgpr1, ….
PHI vgpr3 = vgpr1, ….
PHI vgpr4 = vgpr1, ….
PHI vgpr5 = vgpr1, ….
PHI node elimination is generating copies for all these PHIs (and hoisting
them) as such:
vgpr1 = 0
vgpr20 = COPY vgpr1 // old vgpr2
vgpr30 = COPY vgpr1 // old vgpr3
vgpr40 = COPY vgpr1 // old vgpr4
vgpr 50 = COPY vgprt1 // old vgpr5
I expect the zero
2015 Oct 24
2
[AMDGPU] AMDGPUAsmParser fails to parse several instructions
Thanks you. I'm new to LLVM backend, so the help is much appreciated.
On Sat, Oct 24, 2015 at 2:12 AM, Matt Arsenault <arsenm2 at gmail.com> wrote:
>
> > On Oct 23, 2015, at 3:36 AM, 李弘宇 via llvm-dev <llvm-dev at lists.llvm.org>
> wrote:
>
> > The first line has the following error message:
> >
> > sop1-playground.s:1:15: error: invalid immediate: