thr3ads.net - llvm dev - [llvm-dev] How to implement load/store for vector predicate register [Jun 2020]

If this information is useful, please help other people find it:
Share via:

林政宗 via llvm-dev

2020-Jun-26 06:58 UTC

[llvm-dev] How to implement load/store for vector predicate register

Hi,


I am planning to expanding the pseudo instructions in
XXXTargetLowering::EmitInstrWithCustomInserter(), and use temporary virtual
registers as operands.
If I use virtual registers, do I need to mark them as "early clobber"?
I saw that sometimes they marked virtual register as "early clobber"
in EmitInstrWithCustomInserter() in MIPS backend.
What is the effect of marking a virtual register as "early clobber"
before RA?




Thanks,
Jerry










在 2020-06-25 20:29:30，"Hal Finkel" <hfinkel at anl.gov> 写道：

On 6/25/20 1:11 AM, 林政宗 via llvm-dev wrote:

Hi, there
I am writing an backend, and I met a problem.
We don't have load/store instructions for vector predicate registers(vpr for
short).
The hardware has 64 vector registers(vr for short) and 8 vector predicate
registers. And there is no move instructions between vr and vpr.
vr supports many operations, and vpr supports vpror, vprxor, vprand and vprinv
operations.
 A vr has 512 bits, and a vpr has 128 bits. vr is used for v16i32, v32i16,
v64i8. And a scalar register has 32 bits.
If we compare or add two v16i32, a element in vpr has 8 bits. If we compare or
add two v64i8, then a element in vpr has 2 bits(one bit for compare flag and one
bit for carry flag).
A element in vpr contains carry flag and compare flag.
 We have defined registers and a new type(vpr) for vector predicate registers in
backend.
Although there is no direct instruction to move vpr to vr or to move vr to vpr,
there is a method to work around this. And we have load/store instructions for
vr.
move vpr to vr for v32i16 (from vpr0 to vr1):
1    vclr    vr0   // clear vr0
2    ldi    r5, 0x00010001  // load immediate (compare bit mask for v32i16) to
scalar register r5
3    movr2vr.dup    vr2, r5  // duplicate content in r5 into vr2, 
4    vadd.t.s16    vr1, vr0, vr2, vpr0  //vector add if element compare bit is
set, element type is 16 bit signed integer, now we have moved compare bits from
vpr0 to vr1
5    ldi    r5, 0x00020002  // load immediate (carry bit mask for v32i16) to
scalar register r5
6    movr2vr.dup   vr2, r5  // duplicate content in r5 into vr2
7    vadd.c.s16    vr1, vr1, vr2, vpr0 // vr1 = vr1 + vr2, vector add if element
carry bit is set, element type is 16 bit signed integer, now we moved carry bits
from vpr0 to vr1 too.


mov vr to vpr for v32i16 (from vr1 to vpr0):
8    vclr    vr0  // clear vr0
9    ldi    r5, 0x00010001 // load immediate (compare bit mask for v32i16) to r5
10  movr2vr.dup    vr2, r5 // duplicate content of r5 into vr2
11  vand.u16    vr2, vr1, vr2  // vector and, element type is 16 bit unsigned
integer, vr2 = vr1 & vr2, now we have moved compare bits from vr1 to vr2 now
12  vslt.s16    vpr0, vr0, vr2  // vector set when less than, element type is 16
bit signed integer, now we have moved compare bits from vr1 to vpr0
13  ldi    r5, 0x00020002 // load immediate (carry bit mask for v32i16) to r5
14  movr2vr.dup    vr2, r5  // duplicate content of r5 into vr2
15  vand.u16    vr2, vr1, vr2  // vector and for element type 16 bit unsigned
integer, vr2 has carry bits now
16  ldi    r5, 0x7FFF7FFF  // max number for 16 bit signed integer
17  movr2vr.dup    vr3, r5  // duplicate r5 into vr3
18  vadd.s16  vr1, vr2, vr3, vpr0  // vpr0 has carry bits set now


Each vector type has a different instruction sequence, because the bit mask and
element type is different.
I have tried to lower load/store for vpr in XXXISelLowering.cpp. But there is no
guarantee that line 12 and line 18 would assign the same register for vpr0. vpr0
in line18 is an output and is not an input.
And vpr0 in line 12 and line 18 is parallel in SelectionDAG graph. They are both
output.
I think I would try to define three pseudo instructions for three vector type,
and expand the pseudo instruction into instruction sequence before register
allocation at next step. But I'm not sure it will work.
What should I do?





This somewhat depends on how you're modeling things, but a late-expanded
pseud-instructions seems like a workable approach. If the pseudo-instruction
needs temporary registers (and it looks like it does), then the
pseudo-instruction should take them as register operands (so that RA will
allocate them for you and you don't need to worry about scavenging them
later). You might, however, need to mark such operands as "early
clobber" to prevent RA  from assigning the same register as an input and
output (sometimes, depending on how the expanded code uses the registers, this
is necessary).

 -Hal







Thanks and best regards,
Jerry










 






 



_______________________________________________
LLVM Developers mailing list
llvm-dev at
lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200626/8b0b3d56/attachment.html>

Hal Finkel via llvm-dev

2020-Jun-26 17:11 UTC

head link

[llvm-dev] How to implement load/store for vector predicate register

On 6/26/20 1:58 AM, 林政宗 wrote:> Hi,
>
> I am planning to expanding the pseudo instructions in 
> XXXTargetLowering::EmitInstrWithCustomInserter(), and use temporary 
> virtual registers as operands.
> If I use virtual registers, do I need to mark them as "early
clobber"?

If I have an instruction XYZ, and it takes an input register VI, and an 
output register VO, such that the instruction:

   VO = XYZ VI

reads VI and computes VO, and if the value in VI is no longer needed 
after this instruction (or was undef in the first place), then the 
register allocator might assign the same physical register to both VI 
and VO. You might end up with:

RA = XYZ RA.

If XYZ is really a pseudo instruction, this might not be acceptable. You 
might need two distinct registers just because of how the expansion 
works. For example, maybe this expands to:

   VO = OP1 VI
   VO = OP2 VO, VI

note that, in this case, the expansion needs VI in two different places. 
If VO and VI are assigned to be the same register, the expansion just 
won't work correctly. In this case, you need earlyclobber on your 
pseudo-instruction.

> I saw that sometimes they marked virtual register as "early
clobber"
> in EmitInstrWithCustomInserter() in MIPS backend.
> What is the effect of marking a virtual register as "early
clobber"
> before RA?

I don't recall any effect.

  -Hal

>
> Thanks,
> Jerry
>
>
>
>
> 在 2020-06-25 20:29:30，"Hal Finkel" <hfinkel at anl.gov> 写道：
>
>     On 6/25/20 1:11 AM, 林政宗 via llvm-dev wrote:
>>     Hi, there
>>     I am writing an backend, and I met a problem.
>>     We don't have load/store instructions for vector predicate
>>     registers(vpr for short).
>>     The hardware has 64 vector registers(vr for short) and 8 vector
>>     predicate registers. And there is no move instructions between vr
>>     and vpr.
>>     vr supports many operations, and vpr supports vpror, vprxor,
>>     vprand and vprinv operations.
>>      A vr has 512 bits, and a vpr has 128 bits. vr is used for
>>     v16i32, v32i16, v64i8. And a scalar register has 32 bits.
>>     If we compare or add two v16i32, a element in vpr has 8 bits. If
>>     we compare or add two v64i8, then a element in vpr has 2 bits(one
>>     bit for compare flag and one bit for carry flag).
>>     A element in vpr contains carry flag and compare flag.
>>      We have defined registers and a new type(vpr) for vector
>>     predicate registers in backend.
>>     Although there is no direct instruction to move vpr to vr or to
>>     move vr to vpr, there is a method to work around this. And we
>>     have load/store instructions for vr.
>>     move vpr to vr for v32i16 (from vpr0 to vr1):
>>     1    vclr    vr0   // clear vr0
>>     2    ldi    r5, 0x00010001  // load immediate (compare bit mask
>>     for v32i16) to scalar register r5
>>     3    movr2vr.dup    vr2, r5  // duplicate content in r5 into vr2,
>>     4    vadd.t.s16    vr1, vr0, vr2, vpr0  //vector add if element
>>     compare bit is set, element type is 16 bit signed integer, now we
>>     have moved compare bits from vpr0 to vr1
>>     5    ldi    r5, 0x00020002  // load immediate (carry bit mask for
>>     v32i16) to scalar register r5
>>     6    movr2vr.dup   vr2, r5  // duplicate content in r5 into vr2
>>     7    vadd.c.s16    vr1, vr1, vr2, vpr0 // vr1 = vr1 + vr2, vector
>>     add if element carry bit is set, element type is 16 bit signed
>>     integer, now we moved carry bits from vpr0 to vr1 too.
>>
>>     mov vr to vpr for v32i16 (from vr1 to vpr0):
>>     8    vclr    vr0  // clear vr0
>>     9    ldi    r5, 0x00010001 // load immediate (compare bit mask
>>     for v32i16) to r5
>>     10  movr2vr.dup    vr2, r5 // duplicate content of r5 into vr2
>>     11  vand.u16    vr2, vr1, vr2  // vector and, element type is 16
>>     bit unsigned integer, vr2 = vr1 & vr2, now we have moved
compare
>>     bits from vr1 to vr2 now
>>     12  vslt.s16    vpr0, vr0, vr2 // vector set when less than,
>>     element type is 16 bit signed integer, now we have moved compare
>>     bits from vr1 to vpr0
>>     13  ldi    r5, 0x00020002 // load immediate (carry bit mask for
>>     v32i16) to r5
>>     14  movr2vr.dup    vr2, r5  // duplicate content of r5 into vr2
>>     15  vand.u16    vr2, vr1, vr2  // vector and for element type 16
>>     bit unsigned integer, vr2 has carry bits now
>>     16  ldi    r5, 0x7FFF7FFF  // max number for 16 bit signed integer
>>     17  movr2vr.dup    vr3, r5  // duplicate r5 into vr3
>>     18  vadd.s16  vr1, vr2, vr3, vpr0  // vpr0 has carry bits set now
>>
>>     Each vector type has a different instruction sequence, because
>>     the bit mask and element type is different.
>>     I have tried to lower load/store for vpr in XXXISelLowering.cpp.
>>     But there is no guarantee that line 12 and line 18 would assign
>>     the same register for vpr0. vpr0 in line18 is an output and is
>>     not an input.
>>     And vpr0 in line 12 and line 18 is parallel in SelectionDAG
>>     graph. They are both output.
>>     I think I would try to define three pseudo instructions for three
>>     vector type, and expand the pseudo instruction into instruction
>>     sequence before register allocation at next step. But I'm not
>>     sure it will work.
>>     What should I do?
>
>
>     This somewhat depends on how you're modeling things, but a
>     late-expanded pseud-instructions seems like a workable approach.
>     If the pseudo-instruction needs temporary registers (and it looks
>     like it does), then the pseudo-instruction should take them as
>     register operands (so that RA will allocate them for you and you
>     don't need to worry about scavenging them later). You might,
>     however, need to mark such operands as "early clobber" to
prevent
>     RA  from assigning the same register as an input and output
>     (sometimes, depending on how the expanded code uses the registers,
>     this is necessary).
>
>      -Hal
>
>
>>
>>     Thanks and best regards,
>>     Jerry
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>     _______________________________________________
>>     LLVM Developers mailing list
>>     llvm-dev at lists.llvm.org
>>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>     -- 
>     Hal Finkel
>     Lead, Compiler Technology and Programming Languages
>     Leadership Computing Facility
>     Argonne National Laboratory
>
>
>-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200626/3c20002e/attachment.html>

林政宗 via llvm-dev

2020-Jun-29 02:39 UTC

head link

[llvm-dev] How to implement load/store for vector predicate register

Hi,


After I lowered "store vpr" into "move vpr to vr, store vr"
at the SelectionDAG Legalize step, I met another problem.
I expanded the pseudo instruction "move vpr to vr" in
XXXTargetLowering::EmitInstrWithCustomInserter().
But the instructions after expansion don't meet the SSA requirement.
Before expansion:
  %4:vprregs = VSLT_u16 killed %2:vregs, killed %3:vregs   // vector set when
less than, element type is 16-bit unsigned integer, %2 and %3 are vector
registers, %4 is vector predicate register
  %5:vregs = PseudoMoveVPR2VR_e32 killed %4:vprregs  // move vpr to vr, 32
elements total
  VSTORE killed %5:vregs, %stack.3.v, 0 :: (store 64 into %ir.v, align 128) //
store vr into stack

After expansion:
112B      %4:vprregs = VSLT_u16 killed %2:vregs, killed %3:vregs
128B      %14:vregs = VCLR_VR    // clear vr
144B      %19:gregs = MOVi32 65537 // move 0x10001(compare bit mask) to scalar
register
160B      %15:vregs = MOVR2VR_DUP %19:gregs  // duplicate content of scalar into
vector register
176B      %16:vregs, %4:vprregs = V_ADD_t_u16 %14:vregs, %15:vregs,
%16:vregs(tied-def 0), %4:vprregs(tied-def 1) // conditional vector add, do
element add if element compare bit is set, %16 = %14 + %15, it reads vpr compare
bits,                                                                           
//and update vpr carry bits
192B      %20:gregs = MOVi32 131074   // move imm 0x20002(carry bit mask) to
scalar register
208B      %17:vregs = MOVR2VR_DUP %20:gregs  // duplicate content of scalar into
vector register
224B      %18:vregs, %4:vprregs = V_ADD_c_u16 %14:vregs, %17:vregs,
%18:vregs(tied-def 0), %4:vprregs(tied-def 1)  // conditional vector add, do
element add if element carry bit is set, %18 = %14 + %17, it reads vpr carry
bits,                                                                           
//and update vpr carry bits
240B      %5:vregs = V_OR_a_u16 %16:vregs, %18:vregs


The instruction definition of V_ADD_t_u16 has the vpr register in ins and outs
in td file, and there is a constraint that the two vpr register in ins and outs
should be same.
llc will crash after expansion.


 ********** PROCESS IMPLICIT DEFS **********
********** Function: test
llc:
/home/jerry/Develop/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:404:
llvm::MachineInstr* llvm::MachineRegisterInfo::getVRegDef(llvm::Register) const:
Assertion `(I.atEnd() || std::next(I) == def_instr_end()) &&
"getVRegDef assumes a single definition or no definition"' failed.
Stack dump:
0.      Program arguments: llc -march=dtu -mcpu=x -debug dtu-vcc-u16.ll
1.      Running pass 'Function Pass Manager' on module
'dtu-vcc-u16.ll'.
2.      Running pass 'Live Variable Analysis' on function
'@test'
 #0 0x00007efcc508c4e1 llvm::sys::PrintStackTrace(llvm::raw_ostream&)
/home/jerry/Develop/llvm-project/llvm/lib/Support/Unix/Signals.inc:564:0
 #1 0x00007efcc508c574 PrintStackTraceSignalHandler(void*)
/home/jerry/Develop/llvm-project/llvm/lib/Support/Unix/Signals.inc:625:0
 #2 0x00007efcc508a2fc llvm::sys::RunSignalHandlers()
/home/jerry/Develop/llvm-project/llvm/lib/Support/Signals.cpp:68:0
 #3 0x00007efcc508be5b SignalHandler(int)
/home/jerry/Develop/llvm-project/llvm/lib/Support/Unix/Signals.inc:406:0
 #4 0x00007efcc37484b0 (/lib/x86_64-linux-gnu/libc.so.6+0x354b0)
 #5 0x00007efcc3748428 raise
/build/glibc-Cl5G7W/glibc-2.23/signal/../sysdeps/unix/sysv/linux/raise.c:54:0
 #6 0x00007efcc374a02a abort /build/glibc-Cl5G7W/glibc-2.23/stdlib/abort.c:91:0
 #7 0x00007efcc3740bd7 __assert_fail_base
/build/glibc-Cl5G7W/glibc-2.23/assert/assert.c:92:0
 #8 0x00007efcc3740c82 (/lib/x86_64-linux-gnu/libc.so.6+0x2dc82)
 #9 0x00007efcc88e04b0 llvm::MachineRegisterInfo::getVRegDef(llvm::Register)
const
/home/jerry/Develop/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:403:0
#10 0x00007efcc8747235 llvm::LiveVariables::HandleVirtRegUse(unsigned int,
llvm::MachineBasicBlock*, llvm::MachineInstr&)
/home/jerry/Develop/llvm-project/llvm/lib/CodeGen/LiveVariables.cpp:133:0
#11 0x00007efcc87498b4 llvm::LiveVariables::runOnInstr(llvm::MachineInstr&,
llvm::SmallVectorImpl<unsigned int>&)
/home/jerry/Develop/llvm-project/llvm/lib/CodeGen/LiveVariables.cpp:544:0
#12 0x00007efcc8749d53 llvm::LiveVariables::runOnBlock(llvm::MachineBasicBlock*,
unsigned int)
/home/jerry/Develop/llvm-project/llvm/lib/CodeGen/LiveVariables.cpp:581:0
#13 0x00007efcc874a3fe
llvm::LiveVariables::runOnMachineFunction(llvm::MachineFunction&)
/home/jerry/Develop/llvm-project/llvm/lib/CodeGen/LiveVariables.cpp:649:0
#14 0x00007efcc8817b8c
llvm::MachineFunctionPass::runOnFunction(llvm::Function&)
/home/jerry/Develop/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:73:0
#15 0x00007efcc78cca01 llvm::FPPassManager::runOnFunction(llvm::Function&)
/home/jerry/Develop/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1482:0
#16 0x00007efcc78ccc9b llvm::FPPassManager::runOnModule(llvm::Module&)
/home/jerry/Develop/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1518:0
#17 0x00007efcc78cd0cf (anonymous
namespace)::MPPassManager::runOnModule(llvm::Module&)
/home/jerry/Develop/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1583:0
#18 0x00007efcc78cd88b llvm::legacy::PassManagerImpl::run(llvm::Module&)
/home/jerry/Develop/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1695:0
#19 0x00007efcc78cda9b llvm::legacy::PassManager::run(llvm::Module&)
/home/jerry/Develop/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1727:0
#20 0x0000000000445ba9 compileModule(char**, llvm::LLVMContext&)
/home/jerry/Develop/llvm-project/llvm/tools/llc/llc.cpp:620:0
#21 0x0000000000444064 main
/home/jerry/Develop/llvm-project/llvm/tools/llc/llc.cpp:356:0
#22 0x00007efcc3733830 __libc_start_main
/build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c:325:0
#23 0x0000000000441bf9 _start
(/home/jerry/Develop/llvm-project/build/bin/llc+0x441bf9)

Aborted (core dumped)


I think the reason is that there are three definitions of %4.
Is there a method to work around this? What should I do?






Thanks,
Jerry



















在 2020-06-27 01:11:07，"Hal Finkel" <hfinkel at anl.gov> 写道：




On 6/26/20 1:58 AM, 林政宗 wrote:

Hi,


I am planning to expanding the pseudo instructions in
XXXTargetLowering::EmitInstrWithCustomInserter(), and use temporary virtual
registers as operands.
If I use virtual registers, do I need to mark them as "early clobber"?




If I have an instruction XYZ, and it takes an input register VI, and an output
register VO, such that the instruction:

  VO = XYZ VI

reads VI and computes VO, and if the value in VI is no longer needed after this
instruction (or was undef in the first place), then the register allocator might
assign the same physical register to both VI and VO. You might end up with:

RA = XYZ RA.

If XYZ is really a pseudo instruction, this might not be acceptable. You might
need two distinct registers just because of how the expansion works. For
example, maybe this expands to:

  VO = OP1 VI
  VO = OP2 VO, VI


note that, in this case, the expansion needs VI in two different places. If VO
and VI are assigned to be the same register, the expansion just won't work
correctly. In this case, you need earlyclobber on your pseudo-instruction.





I saw that sometimes they marked virtual register as "early clobber"
in EmitInstrWithCustomInserter() in MIPS backend.
What is the effect of marking a virtual register as "early clobber"
before RA?




I don't recall any effect.

 -Hal








Thanks,
Jerry










在 2020-06-25 20:29:30，"Hal Finkel" <hfinkel at anl.gov> 写道：

On 6/25/20 1:11 AM, 林政宗 via llvm-dev wrote:

Hi, there
I am writing an backend, and I met a problem.
We don't have load/store instructions for vector predicate registers(vpr for
short).
The hardware has 64 vector registers(vr for short) and 8 vector predicate
registers. And there is no move instructions between vr and vpr.
vr supports many operations, and vpr supports vpror, vprxor, vprand and vprinv
operations.
 A vr has 512 bits, and a vpr has 128 bits. vr is used for v16i32, v32i16,
v64i8. And a scalar register has 32 bits.
If we compare or add two v16i32, a element in vpr has 8 bits. If we compare or
add two v64i8, then a element in vpr has 2 bits(one bit for compare flag and one
bit for carry flag).
A element in vpr contains carry flag and compare flag.
 We have defined registers and a new type(vpr) for vector predicate registers in
backend.
Although there is no direct instruction to move vpr to vr or to move vr to vpr,
there is a method to work around this. And we have load/store instructions for
vr.
move vpr to vr for v32i16 (from vpr0 to vr1):
1    vclr    vr0   // clear vr0
2    ldi    r5, 0x00010001  // load immediate (compare bit mask for v32i16) to
scalar register r5
3    movr2vr.dup    vr2, r5  // duplicate content in r5 into vr2, 
4    vadd.t.s16    vr1, vr0, vr2, vpr0  //vector add if element compare bit is
set, element type is 16 bit signed integer, now we have moved compare bits from
vpr0 to vr1
5    ldi    r5, 0x00020002  // load immediate (carry bit mask for v32i16) to
scalar register r5
6    movr2vr.dup   vr2, r5  // duplicate content in r5 into vr2
7    vadd.c.s16    vr1, vr1, vr2, vpr0 // vr1 = vr1 + vr2, vector add if element
carry bit is set, element type is 16 bit signed integer, now we moved carry bits
from vpr0 to vr1 too.


mov vr to vpr for v32i16 (from vr1 to vpr0):
8    vclr    vr0  // clear vr0
9    ldi    r5, 0x00010001 // load immediate (compare bit mask for v32i16) to r5
10  movr2vr.dup    vr2, r5 // duplicate content of r5 into vr2
11  vand.u16    vr2, vr1, vr2  // vector and, element type is 16 bit unsigned
integer, vr2 = vr1 & vr2, now we have moved compare bits from vr1 to vr2 now
12  vslt.s16    vpr0, vr0, vr2  // vector set when less than, element type is 16
bit signed integer, now we have moved compare bits from vr1 to vpr0
13  ldi    r5, 0x00020002 // load immediate (carry bit mask for v32i16) to r5
14  movr2vr.dup    vr2, r5  // duplicate content of r5 into vr2
15  vand.u16    vr2, vr1, vr2  // vector and for element type 16 bit unsigned
integer, vr2 has carry bits now
16  ldi    r5, 0x7FFF7FFF  // max number for 16 bit signed integer
17  movr2vr.dup    vr3, r5  // duplicate r5 into vr3
18  vadd.s16  vr1, vr2, vr3, vpr0  // vpr0 has carry bits set now


Each vector type has a different instruction sequence, because the bit mask and
element type is different.
I have tried to lower load/store for vpr in XXXISelLowering.cpp. But there is no
guarantee that line 12 and line 18 would assign the same register for vpr0. vpr0
in line18 is an output and is not an input.
And vpr0 in line 12 and line 18 is parallel in SelectionDAG graph. They are both
output.
I think I would try to define three pseudo instructions for three vector type,
and expand the pseudo instruction into instruction sequence before register
allocation at next step. But I'm not sure it will work.
What should I do?





This somewhat depends on how you're modeling things, but a late-expanded
pseud-instructions seems like a workable approach. If the pseudo-instruction
needs temporary registers (and it looks like it does), then the
pseudo-instruction should take them as register operands (so that RA will
allocate them for you and you don't need to worry about scavenging them
later). You might, however, need to mark such operands as "early
clobber" to prevent RA  from assigning the same register as an input and
output (sometimes, depending on how the expanded code uses the registers, this
is necessary).

 -Hal







Thanks and best regards,
Jerry










 






 



_______________________________________________
LLVM Developers mailing list
llvm-dev at
lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory




 

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200629/e3d325ca/attachment-0001.html>

llvm dev - Jun 2020 - How to implement load/store for vector predicate register

[llvm-dev] How to implement load/store for vector predicate register

[llvm-dev] How to implement load/store for vector predicate register

[llvm-dev] How to implement load/store for vector predicate register