shivam gupta via llvm-dev
2020-Feb-25 18:18 UTC
[llvm-dev] Adding new vector instructions to LLVM Sparc backend
Hello all, As a major degree project, I started working on adding vector instruction to the LLVM Sparc(modify for AJIT processor) backend. My work is to implement VADDD, VSUBD, VUMULD, VSMULD instructions. Their instruction format is as follows:- 31-30 op (always 10) 29-25 rd 24-19 op3 18-14 rs1 13 i (always 1) 12-10 (unused) 9-7 (datatype 8->001, 16->010, 32->100) 6-5 (always 10) 4-0 (rs2) https://llvm.org/docs/ExtendingLLVM.html suggest me to use LLVM Custom Intrinsic to represent this VADDD operation. Is there any detail example code for other architectures available to look at? Am I need to define a new class in SparcInsFormat.td <https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Sparc/SparcInstrFormats.td#L106> because these instructions can't use predefined format-3 class of other arithmetic instructions(8-bit felid of asi changed to specify vector datatype)? Does the implementation of Sparc VIS <https://github.com/llvm/llvm-project/blob/master/llvm/lib/Target/Sparc/SparcInstrVIS.td> resemble with these instructions? May some LLVM backend experts give me an initial idea on what steps should I take to add these instructions? I have gone through LLVM target-independent code generator documentation. SPARC architecture manual and AJIT processor ISA is attached to the mail. https://www.gaisler.com/doc/sparcv8.pdf Thanks and Regards, Shivam -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200225/86745946/attachment-0001.html> -------------- next part -------------- 64-bit ISA extensions to the AJIT processor Madhav Desai 1. Overview -------- The AJIT processor implements the Sparc-V8 ISA. We propose to extend this ISA to provide support for a native 64-bit integer datatype. The proposed extensions use the existing instruction encodings to the maximum extent possible. All proposed extensions are RegisterXRegister -> Register,Condition-codes type instructions. The load/store instructions are not modified. We list the additional instructions in the subsequent sections. In each case, only the differences in the encoding relative to an existing Sparc-V8 instruction are provided. 2. Integer-unit extensions: Arithmetic-logic instructions ------------------------------------------------------- These instructions provide 64-bit arithmetic/logic support in the integer unit. The instructions work on 64-bit register pairs in most cases. Register-pairs are identified by a 5-bit even number (lowest bit must be 0). ADDD encoding: same as ADD, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) + rs2(pair) ADDDCC encoding: same as ADDCC, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) + rs2(pair), set Z,N SUBD encoding: same as SUB, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) - rs2(pair) SUBDCC encoding: same as SUBCC, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) - rs2(pair), set Z,N // shifts SLLD encoding: same as SLL, but with Instr[6:5]=2. if imm bit (Instr[13]) is 1, then Instr[5:0] is the shift-amount. else shift-amount is the lowest 5 bits of rs2. Note that rs2 is a 32-bit register. rd(pair) <- rs1(pair) << shift-amount SRLD encoding: same as SRL, but with Instr[6:5]=2. if imm bit (Instr[13]) is 1, then Instr[5:0] is the shift-amount. else shift-amount is the lowest 5 bits of rs2. Note that rs2 is a 32-bit register. rd(pair) <- rs1(pair) >> shift-amount SRAD encoding: same as SRA, but with Instr[6:5]=2. if imm bit (Instr[13]) is 1, then Instr[5:0] is the shift-amount. else shift-amount is the lowest 5 bits of rs2. Note that rs2 is a 32-bit register. rd(pair) <- rs1(pair) >> shift-amount (with sign extension). // mul/div UMULD encoding: same as UMUL, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) * rs2(pair) UMULDCC encoding: same as UMULCC, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) * rs2(pair), sets Z, Ovflow SMULD encoding: same as SMULD, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) * rs2(pair) (signed) SMULDCC encoding: same as SMULCC, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) * rs2(pair) (signed) sets condition codes Z,N,Ovflow UDIVD encoding: same as UDIV, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) / rs2(pair) note: can generate div-by-zero trap. UDIVDCC encoding: same as UDIVCC, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) / rs2(pair) sets condition codes Z,Ovflow note: can generate div-by-zero trap. SDIVD encoding: same as SDIV, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) / rs2(pair) (signed) SDIVDCC encoding: same as SDIVCC, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) / rs2(pair) (signed) sets condition codes Z,N,Ovflow note: can generate div-by-zero trap. // 64-bit logical. ORD encoding: same as OR, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) | rs2(pair) ORDCC encoding: same as ORCC, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) | rs2(pair), sets Z. ORDN encoding: same as ORN, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) | (~rs2(pair)) ORDNCC encoding: same as ORNCC, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) | (~rs2(pair)), sets Z sets Z. XORD encoding: same as XOR, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) ^ rs2(pair) XORDCC encoding: same as XORCC, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) ^ rs2(pair), sets Z sets Z. XNORD encoding: same as XNOR, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) ^ rs2(pair) XNORDCC encoding: same as XNORCC, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) ^ rs2(pair), sets Z ANDD encoding: same as AND, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) . rs2(pair) ANDDCC encoding: same as ANDCC, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) . rs2(pair), sets Z ANDDN encoding: same as ANDN, but with Instr[13]=0 (i=0), and Instr[5]=1. rd(pair) <- rs1(pair) . (~rs2(pair)) ANDDNCC encoding: same as ANDNCC, but with Instr[13]=0 (i=0), and Instr[5]=1. rd <- rs1 . (~rs2), sets Z 3. Integer-unit extensions: SIMD instructions ------------------------------------------------------- These instructions are vector instructions which work on two source registers (each a 64 bit register pair), and produce a 64-bit vector result. The vector elements can be 8-bit/16-bit/32-bit. VADDD8, VADDD16, VADDD32 encoding: same as ADDD, but with Instr[13]=0 (i=0), and Instr[6:5]=2. bits Instr[9:7] are a 3-bit field, which specify the data type 001 byte (VADDD8) 010 half-word (16-bits) (VADDD16) 100 word (32-bits) (VADDD32) performs a vector operation by considering the 64-bit operands as a vector of objects with specified data-type. vadd8 rs1,rs2, rd vadd16 vadd32 VSUBD8, VSUBD16, VSUBD32 encoding: same as SUBD, but with Instr[13]=0 (i=0), and Instr[6:5]=2. bits Instr[9:7] are a 3-bit field, which specify the data type 001 byte (VSUBD8) 010 half-word (16-bits) (VSUBD16) 100 word (32-bits) (VSUBD32) performs a vector operation by considering the 64-bit operands as a vector of objects with specified data-type. VUMULD8, VUMULD16, VUMULD32 encoding: same as UMULD, but with Instr[13]=0 (i=0), and Instr[6:5]=2. bits Instr[9:7] are a 3-bit field, which specify the data type 001 byte (VMULD8) 010 half-word (16-bits) (VMULD16) 100 word (32-bits) (VMULD32) performs a vector operation by considering the 64-bit operands as a vector of objects with specified data-type. VSMULD8, VSUMLD16, VSMULD32 encoding: same as SMULD, but with Instr[13]=0 (i=0), and Instr[6:5]=2. bits Instr[9:7] are a 3-bit field, which specify the data type 001 byte (VSMULD8) 010 half-word (16-bits) (VSMULD16) 100 word (32-bits) (VSMULD32) performs a vector operation by considering the 64-bit operands as a vector of objects with specified data-type. 4. Integer-unit extensions: SIMD instructions ------------------------------------------------------- These instructions are vector instructions which reduce a source register to a byte result. // byte-reduce or ADDDBYTER op=2, op3[3:0]=0xd, op3[5:4]=0x2, contents[7:0] of rs2 specify a mask. encoding Instr[31:30] (op) = 0x2 Instr[29:25] (rd) 32-bit register. Instr[24:19] (op3) = 101101 Instr[18:14] (rs1) lowest bit assumed 0. Instr[13] (i) = 0 (ignored) Instr[12:5] (zero) Instr[4:0] (rs2) 32-bit register is read. rd <- (rs1_7.m7 + rs1_6.m6 + rs1_5.m5 ... + rs1_0.m0) (The final sum will be a 13-bit number, stored in the least significant bytes. It is up to software to decide which byte(s) to use). addbyter %rs1, %rs2/imm, rd // byte-reduce or ORDBYTER op=2, op3[3:0]=0xe, op3[5:4]=0x2, contents[7:0] of rs2 specify a mask. encoding Instr[31:30] (op) = 0x2 Instr[29:25] (rd) rd is a 32-bit register. Instr[24:19] (op3) = 101110 Instr[18:14] (rs1) lowest bit assumed 0. Instr[13] (i) = 0 (ignored) Instr[12:5] (zero) Instr[4:0] (rs2) 32-bit register is read. rd <- (rs1_7.m7 | rs1_6.m6 | rs1_5.m5 ... | rs1_0.m0) // byte-reduce and ANDDBYTER op=2, op3[3:0]=0xf, op3[5:4]=0x2, contents[7:0] of rs2 specify a mask. encoding Instr[31:30] (op) = 0x2 Instr[29:25] (rd) rd is a 32-bit register. Instr[24:19] (op3) = 101111 Instr[18:14] (rs1) lowest bit assumed 0. Instr[13] (i) = 0 (ignored) Instr[12:5] (zero) Instr[4:0] (rs2) 32-bit register is read. rd <- ( (m7 ? rs1_7 : 0xff) . (m6 ? rs1_6 : 0xff) .... (m0 ? rs1_0 : 0xff)) // byte-reduce xor XORDBYTER op=2, op3[3:0]=0xe, op3[5:4]=0x3, contents[7:0] of rs2 specify a mask. encoding Instr[31:30] (op) = 0x2 Instr[29:25] (rd) rd is a 32-bit register. Instr[24:19] (op3) = 111110 Instr[18:14] (rs1) lowest bit assumed 0. Instr[13] (i) = 0 (ignored) Instr[12:5] (zero) Instr[4:0] (rs2) 32-bit register is read. rd <- (rs1_7.m7 ^ rs1_6.m6 ^ rs1_5.m5 ... ^ rs1_0.m0) // positions-of-zero-bytes in d-word. ZBYTEDPOS op=2, op3[3:0]=0xf, op3[5:4]=0x3, contents[7:0] of rs2/imm-value specify a mask. encoding Instr[31:30] (op) = 0x2 Instr[29:25] (rd) rd is a 32-bit register. Instr[24:19] (op3) = 111111 Instr[18:14] (rs1) lowest bit assumed 0. Instr[13] (i) = if 0, use rs2, else Instr[7:0] Instr[12:5] = 0 (ignored if i=0) Instr[4:0] (rs2, if i=0) 32-bit register is read. rd <- [b7_zero b6_zero b5_zero b4_zero .. b0_zero] (if mask-bit is zero then b*_zero is zero) 5. Vector floating point instructions --------------------------------------- These are vector float operations which work on two single precision operand pairs to produce two single precision results. // SIMD float ops. // NaN propagated, but no traps. // For each of these, rs1,rs2,rd are // considered even numbers pointing to // a floating point register-pair. // VFADD op=2, op3=0x34, opf=0x142 vfadd %f1, %f2, %f3 VFSUB op=2, op3=0x34, opf=0x146 VFMUL op=2, op3=0x34, opf=0x14a VFDIV op=2, op3=0x34, opf=0x14e VFSQRT op=2, op3=0x34, opf=0x12a 6. CSWAP insruction --------------------------------------- The Sparc-V8 ISA does not include a compare-and-swap (CAS) instruction which is very useful in achieving consensus among distributed agents when the number of agents is > 2. We introduce a CSWAP instruction in two flavours CSWAPD rs1, rs2-pair/immediate, rd-pair op=3 op3= 10 1111 (rest of instruction similar to SWAP) CSWAPDA rs1, rs2-pair/immediate, rd-pair, asi op=3 op3= 11 1111 (rest of instruction similar to SWAPA) The semantics of the instruction (the entire sequence is atomic) TMPVAL = mem[rs1] (load double, lock system bus) if <rs2-pair/immediate> == TMPVAL (store double, unlock) mem[rs1] = <rd-pair> <rd-pair> = TMPVAL else (store double, unlock) mem[rs1] = TMPVAL The write under else is redundant but is required in order to unlock the bus. Similar to SWAP, - mem[rs1] is left either with its value prior to the instruction or with the value in rd-pair. - <rd-pair> is left either with its value prior to the instruction or with the value in mem[rs1]. The processor can check rd-pair after execution to confirm if the swap succeeded.
Kai Nacke via llvm-dev
2020-Feb-26 08:36 UTC
[llvm-dev] Adding new vector instructions to LLVM Sparc backend
Hi Shivam, have a look at this talk: https://archive.fosdem.org/2015/schedule/event/llvm_internal_asm/ This shows how to add new instructions to LLVM. Regards, Kai On 25.02.2020 19:18, shivam gupta via llvm-dev wrote:> Hello all, > > As a major degree project, I started working on adding vector > instruction to the LLVM Sparc(modify for AJIT processor) backend. > > My work is to implement VADDD, VSUBD, VUMULD, VSMULD instructions. > > Their instruction format is as follows:- > > 31-30 op (always 10) > > 29-25 rd > > 24-19 op3 > > 18-14 rs1 > > 13 i (always 1) > > 12-10 (unused) > > 9-7 (datatype 8->001, 16->010, 32->100) > > 6-5 (always 10) > > 4-0 (rs2) > > > https://llvm.org/docs/ExtendingLLVM.html > <https://llvm.org/docs/ExtendingLLVM.html> suggest me to use LLVM Custom > Intrinsic to represent this VADDD operation. Is there any detail example > code for other architectures available to look at? > > Am I need to define a new class inSparcInsFormat.td > <https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Sparc/SparcInstrFormats.td#L106> > because these instructions can't use predefined format-3 class of other > arithmetic instructions(8-bit felid of asi changed to specify vector > datatype)? > > Does the implementation of Sparc VIS > <https://github.com/llvm/llvm-project/blob/master/llvm/lib/Target/Sparc/SparcInstrVIS.td> > resemble with these instructions? > > May some LLVM backend experts give me an initial idea on what steps > should I take to add these instructions? > > I have gone through LLVM target-independent code generator documentation. > > SPARC architecture manual and AJIT processor ISA is attached to the mail. > https://www.gaisler.com/doc/sparcv8.pdf > <https://www.gaisler.com/doc/sparcv8.pdf> > > Thanks and Regards, > Shivam > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >