thr3ads.net - llvm dev - [llvm-dev] multiply-accumulate instruction [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Chris.Dewhurst via llvm-dev

2015-Sep-18 14:19 UTC

[llvm-dev] multiply-accumulate instruction

I'm trying to define a multiply-accumulate instruction for the LEON
processor, a Subtarget of the Sparc target.

The documentation for the processor is as follows:

==To accelerate DSP algorithms, two multiply&accumulate instructions are
implemented: UMAC and SMAC. The UMAC performs an unsigned 16-bit multiply,
producing a 32-bit result, and adds the result to a 40-bit accumulator made up
by the 8 lsb bits from the %y register and the %asr18 register. The least
significant 32 bits are also written to the destination register. SMAC works
similarly but performs signed multiply and accumulate. The MAC instructions
execute in one clock but have two clocks latency, meaning that one pipeline
stall cycle will be inserted if the following instruction uses the destination
register of the MAC as a source operand.

Assembler syntax:
    smac rs1, reg_imm, rd

Operation:
    prod[31:0] = rs1[15:0] * reg_imm[15:0]
    result[39:0] = (Y[7:0] & %asr18[31:0]) + prod[31:0]
    (Y[7:0] & %asr18[31:0]) = result[39:0]
    rd = result[31:0]

%asr18 can be read and written using the rdasr and wrasr instructions.
==
I have the following in SparcInstrInfo to define the lowering rules for this
instruction, but I feel that this isn't likely to work as I need to somehow
tie together the fact that %Y, %ASR18 and %rd are all related to each other in
the output.

let Predicates = [HasLeon3, HasLeon4], Defs = [Y, ASR18], Uses = [Y, ASR18] in
def SMACrr :  F3_1<3, 0b111110,
                (outs IntRegs:$rd), (ins IntRegs:$rs1, IntRegs:$rs2,
ASRRegs:$asr18),
                 "smac $rs1, $rs2, $rd",
                 [(set i32:$rd,
                     (add i32:$asr18, (mul i32:$rs1, i32:$rs2)))] >;

Perhaps a well-chosen "let Constraints=" might be used here? If so,
I'm not sure I know what to put in there. If not, can anyone help me how I
might define the lowering rules for this instruction please?

Chris Dewhurst, University of Limerick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150918/9c2df28c/attachment.html>

Hal Finkel via llvm-dev

2015-Sep-18 14:46 UTC

head link

[llvm-dev] multiply-accumulate instruction

----- Original Message -----
> From: "Chris.Dewhurst via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: llvm-dev at lists.llvm.org
> Sent: Friday, September 18, 2015 9:19:48 AM
> Subject: [llvm-dev] multiply-accumulate instruction
> I’m trying to define a multiply-accumulate instruction for the LEON
> processor, a Subtarget of the Sparc target.
> The documentation for the processor is as follows:
> ==> To accelerate DSP algorithms, two multiply&accumulate
instructions
> are implemented: UMAC and SMAC. The UMAC performs an unsigned 16-bit
> multiply, producing a 32-bit result, and adds the result to a 40-bit
> accumulator made up by the 8 lsb bits from the %y register and the
> %asr18 register. The least significant 32 bits are also written to
> the destination register. SMAC works similarly but performs signed
> multiply and accumulate. The MAC instructions execute in one clock
> but have two clocks latency, meaning that one pipeline stall cycle
> will be inserted if the following instruction uses the destination
> register of the MAC as a source operand.
> Assembler syntax:
> smac rs1, reg_imm, rd
> Operation:
> prod[31:0] = rs1[15:0] * reg_imm[15:0]
> result[39:0] = (Y[7:0] & %asr18[31:0]) + prod[31:0]
> (Y[7:0] & %asr18[31:0]) = result[39:0]
> rd = result[31:0]
> %asr18 can be read and written using the rdasr and wrasr
> instructions.
> ==
> I have the following in SparcInstrInfo to define the lowering rules
> for this instruction, but I feel that this isn’t likely to work as I
> need to somehow tie together the fact that %Y, %ASR18 and %rd are
> all related to each other in the output.
> let Predicates = [HasLeon3, HasLeon4], Defs = [Y, ASR18], Uses = [Y,
> ASR18] in
> def SMACrr : F3_1<3, 0b111110,
> (outs IntRegs:$rd), (ins IntRegs:$rs1, IntRegs:$rs2, ASRRegs:$asr18),
> "smac $rs1, $rs2, $rd",
> [(set i32:$rd,
> (add i32:$asr18, (mul i32:$rs1, i32:$rs2)))] >;
> Perhaps a well-chosen “let Constraints=” might be used here? If so,
> I’m not sure I know what to put in there. If not, can anyone help me
> how I might define the lowering rules for this instruction please?You don't need to encode that relationship if the values placed in Y and
ASR18 will be ignored. If you want to use those results, I suspect that
you'll need to manually select the instruction in *ISelDAGToDAG.cpp,
grabbing the result from the fixed registers by generating a glued CopyFromReg
node.

-Hal 
> Chris Dewhurst, University of Limerick.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150918/2b4d6a02/attachment.html>

Chris.Dewhurst via llvm-dev

2015-Sep-18 14:51 UTC

head link

[llvm-dev] multiply-accumulate instruction

The initial thought I had was that I could ignore the ASR18 and Y registers, but
the SMAC instruction is designed to be used in a loop and ASR18 (and Y) feed
back into the inputs each time too.

Does this imply that a hand-coded ISelDAGToDAG.cpp implementation is going to be
virtually required?

Any chance of some Pseudo-code? I haven’t had to write any ISelDAGToDAG code up
to now and any starter would be appreciated.

From: Hal Finkel [mailto:hfinkel at anl.gov]
Sent: 18 September 2015 15:46
To: Chris.Dewhurst
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] multiply-accumulate instruction


________________________________
From: "Chris.Dewhurst via llvm-dev" <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Sent: Friday, September 18, 2015 9:19:48 AM
Subject: [llvm-dev] multiply-accumulate instruction
I’m trying to define a multiply-accumulate instruction for the LEON processor, a
Subtarget of the Sparc target.

The documentation for the processor is as follows:

==To accelerate DSP algorithms, two multiply&accumulate instructions are
implemented: UMAC and SMAC. The UMAC performs an unsigned 16-bit multiply,
producing a 32-bit result, and adds the result to a 40-bit accumulator made up
by the 8 lsb bits from the %y register and the %asr18 register. The least
significant 32 bits are also written to the destination register. SMAC works
similarly but performs signed multiply and accumulate. The MAC instructions
execute in one clock but have two clocks latency, meaning that one pipeline
stall cycle will be inserted if the following instruction uses the destination
register of the MAC as a source operand.

Assembler syntax:
    smac rs1, reg_imm, rd

Operation:
    prod[31:0] = rs1[15:0] * reg_imm[15:0]
    result[39:0] = (Y[7:0] & %asr18[31:0]) + prod[31:0]
    (Y[7:0] & %asr18[31:0]) = result[39:0]
    rd = result[31:0]

%asr18 can be read and written using the rdasr and wrasr instructions.
==
I have the following in SparcInstrInfo to define the lowering rules for this
instruction, but I feel that this isn’t likely to work as I need to somehow tie
together the fact that %Y, %ASR18 and %rd are all related to each other in the
output.

let Predicates = [HasLeon3, HasLeon4], Defs = [Y, ASR18], Uses = [Y, ASR18] in
def SMACrr :  F3_1<3, 0b111110,
                (outs IntRegs:$rd), (ins IntRegs:$rs1, IntRegs:$rs2,
ASRRegs:$asr18),
                 "smac $rs1, $rs2, $rd",
                 [(set i32:$rd,
                     (add i32:$asr18, (mul i32:$rs1, i32:$rs2)))] >;

Perhaps a well-chosen “let Constraints=” might be used here? If so, I’m not sure
I know what to put in there. If not, can anyone help me how I might define the
lowering rules for this instruction please?
You don't need to encode that relationship if the values placed in Y and
ASR18 will be ignored. If you want to use those results, I suspect that
you'll need to manually select the instruction in *ISelDAGToDAG.cpp,
grabbing the result from the fixed registers by generating a glued CopyFromReg
node.

 -Hal

Chris Dewhurst, University of Limerick.

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150918/45fcb095/attachment.html>

James Y Knight via llvm-dev

2015-Sep-18 15:39 UTC

head link

[llvm-dev] multiply-accumulate instruction

Do you only want to define assembler syntax for this, or do you need to be
able to be able to automatically emit it from some higher level construct?
I'd expect the former would be entirely sufficient, in which case this
should be sufficient:

let Predicates = [HasLeon3, HasLeon4], Defs = [Y, ASR18], Uses = [Y, ASR18]
in

def SMACrr :  F3_1<3, 0b111110,

                (outs IntRegs:$rd), (ins IntRegs:$rs1, IntRegs:$rs2),

                 "smac $rs1, $rs2, $rd",

                 []>;


If you want the latter, I'm not sure how you'd go about being able to
pattern-match it, because of the unusual 40 bit accumulate input and
output, and the unusual for sparc 16-bit inputs. Hopefully you don't really
need that. :)

On Fri, Sep 18, 2015 at 10:19 AM, Chris.Dewhurst via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I’m trying to define a multiply-accumulate instruction for the LEON
> processor, a Subtarget of the Sparc target.
>
>
>
> The documentation for the processor is as follows:
>
>
>
> ==>
> To accelerate DSP algorithms, two multiply&accumulate instructions are
> implemented: UMAC and SMAC. The UMAC performs an unsigned 16-bit multiply,
> producing a 32-bit result, and adds the result to a 40-bit accumulator made
> up by the 8 lsb bits from the %y register and the %asr18 register. The
> least significant 32 bits are also written to the destination register.
> SMAC works similarly but performs signed multiply and accumulate. The MAC
> instructions execute in one clock but have two clocks latency, meaning that
> one pipeline stall cycle will be inserted if the following instruction uses
> the destination register of the MAC as a source operand.
>
>
>
> Assembler syntax:
>
>     smac rs1, reg_imm, rd
>
>
>
> Operation:
>
>     prod[31:0] = rs1[15:0] * reg_imm[15:0]
>
>     result[39:0] = (Y[7:0] & %asr18[31:0]) + prod[31:0]
>
>     (Y[7:0] & %asr18[31:0]) = result[39:0]
>
>     rd = result[31:0]
>
>
>
> %asr18 can be read and written using the rdasr and wrasr instructions.
>
> ==>
>
>
> I have the following in SparcInstrInfo to define the lowering rules for
> this instruction, but I feel that this isn’t likely to work as I need to
> somehow tie together the fact that %Y, %ASR18 and %rd are all related to
> each other in the output.
>
>
>
> let Predicates = [HasLeon3, HasLeon4], Defs = [Y, ASR18], Uses = [Y,
> ASR18] in
>
> def SMACrr :  F3_1<3, 0b111110,
>
>                 (outs IntRegs:$rd), (ins IntRegs:$rs1, IntRegs:$rs2,
> ASRRegs:$asr18),
>
>                  "smac $rs1, $rs2, $rd",
>
>                  [(set i32:$rd,
>
>                      (add i32:$asr18, (mul i32:$rs1, i32:$rs2)))] >;
>
>
>
> Perhaps a well-chosen “let Constraints=” might be used here? If so, I’m
> not sure I know what to put in there. If not, can anyone help me how I
> might define the lowering rules for this instruction please?
>
>
>
> Chris Dewhurst, University of Limerick.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150918/9a121efb/attachment.html>

Hal Finkel via llvm-dev

2015-Sep-18 15:46 UTC

head link

[llvm-dev] multiply-accumulate instruction

----- Original Message -----
> From: "James Y Knight via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "Chris.Dewhurst" <Chris.Dewhurst at lero.ie>
> Cc: llvm-dev at lists.llvm.org
> Sent: Friday, September 18, 2015 10:39:20 AM
> Subject: Re: [llvm-dev] multiply-accumulate instruction
> Do you only want to define assembler syntax for this, or do you need
> to be able to be able to automatically emit it from some higher
> level construct? I'd expect the former would be entirely sufficient,
> in which case this should be sufficient:
> let Predicates = [HasLeon3, HasLeon4], Defs = [Y, ASR18], Uses = [Y,
> ASR18] in
> def SMACrr : F3_1<3, 0b111110,
> (outs IntRegs:$rd), (ins IntRegs:$rs1, IntRegs:$rs2),
> "smac $rs1, $rs2, $rd",
> [ ]>;
> If you want the latter, I'm not sure how you'd go about being able
to
> pattern-match it, because of the unusual 40 bit accumulate input and
> output, and the unusual for sparc 16-bit inputs. Hopefully you don't
> really need that. :)To do that, you'll likely need a target-specific IR-level pass that runs in
the backend to recognize the desired pattern and transform it into a loop using
some target-specific intrinsics.

-Hal 
> On Fri, Sep 18, 2015 at 10:19 AM, Chris.Dewhurst via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:
> > I’m trying to define a multiply-accumulate instruction for the LEON
> > processor, a Subtarget of the Sparc target.
> 
> > The documentation for the processor is as follows:
> 
> > ==> 
> > To accelerate DSP algorithms, two multiply&accumulate instructions
> > are implemented: UMAC and SMAC. The UMAC performs an unsigned
> > 16-bit
> > multiply, producing a 32-bit result, and adds the result to a
> > 40-bit
> > accumulator made up by the 8 lsb bits from the %y register and the
> > %asr18 register. The least significant 32 bits are also written to
> > the destination register. SMAC works similarly but performs signed
> > multiply and accumulate. The MAC instructions execute in one clock
> > but have two clocks latency, meaning that one pipeline stall cycle
> > will be inserted if the following instruction uses the destination
> > register of the MAC as a source operand.
> 
> > Assembler syntax:
> 
> > smac rs1, reg_imm, rd
> 
> > Operation:
> 
> > prod[31:0] = rs1[15:0] * reg_imm[15:0]
> 
> > result[39:0] = (Y[7:0] & %asr18[31:0]) + prod[31:0]
> 
> > (Y[7:0] & %asr18[31:0]) = result[39:0]
> 
> > rd = result[31:0]
> 
> > %asr18 can be read and written using the rdasr and wrasr
> > instructions.
> 
> > ==> 
> > I have the following in SparcInstrInfo to define the lowering rules
> > for this instruction, but I feel that this isn’t likely to work as
> > I
> > need to somehow tie together the fact that %Y, %ASR18 and %rd are
> > all related to each other in the output.
> 
> > let Predicates = [HasLeon3, HasLeon4], Defs = [Y, ASR18], Uses >
> [Y,
> > ASR18] in
> 
> > def SMACrr : F3_1<3, 0b111110,
> 
> > (outs IntRegs:$rd), (ins IntRegs:$rs1, IntRegs:$rs2,
> > ASRRegs:$asr18),
> 
> > "smac $rs1, $rs2, $rd",
> 
> > [(set i32:$rd,
> 
> > (add i32:$asr18, (mul i32:$rs1, i32:$rs2)))] >;
> 
> > Perhaps a well-chosen “let Constraints=” might be used here? If so,
> > I’m not sure I know what to put in there. If not, can anyone help
> > me
> > how I might define the lowering rules for this instruction please?
> 
> > Chris Dewhurst, University of Limerick.
> 
> > _______________________________________________
> 
> > LLVM Developers mailing list
> 
> > llvm-dev at lists.llvm.org
> 
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150918/844a43dc/attachment.html>

Chris.Dewhurst via llvm-dev

2015-Sep-21 07:43 UTC

head link

[llvm-dev] multiply-accumulate instruction

I've been looking to see if there's a way to get the instruction below
(SMAC) emitted from a higher-level construct, but I'm starting to think this
is unrealistic.

To do so, I'd have to tie-in two other instructions: Firstly, clearing the
ASR18 and Y register somewhere near the start of the method, then copying out
the value of these registers somewhere near the end of the method, or wherever
the value needs to be used.

In addition, it would only make sense to use the construct inside a loop of some
form, otherwise, some variation on MUL would be better. That would either
require detecting the loop, or optimising further down the line to convert the
above construct *into* a simple MUL.

This now feels to me to be unrealistic and likely to be prone to bugs.

On that basis, I'm going to go with the simple "assembler-only
support" recommended below, unless anyone can recommend a simple way of
achieving the above (and direct me to a suitable reference). I can't find
anything sufficiently similar in any of the other processors supported by LLVM.

Thanks for the feedback
Chris Dewhurst
University of Limerick.
________________________________
From: James Y Knight [jyknight at google.com]
Sent: 18 September 2015 16:39
To: Chris.Dewhurst
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] multiply-accumulate instruction

Do you only want to define assembler syntax for this, or do you need to be able
to be able to automatically emit it from some higher level construct? I'd
expect the former would be entirely sufficient, in which case this should be
sufficient:
let Predicates = [HasLeon3, HasLeon4], Defs = [Y, ASR18], Uses = [Y, ASR18] in
def SMACrr :  F3_1<3, 0b111110,
                (outs IntRegs:$rd), (ins IntRegs:$rs1, IntRegs:$rs2),
                 "smac $rs1, $rs2, $rd",
                 []>;

If you want the latter, I'm not sure how you'd go about being able to
pattern-match it, because of the unusual 40 bit accumulate input and output, and
the unusual for sparc 16-bit inputs. Hopefully you don't really need that.
:)

On Fri, Sep 18, 2015 at 10:19 AM, Chris.Dewhurst via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
I’m trying to define a multiply-accumulate instruction for the LEON processor, a
Subtarget of the Sparc target.

The documentation for the processor is as follows:

==To accelerate DSP algorithms, two multiply&accumulate instructions are
implemented: UMAC and SMAC. The UMAC performs an unsigned 16-bit multiply,
producing a 32-bit result, and adds the result to a 40-bit accumulator made up
by the 8 lsb bits from the %y register and the %asr18 register. The least
significant 32 bits are also written to the destination register. SMAC works
similarly but performs signed multiply and accumulate. The MAC instructions
execute in one clock but have two clocks latency, meaning that one pipeline
stall cycle will be inserted if the following instruction uses the destination
register of the MAC as a source operand.

Assembler syntax:
    smac rs1, reg_imm, rd

Operation:
    prod[31:0] = rs1[15:0] * reg_imm[15:0]
    result[39:0] = (Y[7:0] & %asr18[31:0]) + prod[31:0]
    (Y[7:0] & %asr18[31:0]) = result[39:0]
    rd = result[31:0]

%asr18 can be read and written using the rdasr and wrasr instructions.
==
I have the following in SparcInstrInfo to define the lowering rules for this
instruction, but I feel that this isn’t likely to work as I need to somehow tie
together the fact that %Y, %ASR18 and %rd are all related to each other in the
output.

let Predicates = [HasLeon3, HasLeon4], Defs = [Y, ASR18], Uses = [Y, ASR18] in
def SMACrr :  F3_1<3, 0b111110,
                (outs IntRegs:$rd), (ins IntRegs:$rs1, IntRegs:$rs2,
ASRRegs:$asr18),
                 "smac $rs1, $rs2, $rd",
                 [(set i32:$rd,
                     (add i32:$asr18, (mul i32:$rs1, i32:$rs2)))] >;

Perhaps a well-chosen “let Constraints=” might be used here? If so, I’m not sure
I know what to put in there. If not, can anyone help me how I might define the
lowering rules for this instruction please?

Chris Dewhurst, University of Limerick.

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150921/ba08ff52/attachment.html>

llvm dev - Sep 2015 - multiply-accumulate instruction

[llvm-dev] multiply-accumulate instruction

[llvm-dev] multiply-accumulate instruction

[llvm-dev] multiply-accumulate instruction

[llvm-dev] multiply-accumulate instruction

[llvm-dev] multiply-accumulate instruction

[llvm-dev] multiply-accumulate instruction