thr3ads.net - llvm dev - [llvm-dev] A question about AArch64 Cortex-A57 subtarget definition [May 2016]

If this information is useful, please help other people find it:
Share via:

Xing Su via llvm-dev

2016-May-13 00:13 UTC

[llvm-dev] A question about AArch64 Cortex-A57 subtarget definition

Hello everybody,


I'm reading the .td files defining the Cortex-A57 processor,

which is a subtarget of AArch64 target, and there is something

confusing me in the `AArch64SchedA57.td` file.


In the top of `AArch64SchedA57.td`, various processor resource are

defined, as follows


```

def A57UnitB : ProcResource<1>;  // Type B micro-ops

def A57UnitI : ProcResource<2>;  // Type I micro-ops

def A57UnitM : ProcResource<1>;  // Type M micro-ops

def A57UnitL : ProcResource<1>;  // Type L micro-ops

def A57UnitS : ProcResource<1>;  // Type S micro-ops

def A57UnitX : ProcResource<1>;  // Type X micro-ops

def A57UnitW : ProcResource<1>;  // Type W micro-ops

let SchedModel = CortexA57Model in {

  def A57UnitV : ProcResGroup<[A57UnitX, A57UnitW]>;    // Type V
micro-ops

}

```


According the Cortex-A57 software optimization manual, Cortex-A57 has 8

function units in the backend,


- Branch(B)

- Integer 0(I0)

- Integer 1(I1)

- Integer Muti-Cycle(M)

- Load(L)

- Store(S)

- FP/ASIMD 0(F0)

- FP/ASIMD 1(F1)


So I think `A57UnitW` and `A57UnitX` should be the TableGen records

defining pipeline F0 and F1, respectively. So `A57UnitW` and `A57UnitX`

together compose a `ProcResGroup`, `A57UnitV`,

which can execute a 128bit ASIMD floating point operation,

such as FMLA(Q-form), in a single clock cycle.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


But in line 479-483 of `AArch64SchedA57.td`, as shown below


```

def A57WriteFPVMAD : SchedWriteRes<[A57UnitV]> { let Latency = 9;  }

def A57WriteFPVMAQ : SchedWriteRes<[A57UnitV, A57UnitV]> { let Latency =
10;  }

def A57ReadFPVMA5  : SchedReadAdvance<5, [A57WriteFPVMAD,
A57WriteFPVMAQ]>;

def : InstRW<[A57WriteFPVMAD, A57ReadFPVMA5], (instregex
"^FML[AS](v2f32|v1i32|v2i32|v1i64)")>;

def : InstRW<[A57WriteFPVMAQ, A57ReadFPVMA5], (instregex
"^FML[AS](v4f32|v2f64|v4i32|v2i64)")>;

```


In this code, an 128bit ASIMD FP multiply accumulate(FMLA/FMLS Q-form) requires

two `A57UnitV`s, meaning that two clock cycles are needed.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


There must be something wrong with my understanding, anyone could help me

figure out the problem? thanks a lot!




Xing

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160513/6d513e01/attachment-0001.html>

James Molloy via llvm-dev

2016-May-13 07:36 UTC

head link

[llvm-dev] A question about AArch64 Cortex-A57 subtarget definition

Hi Xing,

Most of what you said was correct, up until the end! :
> In this code, an 128bit ASIMD FP multiply accumulate(FMLA/FMLS Q-form)requires two `A57UnitV`s, meaning that two clock cycles are needed.

The ProcResGroup is an "OR" relationship, not an "AND". It
says that a V op
can go to EITHER the W or X pipes, not both. So a 128-bit FP op is modelled
as having two V ops, which could either be [W, X] (simultaneously), [W, W]
(requiring two cycles), or [X, X] (requiring two cycles).


Cheers,

James

On Fri, 13 May 2016 at 01:13 Xing Su via llvm-dev <llvm-dev at
lists.llvm.org>
wrote:
> Hello everybody,
>
>
> I'm reading the .td files defining the Cortex-A57 processor,
>
> which is a subtarget of AArch64 target, and there is something
>
> confusing me in the `AArch64SchedA57.td` file.
>
>
> In the top of `AArch64SchedA57.td`, various processor resource are
>
> defined, as follows
>
>
> ```
>
> def A57UnitB : ProcResource<1>;  // Type B micro-ops
>
> def A57UnitI : ProcResource<2>;  // Type I micro-ops
>
> def A57UnitM : ProcResource<1>;  // Type M micro-ops
>
> def A57UnitL : ProcResource<1>;  // Type L micro-ops
>
> def A57UnitS : ProcResource<1>;  // Type S micro-ops
>
> def A57UnitX : ProcResource<1>;  // Type X micro-ops
>
> def A57UnitW : ProcResource<1>;  // Type W micro-ops
>
> let SchedModel = CortexA57Model in {
>
>   def A57UnitV : ProcResGroup<[A57UnitX, A57UnitW]>;    // Type V
micro-ops
>
> }
>
> ```
>
>
> According the Cortex-A57 software optimization manual, Cortex-A57 has 8
>
> function units in the backend,
>
>
> - Branch(B)
>
> - Integer 0(I0)
>
> - Integer 1(I1)
>
> - Integer Muti-Cycle(M)
>
> - Load(L)
>
> - Store(S)
>
> - FP/ASIMD 0(F0)
>
> - FP/ASIMD 1(F1)
>
>
> So I think `A57UnitW` and `A57UnitX` should be the TableGen records
>
> defining pipeline F0 and F1, respectively. So `A57UnitW` and `A57UnitX`
>
> together compose a `ProcResGroup`, `A57UnitV`,
>
> which can execute a 128bit ASIMD floating point operation,
>
> such as FMLA(Q-form), in a single clock cycle.
>
>
>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
>
> But in line 479-483 of `AArch64SchedA57.td`, as shown below
>
>
> ```
>
> def A57WriteFPVMAD : SchedWriteRes<[A57UnitV]> { let Latency = 9;  }
>
> def A57WriteFPVMAQ : SchedWriteRes<[A57UnitV, A57UnitV]> { let
Latency > 10;  }
>
> def A57ReadFPVMA5  : SchedReadAdvance<5, [A57WriteFPVMAD,
A57WriteFPVMAQ]>;
>
> def : InstRW<[A57WriteFPVMAD, A57ReadFPVMA5], (instregex
> "^FML[AS](v2f32|v1i32|v2i32|v1i64)")>;
>
> def : InstRW<[A57WriteFPVMAQ, A57ReadFPVMA5], (instregex
> "^FML[AS](v4f32|v2f64|v4i32|v2i64)")>;
>
> ```
>
>
> In this code, an 128bit ASIMD FP multiply accumulate(FMLA/FMLS Q-form)
> requires
>
> two `A57UnitV`s, meaning that two clock cycles are needed.
>
>
>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
>
> There must be something wrong with my understanding, anyone could help me
>
> figure out the problem? thanks a lot!
>
>
>
>
> Xing
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160513/30eedd4c/attachment.html>

Xing Su via llvm-dev

2016-May-13 10:16 UTC

head link

[llvm-dev] A question about AArch64 Cortex-A57 subtarget definition

ok，got it！thanks！

发自我的 iPhone

在 2016年5月13日，15:37，James Molloy <james at jamesmolloy.co.uk<mailto:james
at jamesmolloy.co.uk>> 写道：

Hi Xing,

Most of what you said was correct, up until the end! :
> In this code, an 128bit ASIMD FP multiply accumulate(FMLA/FMLS Q-form)
requires two `A57UnitV`s, meaning that two clock cycles are needed.
The ProcResGroup is an "OR" relationship, not an "AND". It
says that a V op can go to EITHER the W or X pipes, not both. So a 128-bit FP op
is modelled as having two V ops, which could either be [W, X] (simultaneously),
[W, W] (requiring two cycles), or [X, X] (requiring two cycles).


Cheers,

James

On Fri, 13 May 2016 at 01:13 Xing Su via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hello everybody,


I'm reading the .td files defining the Cortex-A57 processor,

which is a subtarget of AArch64 target, and there is something

confusing me in the `AArch64SchedA57.td` file.


In the top of `AArch64SchedA57.td`, various processor resource are

defined, as follows


```

def A57UnitB : ProcResource<1>;  // Type B micro-ops

def A57UnitI : ProcResource<2>;  // Type I micro-ops

def A57UnitM : ProcResource<1>;  // Type M micro-ops

def A57UnitL : ProcResource<1>;  // Type L micro-ops

def A57UnitS : ProcResource<1>;  // Type S micro-ops

def A57UnitX : ProcResource<1>;  // Type X micro-ops

def A57UnitW : ProcResource<1>;  // Type W micro-ops

let SchedModel = CortexA57Model in {

  def A57UnitV : ProcResGroup<[A57UnitX, A57UnitW]>;    // Type V
micro-ops

}

```


According the Cortex-A57 software optimization manual, Cortex-A57 has 8

function units in the backend,


- Branch(B)

- Integer 0(I0)

- Integer 1(I1)

- Integer Muti-Cycle(M)

- Load(L)

- Store(S)

- FP/ASIMD 0(F0)

- FP/ASIMD 1(F1)


So I think `A57UnitW` and `A57UnitX` should be the TableGen records

defining pipeline F0 and F1, respectively. So `A57UnitW` and `A57UnitX`

together compose a `ProcResGroup`, `A57UnitV`,

which can execute a 128bit ASIMD floating point operation,

such as FMLA(Q-form), in a single clock cycle.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


But in line 479-483 of `AArch64SchedA57.td`, as shown below


```

def A57WriteFPVMAD : SchedWriteRes<[A57UnitV]> { let Latency = 9;  }

def A57WriteFPVMAQ : SchedWriteRes<[A57UnitV, A57UnitV]> { let Latency =
10;  }

def A57ReadFPVMA5  : SchedReadAdvance<5, [A57WriteFPVMAD,
A57WriteFPVMAQ]>;

def : InstRW<[A57WriteFPVMAD, A57ReadFPVMA5], (instregex
"^FML[AS](v2f32|v1i32|v2i32|v1i64)")>;

def : InstRW<[A57WriteFPVMAQ, A57ReadFPVMA5], (instregex
"^FML[AS](v4f32|v2f64|v4i32|v2i64)")>;

```


In this code, an 128bit ASIMD FP multiply accumulate(FMLA/FMLS Q-form) requires

two `A57UnitV`s, meaning that two clock cycles are needed.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


There must be something wrong with my understanding, anyone could help me

figure out the problem? thanks a lot!




Xing

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160513/14d37d7c/attachment.html>

llvm dev - May 2016 - A question about AArch64 Cortex-A57 subtarget definition

[llvm-dev] A question about AArch64 Cortex-A57 subtarget definition

[llvm-dev] A question about AArch64 Cortex-A57 subtarget definition

[llvm-dev] A question about AArch64 Cortex-A57 subtarget definition