Xing Su via llvm-dev
2016-May-13 00:13 UTC
[llvm-dev] A question about AArch64 Cortex-A57 subtarget definition
Hello everybody, I'm reading the .td files defining the Cortex-A57 processor, which is a subtarget of AArch64 target, and there is something confusing me in the `AArch64SchedA57.td` file. In the top of `AArch64SchedA57.td`, various processor resource are defined, as follows ``` def A57UnitB : ProcResource<1>; // Type B micro-ops def A57UnitI : ProcResource<2>; // Type I micro-ops def A57UnitM : ProcResource<1>; // Type M micro-ops def A57UnitL : ProcResource<1>; // Type L micro-ops def A57UnitS : ProcResource<1>; // Type S micro-ops def A57UnitX : ProcResource<1>; // Type X micro-ops def A57UnitW : ProcResource<1>; // Type W micro-ops let SchedModel = CortexA57Model in { def A57UnitV : ProcResGroup<[A57UnitX, A57UnitW]>; // Type V micro-ops } ``` According the Cortex-A57 software optimization manual, Cortex-A57 has 8 function units in the backend, - Branch(B) - Integer 0(I0) - Integer 1(I1) - Integer Muti-Cycle(M) - Load(L) - Store(S) - FP/ASIMD 0(F0) - FP/ASIMD 1(F1) So I think `A57UnitW` and `A57UnitX` should be the TableGen records defining pipeline F0 and F1, respectively. So `A57UnitW` and `A57UnitX` together compose a `ProcResGroup`, `A57UnitV`, which can execute a 128bit ASIMD floating point operation, such as FMLA(Q-form), in a single clock cycle. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ But in line 479-483 of `AArch64SchedA57.td`, as shown below ``` def A57WriteFPVMAD : SchedWriteRes<[A57UnitV]> { let Latency = 9; } def A57WriteFPVMAQ : SchedWriteRes<[A57UnitV, A57UnitV]> { let Latency = 10; } def A57ReadFPVMA5 : SchedReadAdvance<5, [A57WriteFPVMAD, A57WriteFPVMAQ]>; def : InstRW<[A57WriteFPVMAD, A57ReadFPVMA5], (instregex "^FML[AS](v2f32|v1i32|v2i32|v1i64)")>; def : InstRW<[A57WriteFPVMAQ, A57ReadFPVMA5], (instregex "^FML[AS](v4f32|v2f64|v4i32|v2i64)")>; ``` In this code, an 128bit ASIMD FP multiply accumulate(FMLA/FMLS Q-form) requires two `A57UnitV`s, meaning that two clock cycles are needed. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There must be something wrong with my understanding, anyone could help me figure out the problem? thanks a lot! Xing -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160513/6d513e01/attachment-0001.html>
James Molloy via llvm-dev
2016-May-13 07:36 UTC
[llvm-dev] A question about AArch64 Cortex-A57 subtarget definition
Hi Xing, Most of what you said was correct, up until the end! :> In this code, an 128bit ASIMD FP multiply accumulate(FMLA/FMLS Q-form)requires two `A57UnitV`s, meaning that two clock cycles are needed. The ProcResGroup is an "OR" relationship, not an "AND". It says that a V op can go to EITHER the W or X pipes, not both. So a 128-bit FP op is modelled as having two V ops, which could either be [W, X] (simultaneously), [W, W] (requiring two cycles), or [X, X] (requiring two cycles). Cheers, James On Fri, 13 May 2016 at 01:13 Xing Su via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Hello everybody, > > > I'm reading the .td files defining the Cortex-A57 processor, > > which is a subtarget of AArch64 target, and there is something > > confusing me in the `AArch64SchedA57.td` file. > > > In the top of `AArch64SchedA57.td`, various processor resource are > > defined, as follows > > > ``` > > def A57UnitB : ProcResource<1>; // Type B micro-ops > > def A57UnitI : ProcResource<2>; // Type I micro-ops > > def A57UnitM : ProcResource<1>; // Type M micro-ops > > def A57UnitL : ProcResource<1>; // Type L micro-ops > > def A57UnitS : ProcResource<1>; // Type S micro-ops > > def A57UnitX : ProcResource<1>; // Type X micro-ops > > def A57UnitW : ProcResource<1>; // Type W micro-ops > > let SchedModel = CortexA57Model in { > > def A57UnitV : ProcResGroup<[A57UnitX, A57UnitW]>; // Type V micro-ops > > } > > ``` > > > According the Cortex-A57 software optimization manual, Cortex-A57 has 8 > > function units in the backend, > > > - Branch(B) > > - Integer 0(I0) > > - Integer 1(I1) > > - Integer Muti-Cycle(M) > > - Load(L) > > - Store(S) > > - FP/ASIMD 0(F0) > > - FP/ASIMD 1(F1) > > > So I think `A57UnitW` and `A57UnitX` should be the TableGen records > > defining pipeline F0 and F1, respectively. So `A57UnitW` and `A57UnitX` > > together compose a `ProcResGroup`, `A57UnitV`, > > which can execute a 128bit ASIMD floating point operation, > > such as FMLA(Q-form), in a single clock cycle. > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > But in line 479-483 of `AArch64SchedA57.td`, as shown below > > > ``` > > def A57WriteFPVMAD : SchedWriteRes<[A57UnitV]> { let Latency = 9; } > > def A57WriteFPVMAQ : SchedWriteRes<[A57UnitV, A57UnitV]> { let Latency > 10; } > > def A57ReadFPVMA5 : SchedReadAdvance<5, [A57WriteFPVMAD, A57WriteFPVMAQ]>; > > def : InstRW<[A57WriteFPVMAD, A57ReadFPVMA5], (instregex > "^FML[AS](v2f32|v1i32|v2i32|v1i64)")>; > > def : InstRW<[A57WriteFPVMAQ, A57ReadFPVMA5], (instregex > "^FML[AS](v4f32|v2f64|v4i32|v2i64)")>; > > ``` > > > In this code, an 128bit ASIMD FP multiply accumulate(FMLA/FMLS Q-form) > requires > > two `A57UnitV`s, meaning that two clock cycles are needed. > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > There must be something wrong with my understanding, anyone could help me > > figure out the problem? thanks a lot! > > > > > Xing > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160513/30eedd4c/attachment.html>
Xing Su via llvm-dev
2016-May-13 10:16 UTC
[llvm-dev] A question about AArch64 Cortex-A57 subtarget definition
ok,got it!thanks! 发自我的 iPhone 在 2016年5月13日,15:37,James Molloy <james at jamesmolloy.co.uk<mailto:james at jamesmolloy.co.uk>> 写道: Hi Xing, Most of what you said was correct, up until the end! :> In this code, an 128bit ASIMD FP multiply accumulate(FMLA/FMLS Q-form) requires two `A57UnitV`s, meaning that two clock cycles are needed.The ProcResGroup is an "OR" relationship, not an "AND". It says that a V op can go to EITHER the W or X pipes, not both. So a 128-bit FP op is modelled as having two V ops, which could either be [W, X] (simultaneously), [W, W] (requiring two cycles), or [X, X] (requiring two cycles). Cheers, James On Fri, 13 May 2016 at 01:13 Xing Su via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hello everybody, I'm reading the .td files defining the Cortex-A57 processor, which is a subtarget of AArch64 target, and there is something confusing me in the `AArch64SchedA57.td` file. In the top of `AArch64SchedA57.td`, various processor resource are defined, as follows ``` def A57UnitB : ProcResource<1>; // Type B micro-ops def A57UnitI : ProcResource<2>; // Type I micro-ops def A57UnitM : ProcResource<1>; // Type M micro-ops def A57UnitL : ProcResource<1>; // Type L micro-ops def A57UnitS : ProcResource<1>; // Type S micro-ops def A57UnitX : ProcResource<1>; // Type X micro-ops def A57UnitW : ProcResource<1>; // Type W micro-ops let SchedModel = CortexA57Model in { def A57UnitV : ProcResGroup<[A57UnitX, A57UnitW]>; // Type V micro-ops } ``` According the Cortex-A57 software optimization manual, Cortex-A57 has 8 function units in the backend, - Branch(B) - Integer 0(I0) - Integer 1(I1) - Integer Muti-Cycle(M) - Load(L) - Store(S) - FP/ASIMD 0(F0) - FP/ASIMD 1(F1) So I think `A57UnitW` and `A57UnitX` should be the TableGen records defining pipeline F0 and F1, respectively. So `A57UnitW` and `A57UnitX` together compose a `ProcResGroup`, `A57UnitV`, which can execute a 128bit ASIMD floating point operation, such as FMLA(Q-form), in a single clock cycle. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ But in line 479-483 of `AArch64SchedA57.td`, as shown below ``` def A57WriteFPVMAD : SchedWriteRes<[A57UnitV]> { let Latency = 9; } def A57WriteFPVMAQ : SchedWriteRes<[A57UnitV, A57UnitV]> { let Latency = 10; } def A57ReadFPVMA5 : SchedReadAdvance<5, [A57WriteFPVMAD, A57WriteFPVMAQ]>; def : InstRW<[A57WriteFPVMAD, A57ReadFPVMA5], (instregex "^FML[AS](v2f32|v1i32|v2i32|v1i64)")>; def : InstRW<[A57WriteFPVMAQ, A57ReadFPVMA5], (instregex "^FML[AS](v4f32|v2f64|v4i32|v2i64)")>; ``` In this code, an 128bit ASIMD FP multiply accumulate(FMLA/FMLS Q-form) requires two `A57UnitV`s, meaning that two clock cycles are needed. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There must be something wrong with my understanding, anyone could help me figure out the problem? thanks a lot! Xing _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160513/14d37d7c/attachment.html>