thr3ads.net - similar to: "RFC: Adding Support For Vectorcall Calling Convention"

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

2

KNL Assembly Code for Matrix Multiplication

Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=

[LLVMdev] ABI incompatability when passing vector parameters on 32-bit x86

2014 Dec 15

2

[LLVMdev] ABI incompatability when passing vector parameters on 32-bit x86

Hi all, Recently, Reid Kleckner found an ABI incompatibility between clang and GCC in the way vector parameters are passed on 32-bit x86. (This is documented in PR21510.) Specifically, GCC uses XMM0-XMM2 to pass the first 3 __m128 parameters, and the rest are passed on the stack. Clang passes an additional parameter by register, using XMM0-XMM3. The same applies to __m256 with YMM0-2 vs. YMM0-3.

[PATCH net V2 4/4] vhost: log dirty page correctly

2018 Dec 25

2

[PATCH net V2 4/4] vhost: log dirty page correctly

On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote: > > On 2018/12/25 ??1:41, Michael S. Tsirkin wrote: > > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote: > > > On 2018/12/14 ??9:20, Michael S. Tsirkin wrote: > > > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote: > > > > > On 2018/12/13 ??10:31, Michael S. Tsirkin

[PATCH net V2 4/4] vhost: log dirty page correctly

2018 Dec 25

2

[PATCH net V2 4/4] vhost: log dirty page correctly

On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote: > > On 2018/12/25 ??1:41, Michael S. Tsirkin wrote: > > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote: > > > On 2018/12/14 ??9:20, Michael S. Tsirkin wrote: > > > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote: > > > > > On 2018/12/13 ??10:31, Michael S. Tsirkin

[PATCH net V2 4/4] vhost: log dirty page correctly

2018 Dec 26

1

[PATCH net V2 4/4] vhost: log dirty page correctly

On Wed, Dec 26, 2018 at 01:43:26PM +0800, Jason Wang wrote: > > On 2018/12/26 ??12:25, Michael S. Tsirkin wrote: > > On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote: > > > On 2018/12/25 ??1:41, Michael S. Tsirkin wrote: > > > > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote: > > > > > On 2018/12/14 ??9:20, Michael S. Tsirkin

[PATCH net V2 4/4] vhost: log dirty page correctly

2018 Dec 24

2

[PATCH net V2 4/4] vhost: log dirty page correctly

On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote: > > On 2018/12/14 ??9:20, Michael S. Tsirkin wrote: > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote: > > > On 2018/12/13 ??10:31, Michael S. Tsirkin wrote: > > > > > Just to make sure I understand this. It looks to me we should: > > > > > > > > > > - allow

[PATCH net V2 4/4] vhost: log dirty page correctly

2018 Dec 24

2

[PATCH net V2 4/4] vhost: log dirty page correctly

On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote: > > On 2018/12/14 ??9:20, Michael S. Tsirkin wrote: > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote: > > > On 2018/12/13 ??10:31, Michael S. Tsirkin wrote: > > > > > Just to make sure I understand this. It looks to me we should: > > > > > > > > > > - allow

[PATCH net V2 4/4] vhost: log dirty page correctly

2018 Dec 14

2

[PATCH net V2 4/4] vhost: log dirty page correctly

On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote: > > On 2018/12/13 ??10:31, Michael S. Tsirkin wrote: > > > Just to make sure I understand this. It looks to me we should: > > > > > > - allow passing GIOVA->GPA through UAPI > > > > > > - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for > > >

[PATCH net V2 4/4] vhost: log dirty page correctly

2018 Dec 14

2

[PATCH net V2 4/4] vhost: log dirty page correctly

On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote: > > On 2018/12/13 ??10:31, Michael S. Tsirkin wrote: > > > Just to make sure I understand this. It looks to me we should: > > > > > > - allow passing GIOVA->GPA through UAPI > > > > > > - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for > > >

[PATCH net V2 4/4] vhost: log dirty page correctly

2018 Dec 13

2

[PATCH net V2 4/4] vhost: log dirty page correctly

On Thu, Dec 13, 2018 at 10:39:41AM +0800, Jason Wang wrote: > > On 2018/12/12 ??10:32, Michael S. Tsirkin wrote: > > On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote: > > > Vhost dirty page logging API is designed to sync through GPA. But we > > > try to log GIOVA when device IOTLB is enabled. This is wrong and may > > > lead to missing data after

[PATCH net V2 4/4] vhost: log dirty page correctly

2018 Dec 13

2

[PATCH net V2 4/4] vhost: log dirty page correctly

On Thu, Dec 13, 2018 at 10:39:41AM +0800, Jason Wang wrote: > > On 2018/12/12 ??10:32, Michael S. Tsirkin wrote: > > On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote: > > > Vhost dirty page logging API is designed to sync through GPA. But we > > > try to log GIOVA when device IOTLB is enabled. This is wrong and may > > > lead to missing data after

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jun 29

2

[RFC][VECLIB] how should we legalize VECLIB calls?

Illustrative Example: clang -fveclib=SVML -O3 svml.c -mavx #include <math.h> void foo(double *a, int N){ int i; #pragma clang loop vectorize_width(8) for (i=0;i<N;i++){ a[i] = sin(i); } } Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer. This is 8-element SVML sin() called with 8-element argument. On the surface,

[PATCH net V2 4/4] vhost: log dirty page correctly

2018 Dec 12

2

[PATCH net V2 4/4] vhost: log dirty page correctly

On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote: > Vhost dirty page logging API is designed to sync through GPA. But we > try to log GIOVA when device IOTLB is enabled. This is wrong and may > lead to missing data after migration. > > To solve this issue, when logging with device IOTLB enabled, we will: > > 1) reuse the device IOTLB translation result of

[PATCH net V2 4/4] vhost: log dirty page correctly

2018 Dec 12

2

[PATCH net V2 4/4] vhost: log dirty page correctly

On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote: > Vhost dirty page logging API is designed to sync through GPA. But we > try to log GIOVA when device IOTLB is enabled. This is wrong and may > lead to missing data after migration. > > To solve this issue, when logging with device IOTLB enabled, we will: > > 1) reuse the device IOTLB translation result of

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jun 29

2

[RFC][VECLIB] how should we legalize VECLIB calls?

Ashutosh, Thanks for the repy. Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there. https://reviews.llvm.org/D19544 There, I see Hal's review comment "let's start only with the directly-legal calls". Apparently, what we have right now in the trunk is "not legal enough". I'll work on the patch to stop

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jul 02

8

[RFC][VECLIB] how should we legalize VECLIB calls?

On 07/02/2018 04:33 PM, Saito, Hideki wrote: > > > > >It may not be a full solution for the problems you're trying to solve > > > > If we are inventing a new solution, I’d like it also to solve OpenMP > declare simd legalization issue. If a small extension of existing scheme > > works for mathlib only, I’m happy to take that and discuss OpenMP >

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jul 02

2

[RFC][VECLIB] how should we legalize VECLIB calls?

Adding to Ashutosh's comments, We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22). reference: https://sourceware.org/glibc/wiki/libmvec Using the example case given in the reference, we found there are 2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin

AVX Scheduling and Parallelism

2017 Jun 24

4

AVX Scheduling and Parallelism

Hello, After generating AVX code for large no of iterations i came to realize that it still uses only 2 registers zmm0 and zmm1 when the loop urnroll factor=1024, i wonder if this register allocation allows operations in parallel? Also i know all the elements within a single vector instruction are computed in parallel but does the elements of multiple instructions computed in parallel? like are

AVX Scheduling and Parallelism

2017 Jun 25

2

AVX Scheduling and Parallelism

Hi Ahmed, >From what can be seen in the code snippet you provided, the reuse of XMM0 and XMM1 across loop-unroll instances does not inhibit instruction-level parallelism. Modern X86 processors use register renaming that can eliminate the dependencies in the instruction stream. In the example you provided, the processor should be able to identify the 2-vloads + vadd + vstore sequences as

AVX Scheduling and Parallelism

2017 Jun 25

0

AVX Scheduling and Parallelism

Hi, Zvi, I agree. In the context of targeting the KNL, however, I'm a bit concerned about the addressing, and specifically, the size of the resulting encoding: > vmovdqu32 zmm0, zmmword ptr [rax + c+401280] ;load b[401280] in > zmm0 > > vpaddd zmm1, zmm1, zmmword ptr [rax + b+401344] > ; zmm1<-zmm1+b[401344] The KNL can only

similar to: RFC: Adding Support For Vectorcall Calling Convention