similar to: RFC: Adding Support For Vectorcall Calling Convention

Displaying 20 results from an estimated 1000 matches similar to: "RFC: Adding Support For Vectorcall Calling Convention"

2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=
2014 Dec 15
2
[LLVMdev] ABI incompatability when passing vector parameters on 32-bit x86
Hi all, Recently, Reid Kleckner found an ABI incompatibility between clang and GCC in the way vector parameters are passed on 32-bit x86. (This is documented in PR21510.) Specifically, GCC uses XMM0-XMM2 to pass the first 3 __m128 parameters, and the rest are passed on the stack. Clang passes an additional parameter by register, using XMM0-XMM3. The same applies to __m256 with YMM0-2 vs. YMM0-3.
2018 Dec 25
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote: > > On 2018/12/25 ??1:41, Michael S. Tsirkin wrote: > > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote: > > > On 2018/12/14 ??9:20, Michael S. Tsirkin wrote: > > > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote: > > > > > On 2018/12/13 ??10:31, Michael S. Tsirkin
2018 Dec 25
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote: > > On 2018/12/25 ??1:41, Michael S. Tsirkin wrote: > > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote: > > > On 2018/12/14 ??9:20, Michael S. Tsirkin wrote: > > > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote: > > > > > On 2018/12/13 ??10:31, Michael S. Tsirkin
2018 Dec 26
1
[PATCH net V2 4/4] vhost: log dirty page correctly
On Wed, Dec 26, 2018 at 01:43:26PM +0800, Jason Wang wrote: > > On 2018/12/26 ??12:25, Michael S. Tsirkin wrote: > > On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote: > > > On 2018/12/25 ??1:41, Michael S. Tsirkin wrote: > > > > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote: > > > > > On 2018/12/14 ??9:20, Michael S. Tsirkin
2018 Dec 24
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote: > > On 2018/12/14 ??9:20, Michael S. Tsirkin wrote: > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote: > > > On 2018/12/13 ??10:31, Michael S. Tsirkin wrote: > > > > > Just to make sure I understand this. It looks to me we should: > > > > > > > > > > - allow
2018 Dec 24
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote: > > On 2018/12/14 ??9:20, Michael S. Tsirkin wrote: > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote: > > > On 2018/12/13 ??10:31, Michael S. Tsirkin wrote: > > > > > Just to make sure I understand this. It looks to me we should: > > > > > > > > > > - allow
2018 Dec 14
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote: > > On 2018/12/13 ??10:31, Michael S. Tsirkin wrote: > > > Just to make sure I understand this. It looks to me we should: > > > > > > - allow passing GIOVA->GPA through UAPI > > > > > > - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for > > >
2018 Dec 14
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote: > > On 2018/12/13 ??10:31, Michael S. Tsirkin wrote: > > > Just to make sure I understand this. It looks to me we should: > > > > > > - allow passing GIOVA->GPA through UAPI > > > > > > - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for > > >
2018 Dec 13
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Thu, Dec 13, 2018 at 10:39:41AM +0800, Jason Wang wrote: > > On 2018/12/12 ??10:32, Michael S. Tsirkin wrote: > > On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote: > > > Vhost dirty page logging API is designed to sync through GPA. But we > > > try to log GIOVA when device IOTLB is enabled. This is wrong and may > > > lead to missing data after
2018 Dec 13
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Thu, Dec 13, 2018 at 10:39:41AM +0800, Jason Wang wrote: > > On 2018/12/12 ??10:32, Michael S. Tsirkin wrote: > > On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote: > > > Vhost dirty page logging API is designed to sync through GPA. But we > > > try to log GIOVA when device IOTLB is enabled. This is wrong and may > > > lead to missing data after
2018 Jun 29
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Illustrative Example: clang -fveclib=SVML -O3 svml.c -mavx #include <math.h> void foo(double *a, int N){ int i; #pragma clang loop vectorize_width(8) for (i=0;i<N;i++){ a[i] = sin(i); } } Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer. This is 8-element SVML sin() called with 8-element argument. On the surface,
2018 Dec 12
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote: > Vhost dirty page logging API is designed to sync through GPA. But we > try to log GIOVA when device IOTLB is enabled. This is wrong and may > lead to missing data after migration. > > To solve this issue, when logging with device IOTLB enabled, we will: > > 1) reuse the device IOTLB translation result of
2018 Dec 12
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote: > Vhost dirty page logging API is designed to sync through GPA. But we > try to log GIOVA when device IOTLB is enabled. This is wrong and may > lead to missing data after migration. > > To solve this issue, when logging with device IOTLB enabled, we will: > > 1) reuse the device IOTLB translation result of
2018 Jun 29
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Ashutosh, Thanks for the repy. Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there. https://reviews.llvm.org/D19544 There, I see Hal's review comment "let's start only with the directly-legal calls". Apparently, what we have right now in the trunk is "not legal enough". I'll work on the patch to stop
2018 Jul 02
8
[RFC][VECLIB] how should we legalize VECLIB calls?
On 07/02/2018 04:33 PM, Saito, Hideki wrote: > >   > > >It may not be a full solution for the problems you're trying to solve > >   > > If we are inventing a new solution, I’d like it also to solve OpenMP > declare simd legalization issue. If a small extension of existing scheme > > works for mathlib only, I’m happy to take that and discuss OpenMP >
2018 Jul 02
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Adding to Ashutosh's comments, We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22). reference: https://sourceware.org/glibc/wiki/libmvec Using the example case given in the reference, we found there are 2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin
2017 Jun 24
4
AVX Scheduling and Parallelism
Hello, After generating AVX code for large no of iterations i came to realize that it still uses only 2 registers zmm0 and zmm1 when the loop urnroll factor=1024, i wonder if this register allocation allows operations in parallel? Also i know all the elements within a single vector instruction are computed in parallel but does the elements of multiple instructions computed in parallel? like are
2017 Jun 25
2
AVX Scheduling and Parallelism
Hi Ahmed, >From what can be seen in the code snippet you provided, the reuse of XMM0 and XMM1 across loop-unroll instances does not inhibit instruction-level parallelism. Modern X86 processors use register renaming that can eliminate the dependencies in the instruction stream. In the example you provided, the processor should be able to identify the 2-vloads + vadd + vstore sequences as
2017 Jun 25
0
AVX Scheduling and Parallelism
Hi, Zvi, I agree. In the context of targeting the KNL, however, I'm a bit concerned about the addressing, and specifically, the size of the resulting encoding: > vmovdqu32 zmm0, zmmword ptr [rax + c+401280] ;load b[401280] in > zmm0 > > vpaddd zmm1, zmm1, zmmword ptr [rax + b+401344] > ; zmm1<-zmm1+b[401344] The KNL can only