Displaying 20 results from an estimated 1000 matches similar to: "RFC: Adding Support For Vectorcall Calling Convention"
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You,
It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 =
[8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are
indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from
these locations. and zmm2 contains constant 4000. so,
vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000,
as for array b the stride is 4000.
zmm14=
2014 Dec 15
2
[LLVMdev] ABI incompatability when passing vector parameters on 32-bit x86
Hi all,
Recently, Reid Kleckner found an ABI incompatibility between clang and GCC in the way vector parameters are passed on 32-bit x86.
(This is documented in PR21510.)
Specifically, GCC uses XMM0-XMM2 to pass the first 3 __m128 parameters, and the rest are passed on the stack. Clang passes an additional parameter by register, using XMM0-XMM3. The same applies to __m256 with YMM0-2 vs. YMM0-3.
2018 Dec 25
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote:
>
> On 2018/12/25 ??1:41, Michael S. Tsirkin wrote:
> > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
> > > On 2018/12/14 ??9:20, Michael S. Tsirkin wrote:
> > > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
> > > > > On 2018/12/13 ??10:31, Michael S. Tsirkin
2018 Dec 25
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote:
>
> On 2018/12/25 ??1:41, Michael S. Tsirkin wrote:
> > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
> > > On 2018/12/14 ??9:20, Michael S. Tsirkin wrote:
> > > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
> > > > > On 2018/12/13 ??10:31, Michael S. Tsirkin
2018 Dec 26
1
[PATCH net V2 4/4] vhost: log dirty page correctly
On Wed, Dec 26, 2018 at 01:43:26PM +0800, Jason Wang wrote:
>
> On 2018/12/26 ??12:25, Michael S. Tsirkin wrote:
> > On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote:
> > > On 2018/12/25 ??1:41, Michael S. Tsirkin wrote:
> > > > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
> > > > > On 2018/12/14 ??9:20, Michael S. Tsirkin
2018 Dec 24
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
>
> On 2018/12/14 ??9:20, Michael S. Tsirkin wrote:
> > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
> > > On 2018/12/13 ??10:31, Michael S. Tsirkin wrote:
> > > > > Just to make sure I understand this. It looks to me we should:
> > > > >
> > > > > - allow
2018 Dec 24
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
>
> On 2018/12/14 ??9:20, Michael S. Tsirkin wrote:
> > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
> > > On 2018/12/13 ??10:31, Michael S. Tsirkin wrote:
> > > > > Just to make sure I understand this. It looks to me we should:
> > > > >
> > > > > - allow
2018 Dec 14
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
>
> On 2018/12/13 ??10:31, Michael S. Tsirkin wrote:
> > > Just to make sure I understand this. It looks to me we should:
> > >
> > > - allow passing GIOVA->GPA through UAPI
> > >
> > > - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
> > >
2018 Dec 14
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
>
> On 2018/12/13 ??10:31, Michael S. Tsirkin wrote:
> > > Just to make sure I understand this. It looks to me we should:
> > >
> > > - allow passing GIOVA->GPA through UAPI
> > >
> > > - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
> > >
2018 Dec 13
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Thu, Dec 13, 2018 at 10:39:41AM +0800, Jason Wang wrote:
>
> On 2018/12/12 ??10:32, Michael S. Tsirkin wrote:
> > On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote:
> > > Vhost dirty page logging API is designed to sync through GPA. But we
> > > try to log GIOVA when device IOTLB is enabled. This is wrong and may
> > > lead to missing data after
2018 Dec 13
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Thu, Dec 13, 2018 at 10:39:41AM +0800, Jason Wang wrote:
>
> On 2018/12/12 ??10:32, Michael S. Tsirkin wrote:
> > On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote:
> > > Vhost dirty page logging API is designed to sync through GPA. But we
> > > try to log GIOVA when device IOTLB is enabled. This is wrong and may
> > > lead to missing data after
2018 Jun 29
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Illustrative Example:
clang -fveclib=SVML -O3 svml.c -mavx
#include <math.h>
void foo(double *a, int N){
int i;
#pragma clang loop vectorize_width(8)
for (i=0;i<N;i++){
a[i] = sin(i);
}
}
Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.
This is 8-element SVML sin() called with 8-element argument. On the surface,
2018 Dec 12
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote:
> Vhost dirty page logging API is designed to sync through GPA. But we
> try to log GIOVA when device IOTLB is enabled. This is wrong and may
> lead to missing data after migration.
>
> To solve this issue, when logging with device IOTLB enabled, we will:
>
> 1) reuse the device IOTLB translation result of
2018 Dec 12
2
[PATCH net V2 4/4] vhost: log dirty page correctly
On Wed, Dec 12, 2018 at 06:08:19PM +0800, Jason Wang wrote:
> Vhost dirty page logging API is designed to sync through GPA. But we
> try to log GIOVA when device IOTLB is enabled. This is wrong and may
> lead to missing data after migration.
>
> To solve this issue, when logging with device IOTLB enabled, we will:
>
> 1) reuse the device IOTLB translation result of
2018 Jun 29
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Ashutosh,
Thanks for the repy.
Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.
https://reviews.llvm.org/D19544
There, I see Hal's review comment "let's start only with the directly-legal calls". Apparently, what we have right now
in the trunk is "not legal enough". I'll work on the patch to stop
2018 Jul 02
8
[RFC][VECLIB] how should we legalize VECLIB calls?
On 07/02/2018 04:33 PM, Saito, Hideki wrote:
>
>
>
> >It may not be a full solution for the problems you're trying to solve
>
>
>
> If we are inventing a new solution, I’d like it also to solve OpenMP
> declare simd legalization issue. If a small extension of existing scheme
>
> works for mathlib only, I’m happy to take that and discuss OpenMP
>
2018 Jul 02
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Adding to Ashutosh's comments, We are also interested in making LLVM
generate vector math library calls that are available with glibc (version >
2.22).
reference: https://sourceware.org/glibc/wiki/libmvec
Using the example case given in the reference, we found there are 2 vector
versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx)
version and _ZGVdN4v_sin
2017 Jun 24
4
AVX Scheduling and Parallelism
Hello,
After generating AVX code for large no of iterations i came to realize that
it still uses only 2 registers zmm0 and zmm1 when the loop urnroll
factor=1024,
i wonder if this register allocation allows operations in parallel?
Also i know all the elements within a single vector instruction are
computed in parallel but does the elements of multiple instructions
computed in parallel? like are
2017 Jun 25
2
AVX Scheduling and Parallelism
Hi Ahmed,
>From what can be seen in the code snippet you provided, the reuse of XMM0 and XMM1 across loop-unroll instances does not inhibit instruction-level parallelism.
Modern X86 processors use register renaming that can eliminate the dependencies in the instruction stream. In the example you provided, the processor should be able to identify the 2-vloads + vadd + vstore sequences as
2017 Jun 25
0
AVX Scheduling and Parallelism
Hi, Zvi,
I agree. In the context of targeting the KNL, however, I'm a bit
concerned about the addressing, and specifically, the size of the
resulting encoding:
> vmovdqu32 zmm0, zmmword ptr [rax + c+401280] ;load b[401280] in
> zmm0
>
> vpaddd zmm1, zmm1, zmmword ptr [rax + b+401344]
> ; zmm1<-zmm1+b[401344]
The KNL can only