similar to: error of using GATHER intrinsic

Displaying 20 results from an estimated 900 matches similar to: "error of using GATHER intrinsic"

2016 Jan 20
3
error of using GATHER intrinsic
> On Jan 20, 2016, at 12:59 PM, Tim Northover via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi Zhi, > > On 18 January 2016 at 11:28, zhi chen via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Any idea about this error? Or could anyone give me an example how to use the >> gather intrinsic if there is something wrong with the way I am using it?
2016 Jan 20
2
error of using GATHER intrinsic
Hi Tim, Thanks for your response. The attached is the .bc file after my pass. I could generate the assembly with -mcpu=skx but not with -mcpu=core-avx2. Could you please take a look? BTW, I am using LLVM-3.7. Best, Zhi On Wed, Jan 20, 2016 at 1:21 PM, Tim Northover <t.p.northover at gmail.com> wrote: > > Only typo that caught my eye is ‘llvm.masked.gather.v8f64’ which should >
2016 Jan 20
2
error of using GATHER intrinsic
Got it. Thanks. I will try it with the trunk version. On Wed, Jan 20, 2016 at 1:36 PM, Tim Northover <t.p.northover at gmail.com> wrote: > Hi Zhi, > On 20 January 2016 at 13:33, zhi chen <zchenhn at gmail.com> wrote: > > Thanks for your response. The attached is the .bc file after my pass. I > > could generate the assembly with -mcpu=skx but not with
2016 Jan 23
3
how to force llvm generate gather intrinsic
Thanks for your response, Sanjay. I know there are intrinsics available in C/C++. But the problem is that I want to instrument my code at the IR level and generate those instructions. I don't want to touch the source code. Best, Zhi On Fri, Jan 22, 2016 at 4:54 PM, Sanjay Patel <spatel at rotateright.com> wrote: > I was just looking at the related masked load/store operations, and
2016 Jan 23
2
how to force llvm generate gather intrinsic
Hi, I used clang -O3 -c -emit-llvm on the follow code to generate a bitcode, say a.bc. I read the .ll file and didn't see any gather intrinsic. Also, I used opt -O3 -mcpu=core-avx2/-mcpu=skx, but there is still no gather intrinsic generated. int foo(int A[800], int B[800], int C[800]) { for (int i = 0; i < 800; i++) { A[B[i]] = i + 5; } for (int i = 0; i < 800;
2016 Feb 25
2
how to force llvm generate gather intrinsic
It seems that http://reviews.llvm.org/D15690 only implemented gather/scatter for AVX-512, but not for AVX/AVX2. Is there any plan to enable gather for AVX/2? Thanks. Best, Zhi On Thu, Feb 25, 2016 at 8:28 AM, Sanjay Patel <spatel at rotateright.com> wrote: > I don't think gather has been enabled for AVX2 as of r261875. > Masked load/store were enabled for AVX with: >
2016 Feb 25
2
how to force llvm generate gather intrinsic
Yes, masked load/store/gather/scatter are completed. - Elena From: zhi chen [mailto:zchenhn at gmail.com] Sent: Thursday, February 25, 2016 01:20 To: Demikhovsky, Elena <elena.demikhovsky at intel.com> Cc: Sanjay Patel <spatel at rotateright.com>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] how to
2016 Feb 26
2
how to force llvm generate gather intrinsic
If I'm understanding correctly, you're saying that vgather* is slow on all of Excavator, Haswell, Broadwell, and Skylake (client). Therefore, we will not generate it for any of those machines. Even if that's true, we should not define "gatherIsSlow()" as "hasAVX2() && !hasAVX512()". It could break for some hypothetical future processor that manages to
2016 Feb 25
0
how to force llvm generate gather intrinsic
I don't think gather has been enabled for AVX2 as of r261875. Masked load/store were enabled for AVX with: http://reviews.llvm.org/D16528 / http://reviews.llvm.org/rL258675 On Wed, Feb 24, 2016 at 11:39 PM, Demikhovsky, Elena < elena.demikhovsky at intel.com> wrote: > Yes, masked load/store/gather/scatter are completed. > > > > - * Elena* > > > >
2016 Feb 26
0
how to force llvm generate gather intrinsic
That makes great sense. It would be great if we have profitability mode to see the necessity to use gathers. Or it also would be good if there is a compiler option for the users to enable LLVM to generate the gather instructions no matter it is faster or slow. Best, Zhi On Fri, Feb 26, 2016 at 12:49 PM, Sanjay Patel <spatel at rotateright.com> wrote: > If I'm understanding
2016 Feb 26
0
how to force llvm generate gather intrinsic
No. Gather operation is slow on AVX2 processors. - Elena From: zhi chen [mailto:zchenhn at gmail.com] Sent: Thursday, February 25, 2016 20:48 To: Sanjay Patel <spatel at rotateright.com> Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] how to force
2016 Feb 24
0
how to force llvm generate gather intrinsic
Hi Elena, Are the masked_load and gather working now? Best, Zhi On Sat, Jan 23, 2016 at 12:06 PM, Demikhovsky, Elena < elena.demikhovsky at intel.com> wrote: > Ø Can we legalize the same set of masked load/store operations for AVX1 > as AVX2? > > Yes, of course. > > > > - * Elena* > > > > *From:* Sanjay Patel [mailto:spatel at
2017 May 05
2
load instruction to gather intrinsics
Hi All, Can I change a vector load to gather intrinsic? If so, how can I do it? For example, I want to change the following IR code %1 = load <2 x i64>* %arrayidx1, align 8 to %1 = call <2 x i64> @llvm.masked.gather.v2i64(<2 x i64*> %arrayidx1, i32 8, <2 x i1> <i1 true, i1 true>, <2 x i64> undef) Basically, I am not sure how to get two consecutive
2016 Jan 23
2
how to force llvm generate gather intrinsic
Ø Can we legalize the same set of masked load/store operations for AVX1 as AVX2? Yes, of course. - Elena From: Sanjay Patel [mailto:spatel at rotateright.com] Sent: Saturday, January 23, 2016 18:42 To: Nema, Ashutosh <Ashutosh.Nema at amd.com> Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; zhi chen <zchenhn at gmail.com>; llvm-dev <llvm-dev at
2017 May 05
2
load instruction to gather intrinsics
The frontend would generate the load in the IR. I am using IRBuilder to generate gather. I know it is mainly for discontinuous memory locations. It's a long story why I want to use this. I want to gather some memory locations. Suppose there are an array A, I manually duplicated it somewhere with an offset x. Now, we have two arrays A and A', where A'[i] - A[i] = offset. I want to
2016 Jan 23
2
how to force llvm generate gather intrinsic
Thanks Sanjay for highlighting this, few days back I also faced similar problem while generating masked store in avx1 mode, found its only supported under avx2 else we scalarize it. > 1) I did not switch-on masked_load/store to AVX1, I can do this. Yes Elena, This should be supported for FP type in avx1 mode (for INT type, I doubt X86 has masked_load/store instruction in avx1 mode).
2014 Sep 18
2
[LLVMdev] troubles with ISD::FPOWI
Hi, I'm stumped by how to handle fpowi. Here is the context: my architecture has i64, f32, and f64 registers. No i32. For calls & returns, we promote i32 to i64. There is no support in the architecture to perform fpowi - it has to go through the runtime. I'm using gfortran + dragonegg + llvm3.4 to generate .ll files via plugin. The fortran expression REAL = REAL ** INTEGER*4
2012 Jul 01
7
btrfs_print_tree?
HI, Do anyone know where btrfs_print_tree is invoked? thanks. -- Regards, Zhi Yong Wu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2015 Nov 19
2
Get timestamp and processor ID in the IR
Hi Hal, Thanks for the pointer. Is it possible to get the processor ID on X86 architecture? There is a library call in linux, sched_getcpu(), to the ID. Also, is it possible to get the program counter in the IR? Best, Zhi On Thu, Nov 19, 2015 at 1:33 PM, Hal Finkel <hfinkel at anl.gov> wrote: > Hi Zhi, > > There is no standard (architecture-independent) way to get the processor
2015 May 14
4
[LLVMdev] how to disable sse and avx
Thanks, Mats. Actually, it is able to generate the assembly now if I use the follow command: clang++ -O3 -S -mllvm --x86-asm-syntax=intel -mno-sse -o test_nosee.s test.cpp However, when I use g++ -O3 -o test_nosse test_nosse.s -lm to generate the executable, if gives me the following errors: Error: too many memory references for `sub' Error: too many memory references for `mov' Error: