search for: _mm_

Displaying 10 results from an estimated 10 matches for "_mm_".

Did you mean: _am_
2009 Jan 31
0
[LLVMdev] Optimized code analysis problems
...y to extract the names > through any of the passes. > Where could I potentially insert a hack so that any function call to > intrinsic functions or library functions can be retrieved? > Could you gimme any ideas for the start? Basically, there is no mapping from the llvm.* names to the _mm_* names; the transformation is lossy. You have a couple options here: one is to manipulate the source to let you see the _mm_ names, and the other is to catch the _mm_ names before the inliner runs. Manipulating the source isn't actually very hard, although it's a non-trivial amount of wor...
2009 Jan 31
2
[LLVMdev] Optimized code analysis problems
...pun On Fri, Jan 30, 2009 at 10:39 PM, Eli Friedman <eli.friedman at gmail.com>wrote: > On Fri, Jan 30, 2009 at 7:10 PM, Nipun Arora <nipun2512 at gmail.com> wrote: > > Essentially I would like to extract the control flow graph representation > > with function names (eg. _mm_cvtsi32_si128) instead of the functions > being > > replaced by 'llvm.*' > > Is there anyway to extract these names directly as function calls? > > The names disappear in an unrecoverable way once the first inlining > pass runs to take care of always_inline. You migh...
2009 Feb 01
1
[LLVMdev] Optimized code analysis problems
...gt; through any of the passes. > > Where could I potentially insert a hack so that any function call to > > intrinsic functions or library functions can be retrieved? > > Could you gimme any ideas for the start? > > Basically, there is no mapping from the llvm.* names to the _mm_* > names; the transformation is lossy. > > You have a couple options here: one is to manipulate the source to let > you see the _mm_ names, and the other is to catch the _mm_ names > before the inliner runs. > > Manipulating the source isn't actually very hard, although it&...
2016 Jan 23
3
how to force llvm generate gather intrinsic
...> bit vectors with AVX2 (not just AVX512). > > I looked at this for the first time today, so I may be missing something... > > So for the moment, the answer to your question is 'no'; there's no generic > way to produce these instructions. You should be able to use the _mm_* > intrinsics in C though. > > > > > On Fri, Jan 22, 2016 at 5:00 PM, zhi chen via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi, >> >> I used clang -O3 -c -emit-llvm on the follow code to generate a bitcode, >> say a.bc. I read the .l...
2016 Jan 23
2
how to force llvm generate gather intrinsic
Hi, I used clang -O3 -c -emit-llvm on the follow code to generate a bitcode, say a.bc. I read the .ll file and didn't see any gather intrinsic. Also, I used opt -O3 -mcpu=core-avx2/-mcpu=skx, but there is still no gather intrinsic generated. int foo(int A[800], int B[800], int C[800]) { for (int i = 0; i < 800; i++) { A[B[i]] = i + 5; } for (int i = 0; i < 800;
2016 Jan 14
2
RFC: non-temporal fencing in LLVM IR
...mfence ). Other languages have private stacks where this isn't > > an issue, and where the stack top can reasonably be assumed to be > > in > > cache. > > > How will this affect non-user-mode code (i.e. kernel code)? > > > Kernel code still has to ask for _mm_ mfence if it wants mfence : > > C11 > > and C++11 barriers aren't specified as a specific instruction. > > > Is it safe to access top-of-stack? > > > AFAIK yes, and the ABI-specified red zone has our back (or front if > > the stack grows up ☻). > &gt...
2016 Jan 23
2
how to force llvm generate gather intrinsic
...) should be legal for 128/256 bit vectors with AVX2 (not just AVX512). I looked at this for the first time today, so I may be missing something... So for the moment, the answer to your question is 'no'; there's no generic way to produce these instructions. You should be able to use the _mm_* intrinsics in C though. On Fri, Jan 22, 2016 at 5:00 PM, zhi chen via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi, I used clang -O3 -c -emit-llvm on the follow code to generate a bitcode, say a.bc. I read the .ll file and didn't see any gat...
2016 Jan 13
4
RFC: non-temporal fencing in LLVM IR
...r a locked top-of-stack idempotent operation is better than mfence). Other languages have private stacks where this isn't an issue, and where the stack top can reasonably be assumed to be in cache. *How will this affect non-user-mode code (i.e. kernel code)?* Kernel code still has to ask for _mm_mfence if it wants mfence: C11 and C++11 barriers aren't specified as a specific instruction. *Is it safe to access top-of-stack?* AFAIK yes, and the ABI-specified red zone has our back (or front if the stack grows up ☻). *What about non-x86 architectures?* Architectures such as ARMv8 supp...
2019 Dec 21
13
[PATCH 0/8] Convert the intel iommu driver to the dma-iommu api
This patchset converts the intel iommu driver to the dma-iommu api. While converting the driver I exposed a bug in the intel i915 driver which causes a huge amount of artifacts on the screen of my laptop. You can see a picture of it here: https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg This issue is most likely in the i915 driver and is most likely caused by the
2019 Dec 21
13
[PATCH 0/8] Convert the intel iommu driver to the dma-iommu api
This patchset converts the intel iommu driver to the dma-iommu api. While converting the driver I exposed a bug in the intel i915 driver which causes a huge amount of artifacts on the screen of my laptop. You can see a picture of it here: https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg This issue is most likely in the i915 driver and is most likely caused by the