similar to: [X86][AVX512] RFC: make i1 illegal in the Codegen

Displaying 20 results from an estimated 1000 matches similar to: "[X86][AVX512] RFC: make i1 illegal in the Codegen"

2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=
2018 Apr 12
3
[RFC] __builtin_constant_p() Improvements
Hello again! I took a stab at PR4898[1]. The attached patch improves Clang's __builtin_constant_p support so that the Linux kernel is happy. With this improvement, Clang can determine if __builtin_constant_p is true or false after inlining. As an example: static __attribute__((always_inline)) int foo(int x) { if (__builtin_constant_p(x)) return 1; return 0; } static
2018 Apr 13
0
[RFC] __builtin_constant_p() Improvements
I actually was working on an updated patch for the LLVM-side of this, also. :) I was just working on some test cases; I'll post it soon. It's somewhat different than yours. I haven't touched the clang side yet, but I think it needs to be more complex than what you have there. I think it actually needs to be able to evaluate the intrinsic as a constant _false_ in the front-end in some
2015 Aug 31
2
[RFC] New pass: LoopExitValues
Hello LLVM, This is a proposal for a new pass that improves performance and code size in some nested loop situations. The pass is target independent. >From the description in the file header: This optimization finds loop exit values reevaluated after the loop execution and replaces them by the corresponding exit values if they are available. Such sequences can arise after the
2015 Sep 01
2
[RFC] New pass: LoopExitValues
On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem <jvanadrighem at gmail.com> wrote: > Do you have some specific performance measurements? Averaging 4 runs of 10000 iterations each of Coremark on my X86_64 desktop showed: -O2 performance: +2.9% faster with the L.E.V. pass -Os size: 1.5% smaller with the L.E.V. pass In the case of Coremark, the benefit comes mainly from the matrix
2002 Jun 27
2
Fastest way to find the last index k such that x[k] < y in a sorted vector x?
Hi, I am trying to find the fastest way to "find the last index k such that x[k] < y in a *sorted* vector x" These are my two alternatives: x <- sort(rnorm(1e4)) y <- 0.2 # Alt 1 k <- max(1, sum(x < y)) # Alt 2 "divide and conquer" lastIndexLessThan <- function(x, y) { k0 <- 1; k1 <- length(x) while ((dk <- (k1 - k0)) >
2018 Oct 17
3
pcie-expander-bus doesn't support pcie-pci-bridge and pcie-switch-upstream-port
In libvirt, I found pcie-expander-bus controller doesn't support pcie-to-pci-bridge and pcie-switch-upstream-port. Version: libvirt-4.9 # cat /tmp/c.xml ... <controller type='pci' index='0' model='pcie-root'/> <controller type='pci' index='1' model='pcie-expander-bus'> <model name='pxb-pcie'/>
2016 Oct 20
2
[AVX512BW] Nasty KAND issue
Hey guys, I've hit a pretty nasty issue on SKX with ANDs of masks <= 4 bits. In the IR, we represent a 4b vector mask as <4 x i1>. This assumes that the storage container for this type is also 4b, but it's not. The smallest mask register on SKX is 8b. This also implies that the smallest load/store moves 8b. We run into problems when we try to optimize ANDs (full test case
2016 Oct 20
2
[AVX512BW] Nasty KAND issue
On Thu, Oct 20, 2016 at 12:05 PM, Mehdi Amini <mehdi.amini at apple.com> wrote: > >> On Oct 20, 2016, at 8:54 AM, Cameron McInally via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hey guys, >> >> I've hit a pretty nasty issue on SKX with ANDs of masks <= 4 bits. >> >> In the IR, we represent a 4b vector mask as <4 x i1>.
2012 May 15
2
Renaming names in R matrix
I have the following matrix: > dat [,1] [,2] [,3] [,4] foo 0.7574657 0.2104075 0.02922241 0.002705617 foo 0.0000000 0.0000000 0.00000000 0.000000000 foo 0.0000000 0.0000000 0.00000000 0.000000000 foo 0.0000000 0.0000000 0.00000000 0.000000000 foo 0.0000000 0.0000000 0.00000000 0.000000000 foo 0.0000000 0.0000000 0.00000000 0.000000000 and given this:
2017 Dec 19
4
A code layout related side-effect introduced by rL318299
Hi, Recently 10% performance regression on an important benchmark showed up after we integrated https://reviews.llvm.org/rL318299. The analysis showed that rL318299 triggered loop rotation on an multi exits loop, and the loop rotation introduced code layout issue. The performance regression is a side-effect of rL318299. I got two testcases a.ll and b.ll attached to illustrate the problem. a.ll
2016 Oct 20
2
[AVX512BW] Nasty KAND issue
On 10/20/2016 9:28 AM, Cameron McInally via llvm-dev wrote: > I should have attached the generated asm to save some trouble. > Apologies for that and attaching now... > > > > On Thu, Oct 20, 2016 at 12:26 PM, Cameron McInally > <cameron.mcinally at nyu.edu> wrote: >> On Thu, Oct 20, 2016 at 12:05 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:
2017 Dec 19
2
A code layout related side-effect introduced by rL318299
On Mon, Dec 18, 2017 at 5:46 PM Xinliang David Li <davidxl at google.com> wrote: > The introduction of cleanup.cond block in b.ll without loop-rotation > already makes the layout worse than a.ll. > > > Without introducing cleanup.cond block, the layout out is > > entry->while.cond -> while.body->ret > > All the arrows are hot fall through edges which is
2018 Oct 17
1
Re: pcie-expander-bus doesn't support pcie-pci-bridge and pcie-switch-upstream-port
On 10/17/2018 08:56 AM, Andrea Bolognani wrote: > On Wed, 2018-10-17 at 10:50 +0800, Han Han wrote: >> In libvirt, I found pcie-expander-bus controller doesn't support pcie-to-pci-bridge and pcie-switch-upstream-port. > [...] >> # virsh -k0 -K0 define /tmp/c.xml > Aside: the -k and -K virsh options are documented as > > -k | --keepalive-interval=NUM >
2006 Jun 07
1
knn - 10 fold cross validation
Hi, I was trying to get the optimal 'k' for the knn. To do this I was using the following function : knn.cvk <- function(datmat, cl, k = 2:9) { datmatT <- (datmat) cv.err <- cl.pred <- c() for (i in k) { newpre <- as.vector(knn.cv(datmatT, cl, k = i)) cl.pred <- cbind(cl.pred, newpre) cv.err <- c(cv.err, sum(cl != newpre)) }
2020 Jan 14
2
[R] choose(n, k) as n approaches k
> On 14 Jan 2020, at 16:21 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: > > On 14/01/2020 10:07 a.m., peter dalgaard wrote: >> Yep, that looks wrong (probably want to continue discussion over on R-devel) >> I think the culprit is here (in src/nmath/choose.c) >> if (k < k_small_max) { >> int j; >> if(n-k < k
2020 Jan 14
4
[R] choose(n, k) as n approaches k
OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the if (k < 0) return 0.; if (k == 0) return 1.; /* else: k >= 1 */ part, because at that point k is sure to be integer, possibly after rounding. It is when n-k is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n
2011 Aug 13
1
Own R function doubt
Hi to all the people again, I was writting a simply function in R, and wish to collect the results in a excel file. The work goes as follows, Ciervos<-function(K1, K0, A, R,M,Pi,Hembras) {B<-(K1-K0)/A T1<-(R*Pi*Hembras-M*Pi+B)/(Pi-M*Pi+R*Pi*Hembras) P1<-Pi-B R1<-P1*Hembras*R M1<-P1*M T2<-(R1-M1+B)/(P1-M1+R1) P2<-P1-B R2<-P2*Hembras*R M2<-P2*M
2014 May 11
2
[LLVMdev] [cfe-dev] Code generation for noexcept functions
On Sun, May 11, 2014 at 8:19 AM, Stephan Tolksdorf <st at quanttec.com> wrote: > Hi, > > When clang/LLVM can't prove that a noexcept function only contains > non-throwing code, it seems to insert an explicit exception handler that > calls std::terminate. Why doesn't clang leave it to the eh personality > function to call std::terminate when an exception is thrown
2014 Jul 23
4
[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
the clang 3.5 loop optimizer seems to jump in unintentional for simple loops the very simple example ---- const int SIZE = 3; int the_func(int* p_array) { int dummy = 0; #if defined(ITER) for(int* p = &p_array[0]; p < &p_array[SIZE]; ++p) dummy += *p; #else for(int i = 0; i < SIZE; ++i) dummy += p_array[i]; #endif return dummy; } int main(int argc, char** argv) {