thr3ads.net - similar to: "[X86][AVX512] RFC: make i1 illegal in the Codegen"

Displaying 20 results from an estimated 1000 matches similar to: "[X86][AVX512] RFC: make i1 illegal in the Codegen"

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

KNL Assembly Code for Matrix Multiplication

Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=

[RFC] __builtin_constant_p() Improvements

2018 Apr 12

[RFC] __builtin_constant_p() Improvements

Hello again! I took a stab at PR4898[1]. The attached patch improves Clang's __builtin_constant_p support so that the Linux kernel is happy. With this improvement, Clang can determine if __builtin_constant_p is true or false after inlining. As an example: static __attribute__((always_inline)) int foo(int x) { if (__builtin_constant_p(x)) return 1; return 0; } static

[RFC] __builtin_constant_p() Improvements

2018 Apr 13

[RFC] __builtin_constant_p() Improvements

I actually was working on an updated patch for the LLVM-side of this, also. :) I was just working on some test cases; I'll post it soon. It's somewhat different than yours. I haven't touched the clang side yet, but I think it needs to be more complex than what you have there. I think it actually needs to be able to evaluate the intrinsic as a constant _false_ in the front-end in some

[RFC] New pass: LoopExitValues

2015 Aug 31

[RFC] New pass: LoopExitValues

Hello LLVM, This is a proposal for a new pass that improves performance and code size in some nested loop situations. The pass is target independent. >From the description in the file header: This optimization finds loop exit values reevaluated after the loop execution and replaces them by the corresponding exit values if they are available. Such sequences can arise after the

[RFC] New pass: LoopExitValues

2015 Sep 01

[RFC] New pass: LoopExitValues

On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem <jvanadrighem at gmail.com> wrote: > Do you have some specific performance measurements? Averaging 4 runs of 10000 iterations each of Coremark on my X86_64 desktop showed: -O2 performance: +2.9% faster with the L.E.V. pass -Os size: 1.5% smaller with the L.E.V. pass In the case of Coremark, the benefit comes mainly from the matrix

Fastest way to find the last index k such that x[k] < y in a sorted vector x?

2002 Jun 27

Fastest way to find the last index k such that x[k] < y in a sorted vector x?

Hi, I am trying to find the fastest way to "find the last index k such that x[k] < y in a *sorted* vector x" These are my two alternatives: x <- sort(rnorm(1e4)) y <- 0.2 # Alt 1 k <- max(1, sum(x < y)) # Alt 2 "divide and conquer" lastIndexLessThan <- function(x, y) { k0 <- 1; k1 <- length(x) while ((dk <- (k1 - k0)) >

pcie-expander-bus doesn't support pcie-pci-bridge and pcie-switch-upstream-port

2018 Oct 17

pcie-expander-bus doesn't support pcie-pci-bridge and pcie-switch-upstream-port

In libvirt, I found pcie-expander-bus controller doesn't support pcie-to-pci-bridge and pcie-switch-upstream-port. Version: libvirt-4.9 # cat /tmp/c.xml ... <controller type='pci' index='0' model='pcie-root'/> <controller type='pci' index='1' model='pcie-expander-bus'> <model name='pxb-pcie'/>

[AVX512BW] Nasty KAND issue

2016 Oct 20

[AVX512BW] Nasty KAND issue

Hey guys, I've hit a pretty nasty issue on SKX with ANDs of masks <= 4 bits. In the IR, we represent a 4b vector mask as <4 x i1>. This assumes that the storage container for this type is also 4b, but it's not. The smallest mask register on SKX is 8b. This also implies that the smallest load/store moves 8b. We run into problems when we try to optimize ANDs (full test case

[AVX512BW] Nasty KAND issue

2016 Oct 20

[AVX512BW] Nasty KAND issue

On Thu, Oct 20, 2016 at 12:05 PM, Mehdi Amini <mehdi.amini at apple.com> wrote: > >> On Oct 20, 2016, at 8:54 AM, Cameron McInally via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hey guys, >> >> I've hit a pretty nasty issue on SKX with ANDs of masks <= 4 bits. >> >> In the IR, we represent a 4b vector mask as <4 x i1>.

Renaming names in R matrix

2012 May 15

Renaming names in R matrix

I have the following matrix: > dat [,1] [,2] [,3] [,4] foo 0.7574657 0.2104075 0.02922241 0.002705617 foo 0.0000000 0.0000000 0.00000000 0.000000000 foo 0.0000000 0.0000000 0.00000000 0.000000000 foo 0.0000000 0.0000000 0.00000000 0.000000000 foo 0.0000000 0.0000000 0.00000000 0.000000000 foo 0.0000000 0.0000000 0.00000000 0.000000000 and given this:

A code layout related side-effect introduced by rL318299

2017 Dec 19

A code layout related side-effect introduced by rL318299

Hi, Recently 10% performance regression on an important benchmark showed up after we integrated https://reviews.llvm.org/rL318299. The analysis showed that rL318299 triggered loop rotation on an multi exits loop, and the loop rotation introduced code layout issue. The performance regression is a side-effect of rL318299. I got two testcases a.ll and b.ll attached to illustrate the problem. a.ll

[AVX512BW] Nasty KAND issue

2016 Oct 20

[AVX512BW] Nasty KAND issue

On 10/20/2016 9:28 AM, Cameron McInally via llvm-dev wrote: > I should have attached the generated asm to save some trouble. > Apologies for that and attaching now... > > > > On Thu, Oct 20, 2016 at 12:26 PM, Cameron McInally > <cameron.mcinally at nyu.edu> wrote: >> On Thu, Oct 20, 2016 at 12:05 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:

A code layout related side-effect introduced by rL318299

2017 Dec 19

A code layout related side-effect introduced by rL318299

On Mon, Dec 18, 2017 at 5:46 PM Xinliang David Li <davidxl at google.com> wrote: > The introduction of cleanup.cond block in b.ll without loop-rotation > already makes the layout worse than a.ll. > > > Without introducing cleanup.cond block, the layout out is > > entry->while.cond -> while.body->ret > > All the arrows are hot fall through edges which is

Re: pcie-expander-bus doesn't support pcie-pci-bridge and pcie-switch-upstream-port

2018 Oct 17

Re: pcie-expander-bus doesn't support pcie-pci-bridge and pcie-switch-upstream-port

On 10/17/2018 08:56 AM, Andrea Bolognani wrote: > On Wed, 2018-10-17 at 10:50 +0800, Han Han wrote: >> In libvirt, I found pcie-expander-bus controller doesn't support pcie-to-pci-bridge and pcie-switch-upstream-port. > [...] >> # virsh -k0 -K0 define /tmp/c.xml > Aside: the -k and -K virsh options are documented as > > -k | --keepalive-interval=NUM >

knn - 10 fold cross validation

2006 Jun 07

knn - 10 fold cross validation

Hi, I was trying to get the optimal 'k' for the knn. To do this I was using the following function : knn.cvk <- function(datmat, cl, k = 2:9) { datmatT <- (datmat) cv.err <- cl.pred <- c() for (i in k) { newpre <- as.vector(knn.cv(datmatT, cl, k = i)) cl.pred <- cbind(cl.pred, newpre) cv.err <- c(cv.err, sum(cl != newpre)) }

[R] choose(n, k) as n approaches k

2020 Jan 14

[R] choose(n, k) as n approaches k

> On 14 Jan 2020, at 16:21 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: > > On 14/01/2020 10:07 a.m., peter dalgaard wrote: >> Yep, that looks wrong (probably want to continue discussion over on R-devel) >> I think the culprit is here (in src/nmath/choose.c) >> if (k < k_small_max) { >> int j; >> if(n-k < k

[R] choose(n, k) as n approaches k

2020 Jan 14

[R] choose(n, k) as n approaches k

OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the if (k < 0) return 0.; if (k == 0) return 1.; /* else: k >= 1 */ part, because at that point k is sure to be integer, possibly after rounding. It is when n-k is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n

Own R function doubt

2011 Aug 13

Own R function doubt

Hi to all the people again, I was writting a simply function in R, and wish to collect the results in a excel file. The work goes as follows, Ciervos<-function(K1, K0, A, R,M,Pi,Hembras) {B<-(K1-K0)/A T1<-(R*Pi*Hembras-M*Pi+B)/(Pi-M*Pi+R*Pi*Hembras) P1<-Pi-B R1<-P1*Hembras*R M1<-P1*M T2<-(R1-M1+B)/(P1-M1+R1) P2<-P1-B R2<-P2*Hembras*R M2<-P2*M

[LLVMdev] [cfe-dev] Code generation for noexcept functions

2014 May 11

[LLVMdev] [cfe-dev] Code generation for noexcept functions

On Sun, May 11, 2014 at 8:19 AM, Stephan Tolksdorf <st at quanttec.com> wrote: > Hi, > > When clang/LLVM can't prove that a noexcept function only contains > non-throwing code, it seems to insert an explicit exception handler that > calls std::terminate. Why doesn't clang leave it to the eh personality > function to call std::terminate when an exception is thrown

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

2014 Jul 23

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

the clang 3.5 loop optimizer seems to jump in unintentional for simple loops the very simple example ---- const int SIZE = 3; int the_func(int* p_array) { int dummy = 0; #if defined(ITER) for(int* p = &p_array[0]; p < &p_array[SIZE]; ++p) dummy += *p; #else for(int i = 0; i < SIZE; ++i) dummy += p_array[i]; #endif return dummy; } int main(int argc, char** argv) {

similar to: [X86][AVX512] RFC: make i1 illegal in the Codegen