thr3ads.net - similar to: "[RFC] jump threading on std::pair<int, bool>"

Displaying 20 results from an estimated 3000 matches similar to: "[RFC] jump threading on std::pair<int, bool>"

[AVR] [MSP430] Code gen improvements for 8 bit and 16 bit targets

2019 Nov 14

[AVR] [MSP430] Code gen improvements for 8 bit and 16 bit targets

For any of the examples shown below, if the logical equivalent using cmp + other IR instructions is no more than the number of IR instructions as the variant that uses shift, we should consider reversing the canonicalization. To make that happen, you would need to show that at least the minimal cases have codegen that is equal or better using the cmp form for at least a few in-tree targets. My

RFC: SROA for method argument

2017 May 09

RFC: SROA for method argument

Hi, I am working to improve SROA to generate better code when a method has a struct in its arguments. I would appreciate it if I could have any suggestions or comments on how I can best proceed with this optimization. * Problem * I observed that LLVM often generates redundant instructions around glibc’s istreambuf_iterator. The problem comes from the scalar replacement (SROA) for methods with an

Intel AMX programming model discussion.

2020 Aug 15

Intel AMX programming model discussion.

Hi Philip, Your idea make sense to me in my first thought. Thank you for the idea. I will take more time to think it over to see it can help to reduce the complexity of tile register allocation. Yuanke From: Philip Reames <listmail at philipreames.com> Sent: Saturday, August 15, 2020 11:29 AM To: Luo, Yuanke <yuanke.luo at intel.com>; llvm-dev at lists.llvm.org; florian_hahn at

Intel AMX programming model discussion.

2020 Aug 14

Intel AMX programming model discussion.

From: Hal Finkel <hfinkel at anl.gov> Sent: Friday, August 14, 2020 11:27 PM To: Luo, Yuanke <yuanke.luo at intel.com>; llvm-dev at lists.llvm.org; florian_hahn at apple.com; Kaylor, Andrew <andrew.kaylor at intel.com>; Topper, Craig <craig.topper at intel.com>; Lu, Hongjiu <hongjiu.lu at intel.com> Subject: Re: [llvm-dev] Intel AMX programming model discussion. On

[LLVMdev] LiveIntervals analysis problem

2013 Feb 14

[LLVMdev] LiveIntervals analysis problem

Hello everyone, please I need your help. To reproduce my problem I created simple pass for backends (TestPass.cpp in attached files). That pass I call from Mips backend in this way (MipsTargetMachine.cpp): bool MipsPassConfig::addPreRegAlloc() { addPass(createTestPass()); return false; } The problem becomes, when I am trying compile file ldtoa.ll (in attached files). Compiling

Intel AMX programming model discussion.

2020 Aug 14

Intel AMX programming model discussion.

[Yuanke] AMX register is special. It needs to be configured before use and the config instruction is expensive. To avoid unnecessary tile configure, we collect the tile shape information as much as possible and combine them into one ldtilecfg instruction. The ldtilecfg instruction should dominate any AMX instruction that access tile register. On the other side, the ldtilecfg should post-dominated

Intel AMX programming model discussion.

2020 Aug 14

Intel AMX programming model discussion.

Hi, Intel Advanced Matrix Extensions (Intel AMX) is a new programming paradigm consisting of two components: a set of 2-dimensional registers (tiles) representing sub-arrays from a larger 2-dimensional memory image, and accelerators able to operate on tiles. Capability of Intel AMX implementation is enumerated by palettes. Two palettes are supported: palette 0 represents the initialized state and

[LLVMdev] Unnecessary i16 -> i32 type promotion

2009 Jul 19

[LLVMdev] Unnecessary i16 -> i32 type promotion

If I have an '&' operator inside an 'if' statement, LLVM seems to always promote a 16 bit integer type to a 32 bit integer type. I don't want this to happen because my back-end only supports 16 bit types. Why is this happening? Where might this be happening, so I can fix it? It doesn't seem to happen with the '|' operator, only '&'. Thanks!

Intel AMX programming model discussion.

2020 Aug 18

Intel AMX programming model discussion.

The AMX registers are complicated. The single configuration register (which is mostly used implicitly, similar to MXCSR for floating point) controls the shape of all the tile registers, and if you change the tile configuration every single tile register is cleared. In practice, if we have to change the the configuration while any of the tile registers are live, performance is going to be terrible.

Intel AMX programming model discussion.

2020 Aug 19

Intel AMX programming model discussion.

Hi Hal, There is 3 aspect to be solved. 1. The HW support max shape 16x16, so there are many register classes from 1x1 to 16x16. We need 256 register classes. 2. We want to support variable shape, so compiler don't know what register class to fit tile shape as it is only known in runtime. 3. The tile configure is to configure physical tile register, so we need to allocate

[LLVMdev] Alias analysis issue with structs on PPC

2015 Mar 17

[LLVMdev] Alias analysis issue with structs on PPC

Hal Finkel <hfinkel at anl.gov> wrote on 16.03.2015 17:56:20: > If you want to do it at a clang level, the right thing to do is to > fixup the ABI lowerings for pointers to keep them pointers in this case. > So this is an artifact of the way that we pass structures, and > constructing a general solution at the ABI level might be tricky. > I've cc'd Uli, who did most

[AVR] [MSP430] Code gen improvements for 8 bit and 16 bit targets

2019 Nov 13

[AVR] [MSP430] Code gen improvements for 8 bit and 16 bit targets

As before, I'm not convinced that we want to allow target-based enable/disable in instcombine for performance. That undermines having a target-independent canonical form in the 1st place. It's not clear to me what the remaining motivating cases look like. If you could post those here or as bugs, I think you'd have a better chance of finding an answer. Let's take a minimal example

[LLVMdev] failed folding with constant array with opt -O3

2014 Sep 10

[LLVMdev] failed folding with constant array with opt -O3

I came in to an email this morning that said basically the same thing for the reduced example we were looking at. However, the original IR it came from (before hand reduction) had the data layout set correctly, so there's probably still *something* going on. It's just not what I thought at first. :) Philip On 09/10/2014 02:26 AM, Roel Jordans wrote: > Looking at the -debug

Intel AMX programming model discussion.

2020 Aug 19

Intel AMX programming model discussion.

> When the tile shape is unknown at compile time, how do you plan to do the register allocation of the tiles? My question is: do you do the allocation for this case in the same way as you would if you knew the size was 16x16 (i.e., conservatively assume the largest size)? I think what will happen is that the registers are allocated based on a number of runtime values that are assumed to be

Remove zext-unfolding from InstCombine

2016 Aug 04

Remove zext-unfolding from InstCombine

Hi Sanjay, > Am 02.08.2016 um 21:39 schrieb Sanjay Patel <spatel at rotateright.com>: > > Hi Matthias - > > Sorry for the delayed reply. I think you're on the right path with D22864. No problem, thank you for your answer! > If I'm understanding it correctly, my foo() example and zext_or_icmp_icmp() will be equivalent after your patch is added to InstCombine.

Remove zext-unfolding from InstCombine

2016 Jul 27

Remove zext-unfolding from InstCombine

Hi Sanjay, thank you a lot for your answer. I understand that in your examples it is desirable that `foo` and `goo` are canonicalized to the same IR, i.e., something like `@goo`. However, I still have a few open questions, but please correct me in case I'm thinking in the wrong direction. > Am 21.07.2016 um 18:51 schrieb Sanjay Patel <spatel at rotateright.com>: > > I've

Intel AMX programming model discussion.

2020 Aug 19

Intel AMX programming model discussion.

There is no problem to have 256 register classes. Just a lot of register classes to me. We don't assume the shape of each physical register be 16x16, it is defined by user. For variable shape, I mean the shape is known in runtime and in compile time the shape is unknown. Take below code as an example, the %row and %col are variable instead of constant. Compiler recognizes llvm.x86.tileloadd64

Intel AMX programming model discussion.

2020 Aug 20

Intel AMX programming model discussion.

On 8/20/20 2:47 PM, Topper, Craig wrote: > > I think I’m still missing something here. The configuration is per > tile. The multiply instructions take a MxK tile and multiply it by a > KxN tile and accumulate into an MxN tile. So the configuration needs > to know how many of each size of tile it needs to avoid a spill. > Wouldn’t the register allocator then need to know which

[LLVMdev] failed folding with constant array with opt -O3

2014 Sep 09

[LLVMdev] failed folding with constant array with opt -O3

I have the following simplified llvm ir, which basically returns value based on the first value of a constant array. ---- ; ModuleID = 'simple_ir3.txt' @f.b = constant [1 x i32] [i32 1], align 4 ; constant array with value 1 at the first element define void @f(i32* nocapture %l0) { entry: %fc_ = alloca [1 x i32] %f.b.v = load [1 x i32]* @f.b store [1 x i32] %f.b.v, [1 x

Intel AMX programming model discussion.

2020 Aug 21

Intel AMX programming model discussion.

Hi Hal, The proposal is attractive to me, but there is something I still can't figure out. Let's take below MIR as an example. We assume we have 256 register classes (vtile1x1, vtile1x2, ..., tile16x16). 1. After instruction selection, the pseudo AMX instruction is generated. The name of pseudo instructions have 'P' prefix. Now all the AMX pseudo instruction take vtile as

similar to: [RFC] jump threading on std::pair<int, bool>