Displaying 20 results from an estimated 3000 matches similar to: "[RFC] jump threading on std::pair<int, bool>"
2019 Nov 14
2
[AVR] [MSP430] Code gen improvements for 8 bit and 16 bit targets
For any of the examples shown below, if the logical equivalent using cmp +
other IR instructions is no more than the number of IR instructions as the
variant that uses shift, we should consider reversing the canonicalization.
To make that happen, you would need to show that at least the minimal cases
have codegen that is equal or better using the cmp form for at least a few
in-tree targets. My
2017 May 09
3
RFC: SROA for method argument
Hi,
I am working to improve SROA to generate better code when a method has a
struct in its arguments. I would appreciate it if I could have any
suggestions or comments on how I can best proceed with this optimization.
* Problem *
I observed that LLVM often generates redundant instructions around glibc’s
istreambuf_iterator. The problem comes from the scalar replacement (SROA)
for methods with an
2020 Aug 15
2
Intel AMX programming model discussion.
Hi Philip,
Your idea make sense to me in my first thought. Thank you for the idea. I will take more time to think it over to see it can help to reduce the complexity of tile register allocation.
Yuanke
From: Philip Reames <listmail at philipreames.com>
Sent: Saturday, August 15, 2020 11:29 AM
To: Luo, Yuanke <yuanke.luo at intel.com>; llvm-dev at lists.llvm.org; florian_hahn at
2020 Aug 14
2
Intel AMX programming model discussion.
From: Hal Finkel <hfinkel at anl.gov>
Sent: Friday, August 14, 2020 11:27 PM
To: Luo, Yuanke <yuanke.luo at intel.com>; llvm-dev at lists.llvm.org; florian_hahn at apple.com; Kaylor, Andrew <andrew.kaylor at intel.com>; Topper, Craig <craig.topper at intel.com>; Lu, Hongjiu <hongjiu.lu at intel.com>
Subject: Re: [llvm-dev] Intel AMX programming model discussion.
On
2013 Feb 14
1
[LLVMdev] LiveIntervals analysis problem
Hello everyone,
please I need your help.
To reproduce my problem I created simple pass for backends (TestPass.cpp
in attached files). That pass I call from Mips backend in this way
(MipsTargetMachine.cpp):
bool MipsPassConfig::addPreRegAlloc() {
addPass(createTestPass());
return false;
}
The problem becomes, when I am trying compile file ldtoa.ll (in attached
files). Compiling
2020 Aug 14
3
Intel AMX programming model discussion.
[Yuanke] AMX register is special. It needs to be configured before use and the config instruction is expensive. To avoid unnecessary tile configure, we collect the tile shape information as much as possible and combine them into one ldtilecfg instruction. The ldtilecfg instruction should dominate any AMX instruction that access tile register. On the other side, the ldtilecfg should post-dominated
2020 Aug 14
6
Intel AMX programming model discussion.
Hi,
Intel Advanced Matrix Extensions (Intel AMX) is a new programming paradigm consisting of two components: a set of 2-dimensional registers (tiles) representing sub-arrays from a larger 2-dimensional memory image, and accelerators able to operate on tiles. Capability of Intel AMX implementation is enumerated by palettes. Two palettes are supported: palette 0 represents the initialized state and
2009 Jul 19
2
[LLVMdev] Unnecessary i16 -> i32 type promotion
If I have an '&' operator inside an 'if' statement, LLVM seems to always
promote a 16 bit integer type to a 32 bit integer type. I don't want this
to happen because my back-end only supports 16 bit types. Why is this
happening? Where might this be happening, so I can fix it? It doesn't seem
to happen with the '|' operator, only '&'. Thanks!
2020 Aug 18
2
Intel AMX programming model discussion.
The AMX registers are complicated. The single configuration register (which is mostly used implicitly, similar to MXCSR for floating point) controls the shape of all the tile registers, and if you change the tile configuration every single tile register is cleared. In practice, if we have to change the the configuration while any of the tile registers are live, performance is going to be terrible.
2020 Aug 19
2
Intel AMX programming model discussion.
Hi Hal,
There is 3 aspect to be solved.
1. The HW support max shape 16x16, so there are many register classes from 1x1 to 16x16. We need 256 register classes.
2. We want to support variable shape, so compiler don't know what register class to fit tile shape as it is only known in runtime.
3. The tile configure is to configure physical tile register, so we need to allocate
2015 Mar 17
2
[LLVMdev] Alias analysis issue with structs on PPC
Hal Finkel <hfinkel at anl.gov> wrote on 16.03.2015 17:56:20:
> If you want to do it at a clang level, the right thing to do is to
> fixup the ABI lowerings for pointers to keep them pointers in this case.
> So this is an artifact of the way that we pass structures, and
> constructing a general solution at the ABI level might be tricky.
> I've cc'd Uli, who did most
2019 Nov 13
2
[AVR] [MSP430] Code gen improvements for 8 bit and 16 bit targets
As before, I'm not convinced that we want to allow target-based
enable/disable in instcombine for performance. That undermines having a
target-independent canonical form in the 1st place.
It's not clear to me what the remaining motivating cases look like. If you
could post those here or as bugs, I think you'd have a better chance of
finding an answer.
Let's take a minimal example
2014 Sep 10
3
[LLVMdev] failed folding with constant array with opt -O3
I came in to an email this morning that said basically the same thing
for the reduced example we were looking at. However, the original IR it
came from (before hand reduction) had the data layout set correctly, so
there's probably still *something* going on. It's just not what I
thought at first. :)
Philip
On 09/10/2014 02:26 AM, Roel Jordans wrote:
> Looking at the -debug
2020 Aug 19
2
Intel AMX programming model discussion.
> When the tile shape is unknown at compile time, how do you plan to do the register allocation of the tiles? My question is: do you do the allocation for this case in the same way as you would if you knew the size was 16x16 (i.e., conservatively assume the largest size)?
I think what will happen is that the registers are allocated based on a number of runtime values that are assumed to be
2016 Aug 04
2
Remove zext-unfolding from InstCombine
Hi Sanjay,
> Am 02.08.2016 um 21:39 schrieb Sanjay Patel <spatel at rotateright.com>:
>
> Hi Matthias -
>
> Sorry for the delayed reply. I think you're on the right path with D22864.
No problem, thank you for your answer!
> If I'm understanding it correctly, my foo() example and zext_or_icmp_icmp() will be equivalent after your patch is added to InstCombine.
2016 Jul 27
2
Remove zext-unfolding from InstCombine
Hi Sanjay,
thank you a lot for your answer. I understand that in your examples it is desirable that `foo` and `goo` are canonicalized to the same IR, i.e., something like `@goo`. However, I still have a few open questions, but please correct me in case I'm thinking in the wrong direction.
> Am 21.07.2016 um 18:51 schrieb Sanjay Patel <spatel at rotateright.com>:
>
> I've
2020 Aug 19
3
Intel AMX programming model discussion.
There is no problem to have 256 register classes. Just a lot of register classes to me.
We don't assume the shape of each physical register be 16x16, it is defined by user. For variable shape, I mean the shape is known in runtime and in compile time the shape is unknown. Take below code as an example, the %row and %col are variable instead of constant. Compiler recognizes llvm.x86.tileloadd64
2020 Aug 20
1
Intel AMX programming model discussion.
On 8/20/20 2:47 PM, Topper, Craig wrote:
>
> I think I’m still missing something here. The configuration is per
> tile. The multiply instructions take a MxK tile and multiply it by a
> KxN tile and accumulate into an MxN tile. So the configuration needs
> to know how many of each size of tile it needs to avoid a spill.
> Wouldn’t the register allocator then need to know which
2014 Sep 09
3
[LLVMdev] failed folding with constant array with opt -O3
I have the following simplified llvm ir, which basically returns value
based on the first value of a constant array.
----
; ModuleID = 'simple_ir3.txt'
@f.b = constant [1 x i32] [i32 1], align 4 ; constant array with
value 1 at the first element
define void @f(i32* nocapture %l0) {
entry:
%fc_ = alloca [1 x i32]
%f.b.v = load [1 x i32]* @f.b
store [1 x i32] %f.b.v, [1 x
2020 Aug 21
2
Intel AMX programming model discussion.
Hi Hal,
The proposal is attractive to me, but there is something I still can't figure out. Let's take below MIR as an example. We assume we have 256 register classes (vtile1x1, vtile1x2, ..., tile16x16).
1. After instruction selection, the pseudo AMX instruction is generated. The name of pseudo instructions have 'P' prefix. Now all the AMX pseudo instruction take vtile as