Displaying 20 results from an estimated 3000 matches similar to: "Intel AMX programming model discussion."
2020 Aug 14
3
Intel AMX programming model discussion.
[Yuanke] AMX register is special. It needs to be configured before use and the config instruction is expensive. To avoid unnecessary tile configure, we collect the tile shape information as much as possible and combine them into one ldtilecfg instruction. The ldtilecfg instruction should dominate any AMX instruction that access tile register. On the other side, the ldtilecfg should post-dominated
2020 Aug 18
2
Intel AMX programming model discussion.
The AMX registers are complicated. The single configuration register (which is mostly used implicitly, similar to MXCSR for floating point) controls the shape of all the tile registers, and if you change the tile configuration every single tile register is cleared. In practice, if we have to change the the configuration while any of the tile registers are live, performance is going to be terrible.
2020 Aug 19
3
Intel AMX programming model discussion.
There is no problem to have 256 register classes. Just a lot of register classes to me.
We don't assume the shape of each physical register be 16x16, it is defined by user. For variable shape, I mean the shape is known in runtime and in compile time the shape is unknown. Take below code as an example, the %row and %col are variable instead of constant. Compiler recognizes llvm.x86.tileloadd64
2020 Aug 19
2
Intel AMX programming model discussion.
Hi Hal,
There is 3 aspect to be solved.
1. The HW support max shape 16x16, so there are many register classes from 1x1 to 16x16. We need 256 register classes.
2. We want to support variable shape, so compiler don't know what register class to fit tile shape as it is only known in runtime.
3. The tile configure is to configure physical tile register, so we need to allocate
2020 Aug 19
2
Intel AMX programming model discussion.
> When the tile shape is unknown at compile time, how do you plan to do the register allocation of the tiles? My question is: do you do the allocation for this case in the same way as you would if you knew the size was 16x16 (i.e., conservatively assume the largest size)?
I think what will happen is that the registers are allocated based on a number of runtime values that are assumed to be
2020 Aug 19
3
Intel AMX programming model discussion.
The width and height can be runtime values that we would just copy into 64 byte configuration block we pass to ldtilecfg. So the code doesn't need to be multiversioned. The user code would also use those values to update pointers in the loops they write using the tiles. If we can't determine that two tiles were defined with the same width and height we need to assume the shape is different
2020 Aug 20
1
Intel AMX programming model discussion.
On 8/20/20 2:47 PM, Topper, Craig wrote:
>
> I think I’m still missing something here. The configuration is per
> tile. The multiply instructions take a MxK tile and multiply it by a
> KxN tile and accumulate into an MxN tile. So the configuration needs
> to know how many of each size of tile it needs to avoid a spill.
> Wouldn’t the register allocator then need to know which
2020 Aug 21
2
Intel AMX programming model discussion.
Hi Hal,
The proposal is attractive to me, but there is something I still can't figure out. Let's take below MIR as an example. We assume we have 256 register classes (vtile1x1, vtile1x2, ..., tile16x16).
1. After instruction selection, the pseudo AMX instruction is generated. The name of pseudo instructions have 'P' prefix. Now all the AMX pseudo instruction take vtile as
2020 Aug 14
2
Intel AMX programming model discussion.
From: Hal Finkel <hfinkel at anl.gov>
Sent: Friday, August 14, 2020 11:27 PM
To: Luo, Yuanke <yuanke.luo at intel.com>; llvm-dev at lists.llvm.org; florian_hahn at apple.com; Kaylor, Andrew <andrew.kaylor at intel.com>; Topper, Craig <craig.topper at intel.com>; Lu, Hongjiu <hongjiu.lu at intel.com>
Subject: Re: [llvm-dev] Intel AMX programming model discussion.
On
2020 Aug 15
2
Intel AMX programming model discussion.
Hi Philip,
Your idea make sense to me in my first thought. Thank you for the idea. I will take more time to think it over to see it can help to reduce the complexity of tile register allocation.
Yuanke
From: Philip Reames <listmail at philipreames.com>
Sent: Saturday, August 15, 2020 11:29 AM
To: Luo, Yuanke <yuanke.luo at intel.com>; llvm-dev at lists.llvm.org; florian_hahn at
2020 Sep 04
2
Intel AMX programming model discussion.
Fix typo
From: Luo, Yuanke
Sent: Friday, September 4, 2020 9:47 PM
To: 'Hal Finkel' <hfinkel at anl.gov>; Topper, Craig <craig.topper at intel.com>; Kaylor, Andrew <andrew.kaylor at intel.com>; Philip Reames <listmail at philipreames.com>; llvm-dev at lists.llvm.org; florian_hahn at apple.com; Lu, Hongjiu <hongjiu.lu at intel.com>
Subject: RE: [llvm-dev]
2020 Sep 04
2
Intel AMX programming model discussion.
On 9/4/20 3:37 AM, Luo, Yuanke wrote:
>
> Hi Hal,
>
> Thank you for the ideas that help us to improve the design, and sorry
> for replying late. There is something I am not able to figure out and
> there some special trait for tile RA.
>
You're quite welcome.
> 1.X86RegisterInfo::getRegAllocationHints can tell RA which physical
> register is preferred, but it
2020 Nov 19
2
[RFC] Intel AMX programming model
Hi,
Several months ago, we have some discussion for Intel AMX programming model in llvm-dev. H.J. post the AMX ABI at [1], and I sent the design for the programming model at [2]. Thank Hal, Philip for the time to review the design and provide good ideas to improve the design. After that I implemented the patch [4] and it is reviewed in LLVM community. The patch covers 6 components.
1. The c
2020 Aug 24
2
Intel AMX programming model discussion.
Hi, Yuanke,
Thanks for writing this up. Let me back up a bit because the scheme I
proposed last week doesn't work without further modification: within a
particular "configuration region" (i.e., the code in between the
LDTILECFG and the TILERELEASE (or next LDTILECFG)), each tile register
can only be used with one shape, and in addition, no register can have
its shape changed
2020 Nov 19
0
[RFC] Intel AMX programming model
Hi Yuanke,
As I said on the review, I think at least Craig should have a look and
approve before landing, as this is a major change in the x86 back-end.
cheers,
--renato
On Thu, 19 Nov 2020 at 02:29, Luo, Yuanke via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi,
>
>
>
> Several months ago, we have some discussion for Intel AMX programming
> model in llvm-dev. H.J.
2005 Aug 04
2
The killer app for Asterisk in corporate deployment
We're a dealer in Europe selling commercial phone &
building management systems, some residential too.
All the new office buildings have an EIB bus to manage
the lights, clima, security access, etc. The big
companies also have Crestron or AMX automation and
media servers for the boardroom. Asterisk is an
awesome phone solution, but if we could offer a
solution that tied it all together
2024 Oct 22
4
[PATCH v4 0/3] drm/nouveau: Add drm_panic support for nv50+
This series adds basic drm_panic support for nouveau.
I've tested on GTX1650 (Turing), GeForce GT 1030 (Pascal) and
Geforce 8800 GTS (Tesla), running Gnome/Wayland desktop, and in VT.
It should work on other nv50+ cards, but I didn't test them.
To test it, you need to build your kernel with CONFIG_DRM_PANIC=y, and run:
echo c > /proc/sysrq-trigger
or you can enable
2005 Jun 27
5
adding a new log-format escape
I'm adding a new escape to log-format, %s, to print out the checksum
of a file, and I've got a couple problems. They've got to be simple
bugs, but I haven't been able to figure them out. The following patch
gives me a broken pipe and a bus error when I test it. Note that I've
applied the md5 patch beforehand.
diff -Naur rsync-2.6.5-md5/log.c rsync-2.6.5/log.c
---
2020 Mar 25
2
Status of Intel JCC Mitigations and Next Steps
I agree we shouldn’t try to guess what the user is trying to do. There shouldn’t be an unbounded set of heuristic rules; “documented” implies some sort of promise of stability in addition to the actual text in the manual. And we shouldn’t try to guess whether the user’s code cares about the length of a specific instruction.
I think you’re creating a false dichotomy here, though. There’s some
2017 Sep 15
2
What should a truncating store do?
OK, I'm clear on scalars. Data races are thankfully OK in this context.
Densely packing vectors sounds efficient and is clear in the case where
lanes * width is a multiple of 8 bits. I don't think I understand how it
works in other cases.
If we could take store <4 x i8> truncating to <4 x i7> as an example. This
can be converted into four scalar i8 -> i7 stores with