search for: ldtilecfg

Displaying 16 results from an estimated 16 matches for "ldtilecfg".

2020 Sep 04
2
Intel AMX programming model discussion.
...ructure, we need 3 schemes to adapt tile RA to each existing RA. Do you like to finalize the 3 schemes first, or you would like to review the left part of the AMX programming model? We have some limitation to support dynamic shape and I'd like to hear your advice. The dynamic shape requires the ldtilecfg post-dominate the point that define shape, so we encourage user to define their shape in the entry of the function. Take below code as example. Ideally, we hope to insert ldtilecfg at line 57 to config a, b, c, but in this function the c's shape {row, col} is defined in each if/else clause. So...
2020 Aug 15
2
Intel AMX programming model discussion.
...onse is skepticism. Philip On 8/14/20 4:49 PM, Luo, Yuanke wrote: [Yuanke] AMX register is special. It needs to be configured before use and the config instruction is expensive. To avoid unnecessary tile configure, we collect the tile shape information as much as possible and combine them into one ldtilecfg instruction. The ldtilecfg instruction should dominate any AMX instruction that access tile register. On the other side, the ldtilecfg should post-dominated the instruction that define the tile shape. For tile register spill, it should avoid re-config due to the different tile shape, the spilled re...
2020 Nov 19
2
[RFC] Intel AMX programming model
...and provide good ideas to improve the design. After that I implemented the patch [4] and it is reviewed in LLVM community. The patch covers 6 components. 1. The c interface to end user. 2. The AMX intrinsics in LLVM IR. 3. The Lowering from AMX intrinsics to AMX pseudo instruction. 4. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx intruction. 5. The register allocation for tile register. 6. Morph AMX pseudo instruction to AMX real instruction. If there is no objection for the patch, I'd like to land it. [1] http://lists.llvm.org/pipermail/llvm-dev/2020-August/143972.html [...
2020 Aug 14
3
Intel AMX programming model discussion.
[Yuanke] AMX register is special. It needs to be configured before use and the config instruction is expensive. To avoid unnecessary tile configure, we collect the tile shape information as much as possible and combine them into one ldtilecfg instruction. The ldtilecfg instruction should dominate any AMX instruction that access tile register. On the other side, the ldtilecfg should post-dominated the instruction that define the tile shape. For tile register spill, it should avoid re-config due to the different tile shape, the spilled re...
2020 Aug 21
2
Intel AMX programming model discussion.
...e of %2 and %3 is unknown in compile-time, so it arbitrarily picks up a tile register class which is not assigned before and assign the register class to %2 and %3. After register class allocation, the code is transformed as this. The register class for %2:vtile1x1 and %3:vtile1x2 is allocated. PLDTILECFG %1:vtile3x4 = TILELOADDV %17:gr64, 1, %18:gr64_nosp, 0, $noreg %2:vtile1x1 = TILELOADDV %17:gr64, 1, %18:gr64_nosp, 0, $noreg %3:vtile1x2 = TILELOADDV %17:gr64, 1, %18:gr64_nosp, 0, $noreg %21:vtile1x2 = TDPBSSDV %9:vtile1x2(tied-def 0), %1:vtile3x4, %2:vtile1x1 Something I am not figured ou...
2020 Sep 04
2
Intel AMX programming model discussion.
...s there too. Regarding the physical registers, you can grab this information in the pre-rewrite phase. Override addPreRewrite in X86TargetMachine.cpp. You'll need a small pass that records relevant information about the assignments (which, I imagine, is the same small pass that updates the LDTILECFG instructions). For an example of such a pass, see AMDGPU/GCNNSAReassign.cpp > When a tile register is spilled, the shape should also be bound the > corresponding spill stack slot, so that it can be assigned the > physical tile register with the same shape. > I'm not sure what...
2020 Aug 20
1
Intel AMX programming model discussion.
...s needs to deal with that. We should continue to do instruction scheduling in order to minimize register pressure. Once we assign the right virtual register classes to the AMX instructions, shouldn't this automatically happen? If we do spill, since none of the original live ranges cross the ldtilecfg, then there shouldn't be any fundamental issue with using a regular load/store spill implementation. I'm definitely not an expert in this instruction set, so I may just not understand some aspect of this. If there's something I'm overlooking, a little example would be helpful....
2020 Aug 24
2
Intel AMX programming model discussion.
Hi, Yuanke, Thanks for writing this up. Let me back up a bit because the scheme I proposed last week doesn't work without further modification: within a particular "configuration region" (i.e., the code in between the LDTILECFG and the TILERELEASE (or next LDTILECFG)), each tile register can only be used with one shape, and in addition, no register can have its shape changed without zeroing out all of the tile registers. Thus, just using different register classes for the different shapes, as I had suggested, isn'...
2020 Aug 19
3
Intel AMX programming model discussion.
The width and height can be runtime values that we would just copy into 64 byte configuration block we pass to ldtilecfg. So the code doesn't need to be multiversioned. The user code would also use those values to update pointers in the loops they write using the tiles. If we can't determine that two tiles were defined with the same width and height we need to assume the shape is different and try to avoid ev...
2020 Aug 18
2
Intel AMX programming model discussion.
...onse is skepticism. Philip On 8/14/20 4:49 PM, Luo, Yuanke wrote: [Yuanke] AMX register is special. It needs to be configured before use and the config instruction is expensive. To avoid unnecessary tile configure, we collect the tile shape information as much as possible and combine them into one ldtilecfg instruction. The ldtilecfg instruction should dominate any AMX instruction that access tile register. On the other side, the ldtilecfg should post-dominated the instruction that define the tile shape. For tile register spill, it should avoid re-config due to the different tile shape, the spilled re...
2020 Aug 14
2
Intel AMX programming model discussion.
...a pseudo instruction corresponding to it. The AMX intrinsics are lowered to the pseudo AMX instruction which has extra row and column operands corresponding to AMX intrinsic. The real AMX instructions don't need the row and column operands. The row and column information should be configured by ldtilecfg before executing any AMX instruction. 8. Register allocation AMX register is special. It needs to be configured before use and the config instruction is expensive. To avoid unnecessary tile configure, we collect the tile shape information as much as possible and combine them into one ldtilec...
2020 Aug 14
6
Intel AMX programming model discussion.
...a pseudo instruction corresponding to it. The AMX intrinsics are lowered to the pseudo AMX instruction which has extra row and column operands corresponding to AMX intrinsic. The real AMX instructions don't need the row and column operands. The row and column information should be configured by ldtilecfg before executing any AMX instruction. 8. Register allocation AMX register is special. It needs to be configured before use and the config instruction is expensive. To avoid unnecessary tile configure, we collect the tile shape information as much as possible and combine them into one ldtilec...
2020 Aug 19
2
Intel AMX programming model discussion.
...n't be useful anyway). 2. Define the tile-configuration instructions so that they implicitly define all of the registers in all of the classes. Then you would still need to pre-schedule the tile operations as you've described, and collect the configuration information in order to add the ldtilecfgs, but the regular register allocator can handle the allocation itself in the usual way. What do you think? -Hal On 8/18/20 6:58 PM, Kaylor, Andrew via llvm-dev wrote: The AMX registers are complicated. The single configuration register (which is mostly used implicitly, similar to MXCSR for floati...
2020 Aug 19
2
Intel AMX programming model discussion.
...n't be useful anyway). 2. Define the tile-configuration instructions so that they implicitly define all of the registers in all of the classes. Then you would still need to pre-schedule the tile operations as you've described, and collect the configuration information in order to add the ldtilecfgs, but the regular register allocator can handle the allocation itself in the usual way. What do you think? -Hal On 8/18/20 6:58 PM, Kaylor, Andrew via llvm-dev wrote: The AMX registers are complicated. The single configuration register (which is mostly used implicitly, similar to MXCSR for floati...
2020 Aug 19
3
Intel AMX programming model discussion.
...n't be useful anyway). 2. Define the tile-configuration instructions so that they implicitly define all of the registers in all of the classes. Then you would still need to pre-schedule the tile operations as you've described, and collect the configuration information in order to add the ldtilecfgs, but the regular register allocator can handle the allocation itself in the usual way. What do you think? -Hal On 8/18/20 6:58 PM, Kaylor, Andrew via llvm-dev wrote: The AMX registers are complicated. The single configuration register (which is mostly used implicitly, similar to MXCSR for floati...
2020 Nov 19
0
[RFC] Intel AMX programming model
...r that I > implemented the patch [4] and it is reviewed in LLVM community. The patch > covers 6 components. > > 1. The c interface to end user. > > 2. The AMX intrinsics in LLVM IR. > > 3. The Lowering from AMX intrinsics to AMX pseudo instruction. > > 4. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx > intruction. > > 5. The register allocation for tile register. > > 6. Morph AMX pseudo instruction to AMX real instruction. > > > > If there is no objection for the patch, I’d like to land it. > > > > [1] http://...