search for: tmm0

Displaying 6 results from an estimated 6 matches for "tmm0".

Did you mean: mm0
2020 Aug 24
2
Intel AMX programming model discussion.
...mistake on sharing register unit. Can we share > register unit for tile register that is within different tile register > class (different register class has different tile shape)?  Think > about two virtual tile register /%2:vtile1x1 /and /%3:vtile1x2/. First > %2 is allocated to $tmm0, after that %2 is killed and %t3 is allocated > to $tmm0. This is not allowed, because when $tmm0 is allocated to %2, > its shape is configured to 1x1. If we reallocated $tmm0 to %3, then we > need to re-config $tmm0 to 1x2 which cause $tmm0~$tmm7 be clobbered. > > Yuanke > &gt...
2020 Sep 04
2
Intel AMX programming model discussion.
...sharing register unit. Can we share > register unit for tile register that is within different tile > register class (different register class has different tile > shape)?  Think about two virtual tile register /%2:vtile1x1 /and > /%3:vtile1x2/. First %2 is allocated to $tmm0, after that %2 is > killed and %t3 is allocated to $tmm0. This is not allowed, because > when $tmm0 is allocated to %2, its shape is configured to 1x1. If > we reallocated $tmm0 to %3, then we need to re-config $tmm0 to 1x2 > which cause $tmm0~$tmm7 be clobbered. > &g...
2020 Sep 04
2
Intel AMX programming model discussion.
...nke wrote: It seems I make a mistake on sharing register unit. Can we share register unit for tile register that is within different tile register class (different register class has different tile shape)? Think about two virtual tile register %2:vtile1x1 and %3:vtile1x2. First %2 is allocated to $tmm0, after that %2 is killed and %t3 is allocated to $tmm0. This is not allowed, because when $tmm0 is allocated to %2, its shape is configured to 1x1. If we reallocated $tmm0 to %3, then we need to re-config $tmm0 to 1x2 which cause $tmm0~$tmm7 be clobbered. Yuanke From: Luo, Yuanke Sent: Friday, Au...
2020 Aug 21
2
Intel AMX programming model discussion.
...at this time the register is allocated, we don't know the shape of each physical tile register. So we just insert a pseudo tile config instruction. 3. All tile register class share the same register unit. We do register allocation by the framework, and the code is transformed as this. $tmm0 = TILELOADDV %17:gr64, 1, %18:gr64_nosp, 0, $noreg $tmm1 = TILELOADDV %17:gr64, 1, %18:gr64_nosp, 0, $noreg $tmm2 = TILELOADDV %17:gr64, 1, %18:gr64_nosp, 0, $noreg $tmm2 = TDPBSSDV $tmm2(tied-def 0), $tmm0, $tmm1 4. Run config pass to collect the shape of each physical tile register an...
2020 Aug 20
1
Intel AMX programming model discussion.
On 8/20/20 2:47 PM, Topper, Craig wrote: > > I think I’m still missing something here. The configuration is per > tile. The multiply instructions take a MxK tile and multiply it by a > KxN tile and accumulate into an MxN tile. So the configuration needs > to know how many of each size of tile it needs to avoid a spill. > Wouldn’t the register allocator then need to know which
2020 Aug 19
3
Intel AMX programming model discussion.
The width and height can be runtime values that we would just copy into 64 byte configuration block we pass to ldtilecfg. So the code doesn't need to be multiversioned. The user code would also use those values to update pointers in the loops they write using the tiles. If we can't determine that two tiles were defined with the same width and height we need to assume the shape is different