Roman Gareev via llvm-dev
2016-Jul-05 06:55 UTC
[llvm-dev] [GSoC 2016] Implementation of the packing transformation
>> 1. Could we set MemoryAccess::NewAccessRelation to modify an access >> relation of a memory access? As far as I understand, >> MemoryAccess::NewAccessRelation is only used to import memory accesses >> from JSCoPs > > I already use it in Polly-ACC to modify access relations. If you just > want to modify the subscript expression, this should just work. Changing > the accessed array may also already work, but is a lot less tested and > the information is only partially updated such that further interface > changes would be required.Thank you for the information!>> I think that MemoryAccess::NewAccessRelation could be used in the >> implementation and also help to perform, for example, array expansion. > > Yes. > >> However, the modification would probably cause issues. For example, if >> we apply tiling, we'll have to substitute old induction variables >> before analyzing access functions. > > I do not understand this issue. Tiling is a schedule-only transformation > that only considers data-dependences for legality. How do access > functions come into play here? Why do induction variables need to be > substituted. > > If you did not see this yet, any modification of the polyhedral access > function will (except for non-affine accesses) directly be > translated to LLVM-IR. This means, we need (almost) no code generation > changes to make this work.Yes, I agree, tiling is a schedule-only transformation that only considers data-dependences for legality and any modification of the polyhedral access function is directly translated to LLVM-IR. I just wanted to say that if a schedule tree is modified in ScheduleOptimizer and new induction variables are created (e.g. in case of tiling), access relations and new access relations may stay unchanged. It should probably be taken into account, if we were going to modify an access relation according to its previous state. For example, in case of matrix multiplication, we get the following isl ast if (1 && (&MemRef_7[1023][1056] <= &MemRef_5[0][0] || &MemRef_5[1055][1056] <= &MemRef_7[0][0]) && (&MemRef_6[1055][1024] <&MemRef_5[0][0] || &MemRef_5[1055][1056] <= &MemRef_6[0][0])) { for (int c0 = 0; c0 <= 1055; c0 += 1) for (int c1 = 0; c1 <= 1055; c1 += 1) Stmt_12(c0, c1); for (int c0 = 0; c0 <= 1055; c0 += 1) for (int c1 = 0; c1 <= 1055; c1 += 1) for (int c2 = 0; c2 <= 1023; c2 += 1) Stmt_17(c0, c1, c2); } else { /* original code */ } and the following memory access is generated (the corresponding new memory access is NULL): { Stmt_17[i0, i1, i2] -> MemRef_6[i0, i2] } If we try to create BLIS micro and macro kernels, we get the following isl ast { // 1st level tiling - Tiles for (int c0 = 0; c0 <= 32; c0 += 1) for (int c1 = 0; c1 <= 32; c1 += 1) { // 1st level tiling - Points for (int c2 = 0; c2 <= 31; c2 += 1) for (int c3 = 0; c3 <= 31; c3 += 1) Stmt_12(32 * c0 + c2, 32 * c1 + c3); } // 1nd level tiling - Tiles for (int c0 = 0; c0 <= 65; c0 += 1) for (int c1 = 0; c1 <= 3; c1 += 1) for (int c2 = 0; c2 <= 10; c2 += 1) { // 1nd level tiling - Points // Register tiling - Tiles for (int c3 = 0; c3 <= 3; c3 += 1) for (int c4 = 0; c4 <= 11; c4 += 1) for (int c5 = 0; c5 <= 255; c5 += 1) { // Register tiling - Points // 1st level tiling - Tiles // 1st level tiling - Points { Stmt_17(16 * c0 + 4 * c3, 96 * c2 + 8 * c4, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 1, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 2, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 3, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 4, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 5, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 6, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 7, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 1, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 2, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 3, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 4, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 5, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 6, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 7, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 1, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 2, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 3, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 4, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 5, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 6, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 7, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 1, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 2, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 3, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 4, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 5, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 6, 256 * c1 + c5); Stmt_17(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 7, 256 * c1 + c5); } } } } else { /* original code */ } However, if we dump the memory access and the new memory access after creation of the BLIS micro-kernel (e.g. after call of the ScheduleTreeOptimizer::optimizeMatMulPattern), we'll see that they remain the same. In this case, we should probably substitute old induction variables, if we wanted to determine that the memory access actually has the following form Stmt_17[c0, c1, c2, c3, c4, c5, c6, c7] -> MemRef_6[16 * c0 + 4 * c3 + c6, 96 * c2 + 8 * c4, 256 * c1 + c5] and replace it with, for example, the following memory access Stmt_17[c0, c1, c2, c3, c4, c5, c6, c7] -> MemRef_x[4 * (c5 + 256 * c3) + c6]>> I'm not sure whether it's possible >> to do it in ScheduleOptimizer. Another possible issue could arise >> during import from JSCoPs. In this case, we probably shouldn't modify >> imported access relations. > > You mean we should not overwrite the changes that have been imported? I > think overwriting them is OK. If the user prefers to not run schedule > optimizations after the input, he can always disable the schedule > optimizer when importing changes.OK.>> 2. What should we do to change an array referenced by a memory access? >> >> If I'm not mistaken, we can set the value of MemoryAccess::BasePtr to do >> it. > > To just make it work, it should be sufficient to set an access function > that has an isl_id that points to a different ScopArrayInfo object. > > See: IslExprBuilder::createAccessAddress > > BaseExpr = isl_ast_expr_get_op_arg(Expr, 0); > BaseId = isl_ast_expr_get_id(BaseExpr); > isl_ast_expr_free(BaseExpr); > > const ScopArrayInfo *SAI = ScopArrayInfo::getFromId(BaseId); > Base = SAI->getBasePtr(); > > Now, just resetting the access function leaves the MemoryAccess in a > rather inconsistent state. It would be good if we could think about what > is needed to provide a consistent interface for such kind of > functionality.What do you mean by an inconsistent state? Is it a wrong value of, for example, the following fields: MemoryAccess::BaseName, MemoryAccess::Sizes, MemoryAccess::ElementType?> Also, having jscop test cases to test this functionality would be great.Yes, I think it would be good.>> 3. Could we copy data to a new array in IslNodeBuilder::createMark? >> >> We can probably create mark nodes, which contain references to memory >> accesses that should be modified. Subsequently, using information >> about original and new access relations, IslNodeBuilder::createMark >> would generate functions that perform such copying. > > Why mark nodes?If I'm not mistaken, position of a mark node in a schedule tree determines a place in an isl ast, where it should be generated. I think that it's convenient, if we want to specify where to perform copying.> An option I have been thinking of was to create new - > virtual ScopStmts that have specific semantics. E.g. ones that copy the > content from one array location to another. We could test these using > jscop and then use them in your schedule transformation to implement the > packing.Yes, maybe it's a better option. What are 'new virtual ScopStmts'? Is it a new class, which we should add? Would its objects be created during the work of ScheduleOptimizer?> Michael probably has some opinion here. He played in his delicm work > with some changes that also require more complex modifications of memory > accesses. There is no specific commit (AFAIU), but some changes are > mixed in with his prototype work: > > https://github.com/Meinersbur/polly/commits/delicm2Thank you for the information. I'll look for such changes. -- Cheers, Roman Gareev.