Roman Gareev via llvm-dev
2016-Jun-02 11:57 UTC
[llvm-dev] [GSoC 2016] Parameters of a target architecture
Dear LLVM contributors, I work on the "Improvement of vectorization process in Polly". At the moment I'm trying to implement tiling, interchanging and unrolling of specific loops based on the following algorithm for the analytical modeling [1]. It requires information about the following parameters of a target architecture: 1. Size of double-precision floating-point number. 2. Number of double-precision floating-point numbers that can be hold by a vector register. 3. Throughput of vector instructions per clock cycle. 4. Latency of instructions (i.e., the minimum number of cycles between the issuance of two dependent consecutive instructions). 5. Paramaters of cache levels (size of cache lines, associativity degrees, sizes). Could you please advise me where I can find such information? If I'm not mistaken, we can get the size of a cache line and the width of the largest vector register (which probably helps to determine the second parameter) from TargetTransformInfo.h. I would be very grateful for your comments, feedback and ideas. Refs.: [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf -- Cheers, Roman Gareev.
Michael Kruse via llvm-dev
2016-Jun-03 11:41 UTC
[llvm-dev] [GSoC 2016] Parameters of a target architecture
2016-06-02 13:57 GMT+02:00 Roman Gareev via llvm-dev <llvm-dev at lists.llvm.org>:> 1. Size of double-precision floating-point number.By IEEE 754, its always 8 bytes. Generally, use DataLayout::getTypeAllocSize/getTypeStoreSize to get a type's size.> 2. Number of double-precision floating-point numbers that can be hold > by a vector register.TargetTransformInfo::getRegisterBitWidth (divided by 8 if double)> 3. Throughput of vector instructions per clock cycle.TargetTransformInfo::getArithmeticInstrCost See LoopVectorizationCostModel::getInstructionCost for how to use it.> 4. Latency of instructions (i.e., the minimum number of cycles between > the issuance of two dependent consecutive instructions).I think latency and throughput cannot be queried separately. They are combined as 'cost'.> 5. Paramaters of cache levels (size of cache lines, associativity > degrees, sizes).TargetTransformInfo::getCacheLineSize That is, available for one level (probably L1 Data Cache) only. I think the X86 backend doesn't even define it (returns 0) For the information that is missing, I suggest to use command line options to get the information directly from the user. In the long term, we could add it TargetTransformInfo as well. Michael
Tobias Grosser via llvm-dev
2016-Jun-03 11:51 UTC
[llvm-dev] [GSoC 2016] Parameters of a target architecture
On 06/03/2016 01:41 PM, Michael Kruse wrote:> 2016-06-02 13:57 GMT+02:00 Roman Gareev via llvm-dev <llvm-dev at lists.llvm.org>: >> 1. Size of double-precision floating-point number. > > By IEEE 754, its always 8 bytes. > > Generally, use DataLayout::getTypeAllocSize/getTypeStoreSize to get a > type's size. > > >> 2. Number of double-precision floating-point numbers that can be hold >> by a vector register. > > TargetTransformInfo::getRegisterBitWidth (divided by 8 if double) > > >> 3. Throughput of vector instructions per clock cycle. > > TargetTransformInfo::getArithmeticInstrCost > > See LoopVectorizationCostModel::getInstructionCost for how to use it. > > >> 4. Latency of instructions (i.e., the minimum number of cycles between >> the issuance of two dependent consecutive instructions). > > I think latency and throughput cannot be queried separately. They are > combined as 'cost'. > > >> 5. Paramaters of cache levels (size of cache lines, associativity >> degrees, sizes). > > TargetTransformInfo::getCacheLineSize > > That is, available for one level (probably L1 Data Cache) only. I > think the X86 backend doesn't even define it (returns 0) > > > For the information that is missing, I suggest to use command line > options to get the information directly from the user. In the long > term, we could add it TargetTransformInfo as well.Perfect. That's what I would suggest as well. Best, Tobias
Michael Kruse via llvm-dev
2016-Jun-03 12:25 UTC
[llvm-dev] [GSoC 2016] Parameters of a target architecture
2016-06-03 13:41 GMT+02:00 Michael Kruse <llvmdev at meinersbur.de>:>> 2. Number of double-precision floating-point numbers that can be hold >> by a vector register. > > TargetTransformInfo::getRegisterBitWidth (divided by 8 if double)Because it is getRegister_Bit_Width, divide by 64, respectively by DataLayout::getTypeSizeInBits() Sorry for the mistake. Michael
Roman Gareev via llvm-dev
2016-Jun-03 16:25 UTC
[llvm-dev] [GSoC 2016] Parameters of a target architecture
2016-06-03 16:41 GMT+05:00 Michael Kruse <llvmdev at meinersbur.de>:> 2016-06-02 13:57 GMT+02:00 Roman Gareev via llvm-dev <llvm-dev at lists.llvm.org>: >> 1. Size of double-precision floating-point number. > > By IEEE 754, its always 8 bytes. > > Generally, use DataLayout::getTypeAllocSize/getTypeStoreSize to get a > type's size. > > >> 2. Number of double-precision floating-point numbers that can be hold >> by a vector register. > > TargetTransformInfo::getRegisterBitWidth (divided by 8 if double) > > >> 3. Throughput of vector instructions per clock cycle. > > TargetTransformInfo::getArithmeticInstrCost > > See LoopVectorizationCostModel::getInstructionCost for how to use it. > > >> 4. Latency of instructions (i.e., the minimum number of cycles between >> the issuance of two dependent consecutive instructions). > > I think latency and throughput cannot be queried separately. They are > combined as 'cost'. > > >> 5. Paramaters of cache levels (size of cache lines, associativity >> degrees, sizes). > > TargetTransformInfo::getCacheLineSize > > That is, available for one level (probably L1 Data Cache) only. I > think the X86 backend doesn't even define it (returns 0) > > > For the information that is missing, I suggest to use command line > options to get the information directly from the user. In the long > term, we could add it TargetTransformInfo as well.Thank you very much for the detailed information and ideas! -- Cheers, Roman Gareev.
Possibly Parallel Threads
- [LLVMdev] Instruction Cost
- [GSoC 2016] [Polly] Implementation of tiling, interchanging and unrolling of specific loops based on the algorithm for the analytical modeling
- [LLVMdev] Limit loop vectorizer to SSE
- [LLVMdev] Instruction Cost
- [LLVMdev] Limit loop vectorizer to SSE