thr3ads.net - llvm dev - [llvm-dev] [GSoC 2016] Parameters of a target architecture [Jun 2016]

If this information is useful, please help other people find it:
Share via:

Roman Gareev via llvm-dev

2016-Jun-02 11:57 UTC

[llvm-dev] [GSoC 2016] Parameters of a target architecture

Dear LLVM contributors,

I work on the "Improvement of vectorization process in Polly". At the
moment I'm trying to implement tiling, interchanging and unrolling of
specific loops based on the following algorithm for the analytical
modeling [1]. It requires information about the following parameters
of a target architecture:

1. Size of double-precision floating-point number.

2. Number of double-precision floating-point numbers that can be hold
by a vector register.

3. Throughput of vector instructions per clock cycle.

4. Latency of instructions (i.e., the minimum number of cycles between
the issuance of two dependent consecutive instructions).

5. Paramaters of cache levels (size of cache lines,  associativity
degrees, sizes).

Could you please advise me where I can find such information? If I'm
not mistaken, we can get the size of a cache line and the width of the
largest vector register (which probably helps to determine the second
parameter) from TargetTransformInfo.h.

I would be very grateful for your comments, feedback and ideas.

Refs.:

[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf

-- 
                                    Cheers, Roman Gareev.

Michael Kruse via llvm-dev

2016-Jun-03 11:41 UTC

head link

[llvm-dev] [GSoC 2016] Parameters of a target architecture

2016-06-02 13:57 GMT+02:00 Roman Gareev via llvm-dev <llvm-dev at
lists.llvm.org>:> 1. Size of double-precision floating-point number.
By IEEE 754, its always 8 bytes.

Generally, use DataLayout::getTypeAllocSize/getTypeStoreSize to get a
type's size.

> 2. Number of double-precision floating-point numbers that can be hold
> by a vector register.
TargetTransformInfo::getRegisterBitWidth (divided by 8 if double)

> 3. Throughput of vector instructions per clock cycle.
TargetTransformInfo::getArithmeticInstrCost

See LoopVectorizationCostModel::getInstructionCost for how to use it.

> 4. Latency of instructions (i.e., the minimum number of cycles between
> the issuance of two dependent consecutive instructions).
I think latency and throughput cannot be queried separately. They are
combined as 'cost'.

> 5. Paramaters of cache levels (size of cache lines,  associativity
> degrees, sizes).
TargetTransformInfo::getCacheLineSize

That is, available for one level (probably L1 Data Cache) only. I
think the X86 backend doesn't even define it (returns 0)


For the information that is missing, I suggest to use command line
options to get the information directly from the user. In the long
term, we could add it TargetTransformInfo as well.


Michael

Tobias Grosser via llvm-dev

2016-Jun-03 11:51 UTC

head link

[llvm-dev] [GSoC 2016] Parameters of a target architecture

On 06/03/2016 01:41 PM, Michael Kruse wrote:> 2016-06-02 13:57 GMT+02:00 Roman Gareev via llvm-dev <llvm-dev at
lists.llvm.org>:
>> 1. Size of double-precision floating-point number.
> 
> By IEEE 754, its always 8 bytes.
> 
> Generally, use DataLayout::getTypeAllocSize/getTypeStoreSize to get a
> type's size.
> 
> 
>> 2. Number of double-precision floating-point numbers that can be hold
>> by a vector register.
> 
> TargetTransformInfo::getRegisterBitWidth (divided by 8 if double)
> 
> 
>> 3. Throughput of vector instructions per clock cycle.
> 
> TargetTransformInfo::getArithmeticInstrCost
> 
> See LoopVectorizationCostModel::getInstructionCost for how to use it.
> 
> 
>> 4. Latency of instructions (i.e., the minimum number of cycles between
>> the issuance of two dependent consecutive instructions).
> 
> I think latency and throughput cannot be queried separately. They are
> combined as 'cost'.
> 
> 
>> 5. Paramaters of cache levels (size of cache lines,  associativity
>> degrees, sizes).
> 
> TargetTransformInfo::getCacheLineSize
> 
> That is, available for one level (probably L1 Data Cache) only. I
> think the X86 backend doesn't even define it (returns 0)
> 
> 
> For the information that is missing, I suggest to use command line
> options to get the information directly from the user. In the long
> term, we could add it TargetTransformInfo as well.
Perfect. That's what I would suggest as well.

Best,
Tobias

Michael Kruse via llvm-dev

2016-Jun-03 12:25 UTC

head link

[llvm-dev] [GSoC 2016] Parameters of a target architecture

2016-06-03 13:41 GMT+02:00 Michael Kruse <llvmdev at
meinersbur.de>:>> 2. Number of double-precision floating-point numbers that can be hold
>> by a vector register.
>
> TargetTransformInfo::getRegisterBitWidth (divided by 8 if double)
Because it is getRegister_Bit_Width, divide by 64, respectively by
DataLayout::getTypeSizeInBits()

Sorry for the mistake.


Michael

Roman Gareev via llvm-dev

2016-Jun-03 16:25 UTC

head link

[llvm-dev] [GSoC 2016] Parameters of a target architecture

2016-06-03 16:41 GMT+05:00 Michael Kruse <llvmdev at
meinersbur.de>:> 2016-06-02 13:57 GMT+02:00 Roman Gareev via llvm-dev <llvm-dev at
lists.llvm.org>:
>> 1. Size of double-precision floating-point number.
>
> By IEEE 754, its always 8 bytes.
>
> Generally, use DataLayout::getTypeAllocSize/getTypeStoreSize to get a
> type's size.
>
>
>> 2. Number of double-precision floating-point numbers that can be hold
>> by a vector register.
>
> TargetTransformInfo::getRegisterBitWidth (divided by 8 if double)
>
>
>> 3. Throughput of vector instructions per clock cycle.
>
> TargetTransformInfo::getArithmeticInstrCost
>
> See LoopVectorizationCostModel::getInstructionCost for how to use it.
>
>
>> 4. Latency of instructions (i.e., the minimum number of cycles between
>> the issuance of two dependent consecutive instructions).
>
> I think latency and throughput cannot be queried separately. They are
> combined as 'cost'.
>
>
>> 5. Paramaters of cache levels (size of cache lines,  associativity
>> degrees, sizes).
>
> TargetTransformInfo::getCacheLineSize
>
> That is, available for one level (probably L1 Data Cache) only. I
> think the X86 backend doesn't even define it (returns 0)
>
>
> For the information that is missing, I suggest to use command line
> options to get the information directly from the user. In the long
> term, we could add it TargetTransformInfo as well.
Thank you very much for the detailed information and ideas!

-- 
                                    Cheers, Roman Gareev.

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Jun 2016 - [GSoC 2016] Parameters of a target architecture

[llvm-dev] [GSoC 2016] Parameters of a target architecture

[llvm-dev] [GSoC 2016] Parameters of a target architecture

[llvm-dev] [GSoC 2016] Parameters of a target architecture

[llvm-dev] [GSoC 2016] Parameters of a target architecture

[llvm-dev] [GSoC 2016] Parameters of a target architecture

Maybe Matching Threads