For a while now we (Cray) have had some very primitive cache structure information encoded into our version of LLVM. Given the more complex memory structures introduced by Bulldozer and various accelerators, it's time to do this Right (tm). So I'm looking for some feedback on a proposed design. The goal of this work is to provide Passes with useful information such as cache sizes, resource sharing arrangements, etc. so that they may do transformations to improve memory system performance. Here's what I'm thinking this might look like: - Add two new structures to the TargetMachine class: TargetMemoryInfo and TargetExecutionEngineInfo. - TargetMemoryInfo will initially contain cache hierarchy information. It will contain a list of CacheLevelInfo objects, each of which will specify at least the total size of the cache at that level. It may also include other useful bits like associativity, inclusivity, etc. - TargetMemoryInfo could be extended with information about various "special" memory regions such as local, shared, etc. memory typical on accelerators. This should tie into the address space mechanism somehow. - TargetExecutionEngineInfo (probably need a better name) will contain a list of ExecutionResourceInfo objects, such as threads, cores, modules, sockets, etc. For example, for a Bulldozer-based system, we would have a set of cores contained in a module, a set of modules contained in a socket and so on. - Each ExecutionResourceInfo object would contain a name to identify the grouping ("thread," "core," etc.) along with information about the number of execution resources it contains. For example, a "core" object might specify that it contains two "threads." - ExecutionResourceInfo objects would also contain links to CacheLevelInfo objects to model how the various levels of cache are shared. For example, on a Bulldozer system the "core" object would have a link to the L1 CacheLevelInfo object, indicating that L1 is private to a "core." A "module" object would have a link to the L2 CacheLevelInfo object, indicating that it is private to a "module" but shared by "cores" within the "module" and so on. I don't particularly like the names TargetExecutionEngineInfo and ExecutionResourceInfo but couldn't come up with anything better. Any ideas? Does this seem like a reasonable approach? -Dave
On Tue, May 3, 2011 at 8:40 AM, David Greene <dag at cray.com> wrote:> For a while now we (Cray) have had some very primitive cache structure > information encoded into our version of LLVM. Given the more complex > memory structures introduced by Bulldozer and various accelerators, it's > time to do this Right (tm). > > So I'm looking for some feedback on a proposed design. > > The goal of this work is to provide Passes with useful information such > as cache sizes, resource sharing arrangements, etc. so that they may do > transformations to improve memory system performance. > > Here's what I'm thinking this might look like: > > - Add two new structures to the TargetMachine class: TargetMemoryInfo > and TargetExecutionEngineInfo. > > - TargetMemoryInfo will initially contain cache hierarchy information. > It will contain a list of CacheLevelInfo objects, each of which will > specify at least the total size of the cache at that level. It may > also include other useful bits like associativity, inclusivity, etc. > > - TargetMemoryInfo could be extended with information about various > "special" memory regions such as local, shared, etc. memory typical on > accelerators. This should tie into the address space mechanism > somehow. > > - TargetExecutionEngineInfo (probably need a better name) will contain a > list of ExecutionResourceInfo objects, such as threads, cores, > modules, sockets, etc. For example, for a Bulldozer-based system, we > would have a set of cores contained in a module, a set of modules > contained in a socket and so on. > > - Each ExecutionResourceInfo object would contain a name to identify the > grouping ("thread," "core," etc.) along with information about the > number of execution resources it contains. For example, a "core" > object might specify that it contains two "threads." > > - ExecutionResourceInfo objects would also contain links to > CacheLevelInfo objects to model how the various levels of cache are > shared. For example, on a Bulldozer system the "core" object would > have a link to the L1 CacheLevelInfo object, indicating that L1 is > private to a "core." A "module" object would have a link to the L2 > CacheLevelInfo object, indicating that it is private to a "module" but > shared by "cores" within the "module" and so on. > > I don't particularly like the names TargetExecutionEngineInfo and > ExecutionResourceInfo but couldn't come up with anything better. Any > ideas? > > Does this seem like a reasonable approach?The names and the exact information stored don't seem like they really need review; it's easy to change later. Just two questions: 1. What is the expected use? Are we talking about loop optimizations here? 2. IR-level passes don't have access to a TargetMachine; is that okay? -Eli
hi, On Wed, May 4, 2011 at 12:20 AM, Eli Friedman <eli.friedman at gmail.com> wrote:> On Tue, May 3, 2011 at 8:40 AM, David Greene <dag at cray.com> wrote: >> For a while now we (Cray) have had some very primitive cache structure >> information encoded into our version of LLVM. Given the more complex >> memory structures introduced by Bulldozer and various accelerators, it's >> time to do this Right (tm). >> >> So I'm looking for some feedback on a proposed design. >> >> The goal of this work is to provide Passes with useful information such >> as cache sizes, resource sharing arrangements, etc. so that they may do >> transformations to improve memory system performance. >> >> Here's what I'm thinking this might look like: >> >> - Add two new structures to the TargetMachine class: TargetMemoryInfo >> and TargetExecutionEngineInfo. >> >> - TargetMemoryInfo will initially contain cache hierarchy information. >> It will contain a list of CacheLevelInfo objects, each of which will >> specify at least the total size of the cache at that level. It may >> also include other useful bits like associativity, inclusivity, etc. >> >> - TargetMemoryInfo could be extended with information about various >> "special" memory regions such as local, shared, etc. memory typical on >> accelerators. This should tie into the address space mechanism >> somehow. >> >> - TargetExecutionEngineInfo (probably need a better name) will contain a >> list of ExecutionResourceInfo objects, such as threads, cores, >> modules, sockets, etc. For example, for a Bulldozer-based system, we >> would have a set of cores contained in a module, a set of modules >> contained in a socket and so on. >> >> - Each ExecutionResourceInfo object would contain a name to identify the >> grouping ("thread," "core," etc.) along with information about the >> number of execution resources it contains. For example, a "core" >> object might specify that it contains two "threads." >> >> - ExecutionResourceInfo objects would also contain links to >> CacheLevelInfo objects to model how the various levels of cache are >> shared. For example, on a Bulldozer system the "core" object would >> have a link to the L1 CacheLevelInfo object, indicating that L1 is >> private to a "core." A "module" object would have a link to the L2 >> CacheLevelInfo object, indicating that it is private to a "module" but >> shared by "cores" within the "module" and so on. >> >> I don't particularly like the names TargetExecutionEngineInfo and >> ExecutionResourceInfo but couldn't come up with anything better. Any >> ideas? >> >> Does this seem like a reasonable approach? > > The names and the exact information stored don't seem like they really > need review; it's easy to change later. Just two questions: > > 1. What is the expected use? Are we talking about loop optimizations here? > 2. IR-level passes don't have access to a TargetMachine; is that okay?I think they can implement as immutable pass, just like TargetData. best regards ether> > -Eli > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Eli Friedman <eli.friedman at gmail.com> writes:> On Tue, May 3, 2011 at 8:40 AM, David Greene <dag at cray.com> wrote: > 1. What is the expected use? Are we talking about loop optimizations here?Initially, anything that is interested in cache configuration would find this useful. This might include: - cache blocking - prefetching - reuse analysis I think mostly it would be loop-level stuff (that's where the time is spent, after all) but I also know of various papers that do IPO cache-related analysis and transformation.> 2. IR-level passes don't have access to a TargetMachine; is that okay?I thought about that too. I don't know any better place to put it because this is very (sub)target-specific stuff. I think in the future we may want to consider a generic interface for IR-level passes to query some target-specific parameters that are generally useful. Cache structure would be one. Our (Cray) current uses are all in Machine-level passes but that's because most of our analysis and transformation is done outside LLVM. Mostly I'm concerned about getting the abstraction right. Or at least reasonable. :) -Dave
Hi Dave, Can you describe which passes may benefit from this information ? My intuition is that until there are a number of passes which require this information, there are other ways to provide this information. One way would be to use Metadata. Having said that, I do share the feeling that IR-level optimization often need more target-specific information. For example, vectorizing compilers need to know which instructions set the target has, etc. To this end, we have implemented a new 'instcombine-like' pass which has optimizations which should have gone into 'instcombine' had we had more information about the target. Nadav -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of David Greene Sent: Tuesday, May 03, 2011 18:41 To: llvmdev at cs.uiuc.edu Subject: [LLVMdev] Memory Subsystem Representation For a while now we (Cray) have had some very primitive cache structure information encoded into our version of LLVM. Given the more complex memory structures introduced by Bulldozer and various accelerators, it's time to do this Right (tm). So I'm looking for some feedback on a proposed design. The goal of this work is to provide Passes with useful information such as cache sizes, resource sharing arrangements, etc. so that they may do transformations to improve memory system performance. Here's what I'm thinking this might look like: - Add two new structures to the TargetMachine class: TargetMemoryInfo and TargetExecutionEngineInfo. - TargetMemoryInfo will initially contain cache hierarchy information. It will contain a list of CacheLevelInfo objects, each of which will specify at least the total size of the cache at that level. It may also include other useful bits like associativity, inclusivity, etc. - TargetMemoryInfo could be extended with information about various "special" memory regions such as local, shared, etc. memory typical on accelerators. This should tie into the address space mechanism somehow. - TargetExecutionEngineInfo (probably need a better name) will contain a list of ExecutionResourceInfo objects, such as threads, cores, modules, sockets, etc. For example, for a Bulldozer-based system, we would have a set of cores contained in a module, a set of modules contained in a socket and so on. - Each ExecutionResourceInfo object would contain a name to identify the grouping ("thread," "core," etc.) along with information about the number of execution resources it contains. For example, a "core" object might specify that it contains two "threads." - ExecutionResourceInfo objects would also contain links to CacheLevelInfo objects to model how the various levels of cache are shared. For example, on a Bulldozer system the "core" object would have a link to the L1 CacheLevelInfo object, indicating that L1 is private to a "core." A "module" object would have a link to the L2 CacheLevelInfo object, indicating that it is private to a "module" but shared by "cores" within the "module" and so on. I don't particularly like the names TargetExecutionEngineInfo and ExecutionResourceInfo but couldn't come up with anything better. Any ideas? Does this seem like a reasonable approach? -Dave _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
"Rotem, Nadav" <nadav.rotem at intel.com> writes:> Can you describe which passes may benefit from this information ? My > intuition is that until there are a number of passes which require > this information, there are other ways to provide this > information. One way would be to use Metadata.We have Cray-specific passes that use this information. Some of the stuff Polly is doing almost certainly would benefit. Metadata seems a very clunky way to do this. It is so target-specific that it would render IR files completely target-dependent. These are rather complex structures we're talking about. Encoding it in metadata would be inconvenient.> Having said that, I do share the feeling that IR-level optimization > often need more target-specific information. For example, vectorizing > compilers need to know which instructions set the target has, etc.Yep, absolutely.> To this end, we have implemented a new 'instcombine-like' pass which > has optimizations which should have gone into 'instcombine' had we had > more information about the target.Right. Exposing some target attributes via generic queries (e.g. what's the max vector length for this scalar type?) has been on my wishlist for a while now. -Dave
Possibly Parallel Threads
- [LLVMdev] Memory Subsystem Representation
- [LLVMdev] Memory Subsystem Representation
- [LLVMdev] Memory Subsystem Representation
- [LLVMdev] Memory Subsystem Representation
- [LLVMdev] [llvm-commits] [patch] "TargetTransform" as an API between codegen and IR-level passes