Renato Golin via llvm-dev
2018-Nov-05 17:08 UTC
[llvm-dev] RFC: System (cache, etc.) model for LLVM
On Mon, 5 Nov 2018 at 15:56, David Greene <dag at cray.com> wrote:> The cache interfaces are flexible enough to allow passes to answer > questions like, "how much effective cache is available for this core > (thread, etc.)?" That's a critical question to reason about the > thrashing behavior you mentioned above. > > Knowing the cache line size is important for prefetching and various > other memory operations such as streaming. > > Knowing the number of ways can allow one to guesstimate which memory > accesses are likely to collide in the cache. > > It also happens that all of these parameters are useful for simulation > purposes, which may help projects like llvm-mca.I see. So, IIGIR, initially, this would consolidate the prefetching infrastructure, which is a worthy goal in itself and would require a minimalist implementation for now. But later, vectorisers could use that info, for example, to understand how much would be beneficial to unroll vectorised loops (where total access size should be a multiple of the cache line), etc. Ultimately, simulations would be an interesting use of it, but shouldn't be a driving force for additional features bundled into the initial design.> I'm not quite grasping this. Are you saying that a partcular subtarget > may have multiple "clusters" of big.LITTLE cores and that each cluster > may look different from the others?Yeah, "big.LITTLE" [1] is a marketing name and can mean a bunch of different scenarios. For example: - List of big+little cores seen by the kernel as a single core but actually being two separate cores, and scheduled by the kernel via frequency scaling. - Two entirely separate clusters flipped between all big or all little - Heterogeneous mix, which could have different number of big and little cores with no cache need of coherence between them. Junos have two little and four big, Tegras have one little and four big. There are also other designs with dozens of huge cores plus a tiny core for management purposes. But it's worse, because different releases of the same family can have different core counts, change model (clustered/bundled/heterogeneous) and there's no way to currently represent that in table-gen. Given that the kernel has such a high influence how those cores get scheduled and preempted, I don't think there's any hope that the compiler can do a good job at predicting usage or having any real impact amidst higher level latency, such as context switches and systemcalls. -- cheers, --renato [1] https://en.wikipedia.org/wiki/ARM_big.LITTLE
David Greene via llvm-dev
2018-Nov-05 19:03 UTC
[llvm-dev] RFC: System (cache, etc.) model for LLVM
Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> writes:> So, IIGIR, initially, this would consolidate the prefetching > infrastructure, which is a worthy goal in itself and would require a > minimalist implementation for now.Yes.> But later, vectorisers could use that info, for example, to understand > how much would be beneficial to unroll vectorised loops (where total > access size should be a multiple of the cache line), etc.Exactly!> Ultimately, simulations would be an interesting use of it, but > shouldn't be a driving force for additional features bundled into the > initial design.I agree simulation isn't the primary motivation, but it's a nice side-effect. We use all of these parameters today, so they are useful.> But it's worse, because different releases of the same family can have > different core counts, change model (clustered/bundled/heterogeneous) > and there's no way to currently represent that in table-gen.Yes, this is exactly the SKU problem I mentioned. I don't have a good solution for that other than to say that we've found that a generic model per major subtarget can work well enough across different SKUs. As currently constructed, the model is intended to be a resource for heuristics, so getting things wrong is "just" a performance hit. I guess it would be up to the people interested in a particular target to figure out a reasonable, maintainable way to manage models for possibly many subtargets. This proposal is about providing infrastructure to allow models to be created with not too much effort. It doesn't say anything about what models for a particular target/subtarget should look like. :)> Given that the kernel has such a high influence how those cores get > scheduled and preempted, I don't think there's any hope that the > compiler can do a good job at predicting usage or having any real > impact amidst higher level latency, such as context switches and > systemcalls.Sure. In those cases a model isn't that useful. Not every subtarget needs to have a model. Alternatively, a simple "dumb" model could be used for such targets, setting prefetch parameters, etc. to something not totally outrageous. The prefetcher, for example, would have to check if a model exists. If not, it wouldn't prefetch. -David
Renato Golin via llvm-dev
2018-Nov-05 22:24 UTC
[llvm-dev] RFC: System (cache, etc.) model for LLVM
On Mon, 5 Nov 2018 at 19:04, David Greene <dag at cray.com> wrote:> I guess it would be up to the people interested in a particular target > to figure out a reasonable, maintainable way to manage models for > possibly many subtargets. This proposal is about providing > infrastructure to allow models to be created with not too much effort. > It doesn't say anything about what models for a particular > target/subtarget should look like. :)Exactly! :) -- cheers, --renato