thr3ads.net - llvm dev - [llvm-dev] RFC: System (cache, etc.) model for LLVM [Nov 2018]

If this information is useful, please help other people find it:
Share via:

Renato Golin via llvm-dev

2018-Nov-02 20:06 UTC

[llvm-dev] RFC: System (cache, etc.) model for LLVM

Hey,

I've been reading back the thread and there's a lot of ideas flying
around, I may have missed more than I should, but here's my view on
it.

First, I think this is a good idea.

Mapping caches is certainly interesting to general architectures, but
particularly important to massive operations like matrix multiply and
stencils can pull a lot of data into cache and sometimes thrash it if
not careful.

With scalable and larger vectors, this will be even more important.

Overall, I think this is a good idea, but the current proposal is too
detailed on the implementation and not enough on the use for me to
have a good idea how and where this will be used.

Can you describe a few situations where these new interfaces would be
used and how?

Some comments inline.

On Thu, 1 Nov 2018 at 21:56, David Greene via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Ok.  I would like to start posting patches for review without
> speculating too much on fancy/exotic things that may come later.  We
> shouldn't do anything that precludes extensions but I don't want to
get
> bogged down in a lot of details on things related to a small number of
> targets.  Let's get the really common stuff in first.  What do you
> think?
In theory, both big and little cores should have the same cache
structure, so we don't necessarily need extra descriptions for both.

In practice, sub-architectures can have multiple combinations of
big.LITTLE cores and it's simply not practical to add that to
table-gen.

-- 
cheers,
--renato

David Greene via llvm-dev

2018-Nov-05 15:56 UTC

head link

[llvm-dev] RFC: System (cache, etc.) model for LLVM

Renato Golin <renato.golin at linaro.org> writes:
> Mapping caches is certainly interesting to general architectures, but
> particularly important to massive operations like matrix multiply and
> stencils can pull a lot of data into cache and sometimes thrash it if
> not careful.
Exactly right.
> With scalable and larger vectors, this will be even more important.
True.
> Overall, I think this is a good idea, but the current proposal is too
> detailed on the implementation and not enough on the use for me to
> have a good idea how and where this will be used.
>
> Can you describe a few situations where these new interfaces would be
> used and how?
Sure.  The prefetching interfaces are already used, though in a
different form, by the LoopDataPrefetch pass.

The cache interfaces are flexible enough to allow passes to answer
questions like, "how much effective cache is available for this core
(thread, etc.)?"  That's a critical question to reason about the
thrashing behavior you mentioned above.

Knowing the cache line size is important for prefetching and various
other memory operations such as streaming.

Knowing the number of ways can allow one to guesstimate which memory
accesses are likely to collide in the cache.

It also happens that all of these parameters are useful for simulation
purposes, which may help projects like llvm-mca.
> On Thu, 1 Nov 2018 at 21:56, David Greene via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Ok.  I would like to start posting patches for review without
>> speculating too much on fancy/exotic things that may come later.  We
>> shouldn't do anything that precludes extensions but I don't
want to get
>> bogged down in a lot of details on things related to a small number of
>> targets.  Let's get the really common stuff in first.  What do you
>> think?
>
> In theory, both big and little cores should have the same cache
> structure, so we don't necessarily need extra descriptions for both.
>
> In practice, sub-architectures can have multiple combinations of
> big.LITTLE cores and it's simply not practical to add that to
> table-gen.
I'm not quite grasping this.  Are you saying that a partcular subtarget
may have multiple "clusters" of big.LITTLE cores and that each cluster
may look different from the others?

                                 -David

Renato Golin via llvm-dev

2018-Nov-05 17:08 UTC

head link

[llvm-dev] RFC: System (cache, etc.) model for LLVM

On Mon, 5 Nov 2018 at 15:56, David Greene <dag at cray.com>
wrote:> The cache interfaces are flexible enough to allow passes to answer
> questions like, "how much effective cache is available for this core
> (thread, etc.)?"  That's a critical question to reason about the
> thrashing behavior you mentioned above.
>
> Knowing the cache line size is important for prefetching and various
> other memory operations such as streaming.
>
> Knowing the number of ways can allow one to guesstimate which memory
> accesses are likely to collide in the cache.
>
> It also happens that all of these parameters are useful for simulation
> purposes, which may help projects like llvm-mca.
I see.

So, IIGIR, initially, this would consolidate the prefetching
infrastructure, which is a worthy goal in itself and would require a
minimalist implementation for now.

But later, vectorisers could use that info, for example, to understand
how much would be beneficial to unroll vectorised loops (where total
access size should be a multiple of the cache line), etc.

Ultimately, simulations would be an interesting use of it, but
shouldn't be a driving force for additional features bundled into the
initial design.

> I'm not quite grasping this.  Are you saying that a partcular subtarget
> may have multiple "clusters" of big.LITTLE cores and that each
cluster
> may look different from the others?
Yeah, "big.LITTLE" [1] is a marketing name and can mean a bunch of
different scenarios.

For example:
 - List of big+little cores seen by the kernel as a single core but
actually being two separate cores, and scheduled by the kernel via
frequency scaling.
 - Two entirely separate clusters flipped between all big or all little
 - Heterogeneous mix, which could have different number of big and
little cores with no cache need of coherence between them. Junos have
two little and four big, Tegras have one little and four big. There
are also other designs with dozens of huge cores plus a tiny core for
management purposes.

But it's worse, because different releases of the same family can have
different core counts, change model (clustered/bundled/heterogeneous)
and there's no way to currently represent that in table-gen.

Given that the kernel has such a high influence how those cores get
scheduled and preempted, I don't think there's any hope that the
compiler can do a good job at predicting usage or having any real
impact amidst higher level latency, such as context switches and
systemcalls.

-- 
cheers,
--renato

[1] https://en.wikipedia.org/wiki/ARM_big.LITTLE

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Nov 2018 - RFC: System (cache, etc.) model for LLVM

[llvm-dev] RFC: System (cache, etc.) model for LLVM

[llvm-dev] RFC: System (cache, etc.) model for LLVM

[llvm-dev] RFC: System (cache, etc.) model for LLVM

Apparently Analagous Threads