Displaying 4 results from an estimated 4 matches for "dcbt".
Did you mean:
dcb
2018 Nov 01
3
RFC: System (cache, etc.) model for LLVM
...to what is done with cache levels to express this kind
> of sharing. We haven't found a need for it but that doesn't mean it
> wouldn't be useful for other/new targets.
The example above is IBM's Blue Gene/Q processor, so yes, such targets do exist.
> > PowerPC's dcbt/dcbtst instruction allows explicitly specifying to the
> > hardware which streams it should establish. Do the buffer counts
> > include explicitly and automatically established streams? Do
> > non-stream accesses (e.g. stack access) count towards
>
> It's up to the targe...
2018 Nov 02
2
RFC: System (cache, etc.) model for LLVM
...the Blue Gene/Q: What counts as stream is configurable at
> > runtime via a hardware register. It supports 3 settings:
> > * Interpret every memory access as start of a stream
> > * Interpret a stream when there are 2 consecutive cache misses
> > * Only establish streams via dcbt instructions.
>
> I think we're interpreting "streaming" differently. In this design, a
> "stream" is a sequence of memory operations that should bypass the cache
> because the data will never be reused (at least not in a timely manner).
I understood "stre...
2018 Nov 07
3
RFC: System (cache, etc.) model for LLVM
...could
> determine how aggressively compilers prefetch. Is that the idea or are
> you thinking of something else?
I declared streams for the CPU to prefetch (which 'run' at different
speeds over the memory), which, at some point in time I can assume to
be in the L1P cache. Using the dcbt instruction, the cache line can be
lifted from the L1P to the L1 cache, a fixed number of cycles in
advance. If the cache line had to be prefetched from L2, the
prefetch/access latency would be longer (24 cycles vs 82 cycles).
Michael
2018 Nov 01
2
RFC: System (cache, etc.) model for LLVM
...orse.
What count's as steam? Some processors may support streams with
strides and/or backward stream.
Is there a way on which level the number of streams are shared? For
instance, a core might be able to track 16 streams, but if 4 threads
are running (SMT), each can only use 4.
PowerPC's dcbt/dcbtst instruction allows explicitly specifying to the
hardware which streams it should establish. Do the buffer counts
include explicitly and automatically established streams? Do
non-stream accesses (e.g. stack access) count towards
> class TargetMemorySystemInfo {
> const TargetCac...