search for: dcbt

Displaying 4 results from an estimated 4 matches for "dcbt".

Did you mean: dcb
2018 Nov 01
3
RFC: System (cache, etc.) model for LLVM
...to what is done with cache levels to express this kind > of sharing. We haven't found a need for it but that doesn't mean it > wouldn't be useful for other/new targets. The example above is IBM's Blue Gene/Q processor, so yes, such targets do exist. > > PowerPC's dcbt/dcbtst instruction allows explicitly specifying to the > > hardware which streams it should establish. Do the buffer counts > > include explicitly and automatically established streams? Do > > non-stream accesses (e.g. stack access) count towards > > It's up to the targe...
2018 Nov 02
2
RFC: System (cache, etc.) model for LLVM
...the Blue Gene/Q: What counts as stream is configurable at > > runtime via a hardware register. It supports 3 settings: > > * Interpret every memory access as start of a stream > > * Interpret a stream when there are 2 consecutive cache misses > > * Only establish streams via dcbt instructions. > > I think we're interpreting "streaming" differently. In this design, a > "stream" is a sequence of memory operations that should bypass the cache > because the data will never be reused (at least not in a timely manner). I understood "stre...
2018 Nov 07
3
RFC: System (cache, etc.) model for LLVM
...could > determine how aggressively compilers prefetch. Is that the idea or are > you thinking of something else? I declared streams for the CPU to prefetch (which 'run' at different speeds over the memory), which, at some point in time I can assume to be in the L1P cache. Using the dcbt instruction, the cache line can be lifted from the L1P to the L1 cache, a fixed number of cycles in advance. If the cache line had to be prefetched from L2, the prefetch/access latency would be longer (24 cycles vs 82 cycles). Michael
2018 Nov 01
2
RFC: System (cache, etc.) model for LLVM
...orse. What count's as steam? Some processors may support streams with strides and/or backward stream. Is there a way on which level the number of streams are shared? For instance, a core might be able to track 16 streams, but if 4 threads are running (SMT), each can only use 4. PowerPC's dcbt/dcbtst instruction allows explicitly specifying to the hardware which streams it should establish. Do the buffer counts include explicitly and automatically established streams? Do non-stream accesses (e.g. stack access) count towards > class TargetMemorySystemInfo { > const TargetCac...