I''ve been pouring through the vdev_cache code and need help understanding the cache contents. Per CR 6437054 the cache was said to only cache meta-data. However, looking at the current code this no longer appears to be the case. Can someone explain what happened here? The idea of only caching metadata is odd to me because when you read-ahead you can''t be certain of what it is you have put in the cache until someone comes knocking for that block. As I understand the current implementation, the per-disk vdev_caches are 10MB LRU. Reads smaller than 16K (cache_max) are inflated to 64K (cache_bshift), anything larger than 16K is therefore un-inflated. The cache does not differentiate between meta-data blocks and data blocks when caching. Is this correct? Thanks. benr. -- This message posted from opensolaris.org
Ben Rockwood wrote:> I''ve been pouring through the vdev_cache code and need help > understanding the cache contents. Per CR 6437054 the cache was said to > only cache meta-data. However, looking at the current code this no > longer appears to be the case. > > Can someone explain what happened here? The idea of only caching > metadata is odd to me because when you read-ahead you can''t be certain > of what it is you have put in the cache until someone comes knocking for > that block. > > As I understand the current implementation, the per-disk vdev_caches are > 10MB LRU. Reads smaller than 16K (cache_max) are inflated to 64K > (cache_bshift), anything larger than 16K is therefore un-inflated. The > cache does not differentiate between meta-data blocks and data blocks > when caching. Is this correct? > > Thanks. > > benr. >Ben, The code you''re looking for is in zio_read_bp_init(): if (!dmu_ot[BP_GET_TYPE(bp)].ot_metadata && BP_GET_LEVEL(bp) == 0) zio->io_flags |= ZIO_FLAG_DONT_CACHE; So any data blocks have the ZIO_FLAG_DONT_CACHE flag enabled and won''t end up in the vdev_cache. Thanks, George
empirical look below... George Wilson wrote:> Ben Rockwood wrote: >> I''ve been pouring through the vdev_cache code and need help >> understanding the cache contents. Per CR 6437054 the cache was said to >> only cache meta-data. However, looking at the current code this no >> longer appears to be the case. >> >> Can someone explain what happened here? The idea of only caching >> metadata is odd to me because when you read-ahead you can''t be certain >> of what it is you have put in the cache until someone comes knocking for >> that block. >> >> As I understand the current implementation, the per-disk vdev_caches are >> 10MB LRU. Reads smaller than 16K (cache_max) are inflated to 64K >> (cache_bshift), anything larger than 16K is therefore un-inflated. The >> cache does not differentiate between meta-data blocks and data blocks >> when caching. Is this correct? >> >> Thanks. >> >> benr. >> > Ben, > > The code you''re looking for is in zio_read_bp_init(): > > if (!dmu_ot[BP_GET_TYPE(bp)].ot_metadata && BP_GET_LEVEL(bp) == 0) > zio->io_flags |= ZIO_FLAG_DONT_CACHE; > > So any data blocks have the ZIO_FLAG_DONT_CACHE flag enabled and won''t > end up in the vdev_cache. >You can also observe the kstats and note that there are far fewer accesses than you would expect to see for any given workload. kstat -n vdev_cache_stats -- richard
George: Thanks, that was exactly what I needed. Richard: Ya, I''d noticed that. Thanks! benr. Richard Elling wrote:> empirical look below... > > George Wilson wrote: >> Ben Rockwood wrote: >>> I''ve been pouring through the vdev_cache code and need help >>> understanding the cache contents. Per CR 6437054 the cache was said to >>> only cache meta-data. However, looking at the current code this no >>> longer appears to be the case. >>> >>> Can someone explain what happened here? The idea of only caching >>> metadata is odd to me because when you read-ahead you can''t be certain >>> of what it is you have put in the cache until someone comes knocking >>> for >>> that block. >>> >>> As I understand the current implementation, the per-disk vdev_caches >>> are >>> 10MB LRU. Reads smaller than 16K (cache_max) are inflated to 64K >>> (cache_bshift), anything larger than 16K is therefore un-inflated. The >>> cache does not differentiate between meta-data blocks and data blocks >>> when caching. Is this correct? >>> >>> Thanks. >>> >>> benr. >>> >> Ben, >> >> The code you''re looking for is in zio_read_bp_init(): >> >> if (!dmu_ot[BP_GET_TYPE(bp)].ot_metadata && BP_GET_LEVEL(bp) >> == 0) >> zio->io_flags |= ZIO_FLAG_DONT_CACHE; >> >> So any data blocks have the ZIO_FLAG_DONT_CACHE flag enabled and >> won''t end up in the vdev_cache. >> > > You can also observe the kstats and note that there are far fewer > accesses than you would expect to see for any given workload. > kstat -n vdev_cache_stats > > -- richard >