thr3ads.net - zfs discuss - [zfs-discuss] zfs read-ahead and L2ARC [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Jim Klimov

2012-Jan-07 16:59 UTC

[zfs-discuss] zfs read-ahead and L2ARC

I wonder if it is possible (currently or in the future as an RFE)
to tell ZFS to automatically read-ahead some files and cache them
in RAM and/or L2ARC?

One use-case would be for Home-NAS setups where multimedia (video
files or catalogs of images/music) are viewed form a ZFS box. For
example, if a user wants to watch a film, or listen to a playlist
of MP3''s, or push photos to a wall display (photo frame, etc.),
the storage box "should" read-ahead all required data from HDDs
and save it in ARC/L2ARC. Then the HDDs can spin down for hours
while the pre-fetched gigabytes of data are used by consumers
from the cache. End-users get peace, quiet and less electricity
used while they enjoy their multimedia entertainment ;)

Is it possible? If not, how hard would it be to implement?

In terms of scripting, would it suffice to detect reads (i.e.
with DTrace) and read the files to /dev/null to get them cached
along with all required metadata (so that mechanical HDDs are
not required for reads afterwards)?

Thanks,
//Jim Klimov

Edward Ned Harvey

2012-Jan-08 14:30 UTC

head link

[zfs-discuss] zfs read-ahead and L2ARC

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Jim Klimov
> 
> I wonder if it is possible (currently or in the future as an RFE)
> to tell ZFS to automatically read-ahead some files and cache them
> in RAM and/or L2ARC?
> 
> One use-case would be for Home-NAS setups where multimedia (video
> files or catalogs of images/music) are viewed form a ZFS box. For
> example, if a user wants to watch a film, or listen to a playlist
> of MP3''s, or push photos to a wall display (photo frame, etc.),
> the storage box "should" read-ahead all required data from HDDs
> and save it in ARC/L2ARC. Then the HDDs can spin down for hours
> while the pre-fetched gigabytes of data are used by consumers
> from the cache. End-users get peace, quiet and less electricity
> used while they enjoy their multimedia entertainment ;)
This whole subject is important and useful - and not unique to ZFS.  The
whole question is, how can the system predict which things are going to be
requested next?

In the case of a video - there''s a big file which is likely to be read
sequentially.  I don''t know how far readahead currently will read
ahead, but
it is surely only smart enough to stay within a single file.  If the
readahead buffer starts to get low, and the disks have been spun down, I
don''t know how low the buffer gets before it will trigger more
readahead.
But at least in the case of streaming video files, there''s a very
realistic
possibility that something like the existing readahead can do what you want.

In the case of your MP3 collection...  Probably the only thing you can do is
to write a script which will simply go read all the files you predict will
be read soon.  The key here is the prediction - There''s no way ZFS or
solaris, or any other OS in the present day is going to intelligently
predict which files you''ll be requesting soon.  But you, the user, who
knows
your usage patterns, might be able to make these predictions and request to
cache them.  The request is simply - telling the system to start reading
those files now.  So it''s very easy to cache, as long as you know what
to
cache.

John Martin

2012-Jan-08 15:15 UTC

head link

[zfs-discuss] zfs read-ahead and L2ARC

On 01/08/12 09:30, Edward Ned Harvey wrote:
> In the case of your MP3 collection...  Probably the only thing you can do
is
> to write a script which will simply go read all the files you predict will
> be read soon.  The key here is the prediction - There''s no way ZFS
or
> solaris, or any other OS in the present day is going to intelligently
> predict which files you''ll be requesting soon.

The other prediction is whether the blocks will be reused.
If the blocks of a streaming read are only used once, then
it may be wasteful for a file system to allow these blocks
to placed in the cache.  If a file system purposely
chooses to not cache streaming reads, manually scheduling a
"pre-read" of particular files may simply cause the file to be read
from disk twice: on the manual pre-read and when it is read again
by the actual application.

I believe Joerg Moellenkamp published a discussion
several years ago on how L1ARC attempt to deal with the pollution
of the cache by large streaming reads, but I don''t have
a bookmark handy (nor the knowledge of whether the
behavior is still accurate).

Jim Klimov

2012-Jan-08 16:30 UTC

head link

[zfs-discuss] zfs read-ahead and L2ARC

2012-01-08 19:15, John Martin ?????:> On 01/08/12 09:30, Edward Ned Harvey wrote:
>
>> In the case of your MP3 collection... Probably the only thing you can
>> do is
>> to write a script which will simply go read all the files you predict
>> will
>> be read soon. The key here is the prediction - There''s no way
ZFS or
>> solaris, or any other OS in the present day is going to intelligently
>> predict which files you''ll be requesting soon.
>
>
> The other prediction is whether the blocks will be reused.
> If the blocks of a streaming read are only used once, then
> it may be wasteful for a file system to allow these blocks
> to placed in the cache. If a file system purposely
> chooses to not cache streaming reads, manually scheduling a
> "pre-read" of particular files may simply cause the file to be
read
> from disk twice: on the manual pre-read and when it is read again
> by the actual application.
>
> I believe Joerg Moellenkamp published a discussion
> several years ago on how L1ARC attempt to deal with the pollution
> of the cache by large streaming reads, but I don''t have
> a bookmark handy (nor the knowledge of whether the
> behavior is still accurate).
Well, this point is valid for intensively-used servers - but
then such blocks might just get evicted from the caches by
newer and/or more-frequently-used blocks.

However for smaller servers, such as home NASes which have
about one user overall, pre-reading and caching files even
for a single use might be an objective per se - just to let
the hard-disks spin down. Say, if I sit down to watch a
movie from my NAS, it is likely that for 90 or 120 minutes
there will be no other IO initiated by me. The movie file
can be pre-read in a few seconds, and then most of the
storage system can go to sleep.

//Jim

John Martin

2012-Jan-08 20:29 UTC

head link

[zfs-discuss] zfs read-ahead and L2ARC

On 01/08/12 11:30, Jim Klimov wrote:
> However for smaller servers, such as home NASes which have
> about one user overall, pre-reading and caching files even
> for a single use might be an objective per se - just to let
> the hard-disks spin down. Say, if I sit down to watch a
> movie from my NAS, it is likely that for 90 or 120 minutes
> there will be no other IO initiated by me. The movie file
> can be pre-read in a few seconds, and then most of the
> storage system can go to sleep.
Isn''t this just a more extreme case of prediction?
In addition to the file system knowing there will only
be one client reading 90-120 minutes of (HD?) video
that will fit in the memory of a small(er) server,
now the hard drive power management code also knows there
won''t be another access for 90-120 minutes so it is OK
to spin down the hard drive(s).

Jim Klimov

2012-Jan-08 22:16 UTC

head link

[zfs-discuss] zfs read-ahead and L2ARC

2012-01-09 0:29, John Martin ?????:> On 01/08/12 11:30, Jim Klimov wrote:
>
>> However for smaller servers, such as home NASes which have
>> about one user overall, pre-reading and caching files even
>> for a single use might be an objective per se - just to let
>> the hard-disks spin down. Say, if I sit down to watch a
>> movie from my NAS, it is likely that for 90 or 120 minutes
>> there will be no other IO initiated by me. The movie file
>> can be pre-read in a few seconds, and then most of the
>> storage system can go to sleep.
I can''t find such home-NAS usage uncommon, because I am
my own example user - so I see this pattern often ;)

> Isn''t this just a more extreme case of prediction?
Probably is, and this is probably not a task for only ZFS,
but for logic outside it. There are some requirements
that ZFS should meet, in order for this to work, though.
Details follow...
> In addition to the file system knowing there will only
> be one client reading 90-120 minutes of (HD?) video
> that will fit in the memory of a small(er) server,
> now the hard drive power management code also knows there
> won''t be another access for 90-120 minutes so it is OK
> to spin down the hard drive(s).
Well, in the original post I did suggest that the prediction
logic might go into scripting or some other user-level tool.
And it should, really, to keep the kernel clean and slim.

The "predictor" might be as simple as a DTrace file access
monitor, which would "cat" or "tar" files into /dev/null.
I.e. if it detected access to "*.(avi|mkv|wmv)", then it
should cat the file. If it detected "*.(mp3|ogg|jpg)" it
should tar the parent directory. Might be dumb and still
sufficiently efficient ;)

However, for such usecases this tool would need some
"guarantees" from ZFS. One would be that the read-ahead
data will find its way into caches and won''t be evicted
for no reason (when there''s no other RAM pressure).
This means that the tool should be able to read all the
data and metadata required by ZFS, so that no more disk
access is required if it''s all in cache.
It might require a tunable in ZFS for home-NAS users
which would disable current "no-caching" for detected
streaming reads: we need the opposite of that behavior.

Another part is HDD power-management, which reportedly
works in Solaris, allowing disks to spin down when there
was no access for some time. Probably there is a syscall
to do this on-demand as well...

On a side note, for home-NASes or other not-heavily-used
storage servers, it would be wonderful to be able to cache
small writes into ZIL devices, if present, and not flush
them onto the main pool until some megabyte limit is
reached (i.e. ZIL is full), or a pool export/import event
occurs. This would allow main disk arrays to remain idle
for a long time while small sporadic writes which are
initiated by the OS (logs, atimes, web-browser cache
files, whatever), and have these writes persistently
stored in ZIL. Essentially, this would be like setting
TXG-commit times to practical infinity, and actually
commit based on bytecount limits. One possible difference
would be not-streaming larger writes to pool disks at once,
but also storing them in dedicated ZIL.

Richard Elling

2012-Jan-09 00:14 UTC

head link

[zfs-discuss] zfs read-ahead and L2ARC

On Jan 7, 2012, at 8:59 AM, Jim Klimov wrote:
> I wonder if it is possible (currently or in the future as an RFE)
> to tell ZFS to automatically read-ahead some files and cache them
> in RAM and/or L2ARC?
See discussions on the ZFS intelligent prefetch algorithm. I think Ben
Rockwood''s
description is the best general description:
http://www.cuddletech.com/blog/pivot/entry.php?id=1040

And a more engineer-focused description is at:
http://www.solarisinternals.com/wiki/index.php/ZFS_Performance#Intelligent_prefetch
 -- richard

> 
> One use-case would be for Home-NAS setups where multimedia (video
> files or catalogs of images/music) are viewed form a ZFS box. For
> example, if a user wants to watch a film, or listen to a playlist
> of MP3''s, or push photos to a wall display (photo frame, etc.),
> the storage box "should" read-ahead all required data from HDDs
> and save it in ARC/L2ARC. Then the HDDs can spin down for hours
> while the pre-fetched gigabytes of data are used by consumers
> from the cache. End-users get peace, quiet and less electricity
> used while they enjoy their multimedia entertainment ;)
> 
> Is it possible? If not, how hard would it be to implement?
> 
> In terms of scripting, would it suffice to detect reads (i.e.
> with DTrace) and read the files to /dev/null to get them cached
> along with all required metadata (so that mechanical HDDs are
> not required for reads afterwards)?
> 
> Thanks,
> //Jim Klimov
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 

ZFS and performance consulting
http://www.RichardElling.com
illumos meetup, Jan 10, 2012, Menlo Park, CA
http://www.meetup.com/illumos-User-Group/events/41665962/

Jim Klimov

2012-Jan-09 01:10 UTC

head link

[zfs-discuss] zfs read-ahead and L2ARC

2012-01-09 4:14, Richard Elling ?????:> On Jan 7, 2012, at 8:59 AM, Jim Klimov wrote:
>
>> I wonder if it is possible (currently or in the future as an RFE)
>> to tell ZFS to automatically read-ahead some files and cache them
>> in RAM and/or L2ARC?
>
> See discussions on the ZFS intelligent prefetch algorithm. I think Ben
Rockwood''s
> description is the best general description:
> http://www.cuddletech.com/blog/pivot/entry.php?id=1040
>
> And a more engineer-focused description is at:
>
http://www.solarisinternals.com/wiki/index.php/ZFS_Performance#Intelligent_prefetch
>   -- richard
Thanks for the pointers. While I''ve seen those articles
(in fact, one of the two non-spam comments in Ben''s
blog was mine), rehashing the basics is always useful ;)

Still, how does VDEV prefetch play along with File-level
Prefetch? For example, if ZFS prefetched 64K from disk
at the SPA level, and those sectors luckily happen to
contain "next" blocks of a streaming-read file, would
the file-level prefetch take the data from RAM cache
or still request them from the disk?

In what cases would it make sense to increase the
zfs_vdev_cache_size? Does it apply to all disks
combined, or to each disk (or even slice/partition)
separately?

In fact, this reading got me thinking that I might have
a fundamental misunderstanding lately; hence a couple
of new yes-no questions arose:

Is it true or false that: ZFS might skip the cache and
go to disks for "streaming" reads? (The more I think
about it, the more senseless this sentence seems, and
I might have just mistaken it with ZIL writes of bulk
data).

Is it true or false that: ARC might evict cached blocks
based on age (without new reads or other processes
requiring the RAM space)?

And I guess the generic answer to my original question
regarding intelligent pre-fetching of whole files is
that this should be done by scripts outside ZFS itself,
and that the read-prefetch as well as ARC/L2ARC is all
in place already. So if no other IOs occur, the disks
may spin down... if only not for those "nasty" writes
that may sporadically occur and which I''d love to see
pushed out to dedicated ZILs ;)

Thanks,
//Jim

Richard Elling

2012-Jan-09 02:06 UTC

head link

[zfs-discuss] zfs read-ahead and L2ARC

On Jan 8, 2012, at 5:10 PM, Jim Klimov wrote:> 2012-01-09 4:14, Richard Elling ?????:
>> On Jan 7, 2012, at 8:59 AM, Jim Klimov wrote:
>> 
>>> I wonder if it is possible (currently or in the future as an RFE)
>>> to tell ZFS to automatically read-ahead some files and cache them
>>> in RAM and/or L2ARC?
>> 
>> See discussions on the ZFS intelligent prefetch algorithm. I think Ben
Rockwood''s
>> description is the best general description:
>> http://www.cuddletech.com/blog/pivot/entry.php?id=1040
>> 
>> And a more engineer-focused description is at:
>>
http://www.solarisinternals.com/wiki/index.php/ZFS_Performance#Intelligent_prefetch
>>  -- richard
> 
> Thanks for the pointers. While I''ve seen those articles
> (in fact, one of the two non-spam comments in Ben''s
> blog was mine), rehashing the basics is always useful ;)
> 
> Still, how does VDEV prefetch play along with File-level
> Prefetch?
Trick question? it doesn''t. vdev prefetching is disabled in opensolaris
b148, illumos,
and Solaris 11 releases. The benefits of having the vdev cache for large numbers
of
disks does not appear to justify the cost. See
	http://wesunsolve.net/bugid/id/6684116
	https://www.illumos.org/issues/175
> For example, if ZFS prefetched 64K from disk
> at the SPA level, and those sectors luckily happen to
> contain "next" blocks of a streaming-read file, would
> the file-level prefetch take the data from RAM cache
> or still request them from the disk?
As of b70, vdev_cache only contains metadata. See 
http://wesunsolve.net/bugid/id/6437054
> In what cases would it make sense to increase the
> zfs_vdev_cache_size? Does it apply to all disks
> combined, or to each disk (or even slice/partition)
> separately?
It applies to each leaf vdev.
> 
> In fact, this reading got me thinking that I might have
> a fundamental misunderstanding lately; hence a couple
> of new yes-no questions arose:
> 
> Is it true or false that: ZFS might skip the cache and
> go to disks for "streaming" reads? (The more I think
> about it, the more senseless this sentence seems, and
> I might have just mistaken it with ZIL writes of bulk
> data).
Unless the primarycache parameter is set to none, reads 
will look in the ARC first.
> 
> Is it true or false that: ARC might evict cached blocks
> based on age (without new reads or other processes
> requiring the RAM space)?
False. Evictions occur when needed.

NB, I''m not sure of the status of the Solaris 11 ARC no-grow issue.
As that code is not open sourced, and we know that Oracle rewrote
some of the ARC code, all bets are off.
> And I guess the generic answer to my original question
> regarding intelligent pre-fetching of whole files is
> that this should be done by scripts outside ZFS itself,
> and that the read-prefetch as well as ARC/L2ARC is all
> in place already. So if no other IOs occur, the disks
> may spin down... if only not for those "nasty" writes
> that may sporadically occur and which I''d love to see
> pushed out to dedicated ZILs ;)
I''ve setup external prefetching for specific use cases.  Spin-down 
is another can of worms?
 -- richard

-- 

ZFS and performance consulting
http://www.RichardElling.com
illumos meetup, Jan 10, 2012, Menlo Park, CA
http://www.meetup.com/illumos-User-Group/events/41665962/

John Martin

2012-Jan-09 14:15 UTC

head link

[zfs-discuss] zfs read-ahead and L2ARC

On 01/08/12 20:10, Jim Klimov wrote:
> Is it true or false that: ZFS might skip the cache and
> go to disks for "streaming" reads?
I don''t believe this was ever suggested.  Instead, if
data is not already in the file system cache and a
large read is made from disk should the file system
put this data into the cache?

BTW, I chose the term streaming to be a subset
of sequential where the access pattern is sequential but
at what appears to be artificial time intervals.
The suggested pre-read of the entire file would
be a simple sequential read done as quickly
as the hardware allows.

Jim Klimov

2012-Jan-09 14:34 UTC

head link

[zfs-discuss] zfs read-ahead and L2ARC

Thanks for the replies, some more questions follow.

Your answers below seem to contradict each other somewhat.
Is it true that:
1) VDEV cache before b70 used to contain a full copy
    of prefetched disk contents,

2) VDEV cache since b70 analyzes the prefetched sectors
    and only keeps metadata blocks,

3) VDEV cache since b148 is disabled by default?

So in fact currently we only have file-level "intelligent"
prefetching?

On my older systems I fired "kstat -p zfs:0:vdev_cache_stats"
and saw hit/miss ratios ranging from 30% to 70%. On the oi_148a
box I do indeed see all-zeros.

While I do understand the implications of VDEV-caching lots
of disks on systems with inadequate RAM, I tend to find this
feature useful on smaller systems - like home-NASes. It is
essentially free in terms of mechanical seeks, as well as
in RAM (what is 60-100Mb for a small box at home?) and any
nonzero hit ratio that speeds up the system seems justifiable ;)

I''ve tried playing with the options on my oi_148a LiveUSB
repair boot, and got varying results:

VDEV is indeed disabled by default, but can be enabled.
My system is scrubbing now, so it''s got a few cache hits
(about 10%) right away.

root at openindiana:~# echo zfs_vdev_cache_size/W0t10000000 | mdb -kw
zfs_vdev_cache_size:            0               =       0x989680

root at openindiana:~# kstat -p zfs:0:vdev_cache_stats
zfs:0:vdev_cache_stats:class    misc
zfs:0:vdev_cache_stats:crtime   65.042318652
zfs:0:vdev_cache_stats:delegations      72
zfs:0:vdev_cache_stats:hits     11
zfs:0:vdev_cache_stats:misses   158
zfs:0:vdev_cache_stats:snaptime 114232.782154249

However, trying to increase the prefetch size hung my system
almost immediately (in a couple of seconds). I''m away from
it now, so I''ll ask for a photo of the console screen :)

root at openindiana:~# echo zfs_vdev_cache_max/W0t16384 | mdb -kw
zfs_vdev_cache_max:             0x4000          =       0x4000
root at openindiana:~# echo zfs_vdev_cache_bshift/W0t20 | mdb -kw
zfs_vdev_cache_bshift:          0x10            =       0x14


So there are deeper questions:
1) As of Illumos bug #175 (as well as OpenSolaris b148 and
    if known - Solaris 11), is the vdev prefetch feature
    *removed* from codebase ("no" as of oi_148a, what about
    others?), or disabled by default (i.e. limit is preset
    to 0, tune it yourself)?

2) If it is only disabled, are there solid plans to remove
    it, or can we vote to keep it for those interested? :)

3) If the feature is present and gets enabled, how would
    VDEV prefetch play along with file prefetch, again? ;)

4) Is there some tuneable (after b70) to enable prefetching
    and keeping of user-data as well (not only metadata)?
    Perhaps only so that I could test it with my use-patterns
    to make sure that caching generic sectors is useless for
    me, and I really should revert to caching only metadata?

5) Would it make sense to increase zfs_vdev_cache_bshift?
    For example, when I tried to set it to 20 and prefetch
    a whole 1MB of data, why would that cause the system
    to die? Can it increase cache hit ratios (if works)?

6) Does the VDEV cache keep ZFS blocks or disk sectors?
    For example, on my 4k disks the blocks are 4k, even
    though there are a few hundred bytes worth of data in
    metadata blocks and 3+KB of slack space.

7) Modern HDDs often have 32-64Mb DRAM cache onboard.
    Is there any reason to match VDEV cache size with that
    in any way (1:1, 2:1, etc)?

Thanks again,
//Jim Klimov


2012-01-09 6:06, Richard Elling wrote:> On Jan 8, 2012, at 5:10 PM, Jim Klimov wrote:
>> 2012-01-09 4:14, Richard Elling ?????:
>>> On Jan 7, 2012, at 8:59 AM, Jim Klimov wrote:
>>>
>>>> I wonder if it is possible (currently or in the future as an
RFE)
>>>> to tell ZFS to automatically read-ahead some files and cache
them
>>>> in RAM and/or L2ARC?
>>>
>>> See discussions on the ZFS intelligent prefetch algorithm. I think
Ben Rockwood''s
>>> description is the best general description:
>>> http://www.cuddletech.com/blog/pivot/entry.php?id=1040
>>>
>>> And a more engineer-focused description is at:
>>>
http://www.solarisinternals.com/wiki/index.php/ZFS_Performance#Intelligent_prefetch
>>>   -- richard
>>
>> Thanks for the pointers. While I''ve seen those articles
>> (in fact, one of the two non-spam comments in Ben''s
>> blog was mine), rehashing the basics is always useful ;)
>>
>> Still, how does VDEV prefetch play along with File-level
>> Prefetch?
>
> Trick question? it doesn''t. vdev prefetching is disabled in
opensolaris b148, illumos,
> and Solaris 11 releases. The benefits of having the vdev cache for large
numbers of
> disks does not appear to justify the cost. See
> 	http://wesunsolve.net/bugid/id/6684116
> 	https://www.illumos.org/issues/175
>
>> For example, if ZFS prefetched 64K from disk
>> at the SPA level, and those sectors luckily happen to
>> contain "next" blocks of a streaming-read file, would
>> the file-level prefetch take the data from RAM cache
>> or still request them from the disk?
>
> As of b70, vdev_cache only contains metadata. See
> http://wesunsolve.net/bugid/id/6437054
>
>> In what cases would it make sense to increase the
>> zfs_vdev_cache_size? Does it apply to all disks
>> combined, or to each disk (or even slice/partition)
>> separately?
>
> It applies to each leaf vdev.

Jim Klimov

2012-Jan-09 14:47 UTC

head link

[zfs-discuss] zfs read-ahead and L2ARC

2012-01-09 18:15, John Martin ?????:> On 01/08/12 20:10, Jim Klimov wrote:
>
>> Is it true or false that: ZFS might skip the cache and
>> go to disks for "streaming" reads? >>  (The more I think
 >> about it, the more senseless this sentence seems, and
 >> I might have just mistaken it with ZIL writes of bulk
 >> data).>
> I don''t believe this was ever suggested. Instead, if
> data is not already in the file system cache and a
> large read is made from disk should the file system
> put this data into the cache?
Hmmm... perhaps THIS is what I could mistake it for...

Thus the correct version of the question goes like this:
is it true or false that some large reads from disk can
be deemed by ZFS as "too big and rare to cache in ARC"?
If yes, what conditions are checked to mark a read as
such? Can this behavior be disabled in order to try and
cache every read (further subject to normal eviction
due to MRU/MFU/memory pressure and other considerations)?

Thanks again,
//Jim Klimov

John Martin

2012-Jan-09 14:47 UTC

head link

[zfs-discuss] zfs read-ahead and L2ARC

On 01/08/12 10:15, John Martin wrote:
> I believe Joerg Moellenkamp published a discussion
> several years ago on how L1ARC attempt to deal with the pollution
> of the cache by large streaming reads, but I don''t have
> a bookmark handy (nor the knowledge of whether the
> behavior is still accurate).
http://www.c0t0d0s0.org/archives/5329-Some-insight-into-the-read-cache-of-ZFS-or-The-ARC.html

Jim Klimov

2012-Jan-10 21:26 UTC

head link

[zfs-discuss] zfs read-ahead and L2ARC

To follow on the subject of VDEV caching, even if
only of metadata, in oi_148a, I have found the
disabling entry in /etc/system of the LiveUSB:

set zfs:zfs_vdev_cache_size=0


Now that I have the cache turned on and my scrub
continues, cache efficiency so far happens to be
75%. Not bad for a feature turned off by default:

# kstat -p zfs:0:vdev_cache_stats
zfs:0:vdev_cache_stats:class    misc
zfs:0:vdev_cache_stats:crtime   60.67302806
zfs:0:vdev_cache_stats:delegations      22619
zfs:0:vdev_cache_stats:hits     32989
zfs:0:vdev_cache_stats:misses   10676
zfs:0:vdev_cache_stats:snaptime 39898.161717983

//Jim

Jim Klimov

2012-Jan-11 11:11 UTC

head link

[zfs-discuss] zfs read-ahead and L2ARC

2012-01-11 1:26, Jim Klimov ?????:> To follow on the subject of VDEV caching, even if
> only of metadata, in oi_148a, I have found the
> disabling entry in /etc/system of the LiveUSB:
>
> set zfs:zfs_vdev_cache_size=0
>
>
> Now that I have the cache turned on and my scrub
> continues, cache efficiency so far happens to be
> 75%. Not bad for a feature turned off by default:
>
> # kstat -p zfs:0:vdev_cache_stats
> zfs:0:vdev_cache_stats:class misc
> zfs:0:vdev_cache_stats:crtime 60.67302806
> zfs:0:vdev_cache_stats:delegations 22619
> zfs:0:vdev_cache_stats:hits 32989
> zfs:0:vdev_cache_stats:misses 10676
> zfs:0:vdev_cache_stats:snaptime 39898.161717983
>
> //Jim
And at this moment I can guess the caching effect
becomes incredible (at least for a feature disabled
and dismissed at useless/harmful) - if I read the
numbers correctly, a 99+% cache hit ratio with
just VDEV prereads:

# kstat -p zfs:0:vdev_cache_stats
zfs:0:vdev_cache_stats:class    misc
zfs:0:vdev_cache_stats:crtime   60.67302806
zfs:0:vdev_cache_stats:delegations      23398
zfs:0:vdev_cache_stats:hits     1309308
zfs:0:vdev_cache_stats:misses   11592
zfs:0:vdev_cache_stats:snaptime 89207.679698161

True, the task (scrubbing) is metadata-intensive :)
Still, for the future, when beginning a scrub the
system might auto-tune or at least suggest to enable
the VDEV prefetch, perhaps with larger strokes)...

BTW, what does the "delegations" field mean? ;)


-- 


+============================================================+
|                                                            |
| ?????? ???????,                                 Jim Klimov |
| ??????????? ????????                                   CTO |
| ??? "??? ? ??"                                  JSC COS&HT |
|                                                            |
| +7-903-7705859 (cellular)          mailto:jimklimov at cos.ru |
|                        CC:admin at cos.ru,jimklimov at gmail.com |
+============================================================+
| ()  ascii ribbon campaign - against html mail              |
| /\                        - against microsoft attachments  |
+============================================================+

zfs discuss - Jan 2012 - zfs read-ahead and L2ARC

[zfs-discuss] zfs read-ahead and L2ARC

[zfs-discuss] zfs read-ahead and L2ARC

[zfs-discuss] zfs read-ahead and L2ARC

[zfs-discuss] zfs read-ahead and L2ARC

[zfs-discuss] zfs read-ahead and L2ARC

[zfs-discuss] zfs read-ahead and L2ARC

[zfs-discuss] zfs read-ahead and L2ARC

[zfs-discuss] zfs read-ahead and L2ARC

[zfs-discuss] zfs read-ahead and L2ARC

[zfs-discuss] zfs read-ahead and L2ARC

[zfs-discuss] zfs read-ahead and L2ARC

[zfs-discuss] zfs read-ahead and L2ARC

[zfs-discuss] zfs read-ahead and L2ARC

[zfs-discuss] zfs read-ahead and L2ARC

[zfs-discuss] zfs read-ahead and L2ARC