This is just a thought exercise.... but I''m curious what would exactly be involved in essentially biasing caching such that a ''ls -al'' was never slow. In my experience, IO speed an vary, but if a user types "ls -al" in the shell and the response isn''t nearly instantaneous they start calling IT staff. Being able to cache all that data (perhaps by priming it) ensuring its not bumped out later would be interesting. For ZFS this is primarily a function of ZAP and DNLC, correct? Does "metadata" caching satisfy everything a directory listing could want or are there bits of data that slip through requiring actual disk IO? benr. -- This message posted from opensolaris.org
Ben Rockwood wrote:> This is just a thought exercise.... but I''m curious what would exactly be involved in essentially biasing caching such that a ''ls -al'' was never slow. > > In my experience, IO speed an vary, but if a user types "ls -al" in the shell and the response isn''t nearly instantaneous they start calling IT staff. Being able to cache all that data (perhaps by priming it) ensuring its not bumped out later would be interesting. > > For ZFS this is primarily a function of ZAP and DNLC, correct? Does "metadata" caching satisfy everything a directory listing could want or are there bits of data that slip through requiring actual disk IO? >While not an extensive study, on a recent project I was working on, the default DNLC ended up being about 2 orders of magnitude too small. In the bad old days when memory was a precious, limited resource tuning down DNLC made sense. Today, I think we might be too small, by default. Fortunately, it is easy to change. -- richard
Ben Rockwood wrote:> This is just a thought exercise.... but I''m curious what would exactly be involved in essentially biasing caching such that a ''ls -al'' was never slow. > > In my experience, IO speed an vary, but if a user types "ls -al" in the shell and the response isn''t nearly instantaneous they start calling IT staff. Being able to cache all that data (perhaps by priming it) ensuring its not bumped out later would be interesting. > >Does that include nameservice data? Doing an ls -l where the a files owned by many users can be really slow. -- Ian.
Ian Collins wrote:> Ben Rockwood wrote: >> This is just a thought exercise.... but I''m curious what would >> exactly be involved in essentially biasing caching such that a ''ls >> -al'' was never slow. >> >> In my experience, IO speed an vary, but if a user types "ls -al" in >> the shell and the response isn''t nearly instantaneous they start >> calling IT staff. Being able to cache all that data (perhaps by >> priming it) ensuring its not bumped out later would be interesting. >> >> > Does that include nameservice data? Doing an ls -l where the a files > owned by many users can be really slow. >Good point... I hadn''t considered that. benr.
Ben Rockwood wrote:> Ian Collins wrote: > >> Ben Rockwood wrote: >> >>> This is just a thought exercise.... but I''m curious what would >>> exactly be involved in essentially biasing caching such that a ''ls >>> -al'' was never slow. >>> >>> In my experience, IO speed an vary, but if a user types "ls -al" in >>> the shell and the response isn''t nearly instantaneous they start >>> calling IT staff. Being able to cache all that data (perhaps by >>> priming it) ensuring its not bumped out later would be interesting. >>> >>> >>> >> Does that include nameservice data? Doing an ls -l where the a files >> owned by many users can be really slow. >> >> > > Good point... I hadn''t considered that. >...and it is sorted, which may be more or less quick, depending on your locale. -- richard
On Mon, Feb 2, 2009 at 1:43 PM, Richard Elling <richard.elling at gmail.com> wrote:> Ben Rockwood wrote: >> Ian Collins wrote: >> >>> Ben Rockwood wrote: >>> >>>> This is just a thought exercise.... but I''m curious what would >>>> exactly be involved in essentially biasing caching such that a ''ls >>>> -al'' was never slow. >>>> >>>> In my experience, IO speed an vary, but if a user types "ls -al" in >>>> the shell and the response isn''t nearly instantaneous they start >>>> calling IT staff. Being able to cache all that data (perhaps by >>>> priming it) ensuring its not bumped out later would be interesting. >>>> >>>> >>>> >>> Does that include nameservice data? Doing an ls -l where the a files >>> owned by many users can be really slow. >>> >>> >> >> Good point... I hadn''t considered that. >> > > ...and it is sorted, which may be more or less quick, depending > on your locale. > -- richardRelated to that, I have some changes I''ll be attempting to putback (eventually) which includes the -U option in ls to prevent sorting of the filenames. If you want a preview, there''s a webrev with the current work at http://cr.opensolaris.org/~jbk/shutils (it has a few other things, not sure yet if I''ll attempt to put those back or not).
On Feb 02, 2009 02:39 -0800, Ben Rockwood wrote:> This is just a thought exercise.... but I''m curious what would exactly be > involved in essentially biasing caching such that a ''ls -al'' was never slow. > > In my experience, IO speed an vary, but if a user types "ls -al" in > the shell and the response isn''t nearly instantaneous they start calling > IT staff. Being able to cache all that data (perhaps by priming it) > ensuring its not bumped out later would be interesting. > > For ZFS this is primarily a function of ZAP and DNLC, correct? > Does "metadata" caching satisfy everything a directory listing could > want or are there bits of data that slip through requiring actual disk IO?At the SPA level, it would seem possible to use flash in a different manner than is currently used by ZFS today. Instead of using flash only as a cache (as is essentially done with the L2ARC and the Logzilla) it would be possible to have the SPA allocate metadata on special RAID-1 VDEV(s) that are built from SSDs, and similarly avoid data allocations on these SSD VDEV(s) unless other VDEVs were full. My understanding is that the SPA currently already knows whether a specific allocation is for data or metadata, though there might need to be some tweaks so that e.g. the meta-dnode contents and ZAPs are considered metadata. This could potentially put all "ls -l" data permanently in high-IOPS flash storage. The main question is what fraction of the pool needs to be on SSDs to make this workable? 5%, 10%? It obviously depends on the average file size, but I''d suspect there is some rough estimate of how much SSD storage would be needed to have a good chance that all metadata is in flash. Cheers, Andreas PS - I''m not a ZFS developer, so don''t take this as gospel... Just musings. -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.