thr3ads.net - zfs discuss - [zfs-discuss] Tuning for a file server, disabling data cache (almost) [Oct 2008]

If this information is useful, please help other people find it:
Share via:

Tomas Ögren

2008-Oct-15 20:57 UTC

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

Hello.

Executive summary: I want arc_data_limit (like arc_meta_limit, but for
data) and set it to 0.5G or so. Is there any way to "simulate" it?

We have a cluster of linux frontends (http/ftp/rsync) for
Debian/Mozilla/etc archives and as a NFS disk backend we currently have
a DL145 running OpenSolaris (snv98) with one pool of 3 raidz2 with 10
SATA disks each.

The frontends have a local cache of a few raid0''d disks, but need to
dip
into the backend every now and then because they don''t fit all the
data.
Rsyncs dip into the backend for filesystem traversal, both when pulling
(for us: writing) data and sending to others.

Obviously, the working data set (4TB or so right now) is quite a lot
larger than the RAM on the disk backend (8GB), so data disk cache is
mostly useless. Metadata cache is good, because rsync checks through the
tree every now and then and that''s the only part that has a chance of
fitting in RAM (about 1.5-2M files now).

So I want to dedicate as much ram as possible to metadata cache and data
cache is of less importance.

Right now, ZFS has a knob so you can limit the amount of metadata cache
(arc_meta_limit), but not limit the amount of data cache.

I''ve tried to do a few tuning tricks, but they all seem to have
drawbacks.

* zfs set primarycache=metadata myfs
- If record size is set to 128k and I get an application to read 32k,
  then ZFS reads 128k, hands over 32k to the app and throws away 96k.
  Repeat. (so I get 400% physical IO over logical IO). If I tune down
  recordsize to 32k, then each disk will get 32/8 (8 data disks per
  raidz2) = 4k IOs, which isn''t all optimal. 4k/IO * 100 IOPS * 8 disks
  * 3 raidz2''s = 9600kB/s. I would prefer if each disk got larger IO
  blocks than with rs=128k too..

* zfs:zfs_prefetch_disable
- prefetch in small amounts could be good, but with a limit on how much
  it will keep of it. It uses up precious ram that I want for metadata
  cache.

* zfs:arc_meta_limit
- I''m raising it to about the size of the whole ARC. But I want
  arc_data_limit and set it to 512M or so, just for temporary buffers.

* ncsize
- I want to keep this high, but with ZFS it seems to use huge amounts of
  data per dnode_t/zfs_znode_cache or something..

I think most of the performance issues will be solved if I let ZFS do
all of its prefetching, but limit the amount of data.

Wouldn''t this problem arise on most file servers?


When checking ::arc / ::kmastat / ::memstat, I usually see close to 1GB
in freelist (probably due to c_max being ~7G), ZFS File Data at ~0 (when
doing primarycache=metadata) and still arc_meta_used is only about 2GB..
where''s the other 5GB of ''Kernel'' memory go?

Large consumers in ::kmastat are:
dnode_t                      656 1260774 1540038 1051332608B  28987865     0
rnode4_cache                 968 1000000 1000000 1024000000B   1000000     0
kmem_va_16384              16384  50389  59952 982253568B  68595231     0
kmem_va_4096                4096 1247328 1268576 901120000B   5788758     0
zio_buf_16384              16384  50413  50437 826359808B 251118744     0
zio_buf_512                  512 1255757 1502096 769073152B  95757654     0
vn_cache                     200 2022348 2780205 759181312B  12796286     0
kmem_va_8192                8192  14923  80336 658112512B   1239789     0
zio_buf_65536              65536   5156   5160 338165760B  49769732     0
dmu_buf_impl_t               192 1306533 1594440 326541312B 103503254     0


Could I do some trickery with creating a 5-6GB ramdisk, setting that as
L2 secondarycache=metadata and primarycache=all ? Or would it be better 
(due to how ZFS migrates data from prim to sec) to have
primarycache=metadata and secondarycache=all with the L2 ramdisk?
How does ZFS currently like if the L2 is blank/missing at boot?

Maybe these trickery will starve the DNLC too though..

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se - 070-5858487

Richard Elling

2008-Oct-15 23:37 UTC

head link

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

Tomas ?gren wrote:> Hello.
>
> Executive summary: I want arc_data_limit (like arc_meta_limit, but for
> data) and set it to 0.5G or so. Is there any way to "simulate"
it?
>   
We describe how to limit the size of the ARC cache in the Evil Tuning Guide.
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
 -- richard
> We have a cluster of linux frontends (http/ftp/rsync) for
> Debian/Mozilla/etc archives and as a NFS disk backend we currently have
> a DL145 running OpenSolaris (snv98) with one pool of 3 raidz2 with 10
> SATA disks each.
>
> The frontends have a local cache of a few raid0''d disks, but need
to dip
> into the backend every now and then because they don''t fit all the
data.
> Rsyncs dip into the backend for filesystem traversal, both when pulling
> (for us: writing) data and sending to others.
>
> Obviously, the working data set (4TB or so right now) is quite a lot
> larger than the RAM on the disk backend (8GB), so data disk cache is
> mostly useless. Metadata cache is good, because rsync checks through the
> tree every now and then and that''s the only part that has a chance
of
> fitting in RAM (about 1.5-2M files now).
>
> So I want to dedicate as much ram as possible to metadata cache and data
> cache is of less importance.
>
> Right now, ZFS has a knob so you can limit the amount of metadata cache
> (arc_meta_limit), but not limit the amount of data cache.
>
> I''ve tried to do a few tuning tricks, but they all seem to have
> drawbacks.
>
> * zfs set primarycache=metadata myfs
> - If record size is set to 128k and I get an application to read 32k,
>   then ZFS reads 128k, hands over 32k to the app and throws away 96k.
>   Repeat. (so I get 400% physical IO over logical IO). If I tune down
>   recordsize to 32k, then each disk will get 32/8 (8 data disks per
>   raidz2) = 4k IOs, which isn''t all optimal. 4k/IO * 100 IOPS * 8
disks
>   * 3 raidz2''s = 9600kB/s. I would prefer if each disk got larger
IO
>   blocks than with rs=128k too..
>
> * zfs:zfs_prefetch_disable
> - prefetch in small amounts could be good, but with a limit on how much
>   it will keep of it. It uses up precious ram that I want for metadata
>   cache.
>
> * zfs:arc_meta_limit
> - I''m raising it to about the size of the whole ARC. But I want
>   arc_data_limit and set it to 512M or so, just for temporary buffers.
>
> * ncsize
> - I want to keep this high, but with ZFS it seems to use huge amounts of
>   data per dnode_t/zfs_znode_cache or something..
>
> I think most of the performance issues will be solved if I let ZFS do
> all of its prefetching, but limit the amount of data.
>
> Wouldn''t this problem arise on most file servers?
>
>
> When checking ::arc / ::kmastat / ::memstat, I usually see close to 1GB
> in freelist (probably due to c_max being ~7G), ZFS File Data at ~0 (when
> doing primarycache=metadata) and still arc_meta_used is only about 2GB..
> where''s the other 5GB of ''Kernel'' memory go?
>
> Large consumers in ::kmastat are:
> dnode_t                      656 1260774 1540038 1051332608B  28987865    
0
> rnode4_cache                 968 1000000 1000000 1024000000B   1000000    
0
> kmem_va_16384              16384  50389  59952 982253568B  68595231     0
> kmem_va_4096                4096 1247328 1268576 901120000B   5788758     0
> zio_buf_16384              16384  50413  50437 826359808B 251118744     0
> zio_buf_512                  512 1255757 1502096 769073152B  95757654     0
> vn_cache                     200 2022348 2780205 759181312B  12796286     0
> kmem_va_8192                8192  14923  80336 658112512B   1239789     0
> zio_buf_65536              65536   5156   5160 338165760B  49769732     0
> dmu_buf_impl_t               192 1306533 1594440 326541312B 103503254     0
>
>
> Could I do some trickery with creating a 5-6GB ramdisk, setting that as
> L2 secondarycache=metadata and primarycache=all ? Or would it be better 
> (due to how ZFS migrates data from prim to sec) to have
> primarycache=metadata and secondarycache=all with the L2 ramdisk?
> How does ZFS currently like if the L2 is blank/missing at boot?
>
> Maybe these trickery will starve the DNLC too though..
>
> /Tomas
>

Tomas Ögren

2008-Oct-16 10:23 UTC

head link

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

On 15 October, 2008 - Richard Elling sent me these 4,3K bytes:
> Tomas ?gren wrote:
> > Hello.
> >
> > Executive summary: I want arc_data_limit (like arc_meta_limit, but for
> > data) and set it to 0.5G or so. Is there any way to
"simulate" it?
> >   
> 
> We describe how to limit the size of the ARC cache in the Evil Tuning
Guide.
> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
Will that limit the _data_ portion only, or the metadata as well?

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Darren J Moffat

2008-Oct-16 10:26 UTC

head link

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

Tomas ?gren wrote:> On 15 October, 2008 - Richard Elling sent me these 4,3K bytes:
> 
>> Tomas ?gren wrote:
>>> Hello.
>>>
>>> Executive summary: I want arc_data_limit (like arc_meta_limit, but
for
>>> data) and set it to 0.5G or so. Is there any way to
"simulate" it?
>>>   
>> We describe how to limit the size of the ARC cache in the Evil Tuning
Guide.
>> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
> 
> Will that limit the _data_ portion only, or the metadata as well?
Recent builds of OpenSolaris have the ability to control on a per 
dataset basis what is put into the ARC and L2ARC using the
primrarycache and secondarycache dataset properties:

      primarycache=all | none | metadata

          Controls what is cached in the primary cache  (ARC).  If
          this  property  is set to "all", then both user data and
          metadata is cached. If this property is set  to  "none",
          then  neither  user data nor metadata is cached. If this
          property is set to "metadata",  then  only  metadata  is
          cached. The default value is "all".

      secondarycache=all | none | metadata

          Controls what is cached in the secondary cache  (L2ARC).
          If  this  property  is set to "all", then both user data
          and metadata is cached.  If  this  property  is  set  to
          "none",  then  neither user data nor metadata is cached.
          If this property is set to "metadata", then  only  meta-
          data is cached. The default value is "all".



-- 
Darren J Moffat

Tomas Ögren

2008-Oct-16 10:40 UTC

head link

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

On 16 October, 2008 - Darren J Moffat sent me these 1,7K bytes:
> Tomas ?gren wrote:
> > On 15 October, 2008 - Richard Elling sent me these 4,3K bytes:
> > 
> >> Tomas ?gren wrote:
> >>> Hello.
> >>>
> >>> Executive summary: I want arc_data_limit (like arc_meta_limit,
but for
> >>> data) and set it to 0.5G or so. Is there any way to
"simulate" it?
> >>>   
> >> We describe how to limit the size of the ARC cache in the Evil
Tuning Guide.
> >>
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
> > 
> > Will that limit the _data_ portion only, or the metadata as well?
> 
> Recent builds of OpenSolaris have the ability to control on a per 
> dataset basis what is put into the ARC and L2ARC using the
> primrarycache and secondarycache dataset properties:
> 
>       primarycache=all | none | metadata
> 
>           Controls what is cached in the primary cache  (ARC).  If
>           this  property  is set to "all", then both user data
and
>           metadata is cached. If this property is set  to 
"none",
>           then  neither  user data nor metadata is cached. If this
>           property is set to "metadata",  then  only  metadata 
is
>           cached. The default value is "all".
> 
>       secondarycache=all | none | metadata
> 
>           Controls what is cached in the secondary cache  (L2ARC).
>           If  this  property  is set to "all", then both user
data
>           and metadata is cached.  If  this  property  is  set  to
>           "none",  then  neither user data nor metadata is
cached.
>           If this property is set to "metadata", then  only 
meta-
>           data is cached. The default value is "all".
Yeah, the problem is (like I wrote in the first post), if I set
primarycache=metadata, then ZFS prefetch will go into "horribly
inefficient mode" where it will do lots of prefetching, but the
prefetched data will be discarded immediately.

128k prefetch for a 32k read will throw away the other 96k immediately.
Followed by another 128k prefetch for the next 32k read, throwing away
the other 96k.

So ZFS needs to have _some_ data cache, but I want to limit it for
"short term data" only.. Setting data cache limit to 512M or something
should work fine, but I want to leave the rest to metadata as that''s
the
place where it can help the most.

Unless I can do some trickery with a ram disk and put that as
secondarycache with data cache as well..

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Ross

2008-Oct-16 11:33 UTC

head link

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

I might be misunderstanding here, but I don''t see how you''re
going to improve on "zfs set primarycache=metadata".

You complain that ZFS throws away 96kb of data if you''re only reading
32kb at a time, but then also complain that you are IO/s bound and that this is
restricting your maximum transfer rate.  If it''s io/s that is limiting
you it makes no difference that ZFS is throwing away 96kb of data,
you''re going to get the same iops and same throughput at your
application whether you''re using 32k or 128k zfs record sizes.

Also, you''re asking on one hand for each disk to get larger IO blocks,
and on the other you are complaining that with large block sizes a lot of data
is wasted.  That looks like a contradictory argument to me as you can''t
you''re asking have both of these.  You just need to pick whichever one
is more suited to your needs.

Like I said, I may be misunderstanding, but I think you might be looking for
something that you don''t actually need.
--
This message posted from opensolaris.org

Tomas Ögren

2008-Oct-16 11:52 UTC

head link

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

On 16 October, 2008 - Ross sent me these 1,1K bytes:
> I might be misunderstanding here, but I don''t see how
you''re going to
> improve on "zfs set primarycache=metadata".
> 
> You complain that ZFS throws away 96kb of data if you''re only
reading
> 32kb at a time, but then also complain that you are IO/s bound and
> that this is restricting your maximum transfer rate.  If it''s io/s
> that is limiting you it makes no difference that ZFS is throwing away
> 96kb of data, you''re going to get the same iops and same
throughput at
> your application whether you''re using 32k or 128k zfs record
sizes.
But with 1Gb FC, if I''m reading 100MB/s it matters if 100MB or 25MB/s
of
those are actually used for something..
> Also, you''re asking on one hand for each disk to get larger IO
blocks,
> and on the other you are complaining that with large block sizes a lot
> of data is wasted.
.. if I turn off data caching (and only leave metadata caching on).
> That looks like a contradictory argument to me as you can''t
you''re
> asking have both of these.  You just need to pick whichever one is
> more suited to your needs.
> 
> Like I said, I may be misunderstanding, but I think you might be
> looking for something that you don''t actually need.
Ok. ZFS prefetch can help, but I don''t want it to use up all my ram for
data cache.. Using it for small temporary buffers while reading stuff
from disk is good, but once data has been read from disk and used once
(delivered over NFS), there is a very low probability that I will need
it again before it has been flushed (because 4TB > 8GB).

With default tuning, ZFS will keep stacking up these "use once" data
blocks in the cache, pushing out metadata cache which actually has a
good chance of being used again (metadata for all our files can fit in
the 8GB of ram, but 4TB of data can''t).

So if I could tell ZFS, "Here you have 512M (or whatever) of ARC space
that you can use for prefetch etc. Leave the other 7.5GB of ram for
metadata cache."

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Richard Elling

2008-Oct-16 15:10 UTC

head link

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

Tomas ?gren wrote:> On 16 October, 2008 - Darren J Moffat sent me these 1,7K bytes:
>
>   
>> Tomas ?gren wrote:
>>     
>>> On 15 October, 2008 - Richard Elling sent me these 4,3K bytes:
>>>
>>>       
>>>> Tomas ?gren wrote:
>>>>         
>>>>> Hello.
>>>>>
>>>>> Executive summary: I want arc_data_limit (like
arc_meta_limit, but for
>>>>> data) and set it to 0.5G or so. Is there any way to
"simulate" it?
>>>>>   
>>>>>           
>>>> We describe how to limit the size of the ARC cache in the Evil
Tuning Guide.
>>>>
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
>>>>         
>>> Will that limit the _data_ portion only, or the metadata as well?
>>>       
>> Recent builds of OpenSolaris have the ability to control on a per 
>> dataset basis what is put into the ARC and L2ARC using the
>> primrarycache and secondarycache dataset properties:
>>
>>       primarycache=all | none | metadata
>>
>>           Controls what is cached in the primary cache  (ARC).  If
>>           this  property  is set to "all", then both user
data and
>>           metadata is cached. If this property is set  to 
"none",
>>           then  neither  user data nor metadata is cached. If this
>>           property is set to "metadata",  then  only 
metadata  is
>>           cached. The default value is "all".
>>
>>       secondarycache=all | none | metadata
>>
>>           Controls what is cached in the secondary cache  (L2ARC).
>>           If  this  property  is set to "all", then both user
data
>>           and metadata is cached.  If  this  property  is  set  to
>>           "none",  then  neither user data nor metadata is
cached.
>>           If this property is set to "metadata", then  only 
meta-
>>           data is cached. The default value is "all".
>>     
>
> Yeah, the problem is (like I wrote in the first post), if I set
> primarycache=metadata, then ZFS prefetch will go into "horribly
> inefficient mode" where it will do lots of prefetching, but the
> prefetched data will be discarded immediately.
>
> 128k prefetch for a 32k read will throw away the other 96k immediately.
> Followed by another 128k prefetch for the next 32k read, throwing away
> the other 96k.
>   
Are you sure this is a prefetch, or is it just the recordsize?
The checksum is based on the record, so to validate the checksum
the entire record must be read.  If you have a fixed record record
sized workload where the size < 128 kBytes, then you might
adjust the recordsize parameter.
 -- richard
> So ZFS needs to have _some_ data cache, but I want to limit it for
> "short term data" only.. Setting data cache limit to 512M or
something
> should work fine, but I want to leave the rest to metadata as
that''s the
> place where it can help the most.
>
> Unless I can do some trickery with a ram disk and put that as
> secondarycache with data cache as well..
>
> /Tomas
>

Al Hopper

2008-Oct-17 09:22 UTC

head link

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

On Thu, Oct 16, 2008 at 6:52 AM, Tomas ?gren <stric at acc.umu.se>
wrote:> On 16 October, 2008 - Ross sent me these 1,1K bytes:
>
>> I might be misunderstanding here, but I don''t see how
you''re going to
>> improve on "zfs set primarycache=metadata".
>>
>> You complain that ZFS throws away 96kb of data if you''re only
reading
>> 32kb at a time, but then also complain that you are IO/s bound and
>> that this is restricting your maximum transfer rate.  If it''s
io/s
>> that is limiting you it makes no difference that ZFS is throwing away
>> 96kb of data, you''re going to get the same iops and same
throughput at
>> your application whether you''re using 32k or 128k zfs record
sizes.
>
> But with 1Gb FC, if I''m reading 100MB/s it matters if 100MB or
25MB/s of
> those are actually used for something..
>
>> Also, you''re asking on one hand for each disk to get larger IO
blocks,
>> and on the other you are complaining that with large block sizes a lot
>> of data is wasted.
>
> .. if I turn off data caching (and only leave metadata caching on).
>
>> That looks like a contradictory argument to me as you can''t
you''re
>> asking have both of these.  You just need to pick whichever one is
>> more suited to your needs.
>>
>> Like I said, I may be misunderstanding, but I think you might be
>> looking for something that you don''t actually need.
>
> Ok. ZFS prefetch can help, but I don''t want it to use up all my
ram for
> data cache.. Using it for small temporary buffers while reading stuff
> from disk is good, but once data has been read from disk and used once
> (delivered over NFS), there is a very low probability that I will need
> it again before it has been flushed (because 4TB > 8GB).
>
> With default tuning, ZFS will keep stacking up these "use once"
data
> blocks in the cache, pushing out metadata cache which actually has a
> good chance of being used again (metadata for all our files can fit in
> the 8GB of ram, but 4TB of data can''t).
>
> So if I could tell ZFS, "Here you have 512M (or whatever) of ARC space
> that you can use for prefetch etc. Leave the other 7.5GB of ram for
> metadata cache."
>
I see where you are going with this -  but it looks like your
performance limit is more likely IOPS (I/O Ops/Sec) and latency,
rather than disk subsystem bandwidth.  You want ZFS to know which
blocks to read to satisfy a request for the next "chunk" of a file (to
be downloaded) so that each I/O operation reads data that you need,
every time.  But ZFS is filling up your cache with data blocks - which
you are unlikely to re-use.

If you set the ZFS caches for metadata only, you''ll probably find that
you''re still not getting enough IOPS.  The way you maximize IOPS is to
use multiway mirrors for your data.  For example, if you have a 5-way
mirror composed of five 15k RPM drives, you''ll see 5 * 700 [1] IOPS -
and my *guess* is that you''ll need about 2,500 IOPS to "busy
out" a
reasonably powerful (in terms of CPU) NFS server "feeding" a gigabit
ethernet port.  So what if some of those IO ops are being used to
traverse metadata to get to the data blocks .... if you can do 3.5k
IOPS, and you need 2.5k IOPS, you can "afford" to "waste"
some of them
because you don''t have all the metadata cached.

I strongly suspect, that if you talked one of the ZFS developers into
cooking you up an experimental version of ZFS to do as you ask above,
that you would still not get the IOPS and system response (low
latency) you need to get the real work done.

The "correct" solution, aside from a multi-way mirror disk config,
(and I know you don''t way to hear this,)  is to equip your NFS server
with 32Gb (or more) of RAM.  You simply need a server style
motherboard with 16 DIMM slots and inexpensive (Kingston) RAM at
approx $23/gigabyte.  Any server grade motherboard with one fast
multi-core CPU will get the job done here.  ZFS is designed to scale
beautifully with more RAM, hence, your most viable solution is to use
it the way the designers intended it to be used.

Another point comes to mind here while thinking about the disk drives.
 We have three basic categories of disk drives with the following,
broad, operational characteristics:

a) inexpensive, large capacity SATA drives running at 7,200 RPM and
providing, approximately, 300 IOPS.
b) expensive, small capacity, SAS drives running at 15k RPM and
providing, approx, 700 IOPS.
c) SSD - currently not available with the cost per gigabyte to make
them viable for your application, but capable of 3.5k+ IOPS

And you need large (inexpensive) capacity but high IOPS - which you
can only get from multi-way mirror configs.  Solutions:

1) a multi-way mirror config of RAIDZ SATA disk drive based pools
(lots of drives, lots of power)
2) a multi-way mirror config of WD VelociRaptor 10k RPM drives (the
version that fits in a 2.5" bay is part # WD3000BLFS)

I would strongly consider option 2) above; take a look at the capacity
and IOPS available from this drive.

Regards,

-- 
Al Hopper  Logical Approach Inc,Plano,TX al at logical-approach.com
                   Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/

Bob Friesenhahn

2008-Oct-17 15:51 UTC

head link

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

On Fri, 17 Oct 2008, Al Hopper wrote:>
> a) inexpensive, large capacity SATA drives running at 7,200 RPM and
> providing, approximately, 300 IOPS.
> b) expensive, small capacity, SAS drives running at 15k RPM and
> providing, approx, 700 IOPS.
Al,

Where are you getting the above IOPS estimates from?  They seem to be 
inflated (by at least a factor of three) from any per-drive IOPS 
numbers I have seen mentioned elsewhere and from my own measurements.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Al Hopper

2008-Oct-17 16:29 UTC

head link

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

On Fri, Oct 17, 2008 at 10:51 AM, Bob Friesenhahn
<bfriesen at simple.dallas.tx.us> wrote:> On Fri, 17 Oct 2008, Al Hopper wrote:
>>
>> a) inexpensive, large capacity SATA drives running at 7,200 RPM and
>> providing, approximately, 300 IOPS.
>> b) expensive, small capacity, SAS drives running at 15k RPM and
>> providing, approx, 700 IOPS.
>
> Al,
>
> Where are you getting the above IOPS estimates from?  They seem to be
> inflated (by at least a factor of three) from any per-drive IOPS numbers I
> have seen mentioned elsewhere and from my own measurements.
>
These are my personal #s that I work to - and they are inflated
because, in *my* real-world experience, where there is some degree of
successful caching, this is what I mostly see.  Please feel entirely
free to substitute your own numbers.  I should have included a
disclaimer requesting the reader to substitute his/her own (observed)
IOPS drive numbers.

Happy Friday Bob!

-- 
Al Hopper  Logical Approach Inc,Plano,TX al at logical-approach.com
                   Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/

Marcelo Leal

2008-Oct-17 19:00 UTC

head link

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

Hello all,
 I think he got some point here... maybe that would be an interesting feature
for that kind of workload. Caching all the metadata, would make the rsync task
more fast (for many files). Try to cache the data is really waste of time,
because the data will not be read again, and will just send away the
"good" metadata cached. That is what i understand when he said about
the 96k being descarded soon. He wants to "configure" an area to
"copy the data", and that?s it. Leave my metadat cache alone. ;-)

 Leal.
--
This message posted from opensolaris.org

David Collier-Brown

2008-Oct-17 21:40 UTC

head link

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

Marcelo Leal <opensolaris at posix.brte.com.br>
wrote:> Hello all,
>  I think he got some point here... maybe that would be an interesting 
> feature for that kind of workload. Caching all the metadata, would make t
> the rsync task more fast (for many files). Try to cache the data is really
> waste of time, because the data will not be read again, and will just send
> away the "good" metadata cached. That is what i understand when
he said
> about the 96k being descarded soon. He wants to "configure" an
area to
> "copy the data", and that?s it. Leave my metadata cache alone.
;-)
    That''s a common enough behavior pattern that Per Brinch Hansen
defined a distinct filetype for it in, if memory serves, the RC 4000.
As soon as it''s read, it''s gone.

   We saw this behavior on NFS servers in the Markham ACE lab, and
absolutely with Samba almost everywhere.  My Smarter Colleagues[tm]
explained it as a normal pattern whenever you have front-end
caching, as backend caching is then rendered far less effective, and
sometimes directly disadvantageous.

   It sounded like, from the previous discussion, one could tune
for it with the level 1 and 2 caches, although if I understood
it properly, the particular machine also had to narrow a stripe
for the particular load being discussed...

--dave
-- 
David Collier-Brown            | Always do right. This will gratify
Sun Microsystems, Toronto      | some people and astonish the rest
davecb at sun.com                 |                      -- Mark Twain
cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191#

zfs discuss - Oct 2008 - Tuning for a file server, disabling data cache (almost)

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

[zfs-discuss] Tuning for a file server, disabling data cache (almost)

[zfs-discuss] Tuning for a file server, disabling data cache (almost)