Tomas Ögren
2008-Oct-15 20:57 UTC
[zfs-discuss] Tuning for a file server, disabling data cache (almost)
Hello. Executive summary: I want arc_data_limit (like arc_meta_limit, but for data) and set it to 0.5G or so. Is there any way to "simulate" it? We have a cluster of linux frontends (http/ftp/rsync) for Debian/Mozilla/etc archives and as a NFS disk backend we currently have a DL145 running OpenSolaris (snv98) with one pool of 3 raidz2 with 10 SATA disks each. The frontends have a local cache of a few raid0''d disks, but need to dip into the backend every now and then because they don''t fit all the data. Rsyncs dip into the backend for filesystem traversal, both when pulling (for us: writing) data and sending to others. Obviously, the working data set (4TB or so right now) is quite a lot larger than the RAM on the disk backend (8GB), so data disk cache is mostly useless. Metadata cache is good, because rsync checks through the tree every now and then and that''s the only part that has a chance of fitting in RAM (about 1.5-2M files now). So I want to dedicate as much ram as possible to metadata cache and data cache is of less importance. Right now, ZFS has a knob so you can limit the amount of metadata cache (arc_meta_limit), but not limit the amount of data cache. I''ve tried to do a few tuning tricks, but they all seem to have drawbacks. * zfs set primarycache=metadata myfs - If record size is set to 128k and I get an application to read 32k, then ZFS reads 128k, hands over 32k to the app and throws away 96k. Repeat. (so I get 400% physical IO over logical IO). If I tune down recordsize to 32k, then each disk will get 32/8 (8 data disks per raidz2) = 4k IOs, which isn''t all optimal. 4k/IO * 100 IOPS * 8 disks * 3 raidz2''s = 9600kB/s. I would prefer if each disk got larger IO blocks than with rs=128k too.. * zfs:zfs_prefetch_disable - prefetch in small amounts could be good, but with a limit on how much it will keep of it. It uses up precious ram that I want for metadata cache. * zfs:arc_meta_limit - I''m raising it to about the size of the whole ARC. But I want arc_data_limit and set it to 512M or so, just for temporary buffers. * ncsize - I want to keep this high, but with ZFS it seems to use huge amounts of data per dnode_t/zfs_znode_cache or something.. I think most of the performance issues will be solved if I let ZFS do all of its prefetching, but limit the amount of data. Wouldn''t this problem arise on most file servers? When checking ::arc / ::kmastat / ::memstat, I usually see close to 1GB in freelist (probably due to c_max being ~7G), ZFS File Data at ~0 (when doing primarycache=metadata) and still arc_meta_used is only about 2GB.. where''s the other 5GB of ''Kernel'' memory go? Large consumers in ::kmastat are: dnode_t 656 1260774 1540038 1051332608B 28987865 0 rnode4_cache 968 1000000 1000000 1024000000B 1000000 0 kmem_va_16384 16384 50389 59952 982253568B 68595231 0 kmem_va_4096 4096 1247328 1268576 901120000B 5788758 0 zio_buf_16384 16384 50413 50437 826359808B 251118744 0 zio_buf_512 512 1255757 1502096 769073152B 95757654 0 vn_cache 200 2022348 2780205 759181312B 12796286 0 kmem_va_8192 8192 14923 80336 658112512B 1239789 0 zio_buf_65536 65536 5156 5160 338165760B 49769732 0 dmu_buf_impl_t 192 1306533 1594440 326541312B 103503254 0 Could I do some trickery with creating a 5-6GB ramdisk, setting that as L2 secondarycache=metadata and primarycache=all ? Or would it be better (due to how ZFS migrates data from prim to sec) to have primarycache=metadata and secondarycache=all with the L2 ramdisk? How does ZFS currently like if the L2 is blank/missing at boot? Maybe these trickery will starve the DNLC too though.. /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se - 070-5858487
Richard Elling
2008-Oct-15 23:37 UTC
[zfs-discuss] Tuning for a file server, disabling data cache (almost)
Tomas ?gren wrote:> Hello. > > Executive summary: I want arc_data_limit (like arc_meta_limit, but for > data) and set it to 0.5G or so. Is there any way to "simulate" it? >We describe how to limit the size of the ARC cache in the Evil Tuning Guide. http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide -- richard> We have a cluster of linux frontends (http/ftp/rsync) for > Debian/Mozilla/etc archives and as a NFS disk backend we currently have > a DL145 running OpenSolaris (snv98) with one pool of 3 raidz2 with 10 > SATA disks each. > > The frontends have a local cache of a few raid0''d disks, but need to dip > into the backend every now and then because they don''t fit all the data. > Rsyncs dip into the backend for filesystem traversal, both when pulling > (for us: writing) data and sending to others. > > Obviously, the working data set (4TB or so right now) is quite a lot > larger than the RAM on the disk backend (8GB), so data disk cache is > mostly useless. Metadata cache is good, because rsync checks through the > tree every now and then and that''s the only part that has a chance of > fitting in RAM (about 1.5-2M files now). > > So I want to dedicate as much ram as possible to metadata cache and data > cache is of less importance. > > Right now, ZFS has a knob so you can limit the amount of metadata cache > (arc_meta_limit), but not limit the amount of data cache. > > I''ve tried to do a few tuning tricks, but they all seem to have > drawbacks. > > * zfs set primarycache=metadata myfs > - If record size is set to 128k and I get an application to read 32k, > then ZFS reads 128k, hands over 32k to the app and throws away 96k. > Repeat. (so I get 400% physical IO over logical IO). If I tune down > recordsize to 32k, then each disk will get 32/8 (8 data disks per > raidz2) = 4k IOs, which isn''t all optimal. 4k/IO * 100 IOPS * 8 disks > * 3 raidz2''s = 9600kB/s. I would prefer if each disk got larger IO > blocks than with rs=128k too.. > > * zfs:zfs_prefetch_disable > - prefetch in small amounts could be good, but with a limit on how much > it will keep of it. It uses up precious ram that I want for metadata > cache. > > * zfs:arc_meta_limit > - I''m raising it to about the size of the whole ARC. But I want > arc_data_limit and set it to 512M or so, just for temporary buffers. > > * ncsize > - I want to keep this high, but with ZFS it seems to use huge amounts of > data per dnode_t/zfs_znode_cache or something.. > > I think most of the performance issues will be solved if I let ZFS do > all of its prefetching, but limit the amount of data. > > Wouldn''t this problem arise on most file servers? > > > When checking ::arc / ::kmastat / ::memstat, I usually see close to 1GB > in freelist (probably due to c_max being ~7G), ZFS File Data at ~0 (when > doing primarycache=metadata) and still arc_meta_used is only about 2GB.. > where''s the other 5GB of ''Kernel'' memory go? > > Large consumers in ::kmastat are: > dnode_t 656 1260774 1540038 1051332608B 28987865 0 > rnode4_cache 968 1000000 1000000 1024000000B 1000000 0 > kmem_va_16384 16384 50389 59952 982253568B 68595231 0 > kmem_va_4096 4096 1247328 1268576 901120000B 5788758 0 > zio_buf_16384 16384 50413 50437 826359808B 251118744 0 > zio_buf_512 512 1255757 1502096 769073152B 95757654 0 > vn_cache 200 2022348 2780205 759181312B 12796286 0 > kmem_va_8192 8192 14923 80336 658112512B 1239789 0 > zio_buf_65536 65536 5156 5160 338165760B 49769732 0 > dmu_buf_impl_t 192 1306533 1594440 326541312B 103503254 0 > > > Could I do some trickery with creating a 5-6GB ramdisk, setting that as > L2 secondarycache=metadata and primarycache=all ? Or would it be better > (due to how ZFS migrates data from prim to sec) to have > primarycache=metadata and secondarycache=all with the L2 ramdisk? > How does ZFS currently like if the L2 is blank/missing at boot? > > Maybe these trickery will starve the DNLC too though.. > > /Tomas >
Tomas Ögren
2008-Oct-16 10:23 UTC
[zfs-discuss] Tuning for a file server, disabling data cache (almost)
On 15 October, 2008 - Richard Elling sent me these 4,3K bytes:> Tomas ?gren wrote: > > Hello. > > > > Executive summary: I want arc_data_limit (like arc_meta_limit, but for > > data) and set it to 0.5G or so. Is there any way to "simulate" it? > > > > We describe how to limit the size of the ARC cache in the Evil Tuning Guide. > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_GuideWill that limit the _data_ portion only, or the metadata as well? /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Darren J Moffat
2008-Oct-16 10:26 UTC
[zfs-discuss] Tuning for a file server, disabling data cache (almost)
Tomas ?gren wrote:> On 15 October, 2008 - Richard Elling sent me these 4,3K bytes: > >> Tomas ?gren wrote: >>> Hello. >>> >>> Executive summary: I want arc_data_limit (like arc_meta_limit, but for >>> data) and set it to 0.5G or so. Is there any way to "simulate" it? >>> >> We describe how to limit the size of the ARC cache in the Evil Tuning Guide. >> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide > > Will that limit the _data_ portion only, or the metadata as well?Recent builds of OpenSolaris have the ability to control on a per dataset basis what is put into the ARC and L2ARC using the primrarycache and secondarycache dataset properties: primarycache=all | none | metadata Controls what is cached in the primary cache (ARC). If this property is set to "all", then both user data and metadata is cached. If this property is set to "none", then neither user data nor metadata is cached. If this property is set to "metadata", then only metadata is cached. The default value is "all". secondarycache=all | none | metadata Controls what is cached in the secondary cache (L2ARC). If this property is set to "all", then both user data and metadata is cached. If this property is set to "none", then neither user data nor metadata is cached. If this property is set to "metadata", then only meta- data is cached. The default value is "all". -- Darren J Moffat
Tomas Ögren
2008-Oct-16 10:40 UTC
[zfs-discuss] Tuning for a file server, disabling data cache (almost)
On 16 October, 2008 - Darren J Moffat sent me these 1,7K bytes:> Tomas ?gren wrote: > > On 15 October, 2008 - Richard Elling sent me these 4,3K bytes: > > > >> Tomas ?gren wrote: > >>> Hello. > >>> > >>> Executive summary: I want arc_data_limit (like arc_meta_limit, but for > >>> data) and set it to 0.5G or so. Is there any way to "simulate" it? > >>> > >> We describe how to limit the size of the ARC cache in the Evil Tuning Guide. > >> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide > > > > Will that limit the _data_ portion only, or the metadata as well? > > Recent builds of OpenSolaris have the ability to control on a per > dataset basis what is put into the ARC and L2ARC using the > primrarycache and secondarycache dataset properties: > > primarycache=all | none | metadata > > Controls what is cached in the primary cache (ARC). If > this property is set to "all", then both user data and > metadata is cached. If this property is set to "none", > then neither user data nor metadata is cached. If this > property is set to "metadata", then only metadata is > cached. The default value is "all". > > secondarycache=all | none | metadata > > Controls what is cached in the secondary cache (L2ARC). > If this property is set to "all", then both user data > and metadata is cached. If this property is set to > "none", then neither user data nor metadata is cached. > If this property is set to "metadata", then only meta- > data is cached. The default value is "all".Yeah, the problem is (like I wrote in the first post), if I set primarycache=metadata, then ZFS prefetch will go into "horribly inefficient mode" where it will do lots of prefetching, but the prefetched data will be discarded immediately. 128k prefetch for a 32k read will throw away the other 96k immediately. Followed by another 128k prefetch for the next 32k read, throwing away the other 96k. So ZFS needs to have _some_ data cache, but I want to limit it for "short term data" only.. Setting data cache limit to 512M or something should work fine, but I want to leave the rest to metadata as that''s the place where it can help the most. Unless I can do some trickery with a ram disk and put that as secondarycache with data cache as well.. /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Ross
2008-Oct-16 11:33 UTC
[zfs-discuss] Tuning for a file server, disabling data cache (almost)
I might be misunderstanding here, but I don''t see how you''re going to improve on "zfs set primarycache=metadata". You complain that ZFS throws away 96kb of data if you''re only reading 32kb at a time, but then also complain that you are IO/s bound and that this is restricting your maximum transfer rate. If it''s io/s that is limiting you it makes no difference that ZFS is throwing away 96kb of data, you''re going to get the same iops and same throughput at your application whether you''re using 32k or 128k zfs record sizes. Also, you''re asking on one hand for each disk to get larger IO blocks, and on the other you are complaining that with large block sizes a lot of data is wasted. That looks like a contradictory argument to me as you can''t you''re asking have both of these. You just need to pick whichever one is more suited to your needs. Like I said, I may be misunderstanding, but I think you might be looking for something that you don''t actually need. -- This message posted from opensolaris.org
Tomas Ögren
2008-Oct-16 11:52 UTC
[zfs-discuss] Tuning for a file server, disabling data cache (almost)
On 16 October, 2008 - Ross sent me these 1,1K bytes:> I might be misunderstanding here, but I don''t see how you''re going to > improve on "zfs set primarycache=metadata". > > You complain that ZFS throws away 96kb of data if you''re only reading > 32kb at a time, but then also complain that you are IO/s bound and > that this is restricting your maximum transfer rate. If it''s io/s > that is limiting you it makes no difference that ZFS is throwing away > 96kb of data, you''re going to get the same iops and same throughput at > your application whether you''re using 32k or 128k zfs record sizes.But with 1Gb FC, if I''m reading 100MB/s it matters if 100MB or 25MB/s of those are actually used for something..> Also, you''re asking on one hand for each disk to get larger IO blocks, > and on the other you are complaining that with large block sizes a lot > of data is wasted... if I turn off data caching (and only leave metadata caching on).> That looks like a contradictory argument to me as you can''t you''re > asking have both of these. You just need to pick whichever one is > more suited to your needs. > > Like I said, I may be misunderstanding, but I think you might be > looking for something that you don''t actually need.Ok. ZFS prefetch can help, but I don''t want it to use up all my ram for data cache.. Using it for small temporary buffers while reading stuff from disk is good, but once data has been read from disk and used once (delivered over NFS), there is a very low probability that I will need it again before it has been flushed (because 4TB > 8GB). With default tuning, ZFS will keep stacking up these "use once" data blocks in the cache, pushing out metadata cache which actually has a good chance of being used again (metadata for all our files can fit in the 8GB of ram, but 4TB of data can''t). So if I could tell ZFS, "Here you have 512M (or whatever) of ARC space that you can use for prefetch etc. Leave the other 7.5GB of ram for metadata cache." /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Richard Elling
2008-Oct-16 15:10 UTC
[zfs-discuss] Tuning for a file server, disabling data cache (almost)
Tomas ?gren wrote:> On 16 October, 2008 - Darren J Moffat sent me these 1,7K bytes: > > >> Tomas ?gren wrote: >> >>> On 15 October, 2008 - Richard Elling sent me these 4,3K bytes: >>> >>> >>>> Tomas ?gren wrote: >>>> >>>>> Hello. >>>>> >>>>> Executive summary: I want arc_data_limit (like arc_meta_limit, but for >>>>> data) and set it to 0.5G or so. Is there any way to "simulate" it? >>>>> >>>>> >>>> We describe how to limit the size of the ARC cache in the Evil Tuning Guide. >>>> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide >>>> >>> Will that limit the _data_ portion only, or the metadata as well? >>> >> Recent builds of OpenSolaris have the ability to control on a per >> dataset basis what is put into the ARC and L2ARC using the >> primrarycache and secondarycache dataset properties: >> >> primarycache=all | none | metadata >> >> Controls what is cached in the primary cache (ARC). If >> this property is set to "all", then both user data and >> metadata is cached. If this property is set to "none", >> then neither user data nor metadata is cached. If this >> property is set to "metadata", then only metadata is >> cached. The default value is "all". >> >> secondarycache=all | none | metadata >> >> Controls what is cached in the secondary cache (L2ARC). >> If this property is set to "all", then both user data >> and metadata is cached. If this property is set to >> "none", then neither user data nor metadata is cached. >> If this property is set to "metadata", then only meta- >> data is cached. The default value is "all". >> > > Yeah, the problem is (like I wrote in the first post), if I set > primarycache=metadata, then ZFS prefetch will go into "horribly > inefficient mode" where it will do lots of prefetching, but the > prefetched data will be discarded immediately. > > 128k prefetch for a 32k read will throw away the other 96k immediately. > Followed by another 128k prefetch for the next 32k read, throwing away > the other 96k. >Are you sure this is a prefetch, or is it just the recordsize? The checksum is based on the record, so to validate the checksum the entire record must be read. If you have a fixed record record sized workload where the size < 128 kBytes, then you might adjust the recordsize parameter. -- richard> So ZFS needs to have _some_ data cache, but I want to limit it for > "short term data" only.. Setting data cache limit to 512M or something > should work fine, but I want to leave the rest to metadata as that''s the > place where it can help the most. > > Unless I can do some trickery with a ram disk and put that as > secondarycache with data cache as well.. > > /Tomas >
Al Hopper
2008-Oct-17 09:22 UTC
[zfs-discuss] Tuning for a file server, disabling data cache (almost)
On Thu, Oct 16, 2008 at 6:52 AM, Tomas ?gren <stric at acc.umu.se> wrote:> On 16 October, 2008 - Ross sent me these 1,1K bytes: > >> I might be misunderstanding here, but I don''t see how you''re going to >> improve on "zfs set primarycache=metadata". >> >> You complain that ZFS throws away 96kb of data if you''re only reading >> 32kb at a time, but then also complain that you are IO/s bound and >> that this is restricting your maximum transfer rate. If it''s io/s >> that is limiting you it makes no difference that ZFS is throwing away >> 96kb of data, you''re going to get the same iops and same throughput at >> your application whether you''re using 32k or 128k zfs record sizes. > > But with 1Gb FC, if I''m reading 100MB/s it matters if 100MB or 25MB/s of > those are actually used for something.. > >> Also, you''re asking on one hand for each disk to get larger IO blocks, >> and on the other you are complaining that with large block sizes a lot >> of data is wasted. > > .. if I turn off data caching (and only leave metadata caching on). > >> That looks like a contradictory argument to me as you can''t you''re >> asking have both of these. You just need to pick whichever one is >> more suited to your needs. >> >> Like I said, I may be misunderstanding, but I think you might be >> looking for something that you don''t actually need. > > Ok. ZFS prefetch can help, but I don''t want it to use up all my ram for > data cache.. Using it for small temporary buffers while reading stuff > from disk is good, but once data has been read from disk and used once > (delivered over NFS), there is a very low probability that I will need > it again before it has been flushed (because 4TB > 8GB). > > With default tuning, ZFS will keep stacking up these "use once" data > blocks in the cache, pushing out metadata cache which actually has a > good chance of being used again (metadata for all our files can fit in > the 8GB of ram, but 4TB of data can''t). > > So if I could tell ZFS, "Here you have 512M (or whatever) of ARC space > that you can use for prefetch etc. Leave the other 7.5GB of ram for > metadata cache." >I see where you are going with this - but it looks like your performance limit is more likely IOPS (I/O Ops/Sec) and latency, rather than disk subsystem bandwidth. You want ZFS to know which blocks to read to satisfy a request for the next "chunk" of a file (to be downloaded) so that each I/O operation reads data that you need, every time. But ZFS is filling up your cache with data blocks - which you are unlikely to re-use. If you set the ZFS caches for metadata only, you''ll probably find that you''re still not getting enough IOPS. The way you maximize IOPS is to use multiway mirrors for your data. For example, if you have a 5-way mirror composed of five 15k RPM drives, you''ll see 5 * 700 [1] IOPS - and my *guess* is that you''ll need about 2,500 IOPS to "busy out" a reasonably powerful (in terms of CPU) NFS server "feeding" a gigabit ethernet port. So what if some of those IO ops are being used to traverse metadata to get to the data blocks .... if you can do 3.5k IOPS, and you need 2.5k IOPS, you can "afford" to "waste" some of them because you don''t have all the metadata cached. I strongly suspect, that if you talked one of the ZFS developers into cooking you up an experimental version of ZFS to do as you ask above, that you would still not get the IOPS and system response (low latency) you need to get the real work done. The "correct" solution, aside from a multi-way mirror disk config, (and I know you don''t way to hear this,) is to equip your NFS server with 32Gb (or more) of RAM. You simply need a server style motherboard with 16 DIMM slots and inexpensive (Kingston) RAM at approx $23/gigabyte. Any server grade motherboard with one fast multi-core CPU will get the job done here. ZFS is designed to scale beautifully with more RAM, hence, your most viable solution is to use it the way the designers intended it to be used. Another point comes to mind here while thinking about the disk drives. We have three basic categories of disk drives with the following, broad, operational characteristics: a) inexpensive, large capacity SATA drives running at 7,200 RPM and providing, approximately, 300 IOPS. b) expensive, small capacity, SAS drives running at 15k RPM and providing, approx, 700 IOPS. c) SSD - currently not available with the cost per gigabyte to make them viable for your application, but capable of 3.5k+ IOPS And you need large (inexpensive) capacity but high IOPS - which you can only get from multi-way mirror configs. Solutions: 1) a multi-way mirror config of RAIDZ SATA disk drive based pools (lots of drives, lots of power) 2) a multi-way mirror config of WD VelociRaptor 10k RPM drives (the version that fits in a 2.5" bay is part # WD3000BLFS) I would strongly consider option 2) above; take a look at the capacity and IOPS available from this drive. Regards, -- Al Hopper Logical Approach Inc,Plano,TX al at logical-approach.com Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
Bob Friesenhahn
2008-Oct-17 15:51 UTC
[zfs-discuss] Tuning for a file server, disabling data cache (almost)
On Fri, 17 Oct 2008, Al Hopper wrote:> > a) inexpensive, large capacity SATA drives running at 7,200 RPM and > providing, approximately, 300 IOPS. > b) expensive, small capacity, SAS drives running at 15k RPM and > providing, approx, 700 IOPS.Al, Where are you getting the above IOPS estimates from? They seem to be inflated (by at least a factor of three) from any per-drive IOPS numbers I have seen mentioned elsewhere and from my own measurements. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Al Hopper
2008-Oct-17 16:29 UTC
[zfs-discuss] Tuning for a file server, disabling data cache (almost)
On Fri, Oct 17, 2008 at 10:51 AM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> On Fri, 17 Oct 2008, Al Hopper wrote: >> >> a) inexpensive, large capacity SATA drives running at 7,200 RPM and >> providing, approximately, 300 IOPS. >> b) expensive, small capacity, SAS drives running at 15k RPM and >> providing, approx, 700 IOPS. > > Al, > > Where are you getting the above IOPS estimates from? They seem to be > inflated (by at least a factor of three) from any per-drive IOPS numbers I > have seen mentioned elsewhere and from my own measurements. >These are my personal #s that I work to - and they are inflated because, in *my* real-world experience, where there is some degree of successful caching, this is what I mostly see. Please feel entirely free to substitute your own numbers. I should have included a disclaimer requesting the reader to substitute his/her own (observed) IOPS drive numbers. Happy Friday Bob! -- Al Hopper Logical Approach Inc,Plano,TX al at logical-approach.com Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
Marcelo Leal
2008-Oct-17 19:00 UTC
[zfs-discuss] Tuning for a file server, disabling data cache (almost)
Hello all, I think he got some point here... maybe that would be an interesting feature for that kind of workload. Caching all the metadata, would make the rsync task more fast (for many files). Try to cache the data is really waste of time, because the data will not be read again, and will just send away the "good" metadata cached. That is what i understand when he said about the 96k being descarded soon. He wants to "configure" an area to "copy the data", and that?s it. Leave my metadat cache alone. ;-) Leal. -- This message posted from opensolaris.org
David Collier-Brown
2008-Oct-17 21:40 UTC
[zfs-discuss] Tuning for a file server, disabling data cache (almost)
Marcelo Leal <opensolaris at posix.brte.com.br> wrote:> Hello all, > I think he got some point here... maybe that would be an interesting > feature for that kind of workload. Caching all the metadata, would make t > the rsync task more fast (for many files). Try to cache the data is really > waste of time, because the data will not be read again, and will just send > away the "good" metadata cached. That is what i understand when he said > about the 96k being descarded soon. He wants to "configure" an area to > "copy the data", and that?s it. Leave my metadata cache alone. ;-)That''s a common enough behavior pattern that Per Brinch Hansen defined a distinct filetype for it in, if memory serves, the RC 4000. As soon as it''s read, it''s gone. We saw this behavior on NFS servers in the Markham ACE lab, and absolutely with Samba almost everywhere. My Smarter Colleagues[tm] explained it as a normal pattern whenever you have front-end caching, as backend caching is then rendered far less effective, and sometimes directly disadvantageous. It sounded like, from the previous discussion, one could tune for it with the level 1 and 2 caches, although if I understood it properly, the particular machine also had to narrow a stripe for the particular load being discussed... --dave -- David Collier-Brown | Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest davecb at sun.com | -- Mark Twain cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191#