Hello, I have a unique deployment scenario where the marriage of ZFS zvol and UFS seem like a perfect match. Here are the list of feature requirements for my use case: * snapshots * rollback * copy-on-write * ZFS level redundancy (mirroring, raidz, ...) * compression * filesystem cache control (control what''s in and out) * priming the filesystem cache (dd if=file of=/dev/null) * control the upper boundary of RAM consumed by the filesystem. This helps me to avoid contention between the filesystem cache and my application. Before zfs came along, I could achieve all but rollback, copy-on-write and compression through UFS+some volume manager. I would like to use ZFS but with ZFS I cannot prime the cache and I don''t have the ability to control what is in the cache (e.g. like with the directio UFS option). If I create a ZFS zvol and format it as a UFS filesystem, it seems like I get the best of both worlds. Can anyone poke holes in this strategy? I think the biggest possible risk factor is if the ZFS zvol still uses the arc cache. If this is the case, I may be double-dipping on the filesystem cache. e.g. The UFS filesystem uses some RAM and ZFS zvol uses some RAM for filesystem cache. Is this a true statement or does the zvol use a minimal amount of system RAM? Lastly, if I were to try this scenario, does anyone know how to monitor the RAM consumed by the zvol and UFS? e.g. Is there a dtrace script for monitoring ZFS or UFS memory consuption? Thanks in advance, Brad
Brad Diggs wrote:> I would like to use ZFS but with ZFS I cannot prime the cache > and I don''t have the ability to control what is in the cache > (e.g. like with the directio UFS option).Why do you believe you need that at all ? What do you do to "prime" the cache with UFS and what benefit do you think it is giving you ? Have you tried just using ZFS and found it doesn''t perform as you need or are you assuming it won''t because it doesn''t have directio ? -- Darren J Moffat
Hello Darren, Please find responses in line below... On Fri, 2008-02-08 at 10:52 +0000, Darren J Moffat wrote:> Brad Diggs wrote: > > I would like to use ZFS but with ZFS I cannot prime the cache > > and I don''t have the ability to control what is in the cache > > (e.g. like with the directio UFS option). > > Why do you believe you need that at all ?My application is directory server. The #1 resource that directory needs to make maximum utilization of is RAM. In order to do that, I want to control every aspect of RAM utilization both to safely use as much RAM as possible AND avoid contention among things trying to use RAM. Lets consider the following example. A customer has a 50M entry directory. The sum of the data (db3 files) is approximately 60GB. However, there is another 2GB for the root filesystem, 30GB for the changelog, 1GB for the transaction logs, and 10GB for the informational logs. The system on which directory server will run has only 64GB of RAM. The system is configured with the following partitions: FS Used(GB) Description / 2 root /db 60 directory data /logs 41 changelog, txn logs, and info logs swap 10 system swap I prefer to keep the directory db cache and entry caches relatively small. So the db cache is 2GB and the entry cache is 100M. This leaves roughly 63GB of RAM for my 60GB of directory data and Solaris. The only way to ensure that the directory data (/db) is the only thing in the filesystem cache is to set directio on / (root) and (/logs).> What do you do to "prime" the cache with UFScd <ds_instance_dir>/db for i in `find . -name ''*.db3"` do dd if="${i}" of=/dev/null done> and what benefit do you think it is giving you ?Priming the directory server data into filesystem cache reduces ldap response time for directory data in the filesystem cache. This could mean the difference between a sub ms response time and a response time on the order of tens or hundreds of ms depending on the underlying storage speed. For telcos in particular, minimal response time is paramount. Another common scenario is when we do benchmark bakeoffs with another vendor''s product. If the data isn''t pre- primed, then ldap response time and throughput will be artificially degraded until the data is primed into either the filesystem or directory (db or entry) cache. Priming via ldap operations can take many hours or even days depending on the number of entries in the directory server. However, priming the same data via dd takes minutes to hours depending on the size of the files. As you know in benchmarking scenarios, time is the most limited resource that we typically have. Thus, priming via dd is much preferred. Lastly, in order to achieve optimal use of available RAM, we use directio for the root (/) and other non-data filesystems. This makes certain that the only data in the filesystem cache is the directory data.> Have you tried just using ZFS and found it doesn''t perform as you need > or are you assuming it won''t because it doesn''t have directio ?We have done extensive testing with ZFS and love it. The three areas lacking for our use cases are as follows: * No ability to control what is in cache. e.g. no directio * No absolute ability to apply an upper boundary to the amount of RAM consumed by ZFS. I know that the arc cache has a control that seems to work well. However, the arc cache is only part of ZFS ram consumption. * No ability to rapidly prime the ZFS cache with the data that I want in the cache. I hope that helps give understanding to where I am coming from! Brad
Priming the cache for ZFS should work at least after boot When freemem is large; any read block will make it to cache. Post boot when memory is primed with something else (what?) then it gets more difficult for both UFS and ZFS to guess what to keep in caches. Did you try priming ZFS after boot ? So next you seem to suffer because your sequential write to log files appear to displace from the ARC the more useful DB files (I''d be interested to see if this still occur after you''ve primed the ZFS cache after boot). Note that if your logfile rate is huge (dd like) then ZFS cache management will suffer but that is well on it''s way to being fixed. But for DS, I would think that the log rate would be more reasonable and that your storage is able to keep up. That gives ZFS cache management a fighting chance to store the reused data over sequential writes. If the default behavior is not working for you, we''ll need to consider the ARC behavior in this case. I don''t see why it should not work out of the box. But manual control will come also in the form of this DIO like feature : 6429855 Need way to tell ZFS that caching is a lost cause While we manage to try and solve your problem out the box, you might also have a background process that keeps priming the cache at a low I/O rate. Not a great workaround, but should be effective. -r Brad Diggs writes: > Hello Darren, > > Please find responses in line below... > > On Fri, 2008-02-08 at 10:52 +0000, Darren J Moffat wrote: > > Brad Diggs wrote: > > > I would like to use ZFS but with ZFS I cannot prime the cache > > > and I don''t have the ability to control what is in the cache > > > (e.g. like with the directio UFS option). > > > > Why do you believe you need that at all ? > > My application is directory server. The #1 resource that > directory needs to make maximum utilization of is RAM. In > order to do that, I want to control every aspect of RAM > utilization both to safely use as much RAM as possible AND > avoid contention among things trying to use RAM. > > Lets consider the following example. A customer has a > 50M entry directory. The sum of the data (db3 files) is > approximately 60GB. However, there is another 2GB for the > root filesystem, 30GB for the changelog, 1GB for the > transaction logs, and 10GB for the informational logs. > > The system on which directory server will run has only > 64GB of RAM. The system is configured with the following > partitions: > > FS Used(GB) Description > / 2 root > /db 60 directory data > /logs 41 changelog, txn logs, and info logs > swap 10 system swap > > I prefer to keep the directory db cache and entry caches > relatively small. So the db cache is 2GB and the entry > cache is 100M. This leaves roughly 63GB of RAM for my 60GB > of directory data and Solaris. The only way to ensure that > the directory data (/db) is the only thing in the filesystem > cache is to set directio on / (root) and (/logs). > > > What do you do to "prime" the cache with UFS > > cd <ds_instance_dir>/db > for i in `find . -name ''*.db3"` > do > dd if="${i}" of=/dev/null > done > > > and what benefit do you think it is giving you ? > > Priming the directory server data into filesystem cache > reduces ldap response time for directory data in the > filesystem cache. This could mean the difference between > a sub ms response time and a response time on the order of > tens or hundreds of ms depending on the underlying storage > speed. For telcos in particular, minimal response time is > paramount. > > Another common scenario is when we do benchmark bakeoffs > with another vendor''s product. If the data isn''t pre- > primed, then ldap response time and throughput will be > artificially degraded until the data is primed into either > the filesystem or directory (db or entry) cache. Priming > via ldap operations can take many hours or even days > depending on the number of entries in the directory server. > However, priming the same data via dd takes minutes to hours > depending on the size of the files. > > As you know in benchmarking scenarios, time is the most limited > resource that we typically have. Thus, priming via dd is much > preferred. > > Lastly, in order to achieve optimal use of available RAM, we > use directio for the root (/) and other non-data filesystems. > This makes certain that the only data in the filesystem cache > is the directory data. > > > Have you tried just using ZFS and found it doesn''t perform as you need > > or are you assuming it won''t because it doesn''t have directio ? > > We have done extensive testing with ZFS and love it. The three > areas lacking for our use cases are as follows: > * No ability to control what is in cache. e.g. no directio > * No absolute ability to apply an upper boundary to the amount > of RAM consumed by ZFS. I know that the arc cache has a > control that seems to work well. However, the arc cache is > only part of ZFS ram consumption. > * No ability to rapidly prime the ZFS cache with the data that > I want in the cache. > > I hope that helps give understanding to where I am coming from! > > Brad > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss