How would one determine if I should have a separate ZIL disk? We are using ZFS as the backend of our Guest Domains boot drives using LDom''s. And we are seeing bad/very slow write performance? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100504/d403b1d6/attachment.html>
On Tue, May 4, 2010 at 10:19 AM, Tony MacDoodle <tpsdoodle at gmail.com> wrote:> How would one determine if I should have a separate ZIL disk? We are using > ZFS as the backend of our Guest Domains boot drives using LDom''s. And we are > seeing bad/very slow write performance?There''s a dtrace script that Richard Elling wrote called zilstat.ksh. It''s available at http://www.richardelling.com/Home/scripts-and-programs-1/zilstat I''m not sure what the numbers mean (there''s info at the address) but anything other than lots of 0s indicates that the ZIL is being used. -B -- Brandon High : bhigh at freaks.com
On 04/05/2010 18:19, Tony MacDoodle wrote:> How would one determine if I should have a separate ZIL disk? We are > using ZFS as the backend of our Guest Domains boot drives using > LDom''s. And we are seeing bad/very slow write performance? >if you can disable ZIL and compare the performance to when it is off it will give you an estimate of what''s the absolute maximum performance increase (if any) by having a dedicated ZIL device. -- Robert Milkowski http://milek.blogspot.com
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Robert Milkowski > > if you can disable ZIL and compare the performance to when it is off it > will give you an estimate of what''s the absolute maximum performance > increase (if any) by having a dedicated ZIL device.I''ll second this suggestion. It''ll cost you nothing to disable the ZIL temporarily. (You have to dismount the filesystem twice. Once to disable the ZIL, and once to re-enable it.) Then you can see if performance is good. If performance is good, then you''ll know you need to accelerate your ZIL. (Because disabled ZIL is the fastest thing you could possibly ever do.) Generally speaking, you should not disable your ZIL for the long run. But in some cases, it makes sense. Here''s how you determine if you want to disable your ZIL permanently: First, understand that with the ZIL disabled, all sync writes are treated as async writes. This is buffered in ram before being written to disk, so the kernel can optimize and aggregate the write operations into one big chunk. No matter what, if you have an ungraceful system shutdown, you will lose all the async writes that were waiting in ram. If you have ZIL disabled, you will also lose the sync writes that were waiting in ram (because those are being handled as async.) In neither case do you have data or filesystem corruption. The risk of running with no ZIL is: In the case of ungraceful shutdown, in addition to the (up to 30 sec) async writes that will be lost, you will also lose up to 30 sec of sync writes.
On Wed, May 05, 2010 at 11:32:23PM -0400, Edward Ned Harvey wrote:> > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > > bounces at opensolaris.org] On Behalf Of Robert Milkowski > > > > if you can disable ZIL and compare the performance to when it is off it > > will give you an estimate of what''s the absolute maximum performance > > increase (if any) by having a dedicated ZIL device. > > I''ll second this suggestion. It''ll cost you nothing to disable the ZIL > temporarily. (You have to dismount the filesystem twice. Once to disable > the ZIL, and once to re-enable it.) Then you can see if performance is > good. If performance is good, then you''ll know you need to accelerate your > ZIL. (Because disabled ZIL is the fastest thing you could possibly ever > do.) > > Generally speaking, you should not disable your ZIL for the long run. But > in some cases, it makes sense. > > Here''s how you determine if you want to disable your ZIL permanently: > > First, understand that with the ZIL disabled, all sync writes are treated as > async writes. This is buffered in ram before being written to disk, so the > kernel can optimize and aggregate the write operations into one big chunk. > > No matter what, if you have an ungraceful system shutdown, you will lose all > the async writes that were waiting in ram. > > If you have ZIL disabled, you will also lose the sync writes that were > waiting in ram (because those are being handled as async.) > > In neither case do you have data or filesystem corruption. >ZFS probably is still OK, since it''s designed to handle this (?), but the data can''t be OK if you lose 30 secs of writes.. 30 secs of writes that have been ack''d being done to the servers/applications..> The risk of running with no ZIL is: In the case of ungraceful shutdown, in > addition to the (up to 30 sec) async writes that will be lost, you will also > lose up to 30 sec of sync writes. >-- Pasi
On 6 maj 2010, at 08.17, Pasi K?rkk?inen wrote:> On Wed, May 05, 2010 at 11:32:23PM -0400, Edward Ned Harvey wrote: >>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >>> bounces at opensolaris.org] On Behalf Of Robert Milkowski >>> >>> if you can disable ZIL and compare the performance to when it is off it >>> will give you an estimate of what''s the absolute maximum performance >>> increase (if any) by having a dedicated ZIL device. >> >> I''ll second this suggestion. It''ll cost you nothing to disable the ZIL >> temporarily. (You have to dismount the filesystem twice. Once to disable >> the ZIL, and once to re-enable it.) Then you can see if performance is >> good. If performance is good, then you''ll know you need to accelerate your >> ZIL. (Because disabled ZIL is the fastest thing you could possibly ever >> do.) >> >> Generally speaking, you should not disable your ZIL for the long run. But >> in some cases, it makes sense. >> >> Here''s how you determine if you want to disable your ZIL permanently: >> >> First, understand that with the ZIL disabled, all sync writes are treated as >> async writes. This is buffered in ram before being written to disk, so the >> kernel can optimize and aggregate the write operations into one big chunk. >> >> No matter what, if you have an ungraceful system shutdown, you will lose all >> the async writes that were waiting in ram. >> >> If you have ZIL disabled, you will also lose the sync writes that were >> waiting in ram (because those are being handled as async.) >> >> In neither case do you have data or filesystem corruption. >> > > ZFS probably is still OK, since it''s designed to handle this (?), > but the data can''t be OK if you lose 30 secs of writes.. 30 secs of writes > that have been ack''d being done to the servers/applications..Entirely right! This is the case for many local user writes anyway, since many applications doesn''t sync the written data to disk. But if you have an application, protocol and/or user that demands or expects persistant storage, disabling ZIL of course could be fatal in case of a crash. Examples are mail servers and NFS servers. /ragge
> From: Pasi K?rkk?inen [mailto:pasik at iki.fi] > > > In neither case do you have data or filesystem corruption. > > > > ZFS probably is still OK, since it''s designed to handle this (?), > but the data can''t be OK if you lose 30 secs of writes.. 30 secs of > writes > that have been ack''d being done to the servers/applications..What I meant was: Yes there''s data loss. But no corruption. In other filesystems, if you have an ungraceful shutdown while the filesystem is writing, since filesystems such as EXT3 perform file-based (or inode-based) block write operations, then you can have files whose contents have been corrupted... Some sectors of the file still in their "old" state, and some sectors of the file in their "new" state. Likewise, in something like EXT3, you could have some file fully written, while another one hasn''t been written yet, but should have been. (AKA, some files written out of order.) In the case of EXT3, since it is a journaled filesystem, the journal only keeps the *filesystem* consistent after a crash. It''s still possible to have corrupted data in the middle of a file. These things don''t happen in ZFS. ZFS takes journaling to a whole new level. Instead of just keeping your filesystem consistent, it also keeps your data consistent. Yes, data loss is possible when a system crashes, but the filesystem will never have any corruption. These are separate things now, and never were before. In ZFS, losing n-seconds of writes leading up to the crash will never result in files partially written, or written out of order. Every atomic write to the filesystem results in a filesystem-consistent and data-consistent view of *some* valid form of all the filesystem and data within it.
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Ragnar Sundblad > > But if you have an application, protocol and/or user that demands > or expects persistant storage, disabling ZIL of course could be fatal > in case of a crash. Examples are mail servers and NFS servers.Basically, anything which writes to disk based on requests from something across a network. Because if your system goes down and comes back up, thinking itself is consistent, but there''s one client thinking "A" and another client thinking "B" ... even though your server is consistent, the world isn''t. Another great example would be if your server handles credit card transactions. If a user clicks "buy now" in a web interface, and the server contacts Visa or MasterCard, records the transaction, and then crashes before it records the transaction to its own disks ... Then the server would come up and have no recollection of that transaction. But the user, and Visa/Mastercard certainly would remember it.
On May 6, 2010, at 8:34 AM, Edward Ned Harvey <solaris2 at nedharvey.com> wrote:>> From: Pasi K?rkk?inen [mailto:pasik at iki.fi] >> >>> In neither case do you have data or filesystem corruption. >>> >> >> ZFS probably is still OK, since it''s designed to handle this (?), >> but the data can''t be OK if you lose 30 secs of writes.. 30 secs of >> writes >> that have been ack''d being done to the servers/applications.. > > What I meant was: Yes there''s data loss. But no corruption. In > other > filesystems, if you have an ungraceful shutdown while the filesystem > is > writing, since filesystems such as EXT3 perform file-based (or inode- > based) > block write operations, then you can have files whose contents have > been > corrupted... Some sectors of the file still in their "old" state, > and some > sectors of the file in their "new" state. Likewise, in something > like EXT3, > you could have some file fully written, while another one hasn''t been > written yet, but should have been. (AKA, some files written out of > order.) > > In the case of EXT3, since it is a journaled filesystem, the journal > only > keeps the *filesystem* consistent after a crash. It''s still > possible to > have corrupted data in the middle of a file.I believe ext3 has an option to journal data as well as metadata, it just defaults to metadata. I don''t believe out-of-order writes are so much an issue any more since Linux gained write barrier support (and most file systems and block devices now support it).> These things don''t happen in ZFS. ZFS takes journaling to a whole new > level. Instead of just keeping your filesystem consistent, it also > keeps > your data consistent. Yes, data loss is possible when a system > crashes, but > the filesystem will never have any corruption. These are separate > things > now, and never were before.ZFS does NOT have a journal, it has an intent log which is completely different. A journal logs operations that are to be performed later (the journal is read, the operation performed) an intent log logs operations that are being performed now, when the disk flushes the intent entry is marked complete. ZFS is consistent by the nature of COW which means a partial write will not become part of the file system (the old block pointer isn''t updated till the new block completes the write).> In ZFS, losing n-seconds of writes leading up to the crash will > never result > in files partially written, or written out of order. Every atomic > write to > the filesystem results in a filesystem-consistent and data- > consistent view > of *some* valid form of all the filesystem and data within it.ZFS file system will always be consistent, but if an application doesn''t flush it''s data, then it can definitely have partially written data. -Ross
On Tue, May 4, 2010 at 11:34 AM, Brandon High <bhigh at freaks.com> wrote:> On Tue, May 4, 2010 at 10:19 AM, Tony MacDoodle <tpsdoodle at gmail.com> > wrote: > > How would one determine if I should have a separate ZIL disk? We are > using > > ZFS as the backend of our Guest Domains boot drives using LDom''s. And we > are > > seeing bad/very slow write performance? > > There''s a dtrace script that Richard Elling wrote called zilstat.ksh. > It''s available at > http://www.richardelling.com/Home/scripts-and-programs-1/zilstat > > I''m not sure what the numbers mean (there''s info at the address) but > anything other than lots of 0s indicates that the ZIL is being used.On my workstation, I peg my IOPS when using VirtialBox set to run on zvols. The zilstat line comes back with about 3000 total synchronous writes per 30sec. Which means that my disks are doing about 90 sync IOPS write. That is about the upper limit for 7200rpm disks ( from what I understand ). This 3000 number doesn''t really change much over time when running with IO load. Disabling the ZIL, I get much better performance, in terms of IO throughput. This tells me that the ZIL is the bottleneck. I will be getting an SSD soon. -- Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100506/45fb50c9/attachment.html>