I had a question to the group: In the different ZFS discussions in zfs-discuss, I''ve seen a recurring theme of disabling write cache on disks. I would think that the performance increase of using write cache would be an advantage, and that write cache should be enabled. Realistically, I can see only one situation where write cache would be an issue. If there is no way to flush the write cache, it would be possible for corruption to occur due to a power loss. The question really devolves into: - Does the reliability increase due to the disabling write cache offset the failure potential incurred by using write cache? In recent Linux distributions, when the kernel shuts down, the kernel will force the scsi drives to flush their write cache. I don''t know if solaris does the same but I think not, due to the ongoing focus of solaris and disabling write cache. Would having a feature like forcing disk flush be sufficient to enable write cache? ----- Gregory Shaw, IT Architect Phone: (303) 673-8273 Fax: (303) 673-2773 ITCTO Group, Sun Microsystems Inc. 1 StorageTek Drive ULVL4-382 greg.shaw at sun.com (work) Louisville, CO 80028-4382 shaw at fmsoft.com (home) "When Microsoft writes an application for Linux, I''ve Won." - Linus Torvalds
Gregory Shaw wrote:> In recent Linux distributions, when the kernel shuts down, the > kernel will force the scsi drives to flush their write cache. I don''t > know if solaris does the same but I think not, due to the ongoing focus > of solaris and disabling write cache.The Solaris sd(7D) SCSI disk driver issues a SYNCHRONIZE CACHE command upon the last close of the device. Rgds, Ed -- Edmund Nadolski Sun Microsystems Inc. ed.nadolski at sun.com
Gregory Shaw wrote:> I had a question to the group: > In the different ZFS discussions in zfs-discuss, I''ve seen a > recurring theme of disabling write cache on disks. I would think that > the performance increase of using write cache would be an advantage, and > that write cache should be enabled. > Realistically, I can see only one situation where write cache would > be an issue. If there is no way to flush the write cache, it would be > possible for corruption to occur due to a power loss.There are two failure modes associated with disk write caches: 1) the disk write cache for performance reasons doesn''t write back data (to diff. blocks) to the platter in the order they were received, so transactional ordering isn''t maintained and corruption can occur. 2) writes to different can disks have different caching policies, so transactions to files on different filesystems may not complete correctly during a power failure. ZFS enables the write cache and flushes it when committing transaction groups; this insures that all of a transaction group appears or does not appear on disk. - Bart -- Bart Smaalders Solaris Kernel Performance barts at cyber.eng.sun.com http://blogs.sun.com/barts
On 5/26/06, Bart Smaalders <bart.smaalders at sun.com> wrote:> > There are two failure modes associated with disk write caches:Failure modes aside, is there any benefit to a write cache when command queueing is available? It seems that the primary advantage is in allowing old ATA hardware to issue writes in an asynchronous manner. Beyond that, it doesn''t really make much sense, if the queue is deep enough.> ZFS enables the write cache and flushes it when committing transaction > groups; this insures that all of a transaction group appears or does > not appear on disk.How often is the write cache flushed, and is it synchronous? Unless I am misunderstanding something, wouldn''t it be better to use ordered tags, and avoid cache flushes all together? Also, does ZFS disable the disk read cache? It seems that this would be counterproductive with ZFS. Chris
> ZFS enables the write cache and flushes it when committing transaction > groups; this insures that all of a transaction group appears or does > not appear on disk.It also flushes the disk write cache before returning from every synchronous request (eg fsync, O_DSYNC). This is done after writing out the intent log blocks. Neil
On Fri, 2006-05-26 at 17:40, Bart Smaalders wrote:> Gregory Shaw wrote: > > I had a question to the group: > > In the different ZFS discussions in zfs-discuss, I''ve seen a > > recurring theme of disabling write cache on disks. I would think that > > the performance increase of using write cache would be an advantage, and > > that write cache should be enabled. > > Realistically, I can see only one situation where write cache would > > be an issue. If there is no way to flush the write cache, it would be > > possible for corruption to occur due to a power loss. > > There are two failure modes associated with disk write caches: > > 1) the disk write cache for performance reasons doesn''t write back > data (to diff. blocks) to the platter in the order they were > received, so transactional ordering isn''t maintained and > corruption can occur. >That''s a pretty nasty situation. I would think that behaviour would violate some SCSI out-of-order standard.> 2) writes to different can disks have different caching policies, so > transactions to files on different filesystems may not complete > correctly during a power failure. >I''ve always felt that drives should have a small battery for this purpose. However, what seems to make sense doesn''t usually make it into products.> ZFS enables the write cache and flushes it when committing transaction > groups; this insures that all of a transaction group appears or does > not appear on disk. >Really? It enables cache on disabled devices? That''s pretty cool, if so.> - Bart > > > > > -- > Bart Smaalders Solaris Kernel Performance > barts at cyber.eng.sun.com http://blogs.sun.com/barts
Bart Smaalders schrieb:> ZFS enables the write cache and flushes it when committing transaction > groups; this insures that all of a transaction group appears or does > not appear on disk.What about IDE drives (PATA, SATA). Currently only the sd driver implements enabling/disabling the write cache? Also the WCE bit isn''t reset if a zpool is destroyed. If you destroy a zpool and later create a "classical" SVM+UFS volume on these disks you run into the disk of data corruption. Daniel
>What about IDE drives (PATA, SATA). Currently only the sd driver implements >enabling/disabling the write cache?They typically have write caches enabled by default; and some don''t take ckindly to disabling the write cache or do not allow it at all.>Also the WCE bit isn''t reset if a zpool is destroyed. If you destroy a zpool >and later create a "classical" SVM+UFS volume on these disks you run into >the disk of data corruption.Looks like something which needs to be fixed at a layer different than ZFS as you may also pull disks and move them to other systems. Casper
Casper.Dik at sun.com schrieb:>> What about IDE drives (PATA, SATA). Currently only the sd driver implements >> enabling/disabling the write cache? > > They typically have write caches enabled by default; and some > don''t take ckindly to disabling the write cache or do not allow > it at all.But you could at least try. enabling/disabling the write cache and flushing the cache are both described in the ATA/ATAPI standard (at least ATA/ATAPI-5 onwards). Currently the IDE driver does (by default) exactly the opposite: it always turns on the write cache regardless of how you set up the drive: http://cvs.opensolaris.org/source/xref/on/usr/src/uts/intel/io/dktp/controller/ata/ata_disk.c#ata_write_cache But disabling the write cache on ATA drives makes only sense if you support command queueing (TCQ/NCQ). Otherwise write performance of these drives is plain bad. Daniel
Roch Bourbonnais - Performance Engineering
2006-May-29 08:46 UTC
[zfs-discuss] hard drive write cache
Chris Csanady writes: > On 5/26/06, Bart Smaalders <bart.smaalders at sun.com> wrote: > > > > There are two failure modes associated with disk write caches: > > Failure modes aside, is there any benefit to a write cache when command > queueing is available? It seems that the primary advantage is in allowing > old ATA hardware to issue writes in an asynchronous manner. Beyond > that, it doesn''t really make much sense, if the queue is deep enough. The write cache decreases the effective latency by a very large factor (> 50 X on a quick test). But I guess your point is more, about comparing "write_cache on + flush" to "write_cache off + ordered tags" ? > > > ZFS enables the write cache and flushes it when committing transaction > > groups; this insures that all of a transaction group appears or does > > not appear on disk. > > How often is the write cache flushed, and is it synchronous? Unless I am > misunderstanding something, wouldn''t it be better to use ordered tags, and > avoid cache flushes all together? Write cache is flushed everytime ZFS needs to insure stable storage. For every zil commit we flush the cache. This insure for instance that O_DSYNC writes are completed before syscall returns. So flush when required and it''s synchronous (flush then wait for flush completion). What benefit do you see in the use of ordered tags over the current situation ? > > Also, does ZFS disable the disk read cache? It seems that this would be > counterproductive with ZFS. It''s on on my system. What don''t you like here ? > > Chris > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss