Hi, With dual-Xeon, 4GB of Ram (will be 8GB in a couple of weeks), two PCI-X 3Ware cards 7 Sata disks (750G & 1T) over FreeBSD 8.0 (But i think it''s OS independant), i made some tests. The disks are exported as JBOD, but i tried enabling/disabling write-cache . I tried with UFS and ZFS on the same disk and the difference is overwhelming. With a 1GB file (greater than the ZFS cache ?): With Writecache disabled UFS time cp /mnt/ufs/rnd /mnt/ufs/rnd2 real 2m58.073s ZFS time cp /zfs/rnd /zfs/rnd2 real 4m33.726s On the same card with WCache enabled UFS time cp /mnt/ufs/rnd /mnt/ufs/rnd2 real 0m31.406s ZFS time cp /zfs/rnd /zfs/rnd2 real 1m0.199s So, despite the fact that ZFS can be twice slower than UFS, it is clear that Write-Cache have to be enabled on the controller. Any drawback (except that without BBU, i''ve got a pb in case of power loss) in enabling the WC with ZFS ? Thanks. Best regards. -- Lyc?e Maximilien Perret, Alfortville
You want the write cache enabled, for sure, with ZFS. ZFS will do the right thing about ensuring write cache is flushed when needed. For the case of a single JBOD, I don''t find it surprising that UFS beats ZFS. ZFS is designed for more complex configurations, and provides much better data integrity guarantees than UFS (so critical data is written to the disk more than once, and in areas of the drive that are not adjacent to improve the chances of recovery in the event of a localized media failure). That said, you could easily accelerate the write performance of ZFS on that single JBOD by adding a small SSD log device. (4GB would be enough. :-) - Garrett On Thu, 2010-07-08 at 15:10 +0200, Philippe Schwarz wrote:> Hi, > > With dual-Xeon, 4GB of Ram (will be 8GB in a couple of weeks), two PCI-X > 3Ware cards 7 Sata disks (750G & 1T) over FreeBSD 8.0 (But i think it''s > OS independant), i made some tests. > > The disks are exported as JBOD, but i tried enabling/disabling write-cache . > > I tried with UFS and ZFS on the same disk and the difference is > overwhelming. > > With a 1GB file (greater than the ZFS cache ?): > > With Writecache disabled > UFS > time cp /mnt/ufs/rnd /mnt/ufs/rnd2 > real 2m58.073s > ZFS > time cp /zfs/rnd /zfs/rnd2 > real 4m33.726s > > On the same card with WCache enabled > UFS > time cp /mnt/ufs/rnd /mnt/ufs/rnd2 > real 0m31.406s > ZFS > time cp /zfs/rnd /zfs/rnd2 > real 1m0.199s > > So, despite the fact that ZFS can be twice slower than UFS, it is clear > that Write-Cache have to be enabled on the controller. > > Any drawback (except that without BBU, i''ve got a pb in case of power > loss) in enabling the WC with ZFS ? > > > Thanks. > Best regards. > >
On Thu, Jul 8, 2010 at 6:10 AM, Philippe Schwarz <phil at schwarz-fr.net> wrote:> With dual-Xeon, 4GB of Ram (will be 8GB in a couple of weeks), two PCI-X > 3Ware cards 7 Sata disks (750G & 1T) over FreeBSD 8.0 (But i think it''s > OS independant), i made some tests. > > The disks are exported as JBOD, but i tried enabling/disabling write-cache .Don''t use JBOD, as that disabled a lot of the advanced features of the 3Ware controllers. Instead, create "Single Disk" arrays for each disk. That way, you get all the management features of the card, all the advanced features of the card (StorSave policies, command queuing, separate read/write cache policies, SMART monitoring, access to the onboard cache, etc). With only the RAID hardware disabled on the controllers. You should get better performance using Single Disk over JBOD (which basically turns your expensive RAID controller into a "dumb" SATA controller). -- Freddie Cash fjwcash at gmail.com
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Le 08/07/2010 18:52, Freddie Cash a ?crit :> On Thu, Jul 8, 2010 at 6:10 AM, Philippe Schwarz <phil at schwarz-fr.net> wrote: >> With dual-Xeon, 4GB of Ram (will be 8GB in a couple of weeks), two PCI-X >> 3Ware cards 7 Sata disks (750G & 1T) over FreeBSD 8.0 (But i think it''s >> OS independant), i made some tests.OK, thanks for all the answers: - - Test if controllers/disks honor cache-flush command - - Buy an SSD for both L2ARC&ZIL - - Use Single disks arrays instead of JBOD Ok for both SSD&Array, but how can i know if my ctrl+disk are not lying about flushing their cache when asked to ? It reminds me of a thread here related to this ""feature"" ;-) Thanks. Best regards. - -- Lyc?e Maximilien Perret, Alfortville -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkw2R5AACgkQlhqCFkbqHRbSOwCggYR29QsrWGGN1JvWuweDT4cH NPEAnjsi+nzemThEWdCvtwn8ZZ37Zq0b =86UV -----END PGP SIGNATURE-----
On 8 jul 2010, at 17.23, Garrett D''Amore wrote:> You want the write cache enabled, for sure, with ZFS. ZFS will do the > right thing about ensuring write cache is flushed when needed.That is not for sure at all, it all depends on what "the right thing" is, which depends on the application and/or what other measures there are for redundancy. Many raid controllers will respond to a flush [to persistant storage] by doing nothing, since the data already is in its write cache buffer which (hopefully) is battery backed or something. They will typically NOT flush the data to the drives and issue flush commands to the disks, and wait for that to finish, before responding. This means, that if your raid controller dies on you, some of your most recently written data will be gone. It may very well be vital metadata, and in the zfs world it may be data or metadata from several of the latest txgs, that is gone. Potentially, it could leave your file system corrupt, or arbitrary pieces of your data could be lost. Maybe that is acceptable in the application, or maybe this is compensated for with other means as multiple raid controllers with the data mirrored over all of them which could reduce the risks by a lot, but you have to evaluate each separate case by itself. /ragge> For the case of a single JBOD, I don''t find it surprising that UFS beats > ZFS. ZFS is designed for more complex configurations, and provides much > better data integrity guarantees than UFS (so critical data is written > to the disk more than once, and in areas of the drive that are not > adjacent to improve the chances of recovery in the event of a localized > media failure). That said, you could easily accelerate the write > performance of ZFS on that single JBOD by adding a small SSD log device. > (4GB would be enough. :-) > > - Garrett > > On Thu, 2010-07-08 at 15:10 +0200, Philippe Schwarz wrote: >> Hi, >> >> With dual-Xeon, 4GB of Ram (will be 8GB in a couple of weeks), two PCI-X >> 3Ware cards 7 Sata disks (750G & 1T) over FreeBSD 8.0 (But i think it''s >> OS independant), i made some tests. >> >> The disks are exported as JBOD, but i tried enabling/disabling write-cache . >> >> I tried with UFS and ZFS on the same disk and the difference is >> overwhelming. >> >> With a 1GB file (greater than the ZFS cache ?): >> >> With Writecache disabled >> UFS >> time cp /mnt/ufs/rnd /mnt/ufs/rnd2 >> real 2m58.073s >> ZFS >> time cp /zfs/rnd /zfs/rnd2 >> real 4m33.726s >> >> On the same card with WCache enabled >> UFS >> time cp /mnt/ufs/rnd /mnt/ufs/rnd2 >> real 0m31.406s >> ZFS >> time cp /zfs/rnd /zfs/rnd2 >> real 1m0.199s >> >> So, despite the fact that ZFS can be twice slower than UFS, it is clear >> that Write-Cache have to be enabled on the controller. >> >> Any drawback (except that without BBU, i''ve got a pb in case of power >> loss) in enabling the WC with ZFS ? >> >> >> Thanks. >> Best regards. >> >> > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Fri, 2010-07-09 at 00:23 +0200, Ragnar Sundblad wrote:> On 8 jul 2010, at 17.23, Garrett D''Amore wrote: > > > You want the write cache enabled, for sure, with ZFS. ZFS will do the > > right thing about ensuring write cache is flushed when needed. > > That is not for sure at all, it all depends on what "the right thing" > is, which depends on the application and/or what other measures > there are for redundancy. > > Many raid controllers will respond to a flush [to persistant storage] > by doing nothing, since the data already is in its write cache buffer > which (hopefully) is battery backed or something. > They will typically NOT flush the data to the drives and issue > flush commands to the disks, and wait for that to finish, before > responding.I consider such behavior "bad", at least if not explicitly enabled, and probably a bug.> > This means, that if your raid controller dies on you, some of your most > recently written data will be gone. It may very well be vital metadata, > and in the zfs world it may be data or metadata from several of the > latest txgs, that is gone. Potentially, it could leave your file system > corrupt, or arbitrary pieces of your data could be lost.Yes, failure to flush data when the OS requests it is *evil*. Enabling a write a cache should only be done if you believe your hardware is not buggy in this respect. I guess I was thinking more along the lines of simple JBOD controllers when I answered the question the first time -- I failed to take into accounted RAID controller behavior. - Garrett> > Maybe that is acceptable in the application, or maybe this is compensated > for with other means as multiple raid controllers with the data mirrored > over all of them which could reduce the risks by a lot, but you have to > evaluate each separate case by itself. > > /ragge > > > For the case of a single JBOD, I don''t find it surprising that UFS beats > > ZFS. ZFS is designed for more complex configurations, and provides much > > better data integrity guarantees than UFS (so critical data is written > > to the disk more than once, and in areas of the drive that are not > > adjacent to improve the chances of recovery in the event of a localized > > media failure). That said, you could easily accelerate the write > > performance of ZFS on that single JBOD by adding a small SSD log device. > > (4GB would be enough. :-) > > > > - Garrett > > > > On Thu, 2010-07-08 at 15:10 +0200, Philippe Schwarz wrote: > >> Hi, > >> > >> With dual-Xeon, 4GB of Ram (will be 8GB in a couple of weeks), two PCI-X > >> 3Ware cards 7 Sata disks (750G & 1T) over FreeBSD 8.0 (But i think it''s > >> OS independant), i made some tests. > >> > >> The disks are exported as JBOD, but i tried enabling/disabling write-cache . > >> > >> I tried with UFS and ZFS on the same disk and the difference is > >> overwhelming. > >> > >> With a 1GB file (greater than the ZFS cache ?): > >> > >> With Writecache disabled > >> UFS > >> time cp /mnt/ufs/rnd /mnt/ufs/rnd2 > >> real 2m58.073s > >> ZFS > >> time cp /zfs/rnd /zfs/rnd2 > >> real 4m33.726s > >> > >> On the same card with WCache enabled > >> UFS > >> time cp /mnt/ufs/rnd /mnt/ufs/rnd2 > >> real 0m31.406s > >> ZFS > >> time cp /zfs/rnd /zfs/rnd2 > >> real 1m0.199s > >> > >> So, despite the fact that ZFS can be twice slower than UFS, it is clear > >> that Write-Cache have to be enabled on the controller. > >> > >> Any drawback (except that without BBU, i''ve got a pb in case of power > >> loss) in enabling the WC with ZFS ? > >> > >> > >> Thanks. > >> Best regards. > >> > >> > > > > > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Philippe Schwarz > > 3Ware cards > > Any drawback (except that without BBU, i''ve got a pb in case of power > loss) in enabling the WC with ZFS ?If you don''t have a BBU, and you care about your data, don''t enable WriteBack. If you enable writeback without a BBU, you might as well just disable ZIL instead. It''s more effective, and just as dangerous. Actually, disabling the ZIL is probably faster *and* safer than running WriteBack without BBU. But if you''re impressed with performance by enabling writeback, you can still do better ... The most effective thing you could possibly do is to disable the writeback, and add SSD for log device. ZFS is able to perform in this configuration, better than the WriteBack. And in this situation, surprisingly, enabling the WriteBack actually hurts performance slightly. The performance of writeback vs naked disk is comparable to the performance of SSD log vs writeback. So prepare yourself to be impressed one more time. The performance with disabled ZIL is yet again, another impressive step. And this performance is unbeatable. There are situations when disabled ZIL is actually a good configuration. If you don''t know ... I''ll suggest posting again, to learn when it''s appropriate to disable ZIL.
On Jul 8, 2010, at 4:37 PM, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Philippe Schwarz >> >> 3Ware cards >> >> Any drawback (except that without BBU, i''ve got a pb in case of power >> loss) in enabling the WC with ZFS ? > > If you don''t have a BBU, and you care about your data, don''t enable > WriteBack. > > If you enable writeback without a BBU, you might as well just disable ZIL > instead. It''s more effective, and just as dangerous. Actually, disabling > the ZIL is probably faster *and* safer than running WriteBack without BBU.ZIL and data loss are orthogonal. As long as the device correctly respects the (nonvolatile) cache flush requests, then the data can be safe. -- richard -- Richard Elling richard at nexenta.com +1-760-896-4422 ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/
> Instead, create "Single Disk" arrays for each disk.I have a question related to this but with a different controller: If I''m using a RAID controller to provide non-RAID single-disk volumes, do I still lose out on the hardware-independence advantage of software RAID that I would get from a basic non-RAID HBA? In other words, if the controller dies, would I still need an identical controller to recognise the formatting of ''single disk volumes'', or is more ''standardised'' than the typical proprietary implementations of hardware RAID that makes it impossible to switch controllers on hardware RAID? -- This message posted from opensolaris.org
On 7/10/2010 1:14 AM, Graham McArdle wrote:>> Instead, create "Single Disk" arrays for each disk. >> > I have a question related to this but with a different controller: If I''m using a RAID controller to provide non-RAID single-disk volumes, do I still lose out on the hardware-independence advantage of software RAID that I would get from a basic non-RAID HBA? > In other words, if the controller dies, would I still need an identical controller to recognise the formatting of ''single disk volumes'', or is more ''standardised'' than the typical proprietary implementations of hardware RAID that makes it impossible to switch controllers on hardware RAID? >Yep. You''re screwed. :-) single-disk volumes are still RAID volumes to the controller, so they''ll have the extra controller-specific bits on them. You''ll need an identical controller (or, possibly, just one from the same OEM) to replace a broken controller with. Even in JBOD mode, I wouldn''t trust a RAID controller to not write proprietary bits onto the disks. It''s one of the big reasons to chose a HBA and not a RAID controller. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
On Jul 10, 2010, at 5:46 AM, Erik Trimble <erik.trimble at oracle.com> wrote:> On 7/10/2010 1:14 AM, Graham McArdle wrote: >>> Instead, create "Single Disk" arrays for each disk. >>> >> I have a question related to this but with a different controller: If I''m using a RAID controller to provide non-RAID single-disk volumes, do I still lose out on the hardware-independence advantage of software RAID that I would get from a basic non-RAID HBA? >> In other words, if the controller dies, would I still need an identical controller to recognise the formatting of ''single disk volumes'', or is more ''standardised'' than the typical proprietary implementations of hardware RAID that makes it impossible to switch controllers on hardware RAID? >> > > Yep. You''re screwed. :-) > > single-disk volumes are still RAID volumes to the controller, so they''ll have the extra controller-specific bits on them. You''ll need an identical controller (or, possibly, just one from the same OEM) to replace a broken controller with. > > Even in JBOD mode, I wouldn''t trust a RAID controller to not write proprietary bits onto the disks. It''s one of the big reasons to chose a HBA and not a RAID controller.Not always, my Dell PERC with the drives set as single disk RAID0 disks, I was able to successfully import the pool on a regular LSI SAS (non-RAID) controller. The only change the PERC made was to coerce the disk size down 128MB, so left 128MB unused at the end of the drive, which would mean new disks would be slightly bigger. -Ross
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Le 09/07/2010 01:37, Edward Ned Harvey a ?crit :>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Philippe Schwarz >>.> But if you''re impressed with performance by enabling writeback, you can > still do better ... > > The most effective thing you could possibly do is to disable the writeback, > and add SSD for log device. ZFS is able to perform in this configuration, > better than the WriteBack. And in this situation, surprisingly, enabling > the WriteBack actually hurts performance slightly. >. Hi, i bought a little SSD (OCZ Agility 30GB) and added half to L2ARC and second half to ZIL: zpool add zfsda1 log da3s2 zpool add zfsda1 cache da3s1 zpool status pool: zfsda1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zfsda1 ONLINE 0 0 0 da1 ONLINE 0 0 0 logs ONLINE 0 0 0 da3s2 ONLINE 0 0 0 cache da3s1 ONLINE 0 0 0 errors: No known data errors OK, let''s try to burst the write (WC disabled on the Raid controller) ... Result is ...awful! A `zpool iostat -v 1` shows: - - Although the L2ARC (da3s1) is showed separately from the pool, the ZIL (da3s2) is shown within the pool. Is it the normal behaviour ? - - ZIL seems to be quiet almost all the time and burst sometimes. OK, i may the normal behaviour of a cache. capacity operations bandwidth pool used avail read write read write - ---------- ----- ----- ----- ----- ----- ----- zfsda1 2.36G 694G 0 176 0 21.8M da1 2.36G 694G 0 56 0 6.88M da3s2 128K 15.0G 0 119 0 15.0M cache - - - - - - da3s1 3.06G 11.7G 0 0 0 0 - ---------- ----- ----- ----- ----- ----- ----- But, at the end of the copy process (copy a 1GB file from & to the same pool), the used capacity of the ZIL remains unchanged... Puzzling.. - - And ,last but not least... the copy isn''t faster at all! - -- Without ZIL&L2ARC time cp /zfsda1/rnd /zfsda1/rn2 real 3m23.297s - -- With ZIL&L2ARC time cp /zfsda1/rnd /zfsda1/rn2 real 3m34.847s Should i call my (dummy) test into question ? Thanks. Best regards. - -- Lyc?e Maximilien Perret, Alfortville -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkw5m0UACgkQlhqCFkbqHRZf1ACgkMIz6gf+H4bXfK5GH1HkWwag WPgAn2H6/j344LdFEOiig3MAxEy68yG2 =Mjn9 -----END PGP SIGNATURE-----
On 7/11/2010 3:21 AM, Philippe Schwarz wrote:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi, > i bought a little SSD (OCZ Agility 30GB) and added half to L2ARC and > second half to ZIL: > > zpool add zfsda1 log da3s2 > zpool add zfsda1 cache da3s1 > zpool status > > pool: zfsda1 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > zfsda1 ONLINE 0 0 0 > da1 ONLINE 0 0 0 > logs ONLINE 0 0 0 > da3s2 ONLINE 0 0 0 > cache > da3s1 ONLINE 0 0 0 > > errors: No known data errors > > > OK, let''s try to burst the write (WC disabled on the Raid controller) > ... > Result is ...awful! > > A `zpool iostat -v 1` shows: > > - - Although the L2ARC (da3s1) is showed separately from the pool, the ZIL > (da3s2) is shown within the pool. Is it the normal behaviour ? > >Yes, it''s just a quirk of the output format.> - - ZIL seems to be quiet almost all the time and burst sometimes. OK, i > may the normal behaviour of a cache. > > capacity operations bandwidth > pool used avail read write read write > - ---------- ----- ----- ----- ----- ----- ----- > zfsda1 2.36G 694G 0 176 0 21.8M > da1 2.36G 694G 0 56 0 6.88M > da3s2 128K 15.0G 0 119 0 15.0M > cache - - - - - - > da3s1 3.06G 11.7G 0 0 0 0 > - ---------- ----- ----- ----- ----- ----- ----- > But, at the end of the copy process (copy a 1GB file from& to the same > pool), the used capacity of the ZIL remains unchanged... Puzzling.. > > - - And ,last but not least... the copy isn''t faster at all! > > - -- Without ZIL&L2ARC > time cp /zfsda1/rnd /zfsda1/rn2 > real 3m23.297s > > - -- With ZIL&L2ARC > time cp /zfsda1/rnd /zfsda1/rn2 > real 3m34.847s > > Should i call my (dummy) test into question ? > > Thanks. > Best regards. > >ZIL speeds up synchronous writes only. Operations like ''cp'' use async writes, so ZIL will be of no benefit, since it''s not being used. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
On Sun, 11 Jul 2010, Philippe Schwarz wrote:> But, at the end of the copy process (copy a 1GB file from & to the same > pool), the used capacity of the ZIL remains unchanged... Puzzling.. > > - - And ,last but not least... the copy isn''t faster at all!Note that the slog device is only used for synchronous writes, and a local file copy is not normally going to use synchronous writes. Also, even if the slog was used, it gets emptied pretty quickly. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/