Hi all, I have a home server based on SNV_127 with 8 disks; 2 x 500GB mirrored root pool 6 x 1TB raidz2 data pool This server performs a few functions; NFS : for several ''lab'' ESX virtual machines NFS : mythtv storage (videos, music, recordings etc) Samba : for home directories for all networked PCs I backup the important data to external USB hdd each day. I previously had a linux NFS server that I had mounted ''ASYNC'' and, as one would expect, NFS performance was pretty good getting close to 900gb/s. Now that I have moved to opensolaris, NFS performance is not very good, I''m guessing mainly due to the ''SYNC'' nature of NFS. I''ve seen various threads and most point at 2 options; 1. Disable the ZIL 2. Add independent log device/s I happen to have 2 x 250GB Western Digital RE3 7200rpm (Raid edition, rated for 24x7 usage etc) hard drives sitting doing nothing and was wondering whether it might speed up NFS, and possibly general filesystem usage, by adding these devices as log devices to the data pool. I understand that an SSD is considered ideal for log devices but I''m thinking that these 2 drives should at least be better than having the ZIL ''inside'' the zpool. If adding these devices, should I add them as mirrored or individual to get some sort of load balancing (according to zpool manpage) and perhaps a little bit more performance ? I''m running ZFS version 19 which ''zpool upgrade -v'' shows me as having ''log device removal'' support. Can I easily remove these devices if I find that they have resulted in little/no performance improvements ? Any help/tips greatly appreciated. Cheers. -- This message posted from opensolaris.org
On Dec 2, 2009, at 6:57 AM, Brian McKerr <brian at datamatters.com.au> wrote:> Hi all, > > I have a home server based on SNV_127 with 8 disks; > > 2 x 500GB mirrored root pool > 6 x 1TB raidz2 data pool > > This server performs a few functions; > > NFS : for several ''lab'' ESX virtual machines > NFS : mythtv storage (videos, music, recordings etc) > Samba : for home directories for all networked PCs > > I backup the important data to external USB hdd each day. > > > I previously had a linux NFS server that I had mounted ''ASYNC'' and, > as one would expect, NFS performance was pretty good getting close > to 900gb/s. Now that I have moved to opensolaris, NFS performance > is not very good, I''m guessing mainly due to the ''SYNC'' nature of > NFS. I''ve seen various threads and most point at 2 options; > > 1. Disable the ZIL > 2. Add independent log device/s > > I happen to have 2 x 250GB Western Digital RE3 7200rpm (Raid > edition, rated for 24x7 usage etc) hard drives sitting doing nothing > and was wondering whether it might speed up NFS, and possibly > general filesystem usage, by adding these devices as log devices to > the data pool. I understand that an SSD is considered ideal for log > devices but I''m thinking that these 2 drives should at least be > better than having the ZIL ''inside'' the zpool. > > If adding these devices, should I add them as mirrored or individual > to get some sort of load balancing (according to zpool manpage) and > perhaps a little bit more performance ? > > I''m running ZFS version 19 which ''zpool upgrade -v'' shows me as > having ''log device removal'' support. Can I easily remove these > devices if I find that they have resulted in little/no performance > improvements ? > > Any help/tips greatly appreciated.It wouldn''t hurt to try, but I''d be surprised if it helped much if at all. Idea of a separate ZIL is to locate it on a device with lower latency then the pool which would help increase performance between pool and log writes. What speed are you trying to achieve for writes? Wirespeed? Well it''s achievable, but with an app that uses larger block sizes and allows more then 1 transaction in flight at a time. I wouldn''t disable the ZIL but look at tuning the client side, or you could invest in a controller with a large battery backed write-cache and good JBOD mode, or a small fast SSD drive. -Ross
> I previously had a linux NFS server that I had mounted ''ASYNC'' and, as > one would expect, NFS performance was pretty good getting close to > 900gb/s. Now that I have moved to opensolaris, NFS performance is not > very good, I''m guessing mainly due to the ''SYNC'' nature of NFS. I''ve > seen various threads and most point at 2 options; > > 1. Disable the ZIL > 2. Add independent log device/sReally your question isn''t about Zil on HDD (as subject says) but NFS performance. I''ll tell you a couple of things. I have a solaris ZFS and NFS server at work, which noticeably outperforms the previous NFS server. Here are the differences in our setup: Yes, I have SSD for ZIL. Just one SSD. 32G. But if this is the problem, then you''ll have the same poor performance on the local machine that you have over NFS. So I''m curious to see if you have the same poor performance locally. The ZIL does not need to be reliable; if it fails, the ZIL will begin writing to the main storage, and performance will suffer until the new SSD is put into production. Another thing - You have 6 disks in raidz2. This is 6 disks with the capacity of 4. You should get noticeably better performance if you have 3x2disk mirrors. 6 disks with the capacity of 3. But if your bottleneck is Ethernet, this difference might be irrelevant. I have nothing special in my dfstab. cat /etc/dfs/dfstab share -F nfs -o ro=host1,rw=host2:host3,root=host2,host3,anon=4294967294 /path-to-export But when I mount it from linux, I took great care to create this config: cat /etc/auto.master /- /etc/auto.direct --timeout=1200 cat /etc/auto.direct /mountpoint -fstype=nfs,noacl,rw,hard,intr,posix solarisserver:/path-to-export I''m interested to hear if this sheds any light for you.
> 2 x 500GB mirrored root pool> 6 x 1TB raidz2 data pool > I happen to have 2 x 250GB Western Digital RE3 7200rpm > be better than having the ZIL ''inside'' the zpool. listing two log devices (stripe) would have more spindles than your single raidz2 vdev.. but for low cost fun one might make a tinny slice on all the disks of the raidz2 and list six log devices (6 way stripe) and not bother adding the other two disks. Rob
On Wed, Dec 2 at 10:59, Rob Logan wrote:> >> 2 x 500GB mirrored root pool >> 6 x 1TB raidz2 data pool >> I happen to have 2 x 250GB Western Digital RE3 7200rpm >> be better than having the ZIL ''inside'' the zpool. > >listing two log devices (stripe) would have more spindles >than your single raidz2 vdev.. but for low cost fun one >might make a tinny slice on all the disks of the raidz2 >and list six log devices (6 way stripe) and not bother >adding the other two disks.But if you did that, a synchronous write (FUA or with a cache flush) would have a significant latency penalty, especially if NCQ was being used. The size of the zil is usually tiny, striping it doesn''t make any sense to me. -- Eric D. Mudama edmudama at mail.bounceswoosh.org
On Wed, Dec 02, 2009 at 03:57:47AM -0800, Brian McKerr wrote:> I previously had a linux NFS server that I had mounted ''ASYNC'' and, as one would expect, NFS performance was pretty good getting close to 900gb/s. Now that I have moved to opensolaris, NFS performance is not very good, I''m guessing mainly due to the ''SYNC'' nature of NFS. I''ve seen various threads and most point at 2 options; > > 1. Disable the ZIL > 2. Add independent log device/sWe have experienced the same performance penalty using NFS over ZFS. The issue is indeed caused by the synchronous nature of ZFS. More precisely, it is caused by the fact that ZFS promises correct behaviour while eg. a linux NFS server (using async) does not. The issue is decribed in great detail at http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine If you want the same behaviour as you had with your Linux NFS server, you can disable the ZIL. Doing so should give the same guarantees as the linux NFS service. The big issue with disabling the ZIL is that it is system-wide. Although it could be an acceptable tradeoff for one filesystem, it is not necesarily a good system-wide setting. That is why I think the option to disable the ZIL should be per-filesystem (Which I think should be possible because a ZIL is actually kept per-filesystem). As for adding HDD''s as ZIL-devices, I''d advise against it. We have tried this and the performance decreased. Using SSD''s as the ZIL is probably the way to go. A final option is to accept the situation as it is, arguing that you have traded performance for increased reliability. Regards, Auke -- Auke Folkerts University of Amsterdam -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091203/461c89fc/attachment.bin>
Hello, Edward Ned Harvey wrote:> Yes, I have SSD for ZIL. Just one SSD. 32G. But if this is the problem, > then you''ll have the same poor performance on the local machine that you > have over NFS. So I''m curious to see if you have the same poor performance > locally. The ZIL does not need to be reliable; if it fails, the ZIL will > begin writing to the main storage, and performance will suffer until the new > SSD is put into production.I am also planning to install a SSD as ZILlog. Is it really true that there are no problems if the ZILlog fails and there is no mirror of the ZILlog? What about the data that were on the ZILlog SSD at the time of failure, is a copy of the data still in the machines memory from where it can be used to put the transaction to the stable storage pool? What if the machine reboots after the SSD has failed? The ZFS Best Practices Guide commends to mirror the log: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Storage_Pool_Performance_Considerations Mirroring the log device is highly recommended. Protecting the log device by mirroring will allow you to access the storage pool even if a log device has failed. Failure of the log device may cause the storage pool to be inaccessible if you are running the Solaris Nevada release prior to build 96 and a release prior to the Solaris 10 10/09 release. For more information, see CR 6707530. http://bugs.opensolaris.org/view_bug.do?bug_id=6707530 No probs with that if I use Sol10U8? Regrads, Michael.
On Thu, 3 Dec 2009, mbr wrote:> > What about the data that were on the ZILlog SSD at the time of failure, is > a copy of the data still in the machines memory from where it can be used > to put the transaction to the stable storage pool?The intent log SSD is used as ''write only'' unless the system reboots, in which case it is used to support recovery. The system memory is used as the write path in the normal case. Once the data is written to the intent log, then the data is declared to be written as far as higher level applications are concerned. If the intent log SSD fails and the system spontaneously reboots, then data may be lost. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Hello, Bob Friesenhahn wrote:> On Thu, 3 Dec 2009, mbr wrote: >> >> What about the data that were on the ZILlog SSD at the time of >> failure, is >> a copy of the data still in the machines memory from where it can be used >> to put the transaction to the stable storage pool? > > The intent log SSD is used as ''write only'' unless the system reboots, in > which case it is used to support recovery. The system memory is used as > the write path in the normal case. Once the data is written to the > intent log, then the data is declared to be written as far as higher > level applications are concerned.thank you Bob for the clarification. So I don''t need a mirrored ZILlog for security reasons, all the information is still in memory and will be used from there by default if only the ZILlog SSD fails.> If the intent log SSD fails and the system spontaneously reboots, then > data may be lost.I can live with the data loss as long as the machine comes up with the faulty ZILlog SSD but otherwise without probs and with a clean zpool. Has the following error no consequences? Bug ID 6538021 Synopsis Need a way to force pool startup when zil cannot be replayed State 3-Accepted (Yes, that is a problem) Link http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6538021 Michael.
On Thu, 3 Dec 2009, mbr wrote:> > Has the following error no consequences? > > Bug ID 6538021 > Synopsis Need a way to force pool startup when zil cannot be replayed > State 3-Accepted (Yes, that is a problem) > Link > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6538021I don''t know the status of this but it does make sense to require the user to explicitly corrupt/lose data in the storage pool. It could be that the log device is just temporarily missing and can be restored so zfs should not do this by default. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On 12/03/09 09:21, mbr wrote:> Hello, > > Bob Friesenhahn wrote: >> On Thu, 3 Dec 2009, mbr wrote: >>> >>> What about the data that were on the ZILlog SSD at the time of >>> failure, is >>> a copy of the data still in the machines memory from where it can be >>> used >>> to put the transaction to the stable storage pool? >> >> The intent log SSD is used as ''write only'' unless the system reboots, >> in which case it is used to support recovery. The system memory is >> used as the write path in the normal case. Once the data is written >> to the intent log, then the data is declared to be written as far as >> higher level applications are concerned. > > thank you Bob for the clarification. > So I don''t need a mirrored ZILlog for security reasons, all the information > is still in memory and will be used from there by default if only the > ZILlog SSD fails.Mirrored log devices are advised to improve reliablity. As previously mentioned, if during writing a log device fails or is temporarily full then we use the main pool devices to chain the log blocks. If we get read errors when trying to replay the intent log (after a crash/power fail) then the admin is given the option to ignore the log and continue or somehow fix the device (eg re-attach) and then retry. Multiple log devices would provide extra reliability here. We do not look in memory for the log records if we can''t get the records from the log blocks.> >> If the intent log SSD fails and the system spontaneously reboots, then >> data may be lost. > > I can live with the data loss as long as the machine comes up with the > faulty ZILlog SSD but otherwise without probs and with a clean zpool.The log records are not required for consistency of the pool (it''s not a journal).> > Has the following error no consequences? > > Bug ID 6538021 > Synopsis Need a way to force pool startup when zil cannot be replayed > State 3-Accepted (Yes, that is a problem) > Link > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6538021Er that bug should probably be closed as a duplicate. We now have this functionality.> > Michael. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss