I was thinking of adding an SDD ZIL to my pool, but then read this in the ''Best Practice Guide''. ********* Prior to pool version 19, if you have an unmirrored log device that fails, your whole pool is permanently lost. Prior to pool version 19, mirroring the log device is highly recommended. ********* I have the following. ********* admin at nas:~$ zpool upgrade This system is currently running ZFS pool version 14. All pools are formatted using this version. ********* This really worries me. For a home server, mirrored SSD ZIL is pricey. I assume this means upgrading to a dev version? What do people do now for upgrades? Then, when trying to figure out the size if the SSD, I got this from the guide as well. ********* The maximum size of a log device should be approximately 1/2 the size of physical memory because that is the maximum amount of potential in-play data that can be stored. For example, if a system has 16 Gbytes of physical memory, consider a maximum log device size of 8 Gbytes. ********* I don''t think I can get SSD that small. It seems like a waste of larger SSD now. Is there a way to make better use of the SDD by partitioning? I have two pools, could I partition the SSD to have each partition as a ZIL for the different pools? Any other creative uses? Thanks, Greg -- This message posted from opensolaris.org
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Gregory Gee > > Prior to pool version 19, mirroring the log device is highly > recommended. > > I have the following. > > This system is currently running ZFS pool version 14. > > This really worries me. For a home server, mirrored SSD ZIL is > pricey. I assume this means upgrading to a dev version? What do > people do now for upgrades?Before anything else, a clarification of terminology: Every pool has a ZIL. The ZIL by default is stored on disk within the primary pool. You are not talking about adding a ZIL; you are talking about adding a dedicated log device. If you look at opensolaris.org, under the "download" page, you''ll see a section for "Developer builds." There it says two things: There are instructions to upgrade your existing installation in-place, and there is a link http://genunix.org to download the latest developer build in ISO format, for a fresh installation. (The latest is b134, which is about 4 months old.) My personal experience is that the upgrade process is a little bit shaky. Nothing fatal happened, but the system didn''t seem quite right afterward. So it''s a good idea to backup before doing an upgrade. Another option, since this is your home server: I bet you''re not processing credit card transactions. I bet you don''t have a compute farm using this as a backend datastore. I bet it''s not a mail server, and even more certainly, not a mail server for critical information. etc. You might just consider disabling the ZIL. If you run with ZIL disabled, then your sync writes behave as async writes, which means, it is faster than you could possibly hope even if you had the dedicated log device. The only risk is possibly 30 seconds worth of sync writes leading up to an ungraceful system crash ... but even if you have the ZIL (dedicated log or built-in to pool), then 30 sec of async writes would be at risk anyway. If you''re not carefully and strictly paying attention and strictly obsessively controlling your applications to distinguish between sync and async writes, I bet you don''t care about the distinction. So just run with your ZIL disabled. For all you knew, your application might have been using async writes anyway, right?> Then, when trying to figure out the size if the SSD, I got this from > the guide as well. > > The maximum size of a log device should be approximately 1/2 the size > of physical memory because that is the maximum amount of potential in- > play data that can be stored.I''ll make this even more shocking: Since the maximum you could possibly store in ZIL is 30sec, just calculate the speed of your device and see how many GB that is. In an unrealistic super computer fast world, that would be 32G. Realistically, it''s more like 4-6G for extreme heavy usage on real devices. Your system is simply incapable of using more than that.> I don''t think I can get SSD that small. It seems like a waste of > larger SSD now. Is there a way to make better use of the SDD by > partitioning? I have two pools, could I partition the SSD to have > each partition as a ZIL for the different pools? Any other creative > uses?Yes. In a high performance environment, people generally acknowledge the waste of space, and simply don''t use most of their SSD. You can "format" and "fdisk" and "partition" your SSD, to create smaller slices. Then you can use a slice for log, and a slice for cache.
On Sat, 31 Jul 2010, Gregory Gee wrote:> The maximum size of a log device should be approximately 1/2 the > size of physical memory because that is the maximum amount of > potential in-play data that can be stored. For example, if a system > has 16 Gbytes of physical memory, consider a maximum log device size > of 8 Gbytes.This sounds like misleading bad advice to me, even if it is technically correct. Only synchronous writes go into the slog. Typical sources of synchronous writes are databases and NFS service. In the case of NFS, the maximum amount of synchronous writes per transaction group (TXG) is limited by wire bandwidth and other latencies. The ''zilstat'' dtrace script can be used to determine the actual synchronous write load on a running server.> ********* > > I don''t think I can get SSD that small. It seems like a waste of > larger SSD now. Is there a way to make better use of the SDD by > partitioning? I have two pools, could I partition the SSD to have > each partition as a ZIL for the different pools? Any other creative > uses?FLASH-based SSDs wear out due to repeated writes. The formatted size of the slog becomes immaterial as long as it is large enough for the synchronous writes assigned to a TXG. An over-provisioned SSD is going to last much longer under heavy write load than a perfectly-sized one. For a small server, a reasonable approach is to partition the SSD so that it satisfies both slog and l2arc requirements. The l2arc requirement is primarily read access so it does not tend to wear the SSD out as quickly as use as an slog. A well designed SSD will include device-wide wear leveling so that seldom used FLASH blocks are reassigned to replace heavily-used FLASH blocks. This means that blocks from the lightly-used partition can be re-assigned to replace blocks from the heavily-used partition. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
You probably would not notice the performance effects of a SSD ZIL on a home network ; so the price of the ticket may not be worth the ride for you. OTOH, you would notice a significant improvement by using that SSD as an L2ARC device. Because the head latency on consumer 1TB drives is so long, the L2ARC would definitely make access to pool "feel" faster because the working footprint of files that your applications frequently reference will sit up front in the L2 cache ; meanwhile archival and infrequently touched items would park in the storage pool cylinders. On a small office network, OTOH, ZIL makes a big difference. For instance, if you had 10 software developers, with their home directories all exported from a ZFS box, adding a NVRAM ZIL will significantly improve performance. That''s because developers often compile hundreds of files at a time, several times per hour, plus updates to files'' atime attr - and that particular scale of operation will be greatly improved by an NVRAM ZIL. If I were to use a ZIL again, i''d use something like the ACARD DDR-2 SATA boxes, and not an SSD or an iRAM. -- Jim -- This message posted from opensolaris.org
Jim, that ACARD looks really nice, but out of the price range for a home server. Edward, disabling ZIL might be ok, but let me characterize what my home server does and tell me if disabling ZIL is ok. My home OpenSolaris server is only used for storage. I have a separate linux box that runs any software I need such as media servers and such. I export all pools from the OpenSolaris box to the linux box via NFS. The OpenSolaris box has 2 pools. The first pool stores videos, pictures, various files and mail all exported via NFS to the linux box. It is a mirrored zpool. The second mirrored zpool is NFS store for VM images. The linux box I mentioned are actually VMs running in XenServer. The VM vdisks are stored and run from the OpenSolaris NFS server mounted in the XenServer box. Yes, I know that this is not a typical home setup, but I''m sure that most here don''t have a ''typical home setup''. So the question is, will disabling ZIL have negative impacts on the VM vdisks stored in NFS? Or any other files on the NFS shares? Thanks, Greg -- This message posted from opensolaris.org
On Sun, Aug 01, 2010 at 12:36:28PM -0700, Gregory Gee wrote:> Jim, that ACARD looks really nice, but out of the price range for a > home server. > > Edward, disabling ZIL might be ok, but let me characterize what my > home server does and tell me if disabling ZIL is ok. > > My home OpenSolaris server is only used for storage. I have a > separate linux box that runs any software I need such as media > servers and such. I export all pools from the OpenSolaris box to the > linux box via NFS. > > The OpenSolaris box has 2 pools. The first pool stores videos, > pictures, various files and mail all exported via NFS to the linux > box. It is a mirrored zpool. The second mirrored zpool is NFS store > for VM images. The linux box I mentioned are actually VMs running in > XenServer. The VM vdisks are stored and run from the OpenSolaris NFS > server mounted in the XenServer box. > > Yes, I know that this is not a typical home setup, but I''m sure that > most here don''t have a ''typical home setup''. > > So the question is, will disabling ZIL have negative impacts on the > VM vdisks stored in NFS? Or any other files on the NFS shares?You would probably see better performance at the expense of reliability in the case of an unplanned outage. Ray
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Gregory Gee > > Edward, disabling ZIL might be ok, but let me characterize what my home > server does and tell me if disabling ZIL is ok.You should understand what it all means, and make your own choice. For "sync" writes, an application tells the OS to write something to disk, and the function call blocks (waits) until the data has been committed to nonvolatile storage. For "async" writes, an applicaiton tells the OS to write something to disk, and the OS is permitted to buffer the write in RAM. The application continues doing other things, even if the data is not yet committed to nonvolatile storage. In ZFS, many async transactions can be aggregated into a single transaction group. ZFS chooses when to flush the TXG to disk based on many factors, optimized for performance, but never longer than 30 sec. In ZFS, sync transactions are first written to the ZIL, so the OS unblocks the application, and then they become async transactions just like all the other async transactions. After an ungraceful crash, the OS checks the ZIL to see if anything was requested to be written but not actually written. Of course, if any unplayed entries exist, they are played now, before the filesystem is mounted. In other filesystems and operating systems, it''s critical to honor the "sync" mode behavior, because transactions such as file creation and removal are sync operations. So in other systems, not honoring the "sync" behavior could result in a corrupt filesystem, or corrupt data where a later write was committed to disk before an earlier write. ZFS is immune to those problems. Because ZFS keeps an in-memory snapshot of what the filesystem looks like as a whole, and ZFS only commits to disk a newer snapshot of the filesystem (doesn''t commit individual file-based operations such as other filesystems and OSes), and because the committal of a new TXG is an atomic operation... It is impossible to ever bootup and discover ZFS to be in a corrupt or inconsistent state. During a crash, up to 30 sec of async writes are at risk. Anything which was in a TXG not yet flushed to disk is lost. So ... If you honor sync writes ... and some NFS client issues a sync write ... and the server reboots ungracefully ... then after reboot, the client will see things as they were expected to be. If you don''t honor sync writes (ZIL disabled), it''s possible for an ungraceful reboot to come up with a filesystem in a state older than what your NFS clients expect. So it''s probably a good idea to reboot or at least remount your NFS clients along with the server reboot. Just to get them all into a consistent state. If you are using NFS to export some VM''s, and some other compute servers are acting as the "heads" for those VM''s ... Well, VM''s are naturally "sync" mode machines. Because whenever an application inside the guest OS requests a sync write, the guest OS is going to issue a sync write to the host OS. I would not recommend NFS as the backend to host files for a VM guest. I would recommend iscsi, which will perform more natively and with less overhead. In either case, NFS or iscsi, if you disable ZIL, just make sure to reboot your VM guests too if the server has an ungraceful reboot.
On Sun, 1 Aug 2010, Jim Doyle wrote:> You probably would not notice the performance effects of a SSD ZIL > on a home network ; so the price of the ticket may not be worth the > ride for you. OTOH,There are cases where the SSD ZIL will offer tremendous improvement. One of the common cases is when the client uses NFS to access the files, and the files are bulk-copied, or extracted from a tar file. In this case there will be a huge improvement to write performance.> On a small office network, OTOH, ZIL makes a big difference. For > instance, if you had 10 software developers, with their home > directories all exported from a ZFS box, adding a NVRAM ZIL will > significantly improve performance. That''s because developers often > compile hundreds of files at a time, several times per hour, plus > updates to files'' atime attr - and that particular scale of > operation will be greatly improved by an NVRAM ZIL.Smart software developers will access the source code via NFS but write the object files to local client disk. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Jul 31, 2010, at 8:56 PM, Gregory Gee wrote:> I was thinking of adding an SDD ZIL to my pool, but then read this in the ''Best Practice Guide''. > > ********* > Prior to pool version 19, if you have an unmirrored log device that fails, your whole pool is permanently lost. > Prior to pool version 19, mirroring the log device is highly recommended. > *********This depends on the failure mode, of course.> > I have the following. > > ********* > admin at nas:~$ zpool upgrade > This system is currently running ZFS pool version 14. > > All pools are formatted using this version. > ********* > > This really worries me. For a home server, mirrored SSD ZIL is pricey. I assume this means upgrading to a dev version? What do people do now for upgrades?Yes, until the OpenSolaris dev''s ran dry in February. There are other distributions with later releases. See http://www.genunix.org> > Then, when trying to figure out the size if the SSD, I got this from the guide as well. > > > ********* > The maximum size of a log device should be approximately 1/2 the size of physical memory because that is the maximum amount of potential in-play data that can be stored. For example, if a system has 16 Gbytes of physical memory, consider a maximum log device size of 8 Gbytes. > > ********* > > I don''t think I can get SSD that small. It seems like a waste of larger SSD now. Is there a way to make better use of the SDD by partitioning? I have two pools, could I partition the SSD to have each partition as a ZIL for the different pools? Any other creative uses?I use something like an Intel X-25V (40GB) for boot disk and separate log. There are a number of similar-sized SSDs targeting the boot environments that are in the 32-40 GB size and around $120 or so. -- richard -- Richard Elling richard at nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com