I am looking for references of folks using ZFS with either NFS or iSCSI as the backing store for VMware (4.x) backing store for virtual machines. We asked the local VMware folks and they had not even heard of ZFS. Part of what we are looking for is a recommendation for NFS or iSCSI, and all VMware would say is "we support both". We are currently using Sun SE-6920, 6140, and 2540 hardware arrays via FC. We have started playing with ZFS/NFS, but have no experience with iSCSI. The ZFS backing store in some cases will be the hardware arrays (the 6920 has fallen off of VMware''s supported list and if we front end it with either NFS or iSCSI it''ll be supported, and VMware suggested that) and some of it will be backed by J4400 SATA disk. I have seen some discussion of this here, but it has all been related to very specific configurations and issues, I am looking for general recommendations and experiences. Thanks. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
Hi Paul, I am using EXSi 4.0 with a NFS-on-ZFS datastore running on OSOL b134. It previously ran on Solaris 10u7 with VMware Server 2.x. Disks are SATAs in a JBOD over FC. I''ll try to summarize my experience here, albeit our system does not provide services to end users and thus is not very stressed (it supports our internal developers only). There are some things to know before setting up a ZFS storage, in my opinion. 1. You cannot shrink a zpool. You can only detach disks from mirrors, not from raidz. You can grow it, tho, by either replacing disks with higher capacity ones or by adding more disks to the same pool. 2. Due to ZFS inherently coherent structure, synch writes (especially random) are its worst enemy: the cure is to bring a pair of mirrored SSDs in the pool or to use a battery backed write cache, especially if you want to use raidz. 3. VMware, regardless of NFS or iSCSI, will do synchronous writes. Due to point 2 above if your workload and number of VM is significant you will definitely need some kind of disk device based on memory and not on platters. YMMV. 4. ZFS snapshotting is great but it can burn a sizeable amount of disk if you leave your VMs local disks mounted without noatime option (assuming they are unix machines) because vmdks will get written to even if the internal vm processes only issue reads on files. (in my case ~60 idle linux machines burned up to 200MB/hr, generating an average of 2MB/sec of writing traffic) 5. Avoid putting different size and performance disks in a zpool, unless of course they are doing a different job. ZFS doesn''t weigh in size or performances and spreads out data evenly. A zpool will perform as the slowest of its members (not all the time, but when the workload is high the slowest disk will be a limiting factor. 6. ZFS performs badly on disks that are more than 80% full. Keep that in mind when you size up for things. 7. ZFS compression works wonders, especially the default one: it costs little in cpu, it doesn''t increase latency (unless you have a very unbalanced CPU/Disks system) and thus saves space and bandwidth. 8. By mirroring/raiding things at OS level ZFS effectively multiplies the bandwidth used on the bus. My SATA disks can sustain writing 60MB/sec, but in a zpool made by 6 mirrors of 2 disks each that uses a 2Gbit fibre the maximum throughtput is ~95MB/sec: the fibre max out at 190MB/sec, but Solaris need to write to both disks on each mirror. You can partially solve this by putting each side of a mirror on different storages and/or increasing the number of paths towards the disks. 9. Deduplication is great on paper and can be wonderful in virtualized environments, but it has a BIG costs upfront: search around, do you math but be aware that you''ll need tons of ram and SSDs to be able to effectively deduplicate multi terabyte storages. Also it is common opinion that''s not ready for production. If the analisys/tests of your use case tells you ZFS is a viable option I think you should give it a try. Administration is wonderfully flexible and easy: once you set up your zpools (which is the really critical phase) you can practically do anything you want in the most efficient way. So far I''m very pleased by it. -- Simone Caldana Senior Consultant Critical Path via Cuniberti 58, 10100 Torino, Italia +39 011 4513811 (Direct) +39 011 4513825 (Fax) simone.caldana at criticalpath.net http://www.cp.net/ Critical Path A global leader in digital communications
> -----Original Message----- > From: zfs-discuss-bounces at opensolaris.org > [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Paul Kraus > Sent: Wednesday, August 11, 2010 3:53 PM > To: ZFS Discussions > Subject: [zfs-discuss] ZFS and VMware > > I am looking for references of folks using ZFS with either NFS > or iSCSI as the backing store for VMware (4.x) backing store for > virtual machines. We asked the local VMware folks and they had not > even heard of ZFS. Part of what we are looking for is a recommendation > for NFS or iSCSI, and all VMware would say is "we support both". We > are currently using Sun SE-6920, 6140, and 2540 hardware arrays via > FC. We have started playing with ZFS/NFS, but have no experience with > iSCSI. The ZFS backing store in some cases will be the hardware arrays > (the 6920 has fallen off of VMware''s supported list and if we front > end it with either NFS or iSCSI it''ll be supported, and VMware > suggested that) and some of it will be backed by J4400 SATA disk. > > I have seen some discussion of this here, but it has all been > related to very specific configurations and issues, I am looking for > general recommendations and experiences. Thanks. >It really depends on your VM system, what you plan on doing with VMs and how you plan to do it. I have the vSphere Enterprise product and I am using the DRS feature, so VMs are vmotioned around my cluster all throughout the day. All of my VM users are able to create and manage their own VMs through the vSphere client. None of them care to know anything about VM storage as long as it''s fast, and most of them don''t want to have to make choices about which datastore to put their new VM on. Only 30-40% of the total number of VMs registered in the cluster are powered on at any given time. I am using OpenSolaris and ZFS to provide a relatively small NFS datastore as a proof of concept. I am trying to demonstrate that it''s a better solution for us than our existing solution, which is Windows Storage Server and the MS iSCSI Software Target. The ZFS-based datastore is hosted on six 146GB 10krpm SAS drives configured as a 3x2 mirror, a 30GB SSD as L2ARC and a 1GB ramdisk as the SLOG. Deduplication and compression (lzjb) are enabled. The server itself is a dual quad core core2-level system with 48GB RAM - it is going to be a VM host in the cluster after this project is concluded. Based on the experience and information I''ve gathered thus far, here is what I think: The biggest thing for me is that I think I will be able to use deduplication, compression and a bunch of mirror vdevs using ZFS, whereas with other products I would need to use RAID 5 or 6 to get enough capacity with my budget. Larger/cheaper drives are also a possibility with ZFS since dedup/compression/ARC/L2ARC cuts down on IO to the disk. NFS Pros: NFS is much easier/faster to configure. Dedup and compression work better as the VM files sit directly on the filesystem. There is potential for faster provisioning by doing a local copy vs. having VMware do it remotely over NFS. It''s nice to be able to get at each VM file directly from the fs as opposed to remotely via the vSphere client or service console (which will disappear with the next VMware release). You can use fast SSD SLOGs to accelerate most (all?) writes to the ZIL. NFS Cons: VMware does not do NFSv4, so each of your filesystems will require a separate mount. There is a maximum number of mounts per cluster (64 with vSphere 4.0). There is no opportunity for load balancing between the client and a single datastore. VMware makes all writes synchronous writes, so you really need to have SLOGs (or a RAID controller with BBWC) to make the hosted VMs usable. VMware does not give you any vm-level disk performance statistics for VMs on an NFS datastore (at least, not through the client). iSCSI Pros: COMSTAR rocks - you can set up your LUNs on an iSCSI target today, and move them to FC/FCoE/SRP tomorrow. Cloning zvols is fast, which could be leveraged for fast VM provisioning. iSCSI supports multipathing, so you can take advantage of VMware built-in NMP to do load balancing. You don''t need a SLOG as much because you''ll only have synchronous writes if the VM requests them. iSCSI Cons: It''s harder to take full advantage of dedup and compression. Basic configuration is not hard, but is still much more complicated than NFS. Other than that, the rest of the cons are all VMware related. vSphere has a limit of 256 LUNs per host, which in a cluster supporting vmotion basically means 256 LUNs per cluster. This limit may mean that cloning zvols to speed up VM provisioning is not possible. You can have multiple VMs per LUN using VMFS, but if you make LUNs too large you run into locking issues when provisioning - general wisdom is to keep LUN sizes as small as possible while not going over 256. This means your storage is chopped up into little pieces, which may be annoying to deal with. The worst thing I''ve experienced with iSCSI and VMFS is LUN resignaturing - if you move a LUN from one host (target?) to another, VMware is going to think it''s a copy, is going to want to resignature VMFS, and is going to want you to reregister every VM in the filesystem. vSphere 4.0 is supposed to offer you the ability to mount a VMFS without resignaturing, but I''ve only been able to get this to work on a single host, not on every host in a cluster. Resignaturing is really painful. Beyond the above, there are some possibilities for the future that may also inform your decision. If VMware ever releases an NFSv4 or NFSv4.1 client, we would get to have multiple filesystems per NFS mount and/or pNFS, either of which would be great. Multiple filesystems per mount would allow provisioning by cloning filesystems (one VM or VM group per filesystem) or filesystem-level snapshots instead of VMware snapshots. Since vSphere 4.1 was released a couple of weeks ago without NFSv4 support, I would not anticipate this becoming available until 4.2 or whatever their next major release is. With vSphere 4.1 VMware has introduced the vStorage API for Array Integration (VAAI), which seems to be fancy marketing wrapped around the implementation of a few ''optional'' SCSI protocol commands. VAAI claims to accelerate provisioning and provide block-level locking for VMFS when used with compatible storage. If COMSTAR does or will implement these commands, I think large iSCSI LUNs become a lot easier to deal with and very attractive. Overall I think NFS and iSCSI are both excellent ways to get VMware using ZFS as a datastore. Hope the above is helpful to you in making a decision. -Will
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Paul Kraus > > I am looking for references of folks using ZFS with either NFS > or iSCSI as the backing store for VMware (4.x) backing store forI''ll try to clearly separate what I know, from what I speculate: I know you can do either one, NFS or iscsi served by ZFS for the backend datastore used by ESX. I know (99.9%) that vmware will issue sync-mode operations in both cases. Which means you are strongly encouraged to use a mirrored dedicated log device, presumably SSD or some sort of high IOPS low latency devices. I speculate that iscsi will perform better. If you serve it up via NFS, then vmware is going to create a file in your NFS filesystem, and inside that file it will create a new filesystem. So you get twice the filesytem overhead. Whereas in iscsi, ZFS presents a raw device to VMware, and then vmware maintains its filesystem in that.
On Wed, Aug 11, 2010 at 7:27 PM, Edward Ned Harvey <shill at nedharvey.com>wrote:> > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > > bounces at opensolaris.org] On Behalf Of Paul Kraus > > > > I am looking for references of folks using ZFS with either NFS > > or iSCSI as the backing store for VMware (4.x) backing store for > > I''ll try to clearly separate what I know, from what I speculate: > > I know you can do either one, NFS or iscsi served by ZFS for the backend > datastore used by ESX. I know (99.9%) that vmware will issue sync-mode > operations in both cases. Which means you are strongly encouraged to use a > mirrored dedicated log device, presumably SSD or some sort of high IOPS low > latency devices. > > I speculate that iscsi will perform better. If you serve it up via NFS, > then vmware is going to create a file in your NFS filesystem, and inside > that file it will create a new filesystem. So you get twice the filesytem > overhead. Whereas in iscsi, ZFS presents a raw device to VMware, and then > vmware maintains its filesystem in that. > > >That''s not true at all. Whether you use iSCSI or NFS, VMware is laying down a file which it presents as a disk to the guest VM which then formats it with its own filesystem. That''s the advantage of virtualization. You''ve got a big file you can pick up and move anywhere that is hardware agnostic. With iSCSI, you''re forced to use VMFS, which is an adaptation of the legato clustered filesystem from the early 90''s. It is nowhere near as robust as NFS, and I can''t think of a reason you would use it if given the choice; short of a massive pre-existing investment in Fibre Channel. With NFS, you''re simply using ZFS, there is no VMFS to worry about. You don''t have to have another ESX box if something goes wrong, any client with an nfs client can mount the share and diagnose the VMDK. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100811/58480c3b/attachment.html>
> -----Original Message----- > From: zfs-discuss-bounces at opensolaris.org > [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Tim Cook > Sent: Wednesday, August 11, 2010 8:46 PM > To: Edward Ned Harvey > Cc: ZFS Discussions > Subject: Re: [zfs-discuss] ZFS and VMware > > > > On Wed, Aug 11, 2010 at 7:27 PM, Edward Ned Harvey > <shill at nedharvey.com> wrote: > > > > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > > bounces at opensolaris.org] On Behalf Of Paul Kraus > > > > > I am looking for references of folks using ZFS > with either NFS > > or iSCSI as the backing store for VMware (4.x) > backing store for > > > I''ll try to clearly separate what I know, from what I speculate: > > I know you can do either one, NFS or iscsi served by > ZFS for the backend > datastore used by ESX. I know (99.9%) that vmware will > issue sync-mode > operations in both cases. Which means you are strongly > encouraged to use a > mirrored dedicated log device, presumably SSD or some > sort of high IOPS low > latency devices. > > I speculate that iscsi will perform better. If you > serve it up via NFS, > then vmware is going to create a file in your NFS > filesystem, and inside > that file it will create a new filesystem. So you get > twice the filesytem > overhead. Whereas in iscsi, ZFS presents a raw device > to VMware, and then > vmware maintains its filesystem in that. > > > > > > That''s not true at all. Whether you use iSCSI or NFS, VMware > is laying down a file which it presents as a disk to the > guest VM which then formats it with its own filesystem. > That''s the advantage of virtualization. You''ve got a big > file you can pick up and move anywhere that is hardware > agnostic. With iSCSI, you''re forced to use VMFS, which is an > adaptation of the legato clustered filesystem from the early > 90''s. It is nowhere near as robust as NFS, and I can''t think > of a reason you would use it if given the choice; short of a > massive pre-existing investment in Fibre Channel. With NFS, > you''re simply using ZFS, there is no VMFS to worry about. > You don''t have to have another ESX box if something goes > wrong, any client with an nfs client can mount the share and > diagnose the VMDK. > > --Tim >This is not entirely correct either. You''re not forced to use VMFS. You can format the LUN with VMFS, then put VM files inside the VMFS; in this case you get the Guest OS filesystem inside a VMDK file on the VMFS filesystem inside a LUN/ZVOL on your ZFS filesystem. You can also set up Raw Device Mapping (RDM) directly to a LUN, in which case you get the Guest OS filesystem inside the LUN/ZVOL on your ZFS filesystem. There has to be VMFS available somewhere to store metadata, though. It was and may still be common to use RDM for VMs that need very high IO performance. It also used to be the only supported way to get thin provisioning for an individual VM disk. However, VMware regularly makes a lot of noise about how VMFS does not hurt performance enough to outweigh its benefits anymore, and thin provisioning has been a native/supported feature on VMFS datastores since 4.0. I still think there are reasons why iSCSI would be better than NFS and vice versa. -Will
> > > > This is not entirely correct either. You''re not forced to use VMFS. >It is entirely true. You absolutely cannot use ESX with a guest on a block device without formatting the LUN with VMFS. You are *FORCED* to use VMFS. You can format the LUN with VMFS, then put VM files inside the VMFS; in this> case you get the Guest OS filesystem inside a VMDK file on the VMFS > filesystem inside a LUN/ZVOL on your ZFS filesystem. You can also set up Raw > Device Mapping (RDM) directly to a LUN, in which case you get the Guest OS > filesystem inside the LUN/ZVOL on your ZFS filesystem. There has to be VMFS > available somewhere to store metadata, though. > >You cannot boot a VM off an RDM. You *HAVE* to use VMFS with block devices for your guest operating systems. Regardless, we aren''t talking about RDM''s, we''re talking about storing virtual machines. It was and may still be common to use RDM for VMs that need very high IO> performance. It also used to be the only supported way to get thin > provisioning for an individual VM disk. However, VMware regularly makes a > lot of noise about how VMFS does not hurt performance enough to outweigh its > benefits anymore, and thin provisioning has been a native/supported feature > on VMFS datastores since 4.0. > > I still think there are reasons why iSCSI would be better than NFS and vice > versa. > >I''d love for you to name one. Short of a piss-poor NFS server implementation, I''ve never once seen iSCSI beat out NFS in a VMware environment. I have however seen countless examples of their "clustered filesystem" causing permanent SCSI locks on a LUN that result in an entire datastore going offline. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100811/91e441e6/attachment.html>
Actually, this brings up a related issue. Does anyone have experience with running VirtualBox on iSCSI volumes vs NFS shares, both of which would be backed by a ZFS server? -Erik On Wed, 2010-08-11 at 21:41 -0500, Tim Cook wrote:> > > > This is not entirely correct either. You''re not forced to use > VMFS. > > It is entirely true. You absolutely cannot use ESX with a guest on a > block device without formatting the LUN with VMFS. You are *FORCED* > to use VMFS. > > > > You can format the LUN with VMFS, then put VM files inside the > VMFS; in this case you get the Guest OS filesystem inside a > VMDK file on the VMFS filesystem inside a LUN/ZVOL on your ZFS > filesystem. You can also set up Raw Device Mapping (RDM) > directly to a LUN, in which case you get the Guest OS > filesystem inside the LUN/ZVOL on your ZFS filesystem. There > has to be VMFS available somewhere to store metadata, though. > > > You cannot boot a VM off an RDM. You *HAVE* to use VMFS with block > devices for your guest operating systems. Regardless, we aren''t > talking about RDM''s, we''re talking about storing virtual machines. > > > > It was and may still be common to use RDM for VMs that need > very high IO performance. It also used to be the only > supported way to get thin provisioning for an individual VM > disk. However, VMware regularly makes a lot of noise about how > VMFS does not hurt performance enough to outweigh its benefits > anymore, and thin provisioning has been a native/supported > feature on VMFS datastores since 4.0. > > I still think there are reasons why iSCSI would be better than > NFS and vice versa. > > > > I''d love for you to name one. Short of a piss-poor NFS server > implementation, I''ve never once seen iSCSI beat out NFS in a VMware > environment. I have however seen countless examples of their > "clustered filesystem" causing permanent SCSI locks on a LUN that > result in an entire datastore going offline. > > > --Tim > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
> -----Original Message----- > From: Tim Cook [mailto:tim at cook.ms] > Sent: Wednesday, August 11, 2010 10:42 PM > To: Saxon, Will > Cc: Edward Ned Harvey; ZFS Discussions > Subject: Re: [zfs-discuss] ZFS and VMware > > > I still think there are reasons why iSCSI would be > better than NFS and vice versa. > > > > > I''d love for you to name one. Short of a piss-poor NFS > server implementation, I''ve never once seen iSCSI beat out > NFS in a VMware environment. I have however seen countless > examples of their "clustered filesystem" causing permanent > SCSI locks on a LUN that result in an entire datastore going offline.My understanding is that if you wanted to use MS Cluster Server, you''d need to use a LUN as an RDM for the quorum drive. VMDK files are locked when open, so they can''t typically be shared. VMware''s Fault Tolerance gets around this somehow, and I have a suspicion that their Lab Manager product does as well. I don''t think you can use VMware''s built-in multipathing with NFS. Maybe it''s possible, it doesn''t look that way but I''m not going to verify it one way or the other. There are probably better/alternative ways to achieve the same thing with NFS. The new VAAI stuff that VMware announced with vSphere 4.1 does not support NFS (yet), it only works with storage servers that implement the requires commands. The locked LUN thing has happened to me once. I''ve had more trouble with thin provisioning and negligence leading to a totally-full VMFS, which is irritating to recover from, and moved/restored luns needing VMFS resignaturing, which is also irritating. I don''t want to argue with you about the other stuff. -Will
> > > > My understanding is that if you wanted to use MS Cluster Server, you''d need > to use a LUN as an RDM for the quorum drive. VMDK files are locked when > open, so they can''t typically be shared. VMware''s Fault Tolerance gets > around this somehow, and I have a suspicion that their Lab Manager product > does as well. > >Right, but again, we''re talking about storing virtual machines, not RDM''s. Using MSCS on top of VMware rarely makes any sense, and MS is doing their damnedest to make it as painful as possible for those that try anyways. There''s nothing stopping you from putting your virtual machine on an NFS datastore, and mounting a LUN directly to the guest OS with a software iSCSI client and cutting out the middleman and bypassing the RDM entirely... which just adds yet another headache when it comes to things like SRM and vmotion.> I don''t think you can use VMware''s built-in multipathing with NFS. Maybe > it''s possible, it doesn''t look that way but I''m not going to verify it one > way or the other. There are probably better/alternative ways to achieve the > same thing with NFS. >You can achieve the same thing with a little bit of forethought on your network design. No, ALUA is not compatible with NFS, it is a block protocol feature. Then again, ALUA is also not compatible with the MSCS example you listed above.> The new VAAI stuff that VMware announced with vSphere 4.1 does not support > NFS (yet), it only works with storage servers that implement the requires > commands. > >VAAI is an attempt to give block more NFS-like features (for instance, finer-grained locking which already exists in NFS by default). The "features" are basically useless in an NFS environment on intelligent storage. The locked LUN thing has happened to me once. I''ve had more trouble with> thin provisioning and negligence leading to a totally-full VMFS, which is > irritating to recover from, and moved/restored luns needing VMFS > resignaturing, which is also irritating. > > I don''t want to argue with you about the other stuff. > >Which is why block with vmware blows :) --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100811/b0ab9441/attachment.html>
On Wed, Aug 11, 2010 at 6:15 PM, Saxon, Will <Will.Saxon at sage.com> wrote:> > It really depends on your VM system, what you plan on doing with VMs and how you plan to do it. > > I have the vSphere Enterprise product and I am using the DRS feature, so VMs are vmotioned around > my cluster all throughout the day. All of my VM users are able to create and manage their own VMs > through the vSphere client. None of them care to know anything about VM storage as long as it''s > fast, and most of them don''t want to have to make choices about which datastore to put their new > VM on. Only 30-40% of the total number of VMs registered in the cluster are powered on at any given time.We have three production VMware VSphere 4 clusters, each with four hosts. The number of guests varies, but ranges from a low of 40 on one cluster to 80 on another. We do not generally have many guests being created or destroyed, but a slow steady growth in their numbers. The guests are both production as well as test / development and the vast majority of them are Windows, mostly Server 2008. The rule is to roll out Windows servers as VMs with the notable exception of the Exchange servers, which are physical servers. The VMs are used for everything including domain controllers, file servers, print servers, dhcp servers, dns servers, workstations (my physical desktop run Linux but I need a Windows system for Outlook and a few other applications, that runs as a VM), SharePoint servers, MS-SQL servers, and other assorted application servers. We are using DRS and VMs do migrate around a bit (transparently). We take advantage of "maintenance mode" for exactly what the name says. We have had a fairly constant, but low rate of FC issues with VMware, from when we first rolled out VMware (version 3.0) through today (4.1). The multi-pathing seems to occasionally either loose one or more paths to a given LUN or completely loose access to a given LUN. These problems do not happen often, but when they do it has caused downtime on production VMs. Part of the reason we started looking at NFS/iSCSI was to get around the VMware (Linux) FC drivers. We also like the low overhead snapshot feature of ZFS (and are leveraging it for other data extensively). Now we are getting serious about using ZFS + NFS/iSCSI and are looking to learn from other''s experience as well as our own. For example, is anyone using NFS with Oracle Cluster for HA storage for VMs or are sites trusting to a single NFS server ? -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
We are doing NFS in VMWare 4.0U2 production, 50K users using OpenSolaris SNV_134 on SuperMicro boxes with SATA drives. Yes, I am crazy. Our experience has been that iSCSI for ESXi 4.x is fast and works well with minimal fussing until there is a problem. When that problem happens, getting to data on VMFS LUNs even with the free java VMFS utility to do so is problematic at best and game over at worst. With NFS, data access in problem situations is a non event. Snapshots happen and everyone is happy. The problem with it is the VMWare NFS client which makes every write an F_SYNC write. That kills NFS performance dead. To get around that, we''re using DDRdrive X1s for our ZIL and the problem is solved. I have not looked at the NFS client changes in 4.1, perhaps it''s better or at least tuneable now. I would recommend NFS as the overall strategy, but you must get a good ZIL device to make that happen. Do not disable the ZIL. Do make sure you set your I/O queue depths correctly. -- This message posted from opensolaris.org
We are using zfs backed fibre targets for ESXi 4.1 and previously 4.0 and have had good performance with no issues. The fibre LUNS were formated with vmfs by the ESXi boxes. SQLIO benchmarks from guest system running on fibre attacted ESXi host. File Size MB Threads Read/Write Duration Sector Size KB Pattern IOs oustanding IO/Sec MB/Sec Lat. Min. Lat. Ave. Lat. Max. 24576 8 R 30 8 random 64 37645 294 0 1 141 24576 8 W 30 8 random 64 17304 135 0 3 303 24576 8 R 30 64 random 64 6250 391 1 9 176 24576 8 W 30 64 random 64 5742 359 1 10 203 The array is a raidz2 with 14 x 256 gb Patriot Torqx drives and a cache with 4 x 32 gb intel 32 GB G1s When I get around to doing the next series of boxes I''ll probably use c300s in place of the indellix based drives. iSCSI was disappointing and seemed to be CPU bound. Possibly by a stupid amount of interupts coming from the less than stellar nic on the test box. NFS we have only used as an ISO store, but it has worked ok and without issues. -- This message posted from opensolaris.org
>>>>> "sw" == Saxon, Will <Will.Saxon at sage.com> writes:sw> It was and may still be common to use RDM for VMs that need sw> very high IO performance. It also used to be the only sw> supported way to get thin provisioning for an individual VM sw> disk. However, VMware regularly makes a lot of noise about how sw> VMFS does not hurt performance enough to outweigh its benefits What''s the performance of configuring the guest to boot off iSCSI or NFS directly using its own initiator/client, through the virtual network adapter? Is it better or worse, and does it interfere with migration? or is this difficult enough that no one using vmware does it, that anyone who does this would already be using Xen and in-house scripts instead of vmware black-box proprietary crap? It seems to me a native NFS guest would go much easier on the DDT. I found it frustrating I could not change the blocksize of XFS: it is locked at 4kB. I would guess there is still no vIB adapter, so if you want to use SRP you are stuck presenting IB storage to guests with vmware virtual scsi card. but I don''t know if the common wisdom, ``TOE is a worthless gimmick. modern network cards and TCP stacks are as good as SCSI cards,'''' still applies when the adapters are virtual ones, so I''d be interested to hear from someone running guests in this way (without virtual storage adapters). -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100812/93cc985e/attachment.bin>
I fully agree with your post. NFS is much simpler in administration. Although I don''t have any experience with the DDRdrive X1, I''ve read and heard from various people actually using them that it''s the best "available" SLOG device. Before everybody starts yelling "ZEUS" or "LOGZILLA". Was anybody able to buy one? Apart from SUN. The DDRdrive X1 is available and you can buy several for one ZEUS. Good to hear another success story. As soon as I have budget I''m going to buy a pair of them. My question, what about I/O queue depths? Which queues those over at VMware or on OpenSolaris? Can you give some examples and actual settings? Oh, in what chassis model have you mounted the DDRdrive? Thanks in advance -- This message posted from opensolaris.org
On Fri, August 13, 2010 07:21, F. Wessels wrote:> I fully agree with your post. NFS is much simpler in administration. > Although I don''t have any experience with the DDRdrive X1, I''ve read and > heard from various people actually using them that it''s the best > "available" SLOG device. Before everybody starts yelling "ZEUS" or > "LOGZILLA". Was anybody able to buy one? Apart from SUN. The DDRdrive X1 > is available and you can buy several for one ZEUS.STEC only sells to OEMs at this time. From past discussions on this list, I think the only dependable SSD alternative are devices based on the SandForce SF-1500 controller: http://www.google.com/search?q=SandForce+SF-1500 For all other products, there are question of the devices respecting SYNC commands (i.e., not lying about them), and issues with the lack of supercaps. The SandForce/ZFS thread (January 2010: "preview of new SSD based on SandForce controller") can be found at: http://tinyurl.com/2c6hvqs#35376 http://mail.opensolaris.org/pipermail/zfs-discuss/2010-January/thread.html#35376
Yes, the sandforce based ssd''s are also interesting. I think both, the 1500 sure can, could be fitted with the necessary supercap to prevent dataloss in case of unexpected power loss. And the 1500 based models will available with a SAS interface needed for clustering. Something the DDRdrive cannot do. BUT at this moment they certainly do not match the DDRdrive in performance and probably also not in MTBF. The DDRdrive only writes to ddr ram, hence the name. Only in case of power loss the ram contents will be written to flash. At least this is what I understand and know of it. The ddr ram doesn''t suffer the wear/degradation any flash memory type suffers. You can buy multiple sandforce ssd''s with supercap for the price of a single DDRdrive X1. Choice is good! -- This message posted from opensolaris.org
Don''t waste your time with something other than the DDRdrive for NFS ZIL. If it''s RAM based it might work, but why risk it and if it''s an SSD forget it. No SSD will work well for the ZIL long term. Short term the only SSD to consider would be Intel, but again long term even that will not work out for you. The 100% write characteristics of the ZIL are an SSDs worst case scenario especially without TRIM support. We have tried them all - Samsung, SanDisk, OCZ and none of those worked out. In particular, anything Sandforce 1500 based was the worst so avoid those at all costs if you dare to try an SSD ZIL. Don''t. :) As for the queue depths, here''s the command from the ZFS Evil Tuning Guide: echo zfs_vdev_max_pending/W0t10 | mdb -kw The W0t10 command is what to change. W0t35 (35 seconds) was the old value, 10 is the new one. For our NFS environment, we found W0t2 was the best by looking at the actual IO using dtrace scripts. Email me if you want those scripts. They are here, but need to be edited before they work: http://blogs.sun.com/chrisg/entry/latency_bubble_in_your_io -- This message posted from opensolaris.org
> -----Original Message----- > From: zfs-discuss-bounces at opensolaris.org > [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Eff Norwood > Sent: Friday, August 13, 2010 10:26 AM > To: zfs-discuss at opensolaris.org > Subject: Re: [zfs-discuss] ZFS and VMware > > Don''t waste your time with something other than the DDRdrive > for NFS ZIL. If it''s RAM based it might work, but why risk it > and if it''s an SSD forget it. No SSD will work well for the > ZIL long term. Short term the only SSD to consider would be > Intel, but again long term even that will not work out for > you. The 100% write characteristics of the ZIL are an SSDs > worst case scenario especially without TRIM support. We have > tried them all - Samsung, SanDisk, OCZ and none of those > worked out. In particular, anything Sandforce 1500 based was > the worst so avoid those at all costs if you dare to try an > SSD ZIL. Don''t. :)What was the observed behavior with the SF-1500 based SSDs? I was planning to purchase something based on these next year, specifically to be SLOG. -Will
I wasn''t planning to buy any SSD as a ZIL. I merely acknowledged that an sandforce with supercap MIGHT be a solution. At least the supercap should take care of the data loss in case of a power failure. But they are still in the consumer realm have not been picked up by the enterprise (yet) for whatever reason. I must admit that I''ve heard that the sandforce''s didn''t really live up to their expectations at least as an slog device. I think a lot people on this mailing list would be very interested in your evaluation of the SSD''s to prevent costly mistakes. Thanks for the scripts, I''ll send you an email about them. And for everybody else here''s a good entry about the DDRdrive X1: http://blogs.sun.com/ahl/entry/ddrdrive -- This message posted from opensolaris.org
On Fri, August 13, 2010 11:39, F. Wessels wrote:> I wasn''t planning to buy any SSD as a ZIL. I merely acknowledged that an > sandforce with supercap MIGHT be a solution. At least the supercap should > take care of the data loss in case of a power failure. But they are still > in the consumer realm have not been picked up by the enterprise (yet) for > whatever reason. I must admit that I''ve heard that the sandforce''s didn''t > really live up to their expectations at least as an slog device.IBM appears to used SandForce for some stuff: http://tinyurl.com/3xtvch4 http://www.engadget.com/2010/05/03/sandforce-makes-ssds-cheaper-faster-more-reliable-just-how/
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Paul Kraus > > I am looking for references of folks using ZFS with either NFS > or iSCSI as the backing store for VMware (4.x) backing store for > virtual machines.Since I had ulterior motives to test this, I spent a lot of time today working on this anyway. So I figured I might as well post some results here: #1 If there''s any performance difference between iscsi vs nfs, it was undetectable to me. If there''s any difference at all, nfs might be faster in some cases. #2 I previously speculated that performance of iscsi would outperform nfs, because I thought vmware would create a file on NFS and then format that file with vmfs3, thus doubling filesystem overhead. I was wrong. In reality, ESXi uses the NFS datastore "raw." Meaning, if you create some new VM named "junk" with associated disks "junk.vmdk" etc, then those files are created inside the NFS file server just like any other normal files. There is no vmfs3 overhead in between. #3 I previously believed that vmfs3 was able to handle sparse files amazingly well, like, when you create a new vmdk, it appears almost instantly regardless of size, and I believed you could copy sparse vmdk''s efficiently, not needing to read all the sparse consecutive zeroes. I was wrong. In reality, vmfs3 doesn''t seem to have any advantage over *any* other filesystems (ntfs, ext3, hfs+, etc) to create and occupy disk space with the sparse files. They do not copy efficiently. I found that copying a large sparse vmdk file, for all intents and purposes, works just as well inside vmfs3 as it does in nfs. Those things being said ... I couldn''t find one reason at all in favor of iscsi over nfs. Except, perhaps, the authentication which may or may not be stronger security than NFS in a less-than-trusted LAN. iscsi requires more work to setup. iscsi has more restrictions on it - You have to choose a size, and can''t expand it. It''s formatted vmfs3, so you cannot see the contents in any way other than mounting it in esx. I could not find even one thing, to promote iscsi over nfs. Although it seems unlikely, if you wanted to disable ZIL instead of buying log devices on the ZFS host, you can easily do this for NFS, and I''m not aware of any way to do it with iscsi. Maybe you can, I don''t know. I mean ... It wasn''t like Mike Tyson beating up a little kid, but it was like a grown-up beating up an adolescent. ;-) Extremely one-sided as far as I can tell.
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Edward Ned Harvey > > #3 I previously believed that vmfs3 was able to handle sparse files > amazingly well, like, when you create a new vmdk, it appears almost > instantly regardless of size, and I believed you could copy sparse > vmdk''s > efficiently, not needing to read all the sparse consecutive zeroes. I > was > wrong.Correction: I was originally right. ;-) In ESXi, if you go to command line (which is busybox) then sparse copies are not efficient. If you go into vSphere, and browse the datastore, and copy vmdk files via gui, then it DOES copy efficiently. The behavior is the same, regardless of NFS vs iSCSI. You should always copy files via GUI. That''s the lesson here.
On Aug 14, 2010, at 8:26 AM, "Edward Ned Harvey" <shill at nedharvey.com> wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Edward Ned Harvey >> >> #3 I previously believed that vmfs3 was able to handle sparse files >> amazingly well, like, when you create a new vmdk, it appears almost >> instantly regardless of size, and I believed you could copy sparse >> vmdk''s >> efficiently, not needing to read all the sparse consecutive zeroes. I >> was >> wrong. > > Correction: I was originally right. ;-) > > In ESXi, if you go to command line (which is busybox) then sparse copies are > not efficient. > If you go into vSphere, and browse the datastore, and copy vmdk files via > gui, then it DOES copy efficiently. > > The behavior is the same, regardless of NFS vs iSCSI. > > You should always copy files via GUI. That''s the lesson here.Technically you should always copy vmdk files via vmfstool on the command line. That will give you wire speed transfers. -Ross
On Aug 11, 2010, at 12:52 PM, Paul Kraus wrote:> I am looking for references of folks using ZFS with either NFS > or iSCSI as the backing store for VMware (4.x) backing store for > virtual machines. We asked the local VMware folks and they had not > even heard of ZFS. Part of what we are looking for is a recommendation > for NFS or iSCSI, and all VMware would say is "we support both". We > are currently using Sun SE-6920, 6140, and 2540 hardware arrays via > FC. We have started playing with ZFS/NFS, but have no experience with > iSCSI. The ZFS backing store in some cases will be the hardware arrays > (the 6920 has fallen off of VMware''s supported list and if we front > end it with either NFS or iSCSI it''ll be supported, and VMware > suggested that) and some of it will be backed by J4400 SATA disk.At Nexenta, we have many customers using ZFS as backing store for VMware and Citrix XenServer. Nexenta also has a plugin to help you integrate your VMware, XenServer, and Hyper-V virtual hosts with the storage appliance. For more info, see http://www.nexenta.com/corp/applications/vmdc and the latest Nexenta docs, including the VMDC User''s Guide are at: http://www.nexenta.com/corp/documentation/product-documentation Please share and enjoy that the joint EMC+NetApp storage best practices for configuring ESX applies to all NFS and block storage (*SCSI) environments. Google "TR-3428" and point me in the direction of any later versions you find :-) -- richard -- Richard Elling richard at nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com