Hi, I''m looking to build a virtualized web hosting server environment accessing files on a hybrid storage SAN. I was looking at using the Sun X-Fire x4540 with the following configuration: - 6 RAID-Z vdevs with one hot spare each (all 500GB 7200RPM SATA drives) - 2 Intel X-25 32GB SSD''s as a mirrored ZIL - 4 Intel X-25 64GB SSD''s as the L2ARC. - De-duplification - LZJB compression The clients will be Apache web hosts serving hundreds of domains. I have the following questions: - Should I use NFS with all five VM''s accessing the exports, or one LUN for each VM, accessed over iSCSI? - Are the FSYNC speed issues with NFS resolved? - Should I go with fiber channel, or will the 4 built-in 1Gbe NIC''s give me enough speed? - How many SSD''s should I use for the ZIL and L2ARC? - What pool structure should I use? I know these questions are slightly vague, but any input would be greatly appreciated. Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100606/588e814f/attachment.html>
On 6/6/2010 6:22 PM, Ken wrote:> Hi, > > I''m looking to build a virtualized web hosting server environment > accessing files on a hybrid storage SAN. I was looking at using the > Sun X-Fire x4540 with the following configuration: > > * 6 RAID-Z vdevs with one hot spare each (all 500GB 7200RPM SATA > drives) > * 2 Intel X-25 32GB SSD''s as a mirrored ZIL > * 4 Intel X-25 64GB SSD''s as the L2ARC. > * De-duplification > * LZJB compression > > The clients will be Apache web hosts serving hundreds of domains. > > I have the following questions: > > * Should I use NFS with all five VM''s accessing the exports, or > one LUN for each VM, accessed over iSCSI? > * Are the FSYNC speed issues with NFS resolved? > * Should I go with fiber channel, or will the 4 built-in 1Gbe > NIC''s give me enough speed? > * How many SSD''s should I use for the ZIL and L2ARC? > * What pool structure should I use? > > I know these questions are slightly vague, but any input would be > greatly appreciated. > > Thanks! >Which Virtual Machine technology are you going to use? VirtualBox VMWare Xen Solaris Zones Somethinge else... It will make a difference as to my recommendation (or, do you want me to recommend a VM type, too?) <grin> -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100606/cc778701/attachment.html>
I''m looking at VMWare, ESXi 4, but I''ll take any advice offered. On Sun, Jun 6, 2010 at 19:40, Erik Trimble <erik.trimble at oracle.com> wrote:> On 6/6/2010 6:22 PM, Ken wrote: > > Hi, > > I''m looking to build a virtualized web hosting server environment > accessing files on a hybrid storage SAN. I was looking at using the Sun > X-Fire x4540 with the following configuration: > > - 6 RAID-Z vdevs with one hot spare each (all 500GB 7200RPM SATA > drives) > - 2 Intel X-25 32GB SSD''s as a mirrored ZIL > - 4 Intel X-25 64GB SSD''s as the L2ARC. > - De-duplification > - LZJB compression > > The clients will be Apache web hosts serving hundreds of domains. > > I have the following questions: > > - Should I use NFS with all five VM''s accessing the exports, or one LUN > for each VM, accessed over iSCSI? > - Are the FSYNC speed issues with NFS resolved? > - Should I go with fiber channel, or will the 4 built-in 1Gbe NIC''s > give me enough speed? > - How many SSD''s should I use for the ZIL and L2ARC? > - What pool structure should I use? > > I know these questions are slightly vague, but any input would be greatly > appreciated. > > Thanks! > > > Which Virtual Machine technology are you going to use? > > VirtualBox > VMWare > Xen > Solaris Zones > Somethinge else... > > It will make a difference as to my recommendation (or, do you want me to > recommend a VM type, too?) > > <grin> > > > > -- > Erik Trimble > Java System Support > Mailstop: usca22-123 > Phone: x17195 > Santa Clara, CA > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100606/7aa99ccc/attachment.html>
Comments in-line. On 6/6/2010 9:16 PM, Ken wrote:> I''m looking at VMWare, ESXi 4, but I''ll take any advice offered. > > On Sun, Jun 6, 2010 at 19:40, Erik Trimble <erik.trimble at oracle.com > <mailto:erik.trimble at oracle.com>> wrote: > > On 6/6/2010 6:22 PM, Ken wrote: >> Hi, >> >> I''m looking to build a virtualized web hosting server environment >> accessing files on a hybrid storage SAN. I was looking at using >> the Sun X-Fire x4540 with the following configuration: >> >> * 6 RAID-Z vdevs with one hot spare each (all 500GB 7200RPM >> SATA drives) >> * 2 Intel X-25 32GB SSD''s as a mirrored ZIL >> * 4 Intel X-25 64GB SSD''s as the L2ARC. >> * De-duplification >> * LZJB compression >> >> The clients will be Apache web hosts serving hundreds of domains. >> >> I have the following questions: >> >> * Should I use NFS with all five VM''s accessing the exports, >> or one LUN for each VM, accessed over iSCSI? >>Generally speaking, it depends on your comfort level with running iSCSI Volumes to put the VMs in, or serving everything out via NFS (hosting the VM disk file in an NFS filesystem). If you go the iSCSI route, I would definitely go the "one iSCSI volume per VM" route - note that you can create multiple zvols per zpool on the X4540, so it''s not limiting in any way to volume-ize a VM. It''s a lot simpler, easier, and allows for nicer management (snapshots/cloning/etc. on the X4540 side) if you go with a VM per iSCSI volume. With NFS-hosted VM disks, do the same thing: create a single filesystem on the X4540 for each VM. Performance-wise, I''d have to test, but I /think/ the iSCSI route will be faster. Even with the ZIL SSDs. In all cases, regardless of how you host the VM images themselves, I''d serve out the website files via NFS. I''m not sure how ESXi works, but under something like Solaris/Vbox, I could set up the base Solaris system to run CacheFS for an NFS share, and then give local access to all the VBox instances that single NFS mountpoint. That would allow for heavy client-side cacheing of important data for your web servers. If you''re careful, you can separate read-only data from write-only data, which would allow you even better performance tweaks. I tend to like to have the host OS handle as much network traffic and cacheing of data as possible instead of each VM doing it; it tends to be more efficient that way.>> * Are the FSYNC speed issues with NFS resolved? >>The ZIL SSDs will compensate for synchronous write issues in NFS. Not completely eliminate them, but you shouldn''t notice issues with sync writing until you''re up at pretty heavy loads.>> * Should I go with fiber channel, or will the 4 built-in 1Gbe >> NIC''s give me enough speed? >>Depending on how much RAM and how much local data caching you do (and the specifics of the web site accesses), 4 GBE should be fine. However, if you want more, I''d get another quad GBE card, and then run at least 2 guest instances per client hardware. Try very hard to have the equivalent of a full GBE available per VM. Personally, I''d go for client hardware that has 4 GBE interfaces: (1) each for two VMs, 1 for external internet access, and 1 for management. I''d then run the X4540 with 8 GBE bonded (trunked/teamed/whatever) together. This might be overkill, so see what your setup requires in terms of available bandwidth.>> * How many SSD''s should I use for the ZIL and L2ARC? >>Being a website mux, your data pattern is likely to be 99% read with small random writes being the remaining 1%. You need just enough high-performance SSD for the ZIL. Honestly, the 32GB X25-E is larger than you''ll likely ever need. I can''t recommend anything else for the money, but the sad truth is that ZFS really only need a 1-2GB of NVRAM for the ZIL (for most use cases). So get the smallest device you can find that still satisfies the high performance requirement. Caveaut: look at the archives for all the talk about protecting your ZIL device from power outages (and the lack of a capacitor in most modern SSDs). For L2ARC, go big. Website files tend to be /very/ small, so you''re in the worst use case for Dedup. With something like a X4540 and it''s huge data capacity, get as much L2ARC SSD space as you can afford. Remember: 250bytes per Dedup block. If you have 1k blocks for all those little files, well, your L2ARC needs to be 25% of your data size. *Ouch* Now, you don''t have to buy the super-expensive stuff for L2ARC: the good old Intel X-25M works just fine. Don''t mirror them. Given the explosive potential size of your DDT, I''d think long and hard about which data you really want to Dedup. Disk is cheap, but SSD isn''t. Good news is that you can selectively decide which data sets to Dedup. Ain''t ZFS great?>> * What pool structure should I use? >>If it were me (and, given what little I know of your data), I''d go like this: (1) pool for VMs: 8 disks, MIRRORED 1 SSD for L2ARC one Zvol per VM instance, served via iSCSI, each with: DD turned ON, Compression turned OFF (1) pool for clients to write data to (log files, incoming data, etc.) 6 or 8 disks, MIRRORED 2 SSDs for ZIL, mirrored Ideally, As many filesystems as you have webSITES, not just client VMs. As this might be unwieldy for 100s of websites, you should segregate them into obvious groupings, taking care with write/read permissions. NFS served DD OFF, Compression ON (or OFF, if you seem to be having CPU overload on the X4540) (1) pool for client read-only data All the rest of the disks, split into 7 or 8-disk RAIDZ2 vdevs All the remaining SSDs for L2ARC As many filesystems as you have webSITES, not just client VMs. (however, see above) NFS served DD on for selected websites (filesystems), Compression ON for everything (2) Global hot spares.>> I know these questions are slightly vague, but any input would be >> greatly appreciated. >> >> Thanks! >> > > Which Virtual Machine technology are you going to use? > > VirtualBox > VMWare > Xen > Solaris Zones > Somethinge else... > > It will make a difference as to my recommendation (or, do you want > me to recommend a VM type, too?) > > <grin> > > > > -- > Erik Trimble > Java System Support > Mailstop: usca22-123 > Phone: x17195 > Santa Clara, CA > >-- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100606/85d4f744/attachment.html>
On Sun, Jun 06, 2010 at 09:16:56PM -0700, Ken wrote:> I''m looking at VMWare, ESXi 4, but I''ll take any advice offered....> I''m looking to build a virtualized web hosting server environment accessing > files on a hybrid storage SAN. I was looking at using the Sun X-Fire x4540 > with the following configuration:IMHO Solaris Zones with LOFS mounted ZFSs gives you the highest flexibility in all directions, probably the best performance and least resource consumption, fine grained resource management (CPU, memory, storage space) and less maintainance stress etc... Have fun, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768
Which Virtual Machine technology are you going to use? VirtualBox VMWare Xen Solaris Zones Somethinge else... It will make a difference as to my recommendation (or, do you want me to recommend a VM type, too?) This is somehow off-topic @zfs-discuss, but still. After trying to fight a bug - http://www.virtualbox.org/ticket/6505 - for months and getting close-to-zero feedback from the virtualbox developers, I have abandoned using vbox on OpenSolaris. It may work fine a few days, perhaps even weeks, and then boom. I don''t have equipment to setup a test system, and my server is located some 50km from home, so I need something that works, not part of the time, but all the time. Due to this, I''d recommend against VirtualBox on OpenSolaris. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100607/a1f1b916/attachment.html>
>>>>> "et" == Erik Trimble <erik.trimble at oracle.com> writes:et> With NFS-hosted VM disks, do the same thing: create a single et> filesystem on the X4540 for each VM. previous posters pointed out there are unreasonable hard limits in vmware to the number of NFS mounts or iSCSI connections or something, so you will probably run into that snag when attempting to use the much faster snapshotting/cloning in ZFS. >>> * Are the FSYNC speed issues with NFS resolved? >>> et> The ZIL SSDs will compensate for synchronous write issues in et> NFS. okay, but sometimes for VM''s I think this often doesn''t matter because NFSv3 and v4 only add fsync()''s on file closings, and a virtual disk is one giant file that the client never closes. There may still be synchronous writes coming through if they don''t get blocked in LVM2 inside the guest or blocked in the VM software, but whatever comes through ought to be exactly the same number of them for NFS or iSCSI, unless the vm software has different bugs in the nfs vs iscsi back-ends. the other difference is in the latest comstar which runs in sync-everything mode by default, AIUI. Or it does use that mode only when zvol-backed? Or something. I''ve the impression it went through many rounds of quiet changes, both in comstar and in zvol''s, on its way to its present form. I''ve heard said here you can change the mode both from the comstar host and on the remote initiator, but I don''t know how to do it or how sticky the change is, but if you didn''t change and stuck with the default sync-everything I think NFS would be a lot faster. This is if we are comparing one giant .vmdk or similar on NFS, against one zvol. If we are comparing an exploded filesystem on NFS mounted through the virtual network adapter, then of course you''re right again Erik. The tradeoff integrity tests are, (1) reboot the solaris storage host without rebooting the vmware hosts & guests and see what happens, (2) cord-yank the vmware host. Both of these are probably more dangerous than (3) command the vm software to virtual-cord-yank the guest. >>> * Should I go with fiber channel, or will the 4 built-in 1Gbe >>> NIC''s give me enough speed? FC has different QoS properties than Ethernet because of the buffer credit mechanism---it can exert back-pressure all the way through the fabric. same with IB, which is HOL-blocking. This is a big deal with storage, with its large blocks of bursty writes that aren''t really the case for which TCP shines. I would try both and compare, if you can afford it! je> IMHO Solaris Zones with LOFS mounted ZFSs gives you the je> highest flexibility in all directions, probably the best je> performance and least resource consumption, fine grained je> resource management (CPU, memory, storage space) and less je> maintainance stress etc... yeah zones are really awesome, especially combined with clones and snapshots. For once the clunky post-Unix XML crappo solaris interfaces are actually something I appreciate a little, because lots of their value comes from being able to do consistent repeatable operations on them. The problem is that the zones run Solaris instead of Linux. BrandZ never got far enough to, for example, run Apache under a 2.6-kernel-based distribution, so I don''t find it useful for any real work. I do keep a CentOS 3.8 (I think?) brandz zone around, but not for anything production---just so I can try it if I think the new/weird version of a tool might be broken. as for native zones, the ipkg repository, and even the jucr repository, has two years old versions of everything---django/python, gcc, movabletype. Many things are missing outright, like nginx. I''m very disappointed that Solaris did not adopt an upstream package system like Dragonfly did. Gentoo or pkgsrc would have been very smart, IMHO. Even opencsw is based on Nick Moffitt''s GAR system, which was an old mostly-abandoned tool for building bleeding edge Gnome on Linux. The ancient perpetually-abandoned set of packages on jucr and the crufty poorly-factored RPM-like spec files leave me with little interest in contributing to jucr myself, while if Solaris had poured the effort instead into one of these already-portable package systems like they poured it into Mercurial after adopting that, then I''d instead look into (a) contributing packages that I need most, and (b) using whatever system Solaris picked on my non-Solaris systems. This crap/marginalized build system means I need to look at a way to host Linux under Solaris, using Solaris basically just for ZFS and nothing else. The alternative is to spend heaps of time re-inventing the wheel only to end up with an environment less rich than competitors and charge twice as much for it like joyent. But, yeah, while working on Solaris I would never install anything in the global zone after discovering how easy it is to work with ipkg zones. They are really brilliant, and unlike everyone else''s attempt at these superchroot''s like freebsd jails/johncompanies.com I feel like zones are basically finished. however... because of: http://mail.opensolaris.org/pipermail/zfs-discuss/2009-October/032878.html I wonder if it might be better to mount ZFS datasets directly in the zones, not lofs mount them. It''s easy to do this. Short version is: 1. create dataset outside the zone with mountpoint=none 2. add dataset to the zone with zonecfg 3. set the dataset''s mountpoint from a shell inside the zone Long version below. postgres cheatsheet: -----8<----- http://blogs.sun.com/jkshah/entry/opensolaris_2008_11_and_postgresql need to make a dataset outside the zbe for postgres data so it''ll escape beadm snapshotting/cloning once that''s working within zones for image-update. setting mountpoints for zoned datasets is weird, though: http://mail.opensolaris.org/pipermail/zones-discuss/2009-January/004661.html outside the zone: zfs list -r tub/export/zone NAME USED AVAIL REFER MOUNTPOINT tub/export/zone 27.1G 335G 40.3K /export/zone tub/export/zone/awabagal 917M 335G 37.4K /export/zone/awabagal tub/export/zone/awabagal/ROOT 917M 335G 31.4K legacy tub/export/zone/awabagal/ROOT/zbe 917M 335G 2.72G legacy zfs create -o mountpoint=none tub/export/zone/awabagal/postgres-data zonecfg -z awabagal zonecfg:awabagal> add dataset zonecfg:awabagal:dataset> set name=tub/export/zone/awabagal/postgres-data zonecfg:awabagal:dataset> end zonecfg:awabagal> commit zonecfg:awabagal> exit inside the zone: zfs list NAME USED AVAIL REFER MOUNTPOINT tub 1.33T 335G 498K /tub tub/export 295G 335G 63.2M /export tub/export/zone 27.1G 335G 40.3K /export/zone tub/export/zone/awabagal 919M 335G 37.4K /export/zone/awabagal tub/export/zone/awabagal/ROOT 919M 335G 31.4K legacy tub/export/zone/awabagal/ROOT/zbe 919M 335G 2.73G legacy tub/export/zone/awabagal/postgres-data 31.4K 335G 31.4K none zfs set mountpoint=/var/postgres tub/export/zone/awabagal/postgres-data the /var/postgres directory is magical and hardcoded into the package. the rest, you do inside the zone: pkg install SUNWpostgr-83-server SUNWpostgr-83-client SUNWpostgr-jdbc SUNWpostgr-83-contrib SUNWpostgr-83-docs \ SUNWpostgr-83-devel SUNWpostgr-83-tcl SUNWpostgr-83-pl SUNWpgadmin3 svccfg import /var/svc/manifest/application/database/postgresql_83.xml svcadm enable postgresql_83:default_64bit add /usr/postgres/8.3/bin to {,SU}PATH in /etc/default/{login,su} -----8<----- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100607/c66b1bb1/attachment.bin>
On Mon, 7 Jun 2010, Miles Nordin wrote:> > FC has different QoS properties than Ethernet because of the buffer > credit mechanism---it can exert back-pressure all the way through the > fabric. same with IB, which is HOL-blocking. This is a big deal with > storage, with its large blocks of bursty writes that aren''t really the > case for which TCP shines. I would try both and compare, if you can > afford it!FCoE is beginning to change this, with ethernet adaptors and switches which support the new features. Without the new FCoE standards, Ethernet can exert back pressure but only on a local-link level, and with long delays. You can be sure that companies like cisco will be (or are) selling FCoE hardware to compete with FC SANs. The intention is that ethernet will put fibre channel out of business. We shall see if history repeats itself. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Jun 7, 2010, at 11:06 AM, Miles Nordin wrote:> > the other difference is in the latest comstar which runs in > sync-everything mode by default, AIUI. Or it does use that mode only > when zvol-backed? Or something.It depends on your definition of "latest." The latest OpenSolaris release is 2009.06 which treats all Zvol-backed COMSTAR iSCSI writes as sync. This was changed in the developer releases in summer 2009, b114. For a release such as NexentaStor 3.0.2, which is based on b140 (+/-), the initiator''s write cache enable/disable request is respected, by default.>>>> * Should I go with fiber channel, or will the 4 built-in 1Gbe >>>> NIC''s give me enough speed? > > FC has different QoS properties than Ethernet because of the buffer > credit mechanism---it can exert back-pressure all the way through the > fabric. same with IB, which is HOL-blocking. This is a big deal with > storage, with its large blocks of bursty writes that aren''t really the > case for which TCP shines.Please don''t confuse Ethernet with IP. Ethernet has no routing and no back-off other than that required for the link. Since GbE and higher speeds are all implemented as switched fabrics, the ability of the switch to manage contention is paramount. You can observe this on a Solaris system by looking at the NIC flow control kstats. For a LAN environment, there is little practical difference between Ethernet and FC wrt port contention -- high quality switches will prove better than bargain-basement switches, with direct attach (no switches) being the optimum cost+performance solution. WANs are a different beast, and is where we find tuning the FC buffer credits to be worth the effort. For WANs no tuning is required for IP on modern OSes (Ethernet doesn''t do WAN).> I would try both and compare, if you can > afford it!+1 -- richard -- ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/
On Mon, 2010-06-07 at 13:32 -0700, Richard Elling wrote:> On Jun 7, 2010, at 11:06 AM, Miles Nordin wrote: > > > > the other difference is in the latest comstar which runs in > > sync-everything mode by default, AIUI. Or it does use that mode only > > when zvol-backed? Or something. > > It depends on your definition of "latest." The latest OpenSolaris release > is 2009.06 which treats all Zvol-backed COMSTAR iSCSI writes as > sync. This was changed in the developer releases in summer 2009, b114. > For a release such as NexentaStor 3.0.2, which is based on b140 (+/-), > the initiator''s write cache enable/disable request is respected, by default. >Minor Correction: NexentaStor 3.0.2 is based on 134, plus a "backport" of a number of selected patches from OpenSolaris -- especially ZFS patches. -- Garrett
On Jun 7, 2010, at 2:10 AM, Erik Trimble <erik.trimble at oracle.com> wrote:> Comments in-line. > > > On 6/6/2010 9:16 PM, Ken wrote: >> >> I''m looking at VMWare, ESXi 4, but I''ll take any advice offered. >> >> On Sun, Jun 6, 2010 at 19:40, Erik Trimble >> <erik.trimble at oracle.com> wrote: >> On 6/6/2010 6:22 PM, Ken wrote: >>> >>> Hi, >>> >>> I''m looking to build a virtualized web hosting server environment >>> accessing files on a hybrid storage SAN. I was looking at using >>> the Sun X-Fire x4540 with the following configuration: >>> 6 RAID-Z vdevs with one hot spare each (all 500GB 7200RPM SATA >>> drives) >>> 2 Intel X-25 32GB SSD''s as a mirrored ZIL >>> 4 Intel X-25 64GB SSD''s as the L2ARC. >>> De-duplification >>> LZJB compression >>> The clients will be Apache web hosts serving hundreds of domains. >>> >>> I have the following questions: >>> Should I use NFS with all five VM''s accessing the exports, or one >>> LUN for each VM, accessed over iSCSI? >> > Generally speaking, it depends on your comfort level with running > iSCSI Volumes to put the VMs in, or serving everything out via NFS > (hosting the VM disk file in an NFS filesystem). > > If you go the iSCSI route, I would definitely go the "one iSCSI > volume per VM" route - note that you can create multiple zvols per > zpool on the X4540, so it''s not limiting in any way to volume-ize a > VM. It''s a lot simpler, easier, and allows for nicer management > (snapshots/cloning/etc. on the X4540 side) if you go with a VM per > iSCSI volume. > > With NFS-hosted VM disks, do the same thing: create a single > filesystem on the X4540 for each VM.Vmware has a 32 mount limit which may limit the OP somewhat here.> Performance-wise, I''d have to test, but I /think/ the iSCSI route > will be faster. Even with the ZIL SSDs.Actually properly tuned they are about the same, but VMware NFS datastores are FSYNC on all operations which isn''t the best for data vmdk files, best to serve the data directly to the VM using either iSCSI or NFS.> >> >>> Are the FSYNC speed issues with NFS resolved? >> > The ZIL SSDs will compensate for synchronous write issues in NFS. > Not completely eliminate them, but you shouldn''t notice issues with > sync writing until you''re up at pretty heavy loads.You will need this with VMware as every NFS operation (not just file open/close) coming out of VMware will be marked FSYNC (for VM data integrity in the face of server failure).> >> >>> >>> >> > If it were me (and, given what little I know of your data), I''d go > like this: > > (1) pool for VMs: > 8 disks, MIRRORED > 1 SSD for L2ARC > one Zvol per VM instance, served via iSCSI, each with: > DD turned ON, Compression turned OFF > > (1) pool for clients to write data to (log files, incoming data, etc.) > 6 or 8 disks, MIRRORED > 2 SSDs for ZIL, mirrored > Ideally, As many filesystems as you have webSITES, not just > client VMs. As this might be unwieldy for 100s of websites, you > should segregate them into obvious groupings, taking care with write/ > read permissions. > NFS served > DD OFF, Compression ON (or OFF, if you seem to be > having CPU overload on the X4540) > > (1) pool for client read-only data > All the rest of the disks, split into 7 or 8-disk RAIDZ2 vdevs > All the remaining SSDs for L2ARC > As many filesystems as you have webSITES, not just client > VMs. (however, see above) > NFS served > DD on for selected websites (filesystems), > Compression ON for everything > > (2) Global hot spares.Make your life easy and use NFS for VMs and data. If you need high performance data such as databases, use iSCSI zvols directly into the VM, otherwise NFS/CIFS into the VM should be good enough. -Ross -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100607/1e5529f4/attachment.html>
Everyone, thank you for the comments, you''ve given me lots of great info to research further. On Mon, Jun 7, 2010 at 15:57, Ross Walker <rswwalker at gmail.com> wrote:> On Jun 7, 2010, at 2:10 AM, Erik Trimble <erik.trimble at oracle.com> wrote: > > Comments in-line. > > > On 6/6/2010 9:16 PM, Ken wrote: > > I''m looking at VMWare, ESXi 4, but I''ll take any advice offered. > > On Sun, Jun 6, 2010 at 19:40, Erik Trimble < <erik.trimble at oracle.com> > erik.trimble at oracle.com> wrote: > >> On 6/6/2010 6:22 PM, Ken wrote: >> >> Hi, >> >> I''m looking to build a virtualized web hosting server environment >> accessing files on a hybrid storage SAN. I was looking at using the Sun >> X-Fire x4540 with the following configuration: >> >> - 6 RAID-Z vdevs with one hot spare each (all 500GB 7200RPM SATA >> drives) >> - 2 Intel X-25 32GB SSD''s as a mirrored ZIL >> - 4 Intel X-25 64GB SSD''s as the L2ARC. >> - De-duplification >> - LZJB compression >> >> The clients will be Apache web hosts serving hundreds of domains. >> >> I have the following questions: >> >> - Should I use NFS with all five VM''s accessing the exports, or one >> LUN for each VM, accessed over iSCSI? >> >> Generally speaking, it depends on your comfort level with running > iSCSI Volumes to put the VMs in, or serving everything out via NFS (hosting > the VM disk file in an NFS filesystem). > > If you go the iSCSI route, I would definitely go the "one iSCSI volume per > VM" route - note that you can create multiple zvols per zpool on the X4540, > so it''s not limiting in any way to volume-ize a VM. It''s a lot simpler, > easier, and allows for nicer management (snapshots/cloning/etc. on the X4540 > side) if you go with a VM per iSCSI volume. > > With NFS-hosted VM disks, do the same thing: create a single filesystem on > the X4540 for each VM. > > > Vmware has a 32 mount limit which may limit the OP somewhat here. > > > Performance-wise, I''d have to test, but I /think/ the iSCSI route will be > faster. Even with the ZIL SSDs. > > > Actually properly tuned they are about the same, but VMware NFS datastores > are FSYNC on all operations which isn''t the best for data vmdk files, best > to serve the data directly to the VM using either iSCSI or NFS. > > >> - Are the FSYNC speed issues with NFS resolved? >> >> The ZIL SSDs will compensate for synchronous write issues in NFS. > Not completely eliminate them, but you shouldn''t notice issues with sync > writing until you''re up at pretty heavy loads. > > > You will need this with VMware as every NFS operation (not just file > open/close) coming out of VMware will be marked FSYNC (for VM data integrity > in the face of server failure). > > >> >> If it were me (and, given what little I know of your data), I''d go > like this: > > (1) pool for VMs: > 8 disks, MIRRORED > 1 SSD for L2ARC > one Zvol per VM instance, served via iSCSI, each with: > DD turned ON, Compression turned OFF > > (1) pool for clients to write data to (log files, incoming data, etc.) > 6 or 8 disks, MIRRORED > 2 SSDs for ZIL, mirrored > Ideally, As many filesystems as you have webSITES, not just client > VMs. As this might be unwieldy for 100s of websites, you should segregate > them into obvious groupings, taking care with write/read permissions. > NFS served > DD OFF, Compression ON (or OFF, if you seem to be having > CPU overload on the X4540) > > (1) pool for client read-only data > All the rest of the disks, split into 7 or 8-disk RAIDZ2 vdevs > All the remaining SSDs for L2ARC > As many filesystems as you have webSITES, not just client VMs. > (however, see above) > NFS served > DD on for selected websites (filesystems), Compression ON > for everything > > (2) Global hot spares. > > > Make your life easy and use NFS for VMs and data. If you need high > performance data such as databases, use iSCSI zvols directly into the VM, > otherwise NFS/CIFS into the VM should be good enough. > > -Ross > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100607/8be69557/attachment.html>
On Jun 7, 2010, at 16:32, Richard Elling wrote:> Please don''t confuse Ethernet with IP. Ethernet has no routing and > no back-off other than that required for the link.Not entirely accurate going forward. IEEE 802.1Qau defines an end-to- end congestion notification management system: http://blogs.netapp.com/ethernet/8021qau/ IEEE 802.1aq provides for a link state protocol for finding the topology of Ethernet network: http://en.wikipedia.org/wiki/Shortest_Path_Bridging See also the IETF''s Transparent Interconnection of Lots of Links (TRILL): http://tools.ietf.org/html/rfc5556 http://tools.ietf.org/wg/trill/ All of this is being done under the rubric of "data center bridging" (DCB): http://en.wikipedia.org/wiki/Data_center_bridging Brocade and IBM (?) call this Converged Enhanced Ethernet (CEE). Things aren''t what they used to was.
>>>>> "re" == Richard Elling <richard.elling at gmail.com> writes:re> Please don''t confuse Ethernet with IP. okay, but I''m not. seriously, if you''ll look into it. Did you misread where I said FC can exert back-pressure? I was contrasting with Ethernet. Ethernet output queues are either FIFO or RED, and are large compared to FC and IB. FC is buffer-credit, which HOL-blocks to prevent the small buffers from overflowing, and IB is...blocking (almost no buffer at all---about 2KB per port and bandwidth*delay product of about 1KB for the whole mesh, compared to ARISTA which has about 48MB per port, so except to pedantic IB is bufferless, ie it does not even buffer one full frame). Unlike Ethernet, both are lossless fabrics (sounds good) and have an HOL-blocking character (sounds bad). They''re fundamentally different at L2, so this is not about IP. If you run IP over IB, it is still blocking and lossless. It does not magically start buffering when you use IP because the fabric is simply unable to buffer---there is no RAM in the mesh anywhere. Both L2 and L3 switches have output queues, and both L3 and L2 output queues can be FIFO or RED because the output buffer exists in the same piece of silicon of an L3 switch no matter whether it''s set to forward in L2 or L3 mode, so L2 and L3 switches are like each other and unlike FC & IB. This is not about IP. It''s about Ethernet. a relevant congestion difference between L3 and L2 switches (confusing ethernet with IP) might be ECN, because only an L3 switch can do ECN. But I don''t think anyone actually uses ECN. It''s disabled by default in Solaris and, I think, all other Unixes. AFAICT my Extreme switches, a very old L3 flow-forwarding platform, are not able to flip the bit. I think 6500 can, but I''m not certain. re> no back-off other than that required for the link. Since re> GbE and higher speeds are all implemented as switched fabrics, re> the ability of the switch to manage contention is paramount. re> You can observe this on a Solaris system by looking at the NIC re> flow control kstats. You''re really confused, though I''m sure you''re going to deny it. Ethernet flow control mostly isn''t used at all, and it is never used to manage output queue congestion except in hardware that everyone agrees is defective. I almost feel like I''ve written all this stuff already, even the part about ECN. Ethernet flow control is never correctly used to signal output queue congestion. The ethernet signal for congestion is a dropped packet. flow control / PAUSE frames are *not* part of some magic mesh-wide mechanism by which switches ``manage'''' congestion. PAUSE are used, when they''re used at all, for oversubscribed backplanes: for congestion on *input*, which in Ethernet is something you want to avoid. You want to switch ethernet frames to the output port where it may or may not encounter congestion so that you don''t hold up input frames headed toward other output ports. If you did hold them up, you''d have something like HOL blocking. IB takes a different approach: you simply accept the HOL blocking, but tend to design a mesh with little or no oversubscription unlike ethernet LAN''s which are heavily oversubscribed on their trunk ports. so...the HOL blocking happens, but not as much as it would with a typical Ethernet topology, and it happens in a way that in practice probably increases the performance of storage networks. This is interesting for storage because when you try to shove a 128kByte write into an Ethernet fabric, part of it may get dropped in an output queue somewhere along the way. In IB, never will part of the write get dropped, but sometimes you can''t shove it into the network---it just won''t go, at L2. With Ethernet you rely on TCP to emulate this can''t-shove-in condition, and it does not work perfectly in that it can introduce huge jitter and link underuse (``incast'''' problem: http://www.pdl.cmu.edu/PDL-FTP/Storage/FASTIncast.pdf ), and secondly leave many kilobytes in transit within the mesh or TCP buffers, like tens of megabytes and milliseconds per hop, requiring large TCP buffers on both ends to match the bandwidth*jitter and frustrating storage QoS by queueing commands on the link instead of in the storage device, but in exchange you get from Ethernet no HOL blocking and the possibility of end-to-end network QoS. It is a fair tradeoff but arguably the wrong one for storage based on experience with iSCSI sucking so far. But the point is, looking at those ``flow control'''' kstats will only warn you if your switches are shit, and shit in one particular way that even cheap switches rarely are. The metric that''s relevant is how many packets are being dropped, and in what pattern (a big bucket of them at once like FIFO, or a scattering like RED), and how TCP is adapting to these drops. For this you might look at TCP stats in solaris, or at output queue drop and output queue size stats on managed switches, or simply at the overall bandwidth, the ``goodput'''' in the incast paper. The flow control kstats will never be activated by normal congestion, unless you have some $20 gamer switch that is misdesigned: http://www.networkworld.com/netresources/0913flow2.html http://www.smallnetbuilder.com/content/view/30212/54/ http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html I said PAUSE frames are mostly never used, but Cisco''s Nexus FCoE supposedly does send pause frames within a CoS when it has a link partner who wants to play its Cisco-FCoE game, so the PAUSE apply to that CoS and not to the whole link, and these have a completely different purpose unrelated to the original pause frames. I''m speculating from limited information because I''m not interested in Nexus and have not read much about it much less have any. Cisco has a lot of slick talk about them that makes it sound like you''re getting the best of every buzzword, but AIUI the point is to create a lossless low-jitter HOL-blocking VLAN for storage only, so that storage traffic can be transmitted without eating huge amounts of switch output buffer and without provoking TCP{,-like protocols} with congestion-signal packet drops, while at the same time running other non-storage vlan''s in lossful, non-HOL-blocking mode where nothing blocks on input and the fabric signals congestion by dropping packets from output queues and color-marking diffserv-style QoS is possible, like most TCP app developers are accustomed to. I know some FCoE stuff got checked into Solaris, but I don''t think FCoE support necessarily implies Nexus CoS-PAUSE support so I don''t know if Solaris even supports this type of weird pause frame. I do think it would need to support these frames for FCoE to work well because otherwise you just push the incast problem out to the edge, to the first switch facing the packet source. Anyway FCoE''s not on the table for any of this discussion so far. I only mention it so you won''t try to make my whole post sound wrong by mentioning some pedantic nit-picky detail. re> The latest OpenSolaris release is 2009.06 which treats all re> Zvol-backed COMSTAR iSCSI writes as sync. This was changed in re> the developer releases in summer 2009, b114. For a release re> such as NexentaStor 3.0.2, which is based on b140 (+/-), the re> initiator''s write cache enable/disable request is respected, re> by default. that helps a little, but it''s far from a full enough picture to be useful to anyone IMHO. In fact it''s pretty close to ``it varies and is confusing'''' which I already knew: * how do I control the write cache from the initiator? though I think I already know the answer: ``it depends on which initiator,'''' and ``oh, you''re using that one? well i don''t know how to do it with THAT initiator'''' == YOU DON''T * when the setting has been controlled, how long does it persist? Where can it be inspected? * ``by default'''' == there is a way to make it not respect the initiator''s setting, and through a target shell command cause it to use one setting or the other, persistently? * is the behavior different for file-backed LUN''s than zvol''s? I guess there is less point to figuring this out until the behavior is settled. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100608/8f1982da/attachment.bin>
On Tue, 8 Jun 2010, Miles Nordin wrote:>>>>>> "re" == Richard Elling <richard.elling at gmail.com> writes: > > re> Please don''t confuse Ethernet with IP. > > okay, but I''m not. seriously, if you''ll look into it. > > Did you misread where I said FC can exert back-pressure? I was > contrasting with Ethernet. > > You''re really confused, though I''m sure you''re going to deny it.I don''t think so. I think that it is time to reset and reboot yourself on the technology curve. FC semantics have been ported onto ethernet. This is not your grandmother''s ethernet but it is capable of supporting both FCoE and normal IP traffic. The FCoE gets per-stream QOS similar to what you are used to from Fibre Channel. Quite naturally, you get to pay a lot more for the new equipment and you have the opportunity to discard the equipment you bought already. Richard is not out in the weeds although there are probably plenty of weeds growing at the ranch. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On 6/8/2010 6:33 PM, Bob Friesenhahn wrote:> On Tue, 8 Jun 2010, Miles Nordin wrote: > >>>>>>> "re" == Richard Elling <richard.elling at gmail.com> writes: >> >> re> Please don''t confuse Ethernet with IP. >> >> okay, but I''m not. seriously, if you''ll look into it. >> >> Did you misread where I said FC can exert back-pressure? I was >> contrasting with Ethernet. >> >> You''re really confused, though I''m sure you''re going to deny it. > > I don''t think so. I think that it is time to reset and reboot > yourself on the technology curve. FC semantics have been ported onto > ethernet. This is not your grandmother''s ethernet but it is capable > of supporting both FCoE and normal IP traffic. The FCoE gets > per-stream QOS similar to what you are used to from Fibre Channel. > Quite naturally, you get to pay a lot more for the new equipment and > you have the opportunity to discard the equipment you bought already. > > Richard is not out in the weeds although there are probably plenty of > weeds growing at the ranch. > > BobWell, you saying we might want to put certain folks out to pasture? <wink> That said, I had a good look at FCoE about a year ago, and, unlike ATAoE which effectively ran over standard managed or smart switched, FCoE required specialized switch hardware that was non-trivially expensive. That said, it did seem to be a mature protocol implementation, so it was a viable option once the hardware price came down (and we had wider, better software implementations). Also, FCoE really doesn''t seem to play well with regular IP on the same link, so you really should dedicate a link (not necessarily a switch) to FCoE, and pipe your IP traffic via another link. It is NOT iSCSI. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
On Tue, Jun 08, 2010 at 08:33:40PM -0500, Bob Friesenhahn wrote:> On Tue, 8 Jun 2010, Miles Nordin wrote: > >>>>>>> "re" == Richard Elling <richard.elling at gmail.com> writes: >> >> re> Please don''t confuse Ethernet with IP. >> >> okay, but I''m not. seriously, if you''ll look into it. >> >> Did you misread where I said FC can exert back-pressure? I was >> contrasting with Ethernet. >> >> You''re really confused, though I''m sure you''re going to deny it. > > I don''t think so. I think that it is time to reset and reboot yourself > on the technology curve. FC semantics have been ported onto ethernet. > This is not your grandmother''s ethernet but it is capable of supporting > both FCoE and normal IP traffic. The FCoE gets per-stream QOS similar to > what you are used to from Fibre Channel. Quite naturally, you get to pay > a lot more for the new equipment and you have the opportunity to discard > the equipment you bought already. >Yeah, today enterprise iSCSI vendors like Equallogic (bought by Dell) _recommend_ using flow control. Their iSCSI storage arrays are designed to work properly with flow control and perform well. Of course you need a proper ("certified") switches aswell. Equallogic says the delays from flow control pause frames are shorter than tcp retransmits, so that''s why they''re using and recommending it. -- Pasi
>>>>> "pk" == Pasi K?rkk?inen <pasik at iki.fi> writes:>>> You''re really confused, though I''m sure you''re going to deny >>> it. >> I don''t think so. I think that it is time to reset and reboot >> yourself on the technology curve. FC semantics have been >> ported onto ethernet. This is not your grandmother''s ethernet >> but it is capable of supporting both FCoE and normal IP >> traffic. The FCoE gets per-stream QOS similar to what you are >> used to from Fibre Channel. FCoE != iSCSI. FCoE was not being discussed in the part you''re trying to contradict. If you read my entire post, I talk about FCoE at the end and say more or less ``I am talking about FCoE here only so you don''t try to throw out my entire post by latching onto some corner case not applying to the OP by dragging FCoE into the mix'''' which is exactly what you did. I''m guessing you fired off a reply without reading the whole thing? pk> Yeah, today enterprise iSCSI vendors like Equallogic (bought pk> by Dell) _recommend_ using flow control. Their iSCSI storage pk> arrays are designed to work properly with flow control and pk> perform well. pk> Of course you need a proper ("certified") switches aswell. pk> Equallogic says the delays from flow control pause frames are pk> shorter than tcp retransmits, so that''s why they''re using and pk> recommending it. please have a look at the three links I posted about flow control not being used the way you think it is by any serious switch vendor, and the explanation of why this limitation is fundamental, not something that can be overcome by ``technology curve.'''' It will not hurt anything to allow autonegotiation of flow control on non-broken switches so I''m not surprised they recommend it with ``certified'''' known-non-broken switches, but it also will not help unless your switches have input/backplane congestion which they usually don''t, or your end host is able to generate PAUSE frames for PCIe congestion which is maybe more plausible. In particular it won''t help with the typical case of the ``incast'''' problem in the experiment in the FAST incast paper URL I gave, because they narrowed down what was happening in their experiment to OUTPUT queue congestion, which (***MODULO FCoE*** mr ``reboot yourself on the technology curve'''') never invokes ethernet flow control. HTH. ok let me try again: yes, I agree it would not be stupid to run iSCSI+TCP over a CoS with blocking storage-friendly buffer semantics if your FCoE/CEE switches can manage that, but I would like to hear of someone actually DOING it before we drag it into the discussion. I don''t think that''s happening in the wild so far, and it''s definitely not the application for which these products have been flogged. I know people run iSCSI over IB (possibly with RDMA for moving the bulk data rather than TCP), and I know people run SCSI over FC, and of course SCSI (not iSCSI) over FCoE. Remember the original assertion was: please try FC as well as iSCSI if you can afford it. Are you guys really saying you believe people are running ***iSCSI*** over the separate HOL-blocking hop-by-hop pause frame CoS''s of FCoE meshes? or are you just spewing a bunch of noxious white paper vapours at me? because AIUI people using the lossless/small-output-buffer channel of FCoE are running the FC protocol over that ``virtual channel'''' of the mesh, not iSCSI, are they not? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100611/57a23945/attachment.bin>
On Fri, Jun 11, 2010 at 03:30:26PM -0400, Miles Nordin wrote:> >>>>> "pk" == Pasi K?rkk?inen <pasik at iki.fi> writes: > > >>> You''re really confused, though I''m sure you''re going to deny > >>> it. > > >> I don''t think so. I think that it is time to reset and reboot > >> yourself on the technology curve. FC semantics have been > >> ported onto ethernet. This is not your grandmother''s ethernet > >> but it is capable of supporting both FCoE and normal IP > >> traffic. The FCoE gets per-stream QOS similar to what you are > >> used to from Fibre Channel. > > FCoE != iSCSI. > > FCoE was not being discussed in the part you''re trying to contradict. > If you read my entire post, I talk about FCoE at the end and say more > or less ``I am talking about FCoE here only so you don''t try to throw > out my entire post by latching onto some corner case not applying to > the OP by dragging FCoE into the mix'''' which is exactly what you did. > I''m guessing you fired off a reply without reading the whole thing? > > pk> Yeah, today enterprise iSCSI vendors like Equallogic (bought > pk> by Dell) _recommend_ using flow control. Their iSCSI storage > pk> arrays are designed to work properly with flow control and > pk> perform well. > > pk> Of course you need a proper ("certified") switches aswell. > > pk> Equallogic says the delays from flow control pause frames are > pk> shorter than tcp retransmits, so that''s why they''re using and > pk> recommending it. > > please have a look at the three links I posted about flow control not > being used the way you think it is by any serious switch vendor, and > the explanation of why this limitation is fundamental, not something > that can be overcome by ``technology curve.'''' It will not hurt > anything to allow autonegotiation of flow control on non-broken > switches so I''m not surprised they recommend it with ``certified'''' > known-non-broken switches, but it also will not help unless your > switches have input/backplane congestion which they usually don''t, or > your end host is able to generate PAUSE frames for PCIe congestion > which is maybe more plausible. In particular it won''t help with the > typical case of the ``incast'''' problem in the experiment in the FAST > incast paper URL I gave, because they narrowed down what was happening > in their experiment to OUTPUT queue congestion, which (***MODULO > FCoE*** mr ``reboot yourself on the technology curve'''') never invokes > ethernet flow control. > > HTH. > > ok let me try again: > > yes, I agree it would not be stupid to run iSCSI+TCP over a CoS with > blocking storage-friendly buffer semantics if your FCoE/CEE switches > can manage that, but I would like to hear of someone actually DOING it > before we drag it into the discussion. I don''t think that''s happening > in the wild so far, and it''s definitely not the application for which > these products have been flogged. > > I know people run iSCSI over IB (possibly with RDMA for moving the > bulk data rather than TCP), and I know people run SCSI over FC, and of > course SCSI (not iSCSI) over FCoE. Remember the original assertion > was: please try FC as well as iSCSI if you can afford it. > > Are you guys really saying you believe people are running ***iSCSI*** > over the separate HOL-blocking hop-by-hop pause frame CoS''s of FCoE > meshes? or are you just spewing a bunch of noxious white paper > vapours at me? because AIUI people using the > lossless/small-output-buffer channel of FCoE are running the FC > protocol over that ``virtual channel'''' of the mesh, not iSCSI, are > they not?I was talking about iSCSI over TCP over IP over Ethernet. No FcOE. No IB. -- Pasi
On Fri, 11 Jun 2010, Miles Nordin wrote:> > FCoE != iSCSI. > > FCoE was not being discussed in the part you''re trying to contradict. > If you read my entire post, I talk about FCoE at the end and say more > or less ``I am talking about FCoE here only so you don''t try to throw > out my entire post by latching onto some corner case not applying to > the OP by dragging FCoE into the mix'''' which is exactly what you did. > I''m guessing you fired off a reply without reading the whole thing?I am deeply concerned that you are relying on your extensive experience with legacy ethernet technologies and have not done any research on modern technologies. Entering "FCoE" into Google resulted in many useful hits which describe technologies which are ethernet but more advanced than the "ethernet" you generalized in your lengthy text. For example http://www.fcoe.com/ http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-462176.html http://www.brocade.com/products-solutions/solutions/connectivity/FCoE/index.page http://www.emulex.com/products/converged-network-adapters.html Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Jun 8, 2010, at 12:46 PM, Miles Nordin wrote:>>>>>> "re" == Richard Elling <richard.elling at gmail.com> writes: > > re> Please don''t confuse Ethernet with IP. > > okay, but I''m not. seriously, if you''ll look into it.[fine whine elided] I think we can agree that the perfect network has yet to be invented :-) Meanwhile, 6Gbps SAS switches are starting to hit the market... what fun :-)> re> The latest OpenSolaris release is 2009.06 which treats all > re> Zvol-backed COMSTAR iSCSI writes as sync. This was changed in > re> the developer releases in summer 2009, b114. For a release > re> such as NexentaStor 3.0.2, which is based on b140 (+/-), the > re> initiator''s write cache enable/disable request is respected, > re> by default. > > that helps a little, but it''s far from a full enough picture to be > useful to anyone IMHO. In fact it''s pretty close to ``it varies and > is confusing'''' which I already knew: > > * how do I control the write cache from the initiator? though I > think I already know the answer: ``it depends on which initiator,'''' > and ``oh, you''re using that one? well i don''t know how to do it > with THAT initiator'''' == YOU DON''TFor ZFS over a Solaris initiator, it is done with setting DKIOCSETWCE via an ioctl. Look on or near http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/vdev_disk.c#276 I presume that this can also be set with format -e, as is done for other devices. Has anyone else tried?> > * when the setting has been controlled, how long does it persist? > Where can it be inspected?RTFM stmfadm(1m) and look for "wcd" <small_rant> drives me nuts that some people prefer negatives (disables) over positives (enables) </small_rant>> > * ``by default'''' == there is a way to make it not respect the > initiator''s setting, and through a target shell command cause it to > use one setting or the other, persistently?See above.> * is the behavior different for file-backed LUN''s than zvol''s?Yes, it can be. It can also be modified by the sync property. See CR 6794730, need zvol support for DKIOCSETWCE and friends http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6794730> I guess there is less point to figuring this out until the behavior is > settled.I think it is settled, but perhaps not well documented :-( -- richard -- ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/