Hi, I am running Solaris 10 ZFS and I do not have STMS multipathing enables. I have dual FC connections to storage using two ports on an Emulex HBA. In the Solaris ZFS admin guide. It says that a ZFS file system monitors disks by their path and their device ID. If a disk is switched between controllers, ZFS will be able to pick up the disk on a secondary controller. I tested this theory by creating a zpool on the first controller and then I pulled the cable on the back of the server. the server took about 3-5 minutes to failover. But it did fail over!! My question is, can ZFS be configured to detect path changes quicker? I would like to configure ZFS to failover within a reasonable amount of time, like 1-2 seconds vs. 1-5 minutes. Thanks, ljs This message posted from opensolaris.org
Jason J. W. Williams
2006-Dec-06 17:23 UTC
[zfs-discuss] ZFS failover without multipathing
Hi Luke, We''ve been using MPXIO (STMS) with ZFS quite solidly for the past few months. Failover is instantaneous when a write operations occurs after a path is pulled. Our environment is similar to yours, dual-FC ports on the host, and 4 FC ports on the storage (2 per controller). Depending on your gear using MPXIO is ridiculously simple. For us it was as simple as enabling it on our T2000, the Opteron boxes just came up. Best Regards, Jason On 12/6/06, Luke Schwab <lukeschwab at yahoo.com> wrote:> Hi, > > I am running Solaris 10 ZFS and I do not have STMS multipathing enables. I have dual FC connections to storage using two ports on an Emulex HBA. > > In the Solaris ZFS admin guide. It says that a ZFS file system monitors disks by their path and their device ID. If a disk is switched between controllers, ZFS will be able to pick up the disk on a secondary controller. > > I tested this theory by creating a zpool on the first controller and then I pulled the cable on the back of the server. the server took about 3-5 minutes to failover. But it did fail over!! > > My question is, can ZFS be configured to detect path changes quicker? I would like to configure ZFS to failover within a reasonable amount of time, like 1-2 seconds vs. 1-5 minutes. > > Thanks, > > ljs > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On 12/6/06, Jason J. W. Williams <jasonjwwilliams at gmail.com> wrote:> We''ve been using MPXIO (STMS) with ZFS quite solidly for the past few > months. Failover is instantaneous when a write operations occurs > after a path is pulled. Our environment is similar to yours, dual-FC > ports on the host, and 4 FC ports on the storage (2 per controller). > Depending on your gear using MPXIO is ridiculously simple. For us it > was as simple as enabling it on our T2000, the Opteron boxes just came > up.Jason, Could you tell me more about you configuration? Do you have multiple LUNs defined? Do you mirror/raidz these LUNs? -Doug
Jason J. W. Williams
2006-Dec-06 17:34 UTC
[zfs-discuss] ZFS failover without multipathing
Hi Doug, The configuration is a T2000 connected to a StorageTek FLX210 array via Qlogic QLA2342 HBAs and Brocade 3850 switches. We currently RAID-Z the LUNs across 3 array volume groups. For performance reasons we''re in the process of changing to striped zpools across RAID-1 volume groups. The performance issue is more a reflection on the array than ZFS. Though RAID-Z tends to be more chatty IOPS-wise than typical RAID-5. Overall, we''ve been VERY happy with ZFS. The scrub feature has saved a lot of time tracking down a corruption issue that cropped up in one of our databases. Helped prove it wasn''t ZFS or the storage. Does this help? Best Regards, Jason On 12/6/06, Douglas Denny <douglasdenny at gmail.com> wrote:> On 12/6/06, Jason J. W. Williams <jasonjwwilliams at gmail.com> wrote: > > We''ve been using MPXIO (STMS) with ZFS quite solidly for the past few > > months. Failover is instantaneous when a write operations occurs > > after a path is pulled. Our environment is similar to yours, dual-FC > > ports on the host, and 4 FC ports on the storage (2 per controller). > > Depending on your gear using MPXIO is ridiculously simple. For us it > > was as simple as enabling it on our T2000, the Opteron boxes just came > > up. > > Jason, > > Could you tell me more about you configuration? Do you have multiple > LUNs defined? Do you mirror/raidz these LUNs? > > -Doug >
On 12/6/06, Jason J. W. Williams <jasonjwwilliams at gmail.com> wrote:> The configuration is a T2000 connected to a StorageTek FLX210 array > via Qlogic QLA2342 HBAs and Brocade 3850 switches. We currently RAID-Z > the LUNs across 3 array volume groups. For performance reasons we''re > in the process of changing to striped zpools across RAID-1 volume > groups. The performance issue is more a reflection on the array than > ZFS. Though RAID-Z tends to be more chatty IOPS-wise than typical > RAID-5.Thanks Jason, Yes, this does help. I think you are doing all raid through ZFS. The disk array is being used a FC JBOD. Thanks! -Doug
Jason J. W. Williams
2006-Dec-06 18:11 UTC
[zfs-discuss] ZFS failover without multipathing
Hi Doug, Actually, our config is: 3 RAID-5 volume groups on the array (with 2 LUNs each). 1 RAID-Z zpool per physical host composed of one LUN from each of the 3 volume groups. This allows for loss of 3 drives in a worst case, and 4 drives in a best case. -J On 12/6/06, Douglas Denny <douglasdenny at gmail.com> wrote:> On 12/6/06, Jason J. W. Williams <jasonjwwilliams at gmail.com> wrote: > > The configuration is a T2000 connected to a StorageTek FLX210 array > > via Qlogic QLA2342 HBAs and Brocade 3850 switches. We currently RAID-Z > > the LUNs across 3 array volume groups. For performance reasons we''re > > in the process of changing to striped zpools across RAID-1 volume > > groups. The performance issue is more a reflection on the array than > > ZFS. Though RAID-Z tends to be more chatty IOPS-wise than typical > > RAID-5. > > Thanks Jason, > > Yes, this does help. I think you are doing all raid through ZFS. The > disk array is being used a FC JBOD. > > Thanks! > > -Doug >
Jason J. W. Williams
2006-Dec-06 18:55 UTC
[zfs-discuss] ZFS failover without multipathing
Hi Luke, That''s really strange. We did the exact same thing moving between two hosts (export/import) and it took maybe 10 secs. How big is your zpool? Best Regards, Jason On 12/6/06, Luke Schwab <lukeschwab at yahoo.com> wrote:> Doug, > > I should have posted the reason behind this posting. > > I have 2 v280''s in a clustered environment and I am > going to attempt to failover (migrate) the zpool > between the machines. I tried the following two > configurations: > > 1. I used ZFS with STMS(mpxio) enabled. I then > exported a zpool and imported it onto another machine. > The second machine took 6 minutes to import the zpool. > (Maybe I am configuring something wrong??) Do you use > exports/imports?? > > 2. In a second configuration I disabled STMS(mpxio) > and exported a zpool and imported it onto the other > machine again. The second machine then only took 50 > seconds to import the zpool. > > When dealing with clusters, we have a 5 minute > failover requirement on the entire cluster to move. > Therefore, it would be ideal to not have STMS(mpxio) > enabled on the machines. > > Luke Schwab > > > > --- "Jason J. W. Williams" <jasonjwwilliams at gmail.com> > wrote: > > > Hi Doug, > > > > The configuration is a T2000 connected to a > > StorageTek FLX210 array > > via Qlogic QLA2342 HBAs and Brocade 3850 switches. > > We currently RAID-Z > > the LUNs across 3 array volume groups. For > > performance reasons we''re > > in the process of changing to striped zpools across > > RAID-1 volume > > groups. The performance issue is more a reflection > > on the array than > > ZFS. Though RAID-Z tends to be more chatty IOPS-wise > > than typical > > RAID-5. > > > > Overall, we''ve been VERY happy with ZFS. The scrub > > feature has saved a > > lot of time tracking down a corruption issue that > > cropped up in one of > > our databases. Helped prove it wasn''t ZFS or the > > storage. > > > > Does this help? > > > > Best Regards, > > Jason > > > > On 12/6/06, Douglas Denny <douglasdenny at gmail.com> > > wrote: > > > On 12/6/06, Jason J. W. Williams > > <jasonjwwilliams at gmail.com> wrote: > > > > We''ve been using MPXIO (STMS) with ZFS quite > > solidly for the past few > > > > months. Failover is instantaneous when a write > > operations occurs > > > > after a path is pulled. Our environment is > > similar to yours, dual-FC > > > > ports on the host, and 4 FC ports on the storage > > (2 per controller). > > > > Depending on your gear using MPXIO is > > ridiculously simple. For us it > > > > was as simple as enabling it on our T2000, the > > Opteron boxes just came > > > > up. > > > > > > Jason, > > > > > > Could you tell me more about you configuration? Do > > you have multiple > > > LUNs defined? Do you mirror/raidz these LUNs? > > > > > > -Doug > > > > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com >
I simply created a zpool with an array disk like hosta# zpool created testpool c6t<num>d0 //runs within a second hosta# zpool export testpool // runs within a second hostb# zpool import testpool // takes 5-7 minutes If STMS(mpxio) is disabled, it takes from 45-60 seconds. I tested this with LUNs of size 10GB and 100MB. I got simular results on both LUNs. However, I am not LUN masking and when I run a format command I can see all of the luns on the array (about 40 of them). and all together they are about 1TB in size. Maybe the problem is that there are many paths/luns to check when importing the zpool. but why do I get fater times when I disable STMS(mpxio)?? It is strange, I may try my testing on another array that has only a few luns and see what happens. Or enable LUN masking. This might help also??? Any thoughts. This message posted from opensolaris.org
James C. McPherson
2006-Dec-06 22:46 UTC
[zfs-discuss] Re: ZFS failover without multipathing
Luke Schwab wrote:> I simply created a zpool with an array disk like > > hosta# zpool created testpool c6t<num>d0 //runs within a second hosta# > zpool export testpool // runs within a second hostb# zpool import > testpool // takes 5-7 minutes > > If STMS(mpxio) is disabled, it takes from 45-60 seconds. I tested this > with LUNs of size 10GB and 100MB. I got simular results on both LUNs. > > However, I am not LUN masking and when I run a format command I can see > all of the luns on the array (about 40 of them). and all together they > are about 1TB in size. > > Maybe the problem is that there are many paths/luns to check when > importing the zpool. but why do I get fater times when I disable > STMS(mpxio)?? > > It is strange, I may try my testing on another array that has only a few > luns and see what happens. Or enable LUN masking. This might help also??? > Any thoughts.First question to ask -- are you using the emlxs driver for the Emulex card? Second question -- are you up to date on the SAN Foundation Kit (SFK) patches? I think the current version is 4.4.11. If you''re not running that version, I strongly recommend that you upgrade your patch levels to it. Ditto for kernel, sd and scsi_vhci. James C. McPherson -- Solaris kernel software engineer Sun Microsystems
Luke Schwab wrote:>Hi, > >I am running Solaris 10 ZFS and I do not have STMS multipathing enables. I have dual FC connections to storage using two ports on an Emulex HBA. > >In the Solaris ZFS admin guide. It says that a ZFS file system monitors disks by their path and their device ID. If a disk is switched between controllers, ZFS will be able to pick up the disk on a secondary controller. > >I tested this theory by creating a zpool on the first controller and then I pulled the cable on the back of the server. the server took about 3-5 minutes to failover. But it did fail over!! > > >By default, the [s]sd driver will retry [3]5 times with a timeout of 60 seconds. STMS understands the lower-level FC stuff, and can make better decisions, faster. -- richard
Jason, I am no longer looking at not using STMS multipathing because without STMS you loose the binding to the array and I loose all transmissions between the server and array. The binding does come back after a few minutes but this is not acceptable in our environment. Load times vary depending on my configuration. Senario 1: No STMS: Really fast zpool create and zpool import/export. Less then 1 second for create/export and 5-15 seconds for an import. Senario 2:STMS(mpxio)enabled and no blacks being used to LUN masking: zpool create takes 5-15 seconds, zpool imports take from 5-7 minutes. Senario 3: STMS enabled and blacklists enabled via /kernel/drv/fp.conf: It look at least 15 minutes to do a "zpool create" before I finially stopped it. This does not appear to be a viable solution. If you have any ideas about how to improve performance I am all ears. I''m not sure why ZFS takes so long to create pools with STMS? Does anyone have problems using LSI arrays. I already had problems using my LSI HBA with ZFS because the LSI HBA does not work with the Leadville stack. R/ ljs> Hi Luke, > > That''s terrific! > > You know you might be able to tell ZFS which disks > to look at. I''m not > sure. It would be interesting, if anyone with a > Thumper could comment > on whether or not they see the import time issue. > What are your load > times now with MPXIO? > > Best Regards, > Jason >This message posted from opensolaris.org
Jason J. W. Williams
2006-Dec-08 02:48 UTC
[zfs-discuss] Re: ZFS failover without multipathing
Hi Luke, I wonder if it is the HBA. We had issues with Solaris and LSI HBAs back when we were using an Xserve RAID. Haven''t had any of the issues you''re describing between our LSI array and the Qlogic HBAs we''re using now. If you have another type of HBA I''d try it. MPXIO and ZFS haven''t ever caused what you''re seeing for us. -J On 12/7/06, Luke Schwab <lukeschwab at yahoo.com> wrote:> Jason, > I am no longer looking at not using STMS multipathing because without STMS you loose the binding to the array and I loose all transmissions between the server and array. The binding does come back after a few minutes but this is not acceptable in our environment. > > Load times vary depending on my configuration. > > Senario 1: No STMS: Really fast zpool create and zpool import/export. Less then 1 second for create/export and 5-15 seconds for an import. > > Senario 2:STMS(mpxio)enabled and no blacks being used to LUN masking: zpool create takes 5-15 seconds, zpool imports take from 5-7 minutes. > > Senario 3: STMS enabled and blacklists enabled via /kernel/drv/fp.conf: It look at least 15 minutes to do a "zpool create" before I finially stopped it. This does not appear to be a viable solution. > > If you have any ideas about how to improve performance I am all ears. I''m not sure why ZFS takes so long to create pools with STMS? > > Does anyone have problems using LSI arrays. I already had problems using my LSI HBA with ZFS because the LSI HBA does not work with the Leadville stack. > > R/ ljs > > > > Hi Luke, > > > > That''s terrific! > > > > You know you might be able to tell ZFS which disks > > to look at. I''m not > > sure. It would be interesting, if anyone with a > > Thumper could comment > > on whether or not they see the import time issue. > > What are your load > > times now with MPXIO? > > > > Best Regards, > > Jason > > > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Hey, #First question to ask -- are you using the emlxs driver for #the Emulex card? Im using what I believe is the latest version of SFS. I got it from a link on the Emulex website. to http://www.sun.com/download/products.xml?id=42c4317d #Second question -- are you up to date on the SAN Foundation #Kit (SFK) patches? I think the current version is 4.4.11. If #you''re not running that version, I strongly recommend that #you upgrade your patch levels to it. Ditto for kernel, sd #and scsi_vhci. I downloaded all of the patches and the latest SFS from sun. Here is a list of patches I have recently downloaded and installed this week: 119130 120222 119470 119715 Also, since my last posting I have found that I get slow ''zpool create'' and ''zpool import'' times when I have mpxio enabled. Running truss duing the ''zpool import'' displays delays on system calls like pread() fstat(), stat64() and ioctl() calls. I have receiver similar test results, i.e. slow zpool creates/imports, on two differnet drivers now. The Emulex 10000DC 2GB HBA and the QLOGIC x6727a (SUN) HBA. Would it also help if I purchased the latest 4GB HBA from Sun. Maybe the Qlogic HBA doesn''t function well with the Leadville (scsi_vhci) driver. Luke This message posted from opensolaris.org