Hello all, Is there any sort of a "Global Hot Spare" feature in ZFS, i.e. that one sufficiently-sized spare HDD would automatically be pulled into any faulted pool on the system? So far I saw adding hot spares dedicated to a certain pool. And perhaps scripts to detach-attach hotspares between pools when bad things happen - but that takes some latency (parsing logs or cronjobs with "zpool status" output - if that does not hang on problems, etc.) Are there any kernel-side implementations? Thanks, //Jim Klimov
On Jun 14, 2011, at 5:18 AM, Jim Klimov wrote:> Hello all, > > Is there any sort of a "Global Hot Spare" feature in ZFS, > i.e. that one sufficiently-sized spare HDD would automatically > be pulled into any faulted pool on the system?Yes. See the ZFS Admin Guide section on Designating Hot Spares inYour Storage Pool. -- richard> > So far I saw adding hot spares dedicated to a certain pool. > And perhaps scripts to detach-attach hotspares between pools > when bad things happen - but that takes some latency (parsing > logs or cronjobs with "zpool status" output - if that does > not hang on problems, etc.) > > Are there any kernel-side implementations? > > Thanks, > //Jim Klimov > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
2011-06-14 19:23, Richard Elling ?????:> On Jun 14, 2011, at 5:18 AM, Jim Klimov wrote: > >> Hello all, >> >> Is there any sort of a "Global Hot Spare" feature in ZFS, >> i.e. that one sufficiently-sized spare HDD would automatically >> be pulled into any faulted pool on the system? > Yes. See the ZFS Admin Guide section on Designating Hot Spares inYour Storage Pool. > -- richardYes, thatnk. I''ve read that, but the guide and its examples only refer to hot spares for a certain single pool, except in the starting phrase: "The hot spares feature enables you to identify disks that could be used to replace a failed or faulted device in one or more storage pools. " Further on in the examples, hot spares are added to specific pools, etc. Can the same spare be added to several pools? So far it is a theoretical question - I don''t have a box with several pools and a spare disk in place at once, which I''d want to sacrifice to uncertain experiments ;) //Jim
On Jun 14, 2011, at 10:38 AM, Jim Klimov wrote:> 2011-06-14 19:23, Richard Elling ?????: >> On Jun 14, 2011, at 5:18 AM, Jim Klimov wrote: >> >>> Hello all, >>> >>> Is there any sort of a "Global Hot Spare" feature in ZFS, >>> i.e. that one sufficiently-sized spare HDD would automatically >>> be pulled into any faulted pool on the system? >> Yes. See the ZFS Admin Guide section on Designating Hot Spares inYour Storage Pool. >> -- richard > Yes, thatnk. I''ve read that, but the guide and its examples > only refer to hot spares for a certain single pool, except > in the starting phrase: "The hot spares feature enables you > to identify disks that could be used to replace a failed or > faulted device in one or more storage pools. " > > Further on in the examples, hot spares are added to specific > pools, etc. > > Can the same spare be added to several pools?Yes> > So far it is a theoretical question - I don''t have a box with > several pools and a spare disk in place at once, which I''d > want to sacrifice to uncertain experiments ;)ZFS won''t let you intentionally compromise a pool without warning. Now, as to whether a hot spare for multiple pools is a good thing, that depends on many factors. I find for most systems I see today, warm spares are superior to hot spares. -- richard
What is the difference between warm spares and hot spares?
I assume the history is stored in the meta data. Is it possible to configure how long/much history can be stored/displayed? I know it is doable via external/additional automation like porting to a database. Thanks. Fred
Hi, Anyone who is successfully poll the zpool/zfs properties thrun SNMP? Thanks. Fred
On Jun 14, 2011, at 2:36 PM, Fred Liu wrote:> What is the difference between warm spares and hot spares?Warm spares are connected and powered. Hot spares are connected, powered, and automatically brought online to replace a "failed" disk. The reason I''m leaning towards warm spares is because I see more replacements than "failed" disks... a bad thing. -- richard
> -----Original Message----- > From: Richard Elling [mailto:richard.elling at gmail.com] > Sent: ???, ?? 15, 2011 11:59 > To: Fred Liu > Cc: Jim Klimov; zfs-discuss at opensolaris.org > Subject: Re: [zfs-discuss] zfs global hot spares? > > On Jun 14, 2011, at 2:36 PM, Fred Liu wrote: > > > What is the difference between warm spares and hot spares? > > Warm spares are connected and powered. Hot spares are connected, > powered, and automatically brought online to replace a "failed" disk. > The reason I''m leaning towards warm spares is because I see more > replacements than "failed" disks... a bad thing. > -- richard >You mean so-called "failed" disks replaced by hot spares are not really physically damaged? Do I misunderstand? Thanks. Fred
On Jun 14, 2011, at 10:31 PM, Fred Liu wrote:> >> -----Original Message----- >> From: Richard Elling [mailto:richard.elling at gmail.com] >> Sent: ???, ?? 15, 2011 11:59 >> To: Fred Liu >> Cc: Jim Klimov; zfs-discuss at opensolaris.org >> Subject: Re: [zfs-discuss] zfs global hot spares? >> >> On Jun 14, 2011, at 2:36 PM, Fred Liu wrote: >> >>> What is the difference between warm spares and hot spares? >> >> Warm spares are connected and powered. Hot spares are connected, >> powered, and automatically brought online to replace a "failed" disk. >> The reason I''m leaning towards warm spares is because I see more >> replacements than "failed" disks... a bad thing. >> -- richard >> > > You mean so-called "failed" disks replaced by hot spares are not really > physically damaged? Do I misunderstand?That is not how I would phrase it, let''s try: assuming the disk is failed because you can''t access it or it returns bad data is a bad assumption. -- richard
> -----Original Message----- > From: Richard Elling [mailto:richard.elling at gmail.com] > Sent: ???, ?? 15, 2011 14:25 > To: Fred Liu > Cc: Jim Klimov; zfs-discuss at opensolaris.org > Subject: Re: [zfs-discuss] zfs global hot spares? > > On Jun 14, 2011, at 10:31 PM, Fred Liu wrote: > > > >> -----Original Message----- > >> From: Richard Elling [mailto:richard.elling at gmail.com] > >> Sent: ???, ?? 15, 2011 11:59 > >> To: Fred Liu > >> Cc: Jim Klimov; zfs-discuss at opensolaris.org > >> Subject: Re: [zfs-discuss] zfs global hot spares? > >> > >> On Jun 14, 2011, at 2:36 PM, Fred Liu wrote: > >> > >>> What is the difference between warm spares and hot spares? > >> > >> Warm spares are connected and powered. Hot spares are connected, > >> powered, and automatically brought online to replace a "failed" disk. > >> The reason I''m leaning towards warm spares is because I see more > >> replacements than "failed" disks... a bad thing. > >> -- richard > >> > > > > You mean so-called "failed" disks replaced by hot spares are not > really > > physically damaged? Do I misunderstand? > > That is not how I would phrase it, let''s try: assuming the disk is > failed because > you can''t access it or it returns bad data is a bad assumption. > -- richard >Gotcha! But if there is a real failed disk, we have to do manual warm spare disk replacement. If the pool''s "failmode" is set to "wait", we experienced a NFS service time-out. It will interrupt NFS service. Thanks. Fred
On Jun 15, 2011, at 2:44 AM, Fred Liu wrote:>> -----Original Message----- >> From: Richard Elling [mailto:richard.elling at gmail.com] >> Sent: ???, ?? 15, 2011 14:25 >> To: Fred Liu >> Cc: Jim Klimov; zfs-discuss at opensolaris.org >> Subject: Re: [zfs-discuss] zfs global hot spares? >> >> On Jun 14, 2011, at 10:31 PM, Fred Liu wrote: >>> >>>> -----Original Message----- >>>> From: Richard Elling [mailto:richard.elling at gmail.com] >>>> Sent: ???, ?? 15, 2011 11:59 >>>> To: Fred Liu >>>> Cc: Jim Klimov; zfs-discuss at opensolaris.org >>>> Subject: Re: [zfs-discuss] zfs global hot spares? >>>> >>>> On Jun 14, 2011, at 2:36 PM, Fred Liu wrote: >>>> >>>>> What is the difference between warm spares and hot spares? >>>> >>>> Warm spares are connected and powered. Hot spares are connected, >>>> powered, and automatically brought online to replace a "failed" disk. >>>> The reason I''m leaning towards warm spares is because I see more >>>> replacements than "failed" disks... a bad thing. >>>> -- richard >>>> >>> >>> You mean so-called "failed" disks replaced by hot spares are not >> really >>> physically damaged? Do I misunderstand? >> >> That is not how I would phrase it, let''s try: assuming the disk is >> failed because >> you can''t access it or it returns bad data is a bad assumption. >> -- richard >> > > Gotcha! But if there is a real failed disk, we have to do manual warm spare disk replacement. > If the pool''s "failmode" is set to "wait", we experienced a NFS service time-out. It will interrupt > NFS service.This is only true if the pool is not protected. Please protect your pool with mirroring or raidz*. -- richard
> This is only true if the pool is not protected. Please protect your > pool with mirroring or raidz*. > -- richard >Yes. We use a raidz2 without any spares. In theory, with one disk broken, there should be no problem. But in reality, we saw NFS service interrupted: Jun 9 23:28:59 cn03 scsi_vhci: [ID 734749 kern.warning] WARNING: vhci_scsi_reset 0x1 Jun 9 23:28:59 cn03 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): Jun 9 23:28:59 cn03 Log info 0x31140000 received for target 11. Jun 9 23:28:59 cn03 scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc .... .... Truncating similar scsi error .... .... Jun 10 08:04:38 cn03 svc.startd[9]: [ID 122153 daemon.warning] svc:/network/nfs/server:default: Method or service exit timed out. Killing contract 71840. Jun 10 08:04:38 cn03 svc.startd[9]: [ID 636263 daemon.warning] svc:/network/nfs/server:default: Method "/lib/svc/method/nfs-server stop 105" failed due to signal KILL. .... .... Truncating scsi similar error .... .... Jun 10 09:04:38 cn03 svc.startd[9]: [ID 122153 daemon.warning] svc:/network/nfs/server:default: Method or service exit timed out. Killing contract 71855. Jun 10 09:04:38 cn03 svc.startd[9]: [ID 636263 daemon.warning] svc:/network/nfs/server:default: Method "/lib/svc/method/nfs-server stop 105" failed due to signal KILL. This is out of my original assumption when I designed this file box. But this NFS interruption may **NOT** be due to the degraded zpool although one broken disk is almost the only **obvious** event in the night. I will add a hot spare and enable autoreplace to see if it will happen again. Thanks. Fred
my point exactly, more below... On Jun 15, 2011, at 8:20 PM, Fred Liu wrote:>> This is only true if the pool is not protected. Please protect your >> pool with mirroring or raidz*. >> -- richard >> > > Yes. We use a raidz2 without any spares. In theory, with one disk broken, > there should be no problem. But in reality, we saw NFS service interrupted: > > Jun 9 23:28:59 cn03 scsi_vhci: [ID 734749 kern.warning] WARNING: vhci_scsi_reset 0x1 > Jun 9 23:28:59 cn03 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): > Jun 9 23:28:59 cn03 Log info 0x31140000 received for target 11. > Jun 9 23:28:59 cn03 scsi_status=0x0, ioc_status=0x8048, scsi_state=0xcThis message is from the disk saying that it aborted a command. These are usually preceded by a reset, as shown here. What caused the reset condition? Was it actually target 11 or did target 11 get caught up in the reset storm?> > .... > .... > Truncating similar scsi error > .... > .... > > > Jun 10 08:04:38 cn03 svc.startd[9]: [ID 122153 daemon.warning] svc:/network/nfs/server:default: Method or service exit timed out. Killing contract 71840. > Jun 10 08:04:38 cn03 svc.startd[9]: [ID 636263 daemon.warning] svc:/network/nfs/server:default: Method "/lib/svc/method/nfs-server stop 105" failed due to signal KILL. > > .... > .... > Truncating scsi similar error > .... > .... > > Jun 10 09:04:38 cn03 svc.startd[9]: [ID 122153 daemon.warning] svc:/network/nfs/server:default: Method or service exit timed out. Killing contract 71855. > Jun 10 09:04:38 cn03 svc.startd[9]: [ID 636263 daemon.warning] svc:/network/nfs/server:default: Method "/lib/svc/method/nfs-server stop 105" failed due to signal KILL. > > This is out of my original assumption when I designed this file box. > But this NFS interruption may **NOT** be due to the degraded zpool although one broken disk is almost the only **obvious** event in the night. > I will add a hot spare and enable autoreplace to see if it will happen again.Hot spare will not help you here. The problem is not constrained to one disk. In fact, a hot spare may be the worst thing here because it can kick in for the disk complaining about a clogged expander or spurious resets. This causes a resilver that reads from the actual broken disk, that causes more resets, that kicks out another disk that causes a resilver, and so on. -- richard
> This message is from the disk saying that it aborted a command. These > are > usually preceded by a reset, as shown here. What caused the reset > condition? > Was it actually target 11 or did target 11 get caught up in the reset > storm? >It happed in the mid-night and nobody touched the file box. I assume it is the transition status before the disk is *thoroughly* damaged: Jun 10 09:34:11 cn03 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major Jun 10 09:34:11 cn03 EVENT-TIME: Fri Jun 10 09:34:11 CST 2011 Jun 10 09:34:11 cn03 PLATFORM: X8DTH-i-6-iF-6F, CSN: 1234567890, HOSTNAME: cn03 Jun 10 09:34:11 cn03 SOURCE: zfs-diagnosis, REV: 1.0 Jun 10 09:34:11 cn03 EVENT-ID: 4f4bfc2c-f653-ed20-ab13-eef72224af5e Jun 10 09:34:11 cn03 DESC: The number of I/O errors associated with a ZFS device exceeded Jun 10 09:34:11 cn03 acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Jun 10 09:34:11 cn03 AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt Jun 10 09:34:11 cn03 will be made to activate a hot spare if available. Jun 10 09:34:11 cn03 IMPACT: Fault tolerance of the pool may be compromised. Jun 10 09:34:11 cn03 REC-ACTION: Run ''zpool status -x'' and replace the bad device. After I rebooted it, I got: Jun 10 11:38:49 cn03 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11 Version snv_134 64-bit Jun 10 11:38:49 cn03 genunix: [ID 683174 kern.notice] Copyright 1983-2010 Sun Microsystems, Inc. All rights reserved. Jun 10 11:38:49 cn03 Use is subject to license terms. Jun 10 11:38:49 cn03 unix: [ID 126719 kern.info] features: 7f7fffff<sse4_2,sse4_1,ssse3,cpuid,mwait,tscp,cmp,cx16,sse3,nx,asysc,htt,sse2,sse,sep,pat,cx8,pae,mca,mmx,cmov,d e,pge,mtrr,msr,tsc,lgpg> Jun 10 11:39:06 cn03 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): Jun 10 11:39:06 cn03 mptsas0 unrecognized capability 0x3 Jun 10 11:39:42 cn03 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g5000c50009723937 (sd3): Jun 10 11:39:42 cn03 drive offline Jun 10 11:39:47 cn03 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g5000c50009723937 (sd3): Jun 10 11:39:47 cn03 drive offline Jun 10 11:39:52 cn03 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g5000c50009723937 (sd3): Jun 10 11:39:52 cn03 drive offline Jun 10 11:39:57 cn03 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g5000c50009723937 (sd3): Jun 10 11:39:57 cn03 drive offline> > Hot spare will not help you here. The problem is not constrained to one > disk. > In fact, a hot spare may be the worst thing here because it can kick in > for the disk > complaining about a clogged expander or spurious resets. This causes a > resilver > that reads from the actual broken disk, that causes more resets, that > kicks out another > disk that causes a resilver, and so on. > -- richard >So the warm spares could be "better" choice under this situation? BTW, in what condition, the scsi reset storm will happen? How can we be immune to this so as not to avoid interrupting the file service? Thanks. Fred
Fixing a typo in my last thread...> -----Original Message----- > From: Fred Liu > Sent: ???, ?? 16, 2011 17:22 > To: ''Richard Elling'' > Cc: Jim Klimov; zfs-discuss at opensolaris.org > Subject: RE: [zfs-discuss] zfs global hot spares? > > > This message is from the disk saying that it aborted a command. These > > are > > usually preceded by a reset, as shown here. What caused the reset > > condition? > > Was it actually target 11 or did target 11 get caught up in the reset > > storm? > > >It happed in the mid-night and nobody touched the file box. I assume it is the transition status before the disk is *thoroughly* damaged: Jun 10 09:34:11 cn03 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS- 8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major Jun 10 09:34:11 cn03 EVENT-TIME: Fri Jun 10 09:34:11 CST 2011 Jun 10 09:34:11 cn03 PLATFORM: X8DTH-i-6-iF-6F, CSN: 1234567890, HOSTNAME: cn03 Jun 10 09:34:11 cn03 SOURCE: zfs-diagnosis, REV: 1.0 Jun 10 09:34:11 cn03 EVENT-ID: 4f4bfc2c-f653-ed20-ab13-eef72224af5e Jun 10 09:34:11 cn03 DESC: The number of I/O errors associated with a ZFS device exceeded Jun 10 09:34:11 cn03 acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Jun 10 09:34:11 cn03 AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt Jun 10 09:34:11 cn03 will be made to activate a hot spare if available. Jun 10 09:34:11 cn03 IMPACT: Fault tolerance of the pool may be compromised. Jun 10 09:34:11 cn03 REC-ACTION: Run ''zpool status -x'' and replace the bad device. After I rebooted it, I got: Jun 10 11:38:49 cn03 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11 Version snv_134 64-bit Jun 10 11:38:49 cn03 genunix: [ID 683174 kern.notice] Copyright 1983- 2010 Sun Microsystems, Inc. All rights reserved. Jun 10 11:38:49 cn03 Use is subject to license terms. Jun 10 11:38:49 cn03 unix: [ID 126719 kern.info] features: 7f7fffff<sse4_2,sse4_1,ssse3,cpuid,mwait,tscp,cmp,cx16,sse3,nx,asysc,ht t,sse2,sse,sep,pat,cx8,pae,mca,mmx,cmov,d e,pge,mtrr,msr,tsc,lgpg> Jun 10 11:39:06 cn03 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): Jun 10 11:39:06 cn03 mptsas0 unrecognized capability 0x3 Jun 10 11:39:42 cn03 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g5000c50009723937 (sd3): Jun 10 11:39:42 cn03 drive offline Jun 10 11:39:47 cn03 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g5000c50009723937 (sd3): Jun 10 11:39:47 cn03 drive offline Jun 10 11:39:52 cn03 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g5000c50009723937 (sd3): Jun 10 11:39:52 cn03 drive offline Jun 10 11:39:57 cn03 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g5000c50009723937 (sd3): Jun 10 11:39:57 cn03 drive offline> > > > > > Hot spare will not help you here. The problem is not constrained to > one > > disk. > > In fact, a hot spare may be the worst thing here because it can kick > in > > for the disk > > complaining about a clogged expander or spurious resets. This causes > a > > resilver > > that reads from the actual broken disk, that causes more resets, that > > kicks out another > > disk that causes a resilver, and so on. > > -- richard > > >So the warm spares could be "better" choice under this situation? BTW, in what condition, the scsi reset storm will happen? How can we be immune to this so as NOT to interrupt the file service?> > > Thanks. > Fred
> -----Original Message----- > From: Fred Liu > Sent: ???, ?? 16, 2011 17:28 > To: Fred Liu; ''Richard Elling'' > Cc: ''Jim Klimov''; ''zfs-discuss at opensolaris.org'' > Subject: RE: [zfs-discuss] zfs global hot spares? > > Fixing a typo in my last thread... > > > -----Original Message----- > > From: Fred Liu > > Sent: ???, ?? 16, 2011 17:22 > > To: ''Richard Elling'' > > Cc: Jim Klimov; zfs-discuss at opensolaris.org > > Subject: RE: [zfs-discuss] zfs global hot spares? > > > > > This message is from the disk saying that it aborted a command. > These > > > are > > > usually preceded by a reset, as shown here. What caused the reset > > > condition? > > > Was it actually target 11 or did target 11 get caught up in the > reset > > > storm? > > > > > > It happed in the mid-night and nobody touched the file box. > I assume it is the transition status before the disk is *thoroughly* > damaged: > > Jun 10 09:34:11 cn03 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS- > 8000-FD, TYPE: Fault, VER: 1, SEVERITY: > > Major > Jun 10 09:34:11 cn03 EVENT-TIME: Fri Jun 10 09:34:11 CST 2011 > Jun 10 09:34:11 cn03 PLATFORM: X8DTH-i-6-iF-6F, CSN: 1234567890, > HOSTNAME: cn03 > Jun 10 09:34:11 cn03 SOURCE: zfs-diagnosis, REV: 1.0 > Jun 10 09:34:11 cn03 EVENT-ID: 4f4bfc2c-f653-ed20-ab13-eef72224af5e > Jun 10 09:34:11 cn03 DESC: The number of I/O errors associated with a > ZFS device exceeded > Jun 10 09:34:11 cn03 acceptable levels. Refer to > http://sun.com/msg/ZFS-8000-FD for more information. > Jun 10 09:34:11 cn03 AUTO-RESPONSE: The device has been offlined and > marked as faulted. An attempt > Jun 10 09:34:11 cn03 will be made to activate a hot spare if > available. > Jun 10 09:34:11 cn03 IMPACT: Fault tolerance of the pool may be > compromised. > Jun 10 09:34:11 cn03 REC-ACTION: Run ''zpool status -x'' and replace the > bad device. > > After I rebooted it, I got: > Jun 10 11:38:49 cn03 genunix: [ID 540533 kern.notice] ^MSunOS Release > 5.11 Version snv_134 64-bit > Jun 10 11:38:49 cn03 genunix: [ID 683174 kern.notice] Copyright 1983- > 2010 Sun Microsystems, Inc. All rights > > reserved. > Jun 10 11:38:49 cn03 Use is subject to license terms. > Jun 10 11:38:49 cn03 unix: [ID 126719 kern.info] features: > > > 7f7fffff<sse4_2,sse4_1,ssse3,cpuid,mwait,tscp,cmp,cx16,sse3,nx,asysc,ht > t,sse2,sse,sep,pat,cx8,pae,mca,mmx,cmov,d > > e,pge,mtrr,msr,tsc,lgpg> > > Jun 10 11:39:06 cn03 scsi: [ID 365881 kern.info] > /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): > Jun 10 11:39:06 cn03 mptsas0 unrecognized capability 0x3 > > Jun 10 11:39:42 cn03 scsi: [ID 107833 kern.warning] WARNING: > /scsi_vhci/disk at g5000c50009723937 (sd3): > Jun 10 11:39:42 cn03 drive offline > Jun 10 11:39:47 cn03 scsi: [ID 107833 kern.warning] WARNING: > /scsi_vhci/disk at g5000c50009723937 (sd3): > Jun 10 11:39:47 cn03 drive offline > Jun 10 11:39:52 cn03 scsi: [ID 107833 kern.warning] WARNING: > /scsi_vhci/disk at g5000c50009723937 (sd3): > Jun 10 11:39:52 cn03 drive offline > Jun 10 11:39:57 cn03 scsi: [ID 107833 kern.warning] WARNING: > /scsi_vhci/disk at g5000c50009723937 (sd3): > Jun 10 11:39:57 cn03 drive offline > > > > > > > > > > Hot spare will not help you here. The problem is not constrained to > > one > > > disk. > > > In fact, a hot spare may be the worst thing here because it can > kick > > in > > > for the disk > > > complaining about a clogged expander or spurious resets. This > causes > > a > > > resilver > > > that reads from the actual broken disk, that causes more resets, > that > > > kicks out another > > > disk that causes a resilver, and so on. > > > -- richard > > > > > > So the warm spares could be "better" choice under this situation? > BTW, in what condition, the scsi reset storm will happen? > How can we be immune to this so as NOT to interrupt the file > service? > > > > > > Thanks. > > Fred
more below... On Jun 16, 2011, at 2:27 AM, Fred Liu wrote:> Fixing a typo in my last thread... > >> -----Original Message----- >> From: Fred Liu >> Sent: ???, ?? 16, 2011 17:22 >> To: ''Richard Elling'' >> Cc: Jim Klimov; zfs-discuss at opensolaris.org >> Subject: RE: [zfs-discuss] zfs global hot spares? >> >>> This message is from the disk saying that it aborted a command. These >>> are >>> usually preceded by a reset, as shown here. What caused the reset >>> condition? >>> Was it actually target 11 or did target 11 get caught up in the reset >>> storm? >>> >> > It happed in the mid-night and nobody touched the file box. > I assume it is the transition status before the disk is *thoroughly* > damaged: > > Jun 10 09:34:11 cn03 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS- > 8000-FD, TYPE: Fault, VER: 1, SEVERITY: > > Major > Jun 10 09:34:11 cn03 EVENT-TIME: Fri Jun 10 09:34:11 CST 2011 > Jun 10 09:34:11 cn03 PLATFORM: X8DTH-i-6-iF-6F, CSN: 1234567890, > HOSTNAME: cn03 > Jun 10 09:34:11 cn03 SOURCE: zfs-diagnosis, REV: 1.0 > Jun 10 09:34:11 cn03 EVENT-ID: 4f4bfc2c-f653-ed20-ab13-eef72224af5e > Jun 10 09:34:11 cn03 DESC: The number of I/O errors associated with a > ZFS device exceeded > Jun 10 09:34:11 cn03 acceptable levels. Refer to > http://sun.com/msg/ZFS-8000-FD for more information. > Jun 10 09:34:11 cn03 AUTO-RESPONSE: The device has been offlined and > marked as faulted. An attempt > Jun 10 09:34:11 cn03 will be made to activate a hot spare if > available. > Jun 10 09:34:11 cn03 IMPACT: Fault tolerance of the pool may be > compromised. > Jun 10 09:34:11 cn03 REC-ACTION: Run ''zpool status -x'' and replace the > bad device.zpool status -x output would be useful. These error reports do not include a pointer to the faulty device. fmadm can also give more info.> > After I rebooted it, I got: > Jun 10 11:38:49 cn03 genunix: [ID 540533 kern.notice] ^MSunOS Release > 5.11 Version snv_134 64-bit > Jun 10 11:38:49 cn03 genunix: [ID 683174 kern.notice] Copyright 1983- > 2010 Sun Microsystems, Inc. All rights > > reserved. > Jun 10 11:38:49 cn03 Use is subject to license terms. > Jun 10 11:38:49 cn03 unix: [ID 126719 kern.info] features: > > 7f7fffff<sse4_2,sse4_1,ssse3,cpuid,mwait,tscp,cmp,cx16,sse3,nx,asysc,ht > t,sse2,sse,sep,pat,cx8,pae,mca,mmx,cmov,d > > e,pge,mtrr,msr,tsc,lgpg> > > Jun 10 11:39:06 cn03 scsi: [ID 365881 kern.info] > /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): > Jun 10 11:39:06 cn03 mptsas0 unrecognized capability 0x3 > > Jun 10 11:39:42 cn03 scsi: [ID 107833 kern.warning] WARNING: > /scsi_vhci/disk at g5000c50009723937 (sd3): > Jun 10 11:39:42 cn03 drive offline > Jun 10 11:39:47 cn03 scsi: [ID 107833 kern.warning] WARNING: > /scsi_vhci/disk at g5000c50009723937 (sd3): > Jun 10 11:39:47 cn03 drive offline > Jun 10 11:39:52 cn03 scsi: [ID 107833 kern.warning] WARNING: > /scsi_vhci/disk at g5000c50009723937 (sd3): > Jun 10 11:39:52 cn03 drive offline > Jun 10 11:39:57 cn03 scsi: [ID 107833 kern.warning] WARNING: > /scsi_vhci/disk at g5000c50009723937 (sd3): > Jun 10 11:39:57 cn03 drive offlinempathadm can be used to determine the device paths for this disk. Notice how the disk is offline at multiple times. There is some sort of recovery going on here that continues to fail later. I call these "wounded soldiers" because they take a lot more care than a dead soldier. You would be better off if the drive completely died.>> >> >>> >>> Hot spare will not help you here. The problem is not constrained to >> one >>> disk. >>> In fact, a hot spare may be the worst thing here because it can kick >> in >>> for the disk >>> complaining about a clogged expander or spurious resets. This causes >> a >>> resilver >>> that reads from the actual broken disk, that causes more resets, that >>> kicks out another >>> disk that causes a resilver, and so on. >>> -- richard >>> >> > So the warm spares could be "better" choice under this situation? > BTW, in what condition, the scsi reset storm will happen?In my experience they start randomly and in some cases are not reproducible.> How can we be immune to this so as NOT to interrupt the file > service?Are you asking for fault tolerance? If so, then you need a fault tolerant system like a Tandem. If you are asking for a way to build a cost effective solution using commercial, off-the-shelf (COTS) components, then that is far beyond what can be easily said in a forum posting. -- richard
> > zpool status -x output would be useful. These error reports do not > include a > pointer to the faulty device. fmadm can also give more info. >Yes. Thanks.> mpathadm can be used to determine the device paths for this disk. > > Notice how the disk is offline at multiple times. There is some sort of > recovery going on here that continues to fail later. I call these > "wounded > soldiers" because they take a lot more care than a dead soldier. You > would be better off if the drive completely died. >I think it only works in mpts2(sas2) where multi-path is forcedly enabled. I agree the disk was a sort of "critical status" before died. The difficult point is the OS can NOT automatically off the wounded disk in mid-night( maybe cause the coming scsi reset storm), nobody can do it at all.> > In my experience they start randomly and in some cases are not > reproducible. >It seems sort of agnostic? Isn''t it? :-)> > Are you asking for fault tolerance? If so, then you need a fault > tolerant system like > a Tandem. If you are asking for a way to build a cost effective > solution using > commercial, off-the-shelf (COTS) components, then that is far beyond > what can be easily > said in a forum posting. > -- richardYeah. High availability is another topic which has more technical challenges. Anyway, thank you very much. Fred