We run a cron job that does a ''zpool status -x'' to check for any degraded pools. We just happened to find a pool degraded this morning by running ''zpool status'' by hand and were surprised that it was degraded as we didn''t get a notice from the cron job. # uname -srvp SunOS 5.11 snv_78 i386 # zpool status -x all pools are healthy # zpool status pool1 pool: pool1 state: DEGRADED scrub: none requested config: NAME STATE READ WRITE CKSUM pool1 DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c1t8d0 REMOVED 0 0 0 c1t9d0 ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 c1t11d0 ONLINE 0 0 0 errors: No known data errors I''m going to look into it now why the disk is listed as removed. Does this look like a bug with ''zpool status -x''? Ben This message posted from opensolaris.org
This post from close to a year ago never received a response. We just had this same thing happen to another server that is running Solaris 10 U6. One of the disks was marked as removed and the pool degraded, but ''zpool status -x'' says all pools are healthy. After doing an ''zpool online'' on the disk it resilvered in fine. Any ideas why ''zpool status -x'' reports all healthy while ''zpool status'' shows a pool in degraded mode? thanks, Ben> We run a cron job that does a ''zpool status -x'' to > check for any degraded pools. We just happened to > find a pool degraded this morning by running ''zpool > status'' by hand and were surprised that it was > degraded as we didn''t get a notice from the cron > job. > > # uname -srvp > SunOS 5.11 snv_78 i386 > > # zpool status -x > all pools are healthy > > # zpool status pool1 > pool: pool1 > tate: DEGRADED > scrub: none requested > onfig: > > NAME STATE READ WRITE CKSUM > pool1 DEGRADED 0 0 0 > raidz1 DEGRADED 0 0 0 > c1t8d0 REMOVED 0 0 0 > c1t9d0 ONLINE 0 0 0 > c1t10d0 ONLINE 0 0 0 > c1t11d0 ONLINE 0 0 0 > No known data errors > > I''m going to look into it now why the disk is listed > as removed. > > Does this look like a bug with ''zpool status -x''? > > Ben-- This message posted from opensolaris.org
I just put in a (low priority) bug report on this. Ben> This post from close to a year ago never received a > response. We just had this same thing happen to > another server that is running Solaris 10 U6. One of > the disks was marked as removed and the pool > degraded, but ''zpool status -x'' says all pools are > healthy. After doing an ''zpool online'' on the disk > it resilvered in fine. Any ideas why ''zpool status > -x'' reports all healthy while ''zpool status'' shows a > pool in degraded mode? > > thanks, > Ben > > > We run a cron job that does a ''zpool status -x'' to > > check for any degraded pools. We just happened to > > find a pool degraded this morning by running > ''zpool > > status'' by hand and were surprised that it was > > degraded as we didn''t get a notice from the cron > > job. > > > > # uname -srvp > > SunOS 5.11 snv_78 i386 > > > > # zpool status -x > > all pools are healthy > > > > # zpool status pool1 > > pool: pool1 > > tate: DEGRADED > > scrub: none requested > > onfig: > > > > NAME STATE READ WRITE CKSUM > > pool1 DEGRADED 0 0 0 > > raidz1 DEGRADED 0 0 0 > > c1t8d0 REMOVED 0 0 0 > > c1t9d0 ONLINE 0 0 0 > > c1t10d0 ONLINE 0 0 0 > > c1t11d0 ONLINE 0 0 0 > > No known data errors > > > > I''m going to look into it now why the disk is > listed > > as removed. > > > > Does this look like a bug with ''zpool status -x''? > > > > Ben-- This message posted from opensolaris.org
Bug ID is 6793967. This problem just happened again. % zpool status pool1 pool: pool1 state: DEGRADED scrub: resilver completed after 0h48m with 0 errors on Mon Jan 5 12:30:52 2009 config: NAME STATE READ WRITE CKSUM pool1 DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 c4t8d0s0 ONLINE 0 0 0 c4t9d0s0 ONLINE 0 0 0 c4t10d0s0 ONLINE 0 0 0 c4t11d0s0 ONLINE 0 0 0 c4t12d0s0 REMOVED 0 0 0 c4t13d0s0 ONLINE 0 0 0 errors: No known data errors % zpool status -x all pools are healthy % # zpool online pool1 c4t12d0s0 % zpool status -x pool: pool1 state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 0.12% done, 2h38m to go config: NAME STATE READ WRITE CKSUM pool1 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c4t8d0s0 ONLINE 0 0 0 c4t9d0s0 ONLINE 0 0 0 c4t10d0s0 ONLINE 0 0 0 c4t11d0s0 ONLINE 0 0 0 c4t12d0s0 ONLINE 0 0 0 c4t13d0s0 ONLINE 0 0 0 errors: No known data errors % Ben> I just put in a (low priority) bug report on this. > > Ben > > > This post from close to a year ago never received > a > > response. We just had this same thing happen to > > another server that is running Solaris 10 U6. One > of > > the disks was marked as removed and the pool > > degraded, but ''zpool status -x'' says all pools are > > healthy. After doing an ''zpool online'' on the > disk > > it resilvered in fine. Any ideas why ''zpool > status > > -x'' reports all healthy while ''zpool status'' shows > a > > pool in degraded mode? > > > > thanks, > > Ben > > > > > We run a cron job that does a ''zpool status -x'' > to > > > check for any degraded pools. We just happened > to > > > find a pool degraded this morning by running > > ''zpool > > > status'' by hand and were surprised that it was > > > degraded as we didn''t get a notice from the cron > > > job. > > > > > > # uname -srvp > > > SunOS 5.11 snv_78 i386 > > > > > > # zpool status -x > > > all pools are healthy > > > > > > # zpool status pool1 > > > pool: pool1 > > > tate: DEGRADED > > > scrub: none requested > > > onfig: > > > > > > NAME STATE READ WRITE CKSUM > > > pool1 DEGRADED 0 0 0 > > > raidz1 DEGRADED 0 0 0 > > > c1t8d0 REMOVED 0 0 0 > > > c1t9d0 ONLINE 0 0 0 > > > c1t10d0 ONLINE 0 0 0 > > > c1t11d0 ONLINE 0 0 0 > > > No known data errors > > > > > > I''m going to look into it now why the disk is > > listed > > > as removed. > > > > > > Does this look like a bug with ''zpool status > -x''? > > > > > > Ben-- This message posted from opensolaris.org
What''s the output of ''zfs upgrade'' and ''zpool upgrade''? (I''m just curious - I had a similar situation which seems to be resolved now that I''ve gone to Solaris 10u6 or OpenSolaris 2008.11). On Wed, Jan 21, 2009 at 2:11 PM, Ben Miller <miller at eecis.udel.edu> wrote:> Bug ID is 6793967. > > This problem just happened again. > % zpool status pool1 > pool: pool1 > state: DEGRADED > scrub: resilver completed after 0h48m with 0 errors on Mon Jan 5 12:30:52 2009 > config: > > NAME STATE READ WRITE CKSUM > pool1 DEGRADED 0 0 0 > raidz2 DEGRADED 0 0 0 > c4t8d0s0 ONLINE 0 0 0 > c4t9d0s0 ONLINE 0 0 0 > c4t10d0s0 ONLINE 0 0 0 > c4t11d0s0 ONLINE 0 0 0 > c4t12d0s0 REMOVED 0 0 0 > c4t13d0s0 ONLINE 0 0 0 > > errors: No known data errors > > % zpool status -x > all pools are healthy > % > # zpool online pool1 c4t12d0s0 > % zpool status -x > pool: pool1 > state: ONLINE > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scrub: resilver in progress for 0h0m, 0.12% done, 2h38m to go > config: > > NAME STATE READ WRITE CKSUM > pool1 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > c4t8d0s0 ONLINE 0 0 0 > c4t9d0s0 ONLINE 0 0 0 > c4t10d0s0 ONLINE 0 0 0 > c4t11d0s0 ONLINE 0 0 0 > c4t12d0s0 ONLINE 0 0 0 > c4t13d0s0 ONLINE 0 0 0 > > errors: No known data errors > % > > Ben > >> I just put in a (low priority) bug report on this. >> >> Ben >> >> > This post from close to a year ago never received >> a >> > response. We just had this same thing happen to >> > another server that is running Solaris 10 U6. One >> of >> > the disks was marked as removed and the pool >> > degraded, but ''zpool status -x'' says all pools are >> > healthy. After doing an ''zpool online'' on the >> disk >> > it resilvered in fine. Any ideas why ''zpool >> status >> > -x'' reports all healthy while ''zpool status'' shows >> a >> > pool in degraded mode? >> > >> > thanks, >> > Ben >> > >> > > We run a cron job that does a ''zpool status -x'' >> to >> > > check for any degraded pools. We just happened >> to >> > > find a pool degraded this morning by running >> > ''zpool >> > > status'' by hand and were surprised that it was >> > > degraded as we didn''t get a notice from the cron >> > > job. >> > > >> > > # uname -srvp >> > > SunOS 5.11 snv_78 i386 >> > > >> > > # zpool status -x >> > > all pools are healthy >> > > >> > > # zpool status pool1 >> > > pool: pool1 >> > > tate: DEGRADED >> > > scrub: none requested >> > > onfig: >> > > >> > > NAME STATE READ WRITE CKSUM >> > > pool1 DEGRADED 0 0 0 >> > > raidz1 DEGRADED 0 0 0 >> > > c1t8d0 REMOVED 0 0 0 >> > > c1t9d0 ONLINE 0 0 0 >> > > c1t10d0 ONLINE 0 0 0 >> > > c1t11d0 ONLINE 0 0 0 >> > > No known data errors >> > > >> > > I''m going to look into it now why the disk is >> > listed >> > > as removed. >> > > >> > > Does this look like a bug with ''zpool status >> -x''? >> > > >> > > Ben > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
The pools are upgraded to version 10. Also, this is on Solaris 10u6. # zpool upgrade This system is currently running ZFS pool version 10. All pools are formatted using this version. Ben> What''s the output of ''zfs upgrade'' and ''zpool > upgrade''? (I''m just > curious - I had a similar situation which seems to be > resolved now > that I''ve gone to Solaris 10u6 or OpenSolaris > 2008.11). > > > > On Wed, Jan 21, 2009 at 2:11 PM, Ben Miller > <miller at eecis.udel.edu> wrote: > > Bug ID is 6793967. > > > > This problem just happened again. > > % zpool status pool1 > > pool: pool1 > > state: DEGRADED > > scrub: resilver completed after 0h48m with 0 > errors on Mon Jan 5 12:30:52 2009 > > config: > > > > NAME STATE READ WRITE CKSUM > > pool1 DEGRADED 0 0 0 > > raidz2 DEGRADED 0 0 0 > > c4t8d0s0 ONLINE 0 0 0 > > c4t9d0s0 ONLINE 0 0 0 > > c4t10d0s0 ONLINE 0 0 0 > > c4t11d0s0 ONLINE 0 0 0 > > c4t12d0s0 REMOVED 0 0 0 > > c4t13d0s0 ONLINE 0 0 0 > > > > errors: No known data errors > > > > % zpool status -x > > all pools are healthy > > % > > # zpool online pool1 c4t12d0s0 > > % zpool status -x > > pool: pool1 > > state: ONLINE > > status: One or more devices is currently being > resilvered. The pool will > > continue to function, possibly in a degraded > state. > > action: Wait for the resilver to complete. > > scrub: resilver in progress for 0h0m, 0.12% done, > 2h38m to go > > config: > > > > NAME STATE READ WRITE CKSUM > > pool1 ONLINE 0 0 0 > > raidz2 ONLINE 0 0 0 > > c4t8d0s0 ONLINE 0 0 0 > > c4t9d0s0 ONLINE 0 0 0 > > c4t10d0s0 ONLINE 0 0 0 > > c4t11d0s0 ONLINE 0 0 0 > > c4t12d0s0 ONLINE 0 0 0 > > c4t13d0s0 ONLINE 0 0 0 > > > > errors: No known data errors > > % > > > > Ben > >-- This message posted from opensolaris.org
A little gotcha that I found in my 10u6 update process was that ''zpool upgrade [poolname]'' is not the same as ''zfs upgrade [poolname]/[filesystem(s)]'' What does ''zfs upgrade'' say? I''m not saying this is the source of your problem, but it''s a detail that seemed to affect stability for me. On Thu, Jan 22, 2009 at 7:25 AM, Ben Miller <miller at eecis.udel.edu> wrote:> The pools are upgraded to version 10. Also, this is on Solaris 10u6. > > # zpool upgrade > This system is currently running ZFS pool version 10. > > All pools are formatted using this version. > > Ben > >> What''s the output of ''zfs upgrade'' and ''zpool >> upgrade''? (I''m just >> curious - I had a similar situation which seems to be >> resolved now >> that I''ve gone to Solaris 10u6 or OpenSolaris >> 2008.11). >> >> >> >> On Wed, Jan 21, 2009 at 2:11 PM, Ben Miller >> <miller at eecis.udel.edu> wrote: >> > Bug ID is 6793967. >> > >> > This problem just happened again. >> > % zpool status pool1 >> > pool: pool1 >> > state: DEGRADED >> > scrub: resilver completed after 0h48m with 0 >> errors on Mon Jan 5 12:30:52 2009 >> > config: >> > >> > NAME STATE READ WRITE CKSUM >> > pool1 DEGRADED 0 0 0 >> > raidz2 DEGRADED 0 0 0 >> > c4t8d0s0 ONLINE 0 0 0 >> > c4t9d0s0 ONLINE 0 0 0 >> > c4t10d0s0 ONLINE 0 0 0 >> > c4t11d0s0 ONLINE 0 0 0 >> > c4t12d0s0 REMOVED 0 0 0 >> > c4t13d0s0 ONLINE 0 0 0 >> > >> > errors: No known data errors >> > >> > % zpool status -x >> > all pools are healthy >> > % >> > # zpool online pool1 c4t12d0s0 >> > % zpool status -x >> > pool: pool1 >> > state: ONLINE >> > status: One or more devices is currently being >> resilvered. The pool will >> > continue to function, possibly in a degraded >> state. >> > action: Wait for the resilver to complete. >> > scrub: resilver in progress for 0h0m, 0.12% done, >> 2h38m to go >> > config: >> > >> > NAME STATE READ WRITE CKSUM >> > pool1 ONLINE 0 0 0 >> > raidz2 ONLINE 0 0 0 >> > c4t8d0s0 ONLINE 0 0 0 >> > c4t9d0s0 ONLINE 0 0 0 >> > c4t10d0s0 ONLINE 0 0 0 >> > c4t11d0s0 ONLINE 0 0 0 >> > c4t12d0s0 ONLINE 0 0 0 >> > c4t13d0s0 ONLINE 0 0 0 >> > >> > errors: No known data errors >> > % >> > >> > Ben >> > > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
We haven''t done ''zfs upgrade ...'' any. I''ll give that a try the next time the system can be taken down. Ben> A little gotcha that I found in my 10u6 update > process was that ''zpool > upgrade [poolname]'' is not the same as ''zfs upgrade > [poolname]/[filesystem(s)]'' > > What does ''zfs upgrade'' say? I''m not saying this is > the source of > your problem, but it''s a detail that seemed to affect > stability for > me. > > > On Thu, Jan 22, 2009 at 7:25 AM, Ben Miller > > The pools are upgraded to version 10. Also, this > is on Solaris 10u6. > >-- This message posted from opensolaris.org
Ben Miller wrote:> We haven''t done ''zfs upgrade ...'' any. I''ll give that a try the next time the system can be taken down. > >No need to take the system down, it can be done on the fly. The only downside to the upgrade is that you may not be able to import the pool or file system on an older OS release. -- richard
You can upgrade live. ''zfs upgrade'' with no arguments shows you the zfs version status of filesystems present without upgrading. On Jan 24, 2009, at 10:19 AM, Ben Miller <miller at eecis.udel.edu> wrote:> We haven''t done ''zfs upgrade ...'' any. I''ll give that a try the > next time the system can be taken down. > > Ben > >> A little gotcha that I found in my 10u6 update >> process was that ''zpool >> upgrade [poolname]'' is not the same as ''zfs upgrade >> [poolname]/[filesystem(s)]'' >> >> What does ''zfs upgrade'' say? I''m not saying this is >> the source of >> your problem, but it''s a detail that seemed to affect >> stability for >> me. >> >> >> On Thu, Jan 22, 2009 at 7:25 AM, Ben Miller >>> The pools are upgraded to version 10. Also, this >> is on Solaris 10u6. >>> > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
I forgot the pool that''s having problems was recreated recently so it''s already at zfs version 3. I just did a ''zfs upgrade -a'' for another pool, but some of those filesystems failed since they are busy and couldn''t be unmounted. # zfs upgrade -a cannot unmount ''/var/mysql'': Device busy cannot unmount ''/var/postfix'': Device busy .... 6 filesystems upgraded 821 filesystems already at this version Ben> You can upgrade live. ''zfs upgrade'' with no > arguments shows you the > zfs version status of filesystems present without > upgrading. > > > > On Jan 24, 2009, at 10:19 AM, Ben Miller > <miller at eecis.udel.edu> wrote: > > > We haven''t done ''zfs upgrade ...'' any. I''ll give > that a try the > > next time the system can be taken down. > > > > Ben > > > >> A little gotcha that I found in my 10u6 update > >> process was that ''zpool > >> upgrade [poolname]'' is not the same as ''zfs > upgrade > >> [poolname]/[filesystem(s)]'' > >> > >> What does ''zfs upgrade'' say? I''m not saying this > is > >> the source of > >> your problem, but it''s a detail that seemed to > affect > >> stability for > >> me. > >> > >>-- This message posted from opensolaris.org
What does ''zpool status -xv'' show? On Tue, Jan 27, 2009 at 8:01 AM, Ben Miller <miller at eecis.udel.edu> wrote:> I forgot the pool that''s having problems was recreated recently so it''s already at zfs version 3. I just did a ''zfs upgrade -a'' for another pool, but some of those filesystems failed since they are busy and couldn''t be unmounted. > > # zfs upgrade -a > cannot unmount ''/var/mysql'': Device busy > cannot unmount ''/var/postfix'': Device busy > .... > 6 filesystems upgraded > 821 filesystems already at this version > > Ben > >> You can upgrade live. ''zfs upgrade'' with no >> arguments shows you the >> zfs version status of filesystems present without >> upgrading. >> >> >> >> On Jan 24, 2009, at 10:19 AM, Ben Miller >> <miller at eecis.udel.edu> wrote: >> >> > We haven''t done ''zfs upgrade ...'' any. I''ll give >> that a try the >> > next time the system can be taken down. >> > >> > Ben >> > >> >> A little gotcha that I found in my 10u6 update >> >> process was that ''zpool >> >> upgrade [poolname]'' is not the same as ''zfs >> upgrade >> >> [poolname]/[filesystem(s)]'' >> >> >> >> What does ''zfs upgrade'' say? I''m not saying this >> is >> >> the source of >> >> your problem, but it''s a detail that seemed to >> affect >> >> stability for >> >> me. >> >> >> >> > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
# zpool status -xv all pools are healthy Ben> What does ''zpool status -xv'' show? > > On Tue, Jan 27, 2009 at 8:01 AM, Ben Miller > <miller at eecis.udel.edu> wrote: > > I forgot the pool that''s having problems was > recreated recently so it''s already at zfs version 3. > I just did a ''zfs upgrade -a'' for another pool, but > some of those filesystems failed since they are busy > and couldn''t be unmounted. > > > # zfs upgrade -a > > cannot unmount ''/var/mysql'': Device busy > > cannot unmount ''/var/postfix'': Device busy > > .... > > 6 filesystems upgraded > > 821 filesystems already at this version > > > > Ben > >-- This message posted from opensolaris.org
Maybe ZFS hasn''t seen an error in a long enough time that it considers the pool healthy? You could try clearing the pool and then observing. On Wed, Jan 28, 2009 at 9:40 AM, Ben Miller <miller at eecis.udel.edu> wrote:> # zpool status -xv > all pools are healthy > > Ben > >> What does ''zpool status -xv'' show? >> >> On Tue, Jan 27, 2009 at 8:01 AM, Ben Miller >> <miller at eecis.udel.edu> wrote: >> > I forgot the pool that''s having problems was >> recreated recently so it''s already at zfs version 3. >> I just did a ''zfs upgrade -a'' for another pool, but >> some of those filesystems failed since they are busy >> and couldn''t be unmounted. >> >> > # zfs upgrade -a >> > cannot unmount ''/var/mysql'': Device busy >> > cannot unmount ''/var/postfix'': Device busy >> > .... >> > 6 filesystems upgraded >> > 821 filesystems already at this version >> > >> > Ben >> > > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >