William D. Hathaway
2007-Apr-07 20:13 UTC
[zfs-discuss] misleading zpool state and panic -- nevada b60 x86
I''m running Nevada build 60 inside VMWare, it is a test rig with no data of value. SunOS b60 5.11 snv_60 i86pc i386 i86pc I wanted to check out the FMA handling of a serious zpool error, so I did the following: 2007-04-07.08:46:31 zpool create tank mirror c0d1 c1d1 2007-04-07.15:21:37 zpool scrub tank (inserted some errors with dd on one device to see if it showed up, which it did, but healed fine) 2007-04-07.15:22:12 zpool scrub tank 2007-04-07.15:22:46 zpool clear tank c1d1 (added a single device without any redundancy) 2007-04-07.15:28:29 zpool add -f tank /var/500m_file (then I copied data into /tank and removed the /var/500m_file, a panic resulted, which was expected) I created a new /var/500m_file and then decided to destroy the pool and start over again. This caused a panic, which I wasn''t expecting. On reboot, I did a zpool -x, which shows: pool: tank state: ONLINE status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-4J scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c0d1 ONLINE 0 0 0 c1d1 ONLINE 0 0 0 /var/500m_file UNAVAIL 0 0 0 corrupted data errors: No known data errors Since there was no redundancy for the /var/500m_file vdev, I don''t see how a replace will help (unless I still had the original device/file with the data intact). When I try to destroy the pool with "zpool destroy tank", I get a panic with: Apr 7 16:00:17 b60 genunix: [ID 403854 kern.notice] assertion failed: vdev_config_sync(rvd, t xg) == 0, file: ../../common/fs/zfs/spa.c, line: 2910 Apr 7 16:00:17 b60 unix: [ID 100000 kern.notice] Apr 7 16:00:17 b60 genunix: [ID 353471 kern.notice] d893cd0c genunix:assfail+5a (f9e87e74, f9 e87e58,) Apr 7 16:00:17 b60 genunix: [ID 353471 kern.notice] d893cd6c zfs:spa_sync+6c3 (da89cac0, 1363 , 0) Apr 7 16:00:17 b60 genunix: [ID 353471 kern.notice] d893cdc8 zfs:txg_sync_thread+1df (d467854 0, 0) Apr 7 16:00:18 b60 genunix: [ID 353471 kern.notice] d893cdd8 unix:thread_start+8 () Apr 7 16:00:18 b60 unix: [ID 100000 kern.notice] Apr 7 16:00:18 b60 genunix: [ID 672855 kern.notice] syncing file systems... My question/comment boil down to: 1) Should the pool state really be ''online'' after losing a non-redundant vdev? 2) It seems like a bug if I get a panic when trying to destroy a pool (although this clearly may be related to #1). Am I hitting a known bug (or misconceptions about how the pool should function)? I will happily provide any debugging info that I can. I haven''t tried a ''zpool destroy -f tank'' yet since I didn''t know if there was any debugging value in my current state. Thanks, William Hathaway www.williamhathaway.com This message posted from opensolaris.org
George Wilson
2007-Apr-10 02:54 UTC
[zfs-discuss] misleading zpool state and panic -- nevada b60 x86
William D. Hathaway wrote:> I''m running Nevada build 60 inside VMWare, it is a test rig with no data of value. > SunOS b60 5.11 snv_60 i86pc i386 i86pc > I wanted to check out the FMA handling of a serious zpool error, so I did the following: > > 2007-04-07.08:46:31 zpool create tank mirror c0d1 c1d1 > 2007-04-07.15:21:37 zpool scrub tank > (inserted some errors with dd on one device to see if it showed up, which it did, but healed fine) > 2007-04-07.15:22:12 zpool scrub tank > 2007-04-07.15:22:46 zpool clear tank c1d1 > (added a single device without any redundancy) > 2007-04-07.15:28:29 zpool add -f tank /var/500m_file > (then I copied data into /tank and removed the /var/500m_file, a panic resulted, which was expected) > > I created a new /var/500m_file and then decided to destroy the pool and start over again. This caused a panic, which I wasn''t expecting. On reboot, I did a zpool -x, which shows: > pool: tank > state: ONLINE > status: One or more devices could not be used because the label is missing or > invalid. Sufficient replicas exist for the pool to continue > functioning in a degraded state. > action: Replace the device using ''zpool replace''. > see: http://www.sun.com/msg/ZFS-8000-4J > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0d1 ONLINE 0 0 0 > c1d1 ONLINE 0 0 0 > /var/500m_file UNAVAIL 0 0 0 corrupted data > > errors: No known data errors > > Since there was no redundancy for the /var/500m_file vdev, I don''t see how a replace will help (unless I still had the original device/file with the data intact). > > When I try to destroy the pool with "zpool destroy tank", I get a panic with: > Apr 7 16:00:17 b60 genunix: [ID 403854 kern.notice] assertion failed: vdev_config_sync(rvd, t > xg) == 0, file: ../../common/fs/zfs/spa.c, line: 2910 > Apr 7 16:00:17 b60 unix: [ID 100000 kern.notice] > Apr 7 16:00:17 b60 genunix: [ID 353471 kern.notice] d893cd0c genunix:assfail+5a (f9e87e74, f9 > e87e58,) > Apr 7 16:00:17 b60 genunix: [ID 353471 kern.notice] d893cd6c zfs:spa_sync+6c3 (da89cac0, 1363 > , 0) > Apr 7 16:00:17 b60 genunix: [ID 353471 kern.notice] d893cdc8 zfs:txg_sync_thread+1df (d467854 > 0, 0) > Apr 7 16:00:18 b60 genunix: [ID 353471 kern.notice] d893cdd8 unix:thread_start+8 () > Apr 7 16:00:18 b60 unix: [ID 100000 kern.notice] > Apr 7 16:00:18 b60 genunix: [ID 672855 kern.notice] syncing file systems... > > My question/comment boil down to: > 1) Should the pool state really be ''online'' after losing a non-redundant vdev? >Yeah, this seems odd and is probably a bug.> 2) It seems like a bug if I get a panic when trying to destroy a pool (although this clearly may be related to #1). >This is a known problem and one that we''re working on right now: 6413847 vdev label write failure should be handled more gracefully. In your case we are trying to update the label to indicate that the pool has been destroyed and this results in label write failure and thus the panic. Thanks, George> Am I hitting a known bug (or misconceptions about how the pool should function)? > I will happily provide any debugging info that I can. > > I haven''t tried a ''zpool destroy -f tank'' yet since I didn''t know if there was any debugging value in my current state. > > Thanks, > William Hathaway > www.williamhathaway.com > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >