Hi, I have a zpool made of 2 vdev mirrors, with disks connected via USB hub. While one vdev was resilvering at 22% (HD replacement), the original disk went away (seems the USB hub is the culprit). I turned the disk off and back on. The status of the disk came back to ONLINE, but there is no resilvering happening. Disks are cool and idle. An clues what could be happening here? Should i plug out / in the new disk again? I can''t check what status the data is in, because it was being used by a non-global zone which is failing to start, but that''s another porblem: # zoneadm -z ZONE boot could not verify fs /data: could not access /tank/data: No such file or directory zoneadm: zone ZONE failed to verify justin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080620/cb875c49/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3361 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080620/cb875c49/attachment.bin>
If you say ''zpool online <pool> <disk>'' that should tell ZFS that the disk is healthy again and automatically kick off a resilver. Of course, that should have happened automatically. What version of ZFS / Solaris are you running? Jeff On Fri, Jun 20, 2008 at 06:01:25PM +0200, Justin Vassallo wrote:> Hi, > > > > I have a zpool made of 2 vdev mirrors, with disks connected via USB hub. > > > > While one vdev was resilvering at 22% (HD replacement), the original disk > went away (seems the USB hub is the culprit). I turned the disk off and back > on. The status of the disk came back to ONLINE, but there is no resilvering > happening. Disks are cool and idle. > > > > An clues what could be happening here? Should i plug out / in the new disk > again? > > > > I can''t check what status the data is in, because it was being used by a > non-global zone which is failing to start, but that''s another porblem: > > > > # zoneadm -z ZONE boot > > could not verify fs /data: could not access /tank/data: No such file or > directory > > zoneadm: zone ZONE failed to verify > > > > > > justin >> _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>>> "jb" == Jeff Bonwick <Jeff.Bonwick at sun.com> writes:jb> If you say ''zpool online <pool> <disk>'' that should tell ZFS jb> that the disk is healthy again and automatically kick off a jb> resilver. jb> Of course, that should have happened automatically. with b71 I find that it does sometimes happen automatically, but the resilver isn''t enough to avoid checksum errors later. Only a manually-requested scrub will stop any more checksum errors from accumulating. Also, if I reboot before one of these auto-resilvers finishes, or plug in the component that flapped while powered down, the auto-resilver never resumes. >> While one vdev was resilvering at 22% (HD replacement), the >> original disk went away so if I understand you, it happened like this: #1 #2 online online t online UNPLUG i online UNPLUG <-- filesystem writes m online UNPLUG <-- filesystem writes e online online | online resilver -> online v UNPLUG xxx online --> fs reads allowed? how? online online why no resilvering? It seems to me like DTRT after #1 is unplugged is to take the whole pool UNAVAIL until the original disk #1 comes back. When the original disk #1 drops off, the only available component left is the #2 component that flapped earlier and is being resilvered, so #2 is out-of-date and should be ignored. but I''m pretty sure ZFS doesn''t work that way, right? What does it do? Will it serve incorrect, old data? Will it somehow return I/O errors for data that has changed on #1 and not been resilvered onto #2 yet? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080620/52408513/attachment.bin>
I am running zfs 3 on SunOS zen 5.10 Generic_118855-33 i86pc i386 i86pc What is baffling is that the disk did come online and appear as healthy, but zpool showed the fs inconsistency. As Miles said, after the disk came back the resilver did not resume. The only additions i have to the sequence shown are: 1) i am absolutely sure there were no disk writes in the interim since the non-global zones which use these fses were halted during the operation 2) The first time i unplugged the disk, i upgraded to a larger disk so i still have that original disk intact 3) i was afraid that zfs might resilver backwards, ie from the 22% image back to the original copy. I therefore pulled the new disk out again. Current status: # zpool status pool: external state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using ''zpool clear'' or replace the device with ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed with 0 errors on Sat Jun 21 07:42:03 2008 config: NAME STATE READ WRITE CKSUM external ONLINE 26.57 114 0 c12t0d0p0 ONLINE 4 114 0 mirror ONLINE 26.57 0 0 c13t0d0p0 ONLINE 55.25 4.48K 0 c16t0d0p0 ONLINE 0 0 53.14 Can i be sure that the unrecoverable error found is on the failed mirror? I was thinking of the following ways forward. Any comments most welcome: 1) run a scrub. I am thinking that kicking this off might actually corrupt data in the second vdev, so maybe starting off with 2 might be better idea... 2) physically replace disk1 with ORIGINAL disk2 and attempt a scrub justin -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Miles Nordin Sent: 21 June 2008 02:46 To: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] zfs mirror broken?>>>>> "jb" == Jeff Bonwick <Jeff.Bonwick at sun.com> writes:jb> If you say ''zpool online <pool> <disk>'' that should tell ZFS jb> that the disk is healthy again and automatically kick off a jb> resilver. jb> Of course, that should have happened automatically. with b71 I find that it does sometimes happen automatically, but the resilver isn''t enough to avoid checksum errors later. Only a manually-requested scrub will stop any more checksum errors from accumulating. Also, if I reboot before one of these auto-resilvers finishes, or plug in the component that flapped while powered down, the auto-resilver never resumes. >> While one vdev was resilvering at 22% (HD replacement), the >> original disk went away so if I understand you, it happened like this: #1 #2 online online t online UNPLUG i online UNPLUG <-- filesystem writes m online UNPLUG <-- filesystem writes e online online | online resilver -> online v UNPLUG xxx online --> fs reads allowed? how? online online why no resilvering? It seems to me like DTRT after #1 is unplugged is to take the whole pool UNAVAIL until the original disk #1 comes back. When the original disk #1 drops off, the only available component left is the #2 component that flapped earlier and is being resilvered, so #2 is out-of-date and should be ignored. but I''m pretty sure ZFS doesn''t work that way, right? What does it do? Will it serve incorrect, old data? Will it somehow return I/O errors for data that has changed on #1 and not been resilvered onto #2 yet? -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3361 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080623/be841de8/attachment.bin>
To add: Zpool status -xv posted earlier ends with: errors: No known data errors # fmadm faulty STATE RESOURCE / UUID -------- ---------------------------------------------------------------------- degraded zfs://pool=external cbc49380-8ebc-cf10-a8c5-fcaa0c984117 -------- ---------------------------------------------------------------------- degraded zfs://pool=external/vdev=3cfe5f6abcb5007b ad451007-4c3c-ee12-a9b4-fe2ad1156ef7 -------- ---------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3361 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080623/2a02f41a/attachment.bin>
I scrubbed the pool and it completed with no errors. I then cleared the pool and re-attached the new disk to the faulted mirror. This time, resilvering started nicely and the pool is now ONLINE and displaying no errors. So that''s done. However, the fs is still unusable by the zone: # zfs list NAME USED AVAIL REFER MOUNTPOINT external 449G 427G 27.4K /external external/backup 447G 427G 374G /external/backup # zoneadm -z anzan boot could not verify fs /backup: could not access //external/backup: No such file or directory zoneadm: zone anzan failed to verify why is that when my pool is healthy? justin -----Original Message----- From: Justin Vassallo [mailto:justin.vassallo at entropay.com] Sent: 23 June 2008 17:30 To: zfs-discuss at opensolaris.org Subject: RE: [zfs-discuss] zfs mirror broken? To add: Zpool status -xv posted earlier ends with: errors: No known data errors # fmadm faulty STATE RESOURCE / UUID -------- ---------------------------------------------------------------------- degraded zfs://pool=external cbc49380-8ebc-cf10-a8c5-fcaa0c984117 -------- ---------------------------------------------------------------------- degraded zfs://pool=external/vdev=3cfe5f6abcb5007b ad451007-4c3c-ee12-a9b4-fe2ad1156ef7 -------- ---------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3361 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080624/a94003f8/attachment.bin>
On Tue, 24 Jun 2008, Justin Vassallo wrote:> # zfs list > NAME USED AVAIL REFER MOUNTPOINT > external 449G 427G 27.4K /external > external/backup 447G 427G 374G /external/backup > # zoneadm -z anzan boot > could not verify fs /backup: could not access //external/backup: No such > file or directory > zoneadm: zone anzan failed to verifyWhat does ''zoneadm list -cp'' show? I suspect the double / is causing problems. Regards, markm
# zoneadm list -cp 0:global:running:/ -:anzan:installed:/zones/anzan That of any help? justin -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3361 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080624/35d66b03/attachment.bin>
Problem solved. I did a zfs mount followed by a zfs unmount, and then the zone booted fine. Thanks to William from the zones-discuss and Mark Musante, both from Sun. The more i work with zfs, the more confidence i get in it. justin -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3361 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080624/0bedc87e/attachment.bin>
Addendum: The fs was mounting on the wrong fs within the zone, and mounting a ufs fs on the intended mount point. To fix that, i zonecfged and removed the fs, and re-added it as a zfs dataset. Once done i changed the zfs mountpoint from within the zone So: zonecfg:anzan> remove fs dir=/backup zonecfg:anzan> add dataset zonecfg:anzan:dataset> set name=external/backup zonecfg:anzan:dataset> end zonecfg:anzan> exit # zoneadm -z anzan boot # zlogin anzan [Connected to zone ''anzan'' pts/2] Last login: Wed Jun 25 07:22:48 on pts/2 Welcome to anzan, a zen name meaning Quiet Mountain, Peaceful Mountain root at anzan:/#zfs set mountpoint=/backup external/backup back to working condition justin -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3361 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080625/64897dfa/attachment.bin>