Dear List, I am struggling with a storage pool on a server, where I would like to offline a device for replacement. The pool consists of two-disk stripes set up in mirrors (yep, stupid, but we were running out of VDs on the controller at the time, and that''s where we are now...). Here''s the pool config: root at storage:~# zpool status -v tank pool: tank state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: resilvered 321G in 36h58m with 1 errors on Wed Apr 4 06:46:10 2012 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t14d0 ONLINE 0 0 0 c1t15d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c1t19d0 ONLINE 0 0 0 c1t18d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c1t20d0 ONLINE 0 0 0 c1t21d0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c1t22d0 ONLINE 0 0 0 c1t23d0 ONLINE 0 0 0 logs mirror-4 ONLINE 0 0 0 c2t2d0p7 ONLINE 0 0 0 c2t3d0p7 ONLINE 0 0 0 cache c2t2d0p11 ONLINE 0 0 0 c2t3d0p11 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: <0xeb78a>:<0xa8be6b> What I would like to do is offline or detach c1t19d0, which the server won''t let me do: root at storage:~# zpool offline tank c1t19d0 cannot offline c1t19d0: no valid replicas The errored file above is not important to me; it was part of a snapshot since deleted. Could that be related to this? How can I find more information about why it simultaneously seems to think mirror-1 above is ok and broken? Any ideas would be greatly appreciated. Thanks in advance for your kind assistance. Best regards Jan
On Apr 4, 2012, at 12:08 PM, Jan-Aage Frydenb?-Bruvoll wrote:> Dear List, > > I am struggling with a storage pool on a server, where I would like to offline a device for replacement. The pool consists of two-disk stripes set up in mirrors (yep, stupid, but we were running out of VDs on the controller at the time, and that''s where we are now...).Which OS and release? There was a bug in some releases circa 2010 that you might be hitting. It is harmless, but annoying. -- richard> > Here''s the pool config: > > root at storage:~# zpool status -v tank > pool: tank > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scan: resilvered 321G in 36h58m with 1 errors on Wed Apr 4 06:46:10 2012 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c1t14d0 ONLINE 0 0 0 > c1t15d0 ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 > c1t19d0 ONLINE 0 0 0 > c1t18d0 ONLINE 0 0 0 > mirror-2 ONLINE 0 0 0 > c1t20d0 ONLINE 0 0 0 > c1t21d0 ONLINE 0 0 0 > mirror-3 ONLINE 0 0 0 > c1t22d0 ONLINE 0 0 0 > c1t23d0 ONLINE 0 0 0 > logs > mirror-4 ONLINE 0 0 0 > c2t2d0p7 ONLINE 0 0 0 > c2t3d0p7 ONLINE 0 0 0 > cache > c2t2d0p11 ONLINE 0 0 0 > c2t3d0p11 ONLINE 0 0 0 > > errors: Permanent errors have been detected in the following files: > > <0xeb78a>:<0xa8be6b> > > What I would like to do is offline or detach c1t19d0, which the server won''t let me do: > > root at storage:~# zpool offline tank c1t19d0 > cannot offline c1t19d0: no valid replicas > > The errored file above is not important to me; it was part of a snapshot since deleted. Could that be related to this? > > How can I find more information about why it simultaneously seems to think mirror-1 above is ok and broken? > > Any ideas would be greatly appreciated. Thanks in advance for your kind assistance. > > Best regards > Jan > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- DTrace Conference, April 3, 2012, http://wiki.smartos.org/display/DOC/dtrace.conf ZFS Performance and Training Richard.Elling at RichardElling.com +1-760-896-4422 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120404/0868d13e/attachment.html>
Hi Richard, Thanks for your reply.>> I am struggling with a storage pool on a server, where I would like to offline a device for replacement. >> The pool consists of two-disk stripes set up in mirrors (yep, stupid, but we were >> running out of VDs on the controller at the time, and that''s where we are now...).> Which OS and release?This is OpenIndiana oi_148, ZFS pool version 28.> There was a bug in some releases circa 2010 that you might be hitting. It is? > harmless, but annoying.Ok - what bug is this, how do I verify whether I am facing it here and what remedies are there? Best regards Jan
2012-04-04 23:27, Jan-Aage Frydenb?-Bruvoll wrote:>> Which OS and release? > > This is OpenIndiana oi_148, ZFS pool version 28. > >> There was a bug in some releases circa 2010 that you might be hitting. It is >> harmless, but annoying. > > Ok - what bug is this, how do I verify whether I am facing it here and what remedies are there?Well, if this machine can afford some downtime, you can try to boot it from an oi_151a (or later if available by the time you try) LiveCD/LiveUSB media, and import this pool, and do your offlining attempt. However before that test you should wait a few minutes after the import and look at "zpool iostat 10" output - in my recent experience, faulty pools (i.e. having recent errors cleared since then, like your broken file from a deleted snapshot) began some housekeeping (i.e. releasnig the deferred-deleted blocks). It is possible that the imported pool would cleanse itself, and that might not be the moment where you want to interfere with offlining - just in case. If the pool is silent, then go on. You could also use zdb to print out pool usage statistics (i.e. how many blocks are there in the deferred-delete list), but IIRC on my test pools gathering the stats took zdb at least 40 minutes. HTH, //Jim
2012-04-05 16:04, Jim Klimov ???????:> 2012-04-04 23:27, Jan-Aage Frydenb?-Bruvoll wrote: >>> Which OS and release? >> >> This is OpenIndiana oi_148, ZFS pool version 28. >> >>> There was a bug in some releases circa 2010 that you might be >>> hitting. It is >>> harmless, but annoying. >> >> Ok - what bug is this, how do I verify whether I am facing it here and >> what remedies are there? > > Well, if this machine can afford some downtime, you can try > to boot it from an oi_151a (or later if available by the time > you try) LiveCD/LiveUSB media, and import this pool, and do > your offlining attempt.For the sake of completeness, your other options are: * run "pkg update" or similar to upgrade your installation to the current published baseline (and reboot into it), but that will likely require more time and traffic to test than downloading a LiveCD; * try to "zpool clear" and "zpool scrub" again in order to, apparently, force processing of deferred-deleted blocks such as that leftover from a corrupt snapshot. It is possible that your current oi_148 might clean it up or not, but it is more likely to allow offlining a disk when there are no recorded errors... HTH, //Jim
On 04/ 5/12 07:08 AM, Jan-Aage Frydenb?-Bruvoll wrote:> Dear List, > > I am struggling with a storage pool on a server, where I would like to offline a device for replacement. The pool consists of two-disk stripes set up in mirrors (yep, stupid, but we were running out of VDs on the controller at the time, and that''s where we are now...).Why is that configuration stupid? Your pool a a stripe of two disk mirrors, not two disk stripes. -- Ian.