thr3ads.net - zfs discuss - [zfs-discuss] scrub: resilver in progress for 0h38m, 0.00% done, 1131207h51m to go [Sep 2010]

If this information is useful, please help other people find it:
Share via:

LIC mesh

2010-Sep-22 20:46 UTC

[zfs-discuss] scrub: resilver in progress for 0h38m, 0.00% done, 1131207h51m to go

What options are there to turn off or reduce the priority of a resilver?

This is on a 400TB iSCSI based zpool (8 LUNs per raidz2 vdev, 4 LUNs per
shelf, 6 drives per LUN - 16 shelves total) - my client has gotten to the
point that they just want to get their data off, but this resilver
won''t
stop.

Case in point:
$ dd if=/dev/zero of=out bs=64k count=20;
(Ctrl-C''d out)
6+0 records in
6+0 records out
393216 bytes (393 kB) copied, 97.6778 s, 4.0 kB/s

This is /after/ an upgrade to build 134 on the head, hoping that zpool
import''s recovery mode would not resilver.

Auto-snapshots are off, max_vdev_pending has been tuned down to 7,
zfs_scrub_limit is set at 2, zpool status is never run as root.  Also, the
filesystem has not been touched by anyone other than myself (including
applications) in 2 months.

If we could flip a switch to turn this thing into "read-only mode"
(it''s got
the redundancy for it), that would totally fix this.

I have been referred to
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473 - is this
stable enough to build?

Are there any other options to speed up
"getting-data-off-of-this-monstrously-huge-filesystem?"
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100922/f2482fe3/attachment.html>

Richard Elling

2010-Sep-23 00:13 UTC

head link

[zfs-discuss] scrub: resilver in progress for 0h38m, 0.00% done, 1131207h51m to go

On Sep 22, 2010, at 1:46 PM, LIC mesh wrote:
> What options are there to turn off or reduce the priority of a resilver?
> 
> This is on a 400TB iSCSI based zpool (8 LUNs per raidz2 vdev, 4 LUNs per
shelf, 6 drives per LUN - 16 shelves total) - my client has gotten to the point
that they just want to get their data off, but this resilver won''t
stop.
For a healthy system, resilver is barely noticeable.
> Case in point:
> $ dd if=/dev/zero of=out bs=64k count=20;
> (Ctrl-C''d out)
> 6+0 records in
> 6+0 records out
> 393216 bytes (393 kB) copied, 97.6778 s, 4.0 kB/s
> 
> This is /after/ an upgrade to build 134 on the head, hoping that zpool
import''s recovery mode would not resilver.
Something else is probably causing the slow I/O.  What is the output of
"iostat -en" ?  The best answer is "all balls"  (balls ==
zeros)
> Auto-snapshots are off, max_vdev_pending has been tuned down to 7,
zfs_scrub_limit is set at 2, zpool status is never run as root.  Also, the
filesystem has not been touched by anyone other than myself (including
applications) in 2 months.
For SATA drives, we find that zfs_vdev_max_pending = 2 can be needed in
certain recovery cases.
> If we could flip a switch to turn this thing into "read-only
mode" (it''s got the redundancy for it), that would totally fix
this.
I think you will find that the solution lies elsewhere.
> 
> I have been referred to
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473 - is this
stable enough to build?
> 
> Are there any other options to speed up
"getting-data-off-of-this-monstrously-huge-filesystem?"
Yes.  But some are not inexpensive.
 -- richard

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com

Richard Elling
richard at nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com

LIC mesh

2010-Sep-23 14:51 UTC

head link

[zfs-discuss] scrub: resilver in progress for 0h38m, 0.00% done, 1131207h51m to go

On Wed, Sep 22, 2010 at 8:13 PM, Richard Elling <Richard at nexenta.com>
wrote:
> On Sep 22, 2010, at 1:46 PM, LIC mesh wrote:
>
> Something else is probably causing the slow I/O.  What is the output of
> "iostat -en" ?  The best answer is "all balls"  (balls
== zeros)
>
>  Found a number of LUNs with errors this way, looks like it has to do withnetwork problems more so than the hardware, so we''re going to try
turning of
LACP and using just 1 NIC.
>
>
> For SATA drives, we find that zfs_vdev_max_pending = 2 can be needed in
> certain recovery cases.
>
> We''ve played around with this on the individual shelves
(originally was setat 1 for quite a great amount of time), but left the head at default for
build 134.
>
>
> Yes.  But some are not inexpensive.
>  -- richard
>
> What price range would we be looking at?
 - Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100923/40cbf3c3/attachment.html>

zfs discuss - Sep 2010 - scrub: resilver in progress for 0h38m, 0.00% done, 1131207h51m to go

[zfs-discuss] scrub: resilver in progress for 0h38m, 0.00% done, 1131207h51m to go

[zfs-discuss] scrub: resilver in progress for 0h38m, 0.00% done, 1131207h51m to go

[zfs-discuss] scrub: resilver in progress for 0h38m, 0.00% done, 1131207h51m to go