I''ve got a Thumper running snv_57 and a large ZFS pool. I recently noticed a drive throwing some read errors, so I did the right thing and zfs replaced it with a spare. Everything went well, but the resilvering process seems to be taking an eternity: # zpool status pool: bigpool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using ''zpool clear'' or replace the device with ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver in progress, 4.66% done, 12h16m to go config: NAME STATE READ WRITE CKSUM bigpool ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 ** Heres the resilver spare ONLINE 0 0 0 c4t1d0 ONLINE 18 0 0 c5t1d0 ONLINE 0 0 0 *** c7t1d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 c6t6d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c5t7d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 c7t7d0 ONLINE 0 0 0 c6t7d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 spares c5t1d0 INUSE currently in use c5t2d0 AVAIL Looks just fine except its been running for 3 days already! These are 500gb drives. Should I have removed the bad drive and just replaced it vs. trying to swap in a spare? Is there some sort of contention issue because the spare and the original drive are still both up? Not sure what to think here... -- This message posted from opensolaris.org
On 19 June, 2009 - Joe Kearney sent me these 3,8K bytes:> I''ve got a Thumper running snv_57 and a large ZFS pool. I recently > noticed a drive throwing some read errors, so I did the right thing > and zfs replaced it with a spare.Are you taking snapshots periodically? If so, you''re using a build old enough to restart resilver/scrub whenever a snapshot is taken. There has also been some bug where ''zpool status'' as root restarts resilver/scrub as well. Try as non-root. /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Richard Elling
2009-Jun-20 15:42 UTC
[zfs-discuss] x4500 resilvering spare taking forever?
Also, b57 is about 2 years old and misses the improvements in performance, especially in scrub performance. -- richard Tomas ?gren wrote:> On 19 June, 2009 - Joe Kearney sent me these 3,8K bytes: > > >> I''ve got a Thumper running snv_57 and a large ZFS pool. I recently >> noticed a drive throwing some read errors, so I did the right thing >> and zfs replaced it with a spare. >> > > Are you taking snapshots periodically? If so, you''re using a build old > enough to restart resilver/scrub whenever a snapshot is taken. > > There has also been some bug where ''zpool status'' as root restarts > resilver/scrub as well. Try as non-root. > > /Tomas >
UPDATE: It''s now back down to 0.9% complete. Does anyone have a clue as to whats happening here or where I can look for problems? -- This message posted from opensolaris.org
> Are you taking snapshots periodically? If so, you''re > using a build old > enough to restart resilver/scrub whenever a snapshot > is taken.Actually yes, I take snapshots once an hour of various things. I''ll try disabling them for the time being and see how far along it gets. Thanks! -- This message posted from opensolaris.org
> Also, b57 is about 2 years old and misses the > improvements in performance, > especially in scrub performance.Yep, I know. I''ll upgrade them at some point down the road, but they''ve been serving our needs nicely so far. Thanks! -- This message posted from opensolaris.org
Richard Elling
2009-Jun-21 19:10 UTC
[zfs-discuss] x4500 resilvering spare taking forever?
Joe Kearney wrote:>> Also, b57 is about 2 years old and misses the >> improvements in performance, >> especially in scrub performance. >> > > Yep, I know. I''ll upgrade them at some point down the road, but they''ve been serving our needs nicely so far. > >Yep, it also suffers from the bug that restarts resilvers when you take a snapshot. This was fixed in b94. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6343667 -- richard
Andrew Gabriel
2009-Jun-21 22:49 UTC
[zfs-discuss] x4500 resilvering spare taking forever?
Joe Kearney wrote:> UPDATE: It''s now back down to 0.9% complete. Does anyone have a clue as to whats happening here or where I can look for problems?There was also a bug which restarted resilvers each time you issue a zpool status command as a privilaged user. Make sure to check the progress by issuing zpool status commands as a non-privilaged user. -- Andrew
> Yep, it also suffers from the bug that restarts > resilvers when you take a > snapshot. This was fixed in b94. > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bu > g_id=6343667 > -- richardHats off to Richard for saving the day. This was exactly the issue. I shut off my automatic snapshots and 3 days later my resilver is done. Joe -- This message posted from opensolaris.org